Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5615
Constantine Stephanidis (Ed.)
Universal Access in Human-Computer Interaction Intelligent and Ubiquitous Interaction Environments 5th International Conference, UAHCI 2009 Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings, Part II
13
Volume Editor Constantine Stephanidis Foundation for Research and Technology - Hellas Institute of Computer Science N. Plastira 100, Vassilika Vouton 70013, Heraklion, Crete, Greece and University of Crete Department of Computer Science Crete, Greece E-mail:
[email protected] Library of Congress Control Number: Applied for CR Subject Classification (1998): H.5, I.3, I.2.10, I.4, I.5 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-02709-1 Springer Berlin Heidelberg New York 978-3-642-02709-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12704804 06/3180 543210
Foreword
The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Constantine Stephanidis, contains papers in the thematic area of Universal Access in Human–Computer Interaction, addressing the following major topics: • • • • •
Universal Access in the Home Environment Ambient Intelligence and Ambient Assisted Living Mobile and Ubiquitous Interaction Alternative Interaction Techniques and Devices Intelligence, Adaptation and Personalization
The remaining volumes of the HCI International 2009 proceedings are: • • • • •
Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis
VI
Foreword
• • • • • • • • • • •
Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris
I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.
Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA
Foreword
Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea
Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA
Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK
VII
VIII
Foreword
Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa
Matthew J.W. Thomas, Australia Mark Young, UK
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA
Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA
Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK
Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA
Foreword
Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA
Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria
Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA
Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA
IX
X
Foreword
Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK
Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China
Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China
Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan
Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.
Foreword
XI
I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis
HCI International 2011
The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/
General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected] Table of Contents
Part I: Universal Access in the Home Environment Key Properties in the Development of Smart Spaces . . . . . . . . . . . . . . . . . . Sergey Balandin and Heikki Waris
3
Design a Multi-Touch Table and Apply to Interior Furniture Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chien-Hsu Chen, Ken-Hao Nien, and Fong-Gong Wu
13
Implementation of a User Interface Model for Systems Control in Buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Szu-Cheng Chien and Ardeshir Mahdavi
20
A Web-Based 3D System for Home Design . . . . . . . . . . . . . . . . . . . . . . . . . . Anthony Chong, Ji-Hyun Lee, and Jieun Park Attitudinal and Intentional Acceptance of Domestic Robots by Younger and Older Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neta Ezer, Arthur D. Fisk, and Wendy A. Rogers Natural Language Interface for Smart Homes . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Fern´ andez, Juan Bautista Montalv´ a, Maria Fernanda Cabrera-Umpierrez, and Mar´ıa Teresa Arredondo Development of Real-Time Face Detection Architecture for Household Robot Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongil Han, Hyunjong Cho, Jaekwang Song, Hyeon-Joon Moon, and Seong Joon Yoo Appropriate Dynamic Lighting as a Possible Basis for a Smart Ambient Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lajos Izs´ o A New Approach for Accessible Interaction within Smart Homes through Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viveca Jimenez-Mixco, Rafael de las Heras, Juan-Luis Villalar, and Mar´ıa Teresa Arredondo
29
39 49
57
67
75
A Design of Air-Condition Remote Control for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cherng-Yee Leung, Yan-Ting Yao, and Su-Chen Chuang
82
Verb Processing in Spoken Commands for Household Security and Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioanna Malagardi and Christina Alexandris
92
XVI
Table of Contents
Thermal Protection of Residential Buildings in the Period of Energy Crisis and Its Influence on Comfort of Living . . . . . . . . . . . . . . . . . . . . . . . . Przemyslaw Nowakowski Design for All Approach with the Aim to Support Autonomous Living for Elderly People in Ordinary Residences – An Implementation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claes Tj¨ ader Speech Input from Older Users in Smart Environments: Challenges and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ravichander Vipperla, Maria Wolters, Kallirroi Georgila, and Steve Renals Sympathetic Devices: Communication Technologies for Inclusion Across Housing Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claudia Winegarden and Brian Jones
100
108
117
127
Part II: Ambient Intelligence and Ambient Assisted Living Design Framework for Ambient Assisted Living Platforms . . . . . . . . . . . . . Patricia Abril-Jim´enez, Cecilia Vera-Mu˜ noz, Maria Fernanda Cabrera-Umpierrez, Mar´ıa Teresa Arredondo, and Juan-Carlos Naranjo
139
Ambient Intelligence in Working Environments . . . . . . . . . . . . . . . . . . . . . . Christian B¨ uhler
143
Towards a Framework for the Development of Adaptive Multimodal User Interfaces for Ambient Assisted Living Environments . . . . . . . . . . . . Marco Blumendorf and Sahin Albayrak
150
Workflow Mining Application to Ambient Intelligence Behavior Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlos Fern´ andez, Juan-Pablo L´ azaro, and Jose Miguel Bened´ı
160
Middleware for Ambient Intelligence Environments: Reviewing Requirements and Communication Technologies . . . . . . . . . . . . . . . . . . . . . Yannis Georgalis, Dimitris Grammenos, and Constantine Stephanidis A Hybrid Approach for Recognizing ADLs and Care Activities Using Inertial Sensors and RFID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Hein and Thomas Kirste Towards Universal Access to Home Monitoring for Assisted Living Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rezwan Islam, Sheikh I. Ahamed, Chowdhury S. Hasan, and Mohammad Tanviruzzaman
168
178
189
Table of Contents
An Approach to and Evaluations of Assisted Living Systems Using Ambient Intelligence for Emergency Monitoring and Prevention . . . . . . . . Thomas Kleinberger, Andreas Jedlitschka, Holger Storf, Silke Steinbach-Nordmann, and Stephan Prueckner
XVII
199
Anamorphosis Projection by Ubiquitous Display in Intelligent Space . . . . Jeong-Eom Lee, Satoshi Miyashita, Kousuke Azuma, Joo-Ho Lee, and Gwi-Tae Park
209
AAL in the Wild – Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edith Maier and Guido Kempter
218
A Modelling Framework for Ambient Assisted Living Validation . . . . . . . Juan-Carlos Naranjo, Carlos Fern´ andez, Pilar Sala, Michael Hellenschmidt, and Franco Mercalli
228
Methods for User Experience Design of AAL Services . . . . . . . . . . . . . . . . . Pilar Sala, Juan-Pablo L´ azaro, J. Artur Serrano, Katrin M¨ uller, and Juan-Carlos Naranjo
238
Self Care System to Assess Cardiovascular Diseases at Home . . . . . . . . . . Elena Villalba, Ignacio Peinado, and Mar´ıa Teresa Arredondo
248
Ambient Intelligence and Knowledge Processing in Distributed Autonomous AAL-Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ralph Welge, Helmut Faasch, and Eckhard C. Bollow
258
Configuration and Dynamic Adaptation of AAL Environments to Personal Requirements and Medical Conditions . . . . . . . . . . . . . . . . . . . . . . Reiner Wichert
267
Part III: Mobile and Ubiquitous Interaction Designing Universally Accessible Networking Services for a Mobile Personal Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis Basdekis, Panagiotis Karampelas, Voula Doulgeraki, and Constantine Stephanidis Activity Recognition for Everyday Life on Mobile Phones . . . . . . . . . . . . . Gerald Bieber, J¨ org Voskamp, and Bodo Urban
279
289
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Bruegger and B´eat Hirsbrunner
297
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vlado Glavinic, Sandi Ljubic, and Mihael Kukec
307
XVIII
Table of Contents
Accessible User Interfaces in a Mobile Logistics System . . . . . . . . . . . . . . . Harald K. Jansson, Robert Bjærum, Riitta Hellman, and Sverre Morka
317
Multimodal Interaction for Mobile Learning . . . . . . . . . . . . . . . . . . . . . . . . . Irina Kondratova
327
Acceptance of Mobile Entertainment by Chinese Rural People . . . . . . . . . Jun Liu, Ying Liu, Hui Li, Dingjun Li, and Pei-Luen Patrick Rau
335
Universal Mobile Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Machado, Tiago Barbosa, Sebasti˜ ao Pais, Bruno Martins, and Ga¨el Dias
345
ActionSpaces: Device Independent Places of Thought, Memory and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rudolf Melcher, Martin Hitz, and Gerhard Leitner Face Recognition Technology for Ubiquitous Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kanghun Jeong, Seongrok Hong, Ilyang Joo, Jaehoon Lee, and Hyeon-Joon Moon
355
365
Location-Triggered Code Execution – Dismissing Displays and Keypads for Mobile Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Narzt and Heinrich Schmitzberger
374
Mobile Interaction: Automatically Adapting Audio Output to Users and Contexts on Communication and Media Control Scenarios . . . . . . . . . Tiago Reis, Lu´ıs Carri¸co, and Carlos Duarte
384
Interactive Photo Viewing on Ubiquitous Displays . . . . . . . . . . . . . . . . . . . . Han-Sol Ryu, Yeo-Jin Yoon, Seon-Min Rhee, and Soo-Mi Choi
394
Mobile Audio Navigation Interfaces for the Blind . . . . . . . . . . . . . . . . . . . . Jaime S´ anchez
402
A Mobile Communication System Designed for the Hearing-Impaired . . . Ji-Won Song and Sung-Ho Yang
412
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wang-Chin Tsai and Chang-Franw Lee Ubiquitous Accessibility: Building Access Features Directly into the Network to Allow Anyone, Anywhere Access to Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gregg C. Vanderheiden
422
432
Table of Contents
XIX
Using Distributed Processing to Create More Powerful, Flexible and User Matched Accessibility Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gregg C. Vanderheiden
438
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bruce N. Walker and Anya Kogan
445
Design and Evaluation of Innovative Chord Input for Mobile Phones . . . Fong-Gong Wu, Chia-Wei Chang, and Chien-Hsu Chen
455
Part IV: Alternative Interaction Techniques and Devices The Potential of the BCI for Accessible and Smart e-Learning . . . . . . . . . Ray Adams, Richard Comley, and Mahbobeh Ghoreyshi Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomohiro Amemiya Interacting with the Environment through Non-invasive Brain-Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Febo Cincotti, Lucia Rita Quitadamo, Fabio Aloise, Luigi Bianchi, Fabio Babiloni, and Donatella Mattia Movement and Recovery Analysis of a Mouse-Replacement Interface for Users with Severe Disabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caitlin Connor, Emily Yu, John Magee, Esra Cansizoglu, Samuel Epstein, and Margrit Betke Sonification System of Maps for Blind – Alternative View . . . . . . . . . . . . . Gintautas Daunys and Vidas Lauruska
467
477
483
493
503
Scanning-Based Human-Computer Interaction Using Intentional Muscle Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torsten Felzer, Rainer Nordmann, and Stephan Rinderknecht
509
Utilizing an Accelerometric Bracelet for Ubiquitous Gesture-Based Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Hein, Andr´e Hoffmeyer, and Thomas Kirste
519
A Proposal of New Interface Based on Natural Phenomena and So on (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ichiro Hirata, Toshiki Yamaoka, Akio Fujiwara, Sachie Yamamoto, Daijirou Yamaguchi, Mayuko Yoshida, and Rie Tutui Timing and Accuracy of Individuals with and without Motor Control Disabilities Completing a Touch Screen Task . . . . . . . . . . . . . . . . . . . . . . . . Curt B. Irwin and Mary E. Sesto
528
535
XX
Table of Contents
Gaze and Gesture Activity in Communication . . . . . . . . . . . . . . . . . . . . . . . Kristiina Jokinen
537
Augmenting Sticky Notes as an I/O Interface . . . . . . . . . . . . . . . . . . . . . . . . Pranav Mistry and Pattie Maes
547
Sonification of Spatial Information: Audio-Tactile Exploration Strategies by Normal and Blind Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marta Olivetti Belardinelli, Stefano Federici, Franco Delogu, and Massimiliano Palmiero What You Feel Is What You Get: Mapping GUIs on Planar Tactile Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Schiewe, Wiebke K¨ ohlmann, Oliver Nadig, and Gerhard Weber
557
564
Multitouch Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Schmidt and Gerhard Weber
574
Free-form Sketching with Ball B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rongqing Song, Zhongke Wu, Mingquan Zhou, and Xuefeng Ao
583
BC(eye): Combining Eye-Gaze Input with Brain-Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roman Vilimek and Thorsten O. Zander
593
Colorimetric and Photometric Compensation for Optical See-Through Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Weiland, Anne-Kathrin Braun, and Wolfgang Heiden
603
A Proposal of New Interface Based on Natural Phenomena and so on (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toshiki Yamaoka, Ichiro Hirata, Akio Fujiwara, Sachie Yamamoto, Daijirou Yamaguchi, Mayuko Yoshida, and Rie Tutui
613
Part V: Intelligence, Adaptation and Personalisation Managing Intelligent Services for People with Disabilities and Elderly People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julio Abascal, Borja Bonail, Luis Gardeazabal, Alberto Lafuente, and Zigor Salvador A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors in Embodied Conversational Agents . . . . . . . . . . . . . Afia Akhter Lipi, Yukiko Nakano, and Matthias Rehm
623
631
Intelligence on the Web and e-Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laura Burzagli and Francesco Gabbanini
641
Accelerated Algorithm for Silhouette Fur Generation Based on GPU . . . Gang Yang and Xin-yuan Huang
650
Table of Contents
XXI
An Ortho-Rectification Method for Space-Borne SAR Image with Imaging Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xufei Gao, Xinyu Chen, and Ping Guo
658
Robust Active Appearance Model Based Upon Multi-linear Analysis against Illumination Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyeong-Sic Jo, Hyeon-Joon Moon, and Yong-Guk Kim
667
Modeling and Simulation of Human Interaction Based on Mutual Beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Taro Kanno, Atsushi Watanabe, and Kazuo Furuta
674
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan-Paul. Leuteritz, Harald Widlroither, Alexandros Mourouzis, Maria Panou, Margherita Antona, and Asterios Leonidis User Individual Differences in Intelligent Interaction: Do They Matter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jelena Naki´c and Andrina Grani´c Intelligent Interface for Elderly Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changhoon Park User Interface Adaptation of Web-Based Services on the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Partarakis, Constantina Doulgeraki, Asterios Leonidis, Margherita Antona, and Constantine Stephanidis
684
694 704
711
Measuring Psychophysiological Signals in Every-Day Situations . . . . . . . . Walter Ritter
720
Why Here and Now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Rizzo, Elisa Rubegni, and Maurizio Caporali
729
A Framework for Service Convergence via Device Cooperation . . . . . . . . . Seungchul Shin, Do-Yoon Kim, and Sung-young Yoon
738
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´erˆ ome Simonin and No¨elle Carbonell
748
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivar Solheim
758
Adaptative User Interfaces to Promote Independent Ageing . . . . . . . . . . . Cecilia Vera-Mu˜ noz, Mercedes Fern´ andez-Rodr´ıguez, Patricia Abril-Jim´enez, Mar´ıa Fernanda Cabrera-Umpi´errez, Mar´ıa Teresa Arredondo, and Sergio Guill´en
766
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
771
Key Properties in the Development of Smart Spaces Sergey Balandin and Heikki Waris Nokia Research Centre Itamerenkatu 11-13, 00180 Helsinki, Finland
[email protected],
[email protected] Abstract. This paper is targeted at improving and expanding the understanding of the Smart Spaces concept of by the R&D community. Through the identification of key properties based on an analysis of evolving trends in the mobile industry, the developers are provided with recommendations that improve the adoption of Smart Spaces. It is especially important to understand how Smart Spaces can change the whole services ecosystem and the role that mobile devices will play. The paper discusses some core technologies being developed in the industry that might play a dominant role in the future Smart Spaces. A special attention of the discussion is the latest trend towards a networked inter-device architecture for mobile devices and what new possibilities it opens. With that the discussion expands into general properties of Smart Spaces. The paper summarizes functional and non-functional properties. By understanding the properties and their implications to the development and adoption of Smart Spaces, the developers are better equipped to ensure that the needs of the various stakeholders are taken into account. For this purpose, the paper proposes a set of questions that can be used to estimate how well the planned Smart Space fares when compared against each of the properties. Keywords: Smart Spaces, Future Mobile Devices, Properties, Taxonomy.
1 Introduction Nowadays people are surrounded by tens of various devices that serve different purposes and what is important – most of these devices already have sufficient processing power, memory and communication capabilities, plus advanced internal control and management system. This fact gives us an opportunity to revise the basic principle of how services are organized and delivered to the users. Actually, a similar trend can already be observed in the Internet, where the services increasingly offer the user the possibility to upgrade related software packages or even replace them by the corresponding distributed network services. Another similar trend can be seen through the success of global image repositories such as Picasa and the recently announced Google repository. Instead of placing all service components to the same physical device, services are implemented in a distributed manner with the involvement of multiple devices. The main research question addressed by this paper is what is the role of mobile devices it this global trend and what are the technical and especially non-technical C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 3–12, 2009. © Springer-Verlag Berlin Heidelberg 2009
4
S. Balandin and H. Waris
properties that the developers of Smart Spaces should consider. This paper is targeted at initiating a discussion on how to facilitate a broad adoption of Smart Spaces. The paper is organized as follows. The next section gives a general definition and provides an overview of the Smart Spaces. We then make an overview of the core technologies that we believe will have a key impact on the Smart Spaces in the near future. The subsequent chapter discusses the main points to be considered in the development of Smart Spaces. It contains a discussion on issues that we consider critical for the success of Smart Spaces as commercial products. The paper is concluded with a summary of the main findings, which we hope to also influence the future work in the field, and the list of references.
2 Definition of Smart Spaces In the book by Diane Cook and Sajal Das the following formal definition of Smart Spaces is given: “Smart Space is able to acquire and apply knowledge about its environment and to adapt to its inhabitants in order to improve their experience in that environment” [1, 2]. This definition assumes continues interaction of the user with the surrounding environment that is targeted in continuous adaptation of the services to the current needs of the user. This interaction is enabled by sensing functionality that gathers information about the space and the user; adaptation functionality for reacting to the detected changes; and effecting functionality for changing the surrounding space to benefit the user. Based on the definition the main focus of Smart Spaces is on the user. The general view of the Smart Spaces hierarchy is depicted by Figure 1.
Fig. 1. Hierarchical layers of Smart Spaces with user in the center
Obvious key concepts for any Smart Spaces are mobility, distribution and context awareness. These are addressed by the recent advances in wireless networking technologies as well as processing and storage capabilities, which have moved mobile and consumer electronics devices beyond their traditional areas of applications and allow their use for a broader scope of services. The significant computing power and highspeed data connections of the modern mobile devices allow them to become information processing and communication hubs that perform rather complex
Key Properties in the Development of Smart Spaces
5
computations locally and distribute the results. This lets multiple devices interact with each other and form ad-hoc dynamic, distributed computation platforms. Together, they form a space where via a number of wireless technologies the users can access a huge variety of services. Similarly, existing and future services form spaces that cater for a variety of needs ranging from browsing to interactive video conversations. These services surround the user all the time and have access to large amounts of data. Over time they can learn the users’ needs and personal preferences, making it possible to build even more advanced services that proactively predict those needs and propose valuable services in the given environment before the users realize it themselves. These layers, each of which can utilize a number of technologies form a smart environment (Smart Space). A further important aspect is that Smart Spaces improve the interaction between users and their physical environments, allowing more efficient consumption of available resources such as energy.
3 Overview of the Related Core Technologies Sensors play a key role in the development of Smart Spaces as the main sources of the context describing the physical world. Multiple sensors allow the continuous observation of the characteristics of the space, which can be collected and processed by a number of devices, which in turn allows the required actions to be taken. As a result, we can automate many services that currently require overprovision of resources or human intervention. The success of the Smart Spaces concept thus depends on whether a standard solution for information representation and communication between the sensors and processing devices will be applied. Another source of massive amounts of information is the World Wide Web, which is especially important when there is a need for interpretation of the obtained information, access to generic data and so on. From this respect the main enabler for Smart Spaces is the semantic web [3] and its underlying technologies, such as Resource Description Framework (RDF) [4]. It provides information representation, including structure and semantics, in a machine readable form. The Semantic Web is an enabler for creating a true web of information and opens the door for the creation of sophisticated Smart Space services where most of the informational interactions happen in an automatic fashion. It completely changes the nature of applications from the current monolithic to highly distributed, mobile and agent-like entities. Devices need to act as information processing and storage units and the resulting services need to be delivered to the consumers. We believe that the mobile device, due to being available to users and possessing significant internal processing power and data storage, should be a central component of the personal Smart Spaces. For interaction between the mobile device and the smart objects surrounding it, the most efficient approach seems to be the expansion of intra-device connectivity solutions. Unfortunately, there is today no existing optimized interface in the mobile industry similar to ISA [5], USB [6], PCI [7] or PCI Express [8] available in the PC world. This has strong historical reasons, especially the need to optimize device performance as much as possible. As a consequence, a large number of sometimes incompatible interface alternatives exist for connecting purpose specific components, and strongly monolithic mobile device architectures include extension busses such as I2C [9] and
6
S. Balandin and H. Waris
SPI [10], which provide a bandwidth of at most a few Mbit/s. The current situation contradicts the target of easy expansion outside of the device. Out of the listed PC world solutions including also FireWire [11], SATA [12] and eSATA, the most important standards from are PCI-Express and USB. Neither fits well for the mobile industry due to being designed with different requirements in mind. However, USB is being used as an external connection interface for mobile devices and other peripherals. Especially the USB 3.0 standard might carve out a niche in the mobile industry if it is aligned with the currently developed core mobile device technologies. The PCI-Express was designed and optimized as a solution with backwards compatibility to PCI. Since the PCI interface is not used in the mobile industry, it is unlikely that it will become a key technology for future Smart Space devices although the technology and industry convergence is still an open story. Another very interesting angle concerns the technologies developed in the space industry. SpaceWire is a standard for high-speed links and networks for use onboard spacecraft. The standard is free for download from the ESA [13] and after a thorough study and modeling we have found that this technology has a good potential for intraand inter-device communications. However, a number of restrictions made it suboptimal for mobile devices. Among the most critical limiting factors one can mention that PHY used DS coding, which doesn’t scale well in terms of bandwidth of a single link; also the standard has minimal support for Network layer functionality and no definition of the Transport layer; it does not have Quality of Service (QoS) support; and finally uncertainty about its future made us drop it from the list of candidates. The development of the new standard for mobile industry was started by the MIPI alliance [13]. The new standard was targeting PHY with 4 pins for the bi-directional 1 Gbit/s link with ultra-low power consumption. As a result, the targeted solution has Bit Error Rate (BER) of 10-14 (i.e. 1 error every 30 hours at link speed of 1 Gbit/s) for chip-to-chip connections, making it impossible to ignore transmission errors as is done with PC busses. The corresponding protocol stack solution, UniPro, provides mechanisms for detecting errors and recovering from them, as well as many other capabilities such as node discovery, QoS, network management. UniPro provides many opportunities for the efficient handling of intra- and inter-device connectivity. To enable the integration of the mobile device and its surrounding equipment, the potential of a wireless extension of the MIPI UniPro has been identified and is being researched. The development of this extension will support the device federation concept, making all surrounding devices in the Smart Space into logical sub-modules of the mobile device internal network. Such a low-level device interconnect will significantly speed up communication and reduce power consumption, but any potential drawbacks of the approach are still to be discovered and investigated. A further enabler technology is the Network on Terminal Architecture (NoTA) [15], a modular service-based interconnect-centric architecture for embedded devices with the basic underlying paradigm similar to what is used for Web Services. NoTA is based on the Device Interconnect Protocol (DIP) that can be implemented for various physical interfaces ranging from MIPI high speed serial interfaces to wireless transports, e.g. Bluetooth. NoTA core, DIP and related system services are open for all interested parties. A number of ideas of various services and solutions on top of NoTA have already been proposed. A number of related publications are available, and a good general overview and further references can be found in [16].
Key Properties in the Development of Smart Spaces
7
Another very interesting technology is called the Multi-device, Multipart, and Multivendor (M3) framework. M3 can be built on top of NoTA or other communication platforms. It extends the principles of rapid product innovation of Internet mash-up services to cover services in physical devices. We recommend two papers onM3 describing an example of workload balancing for RDF store below the semantic information brokers (SIB) [17] and providing the high level definition of the related Smart Spaces architecture [18].
4 Main Points to Consider in the Development of Smart Spaces This chapter describes properties that we currently see as relevant. As described previous chapters, the technical enablers are finally available and adopted by the consumers. Therefore, the main emphasis is on non-technical properties that primarily address challenges related to usability and commercial deployment rather than specific technical problems, although they can be often addressed by technical solutions. We feel that this is beneficial for the R&D community because despite their importance to the broad acceptance of the research results, the properties typically become relevant only later during the development process. 4.1 Technical Properties The product properties of Smart Space systems can be split into two categories. Functional properties are dependent on the functionality that the Smart Space should offer to its users, and are outside the scope of this paper because our assumption is that Smart Spaces can be used to provide arbitrary functionality and therefore the desired composition of these properties varies case by case. The R&D community is already adequately addressing non-functional properties such as resource awareness and security, which may be difficult or expensive to incorporate into the solutions once the products have been deployed. There are further technical properties that are not related to any particular Smart Space as a single product, but rather to the efficiency of the process that creates them as a group or category of products. The following elaborates on these in more detail and presents questions that can be used to estimate how well a work-in-progress Smart Space can be productized. Interoperability of the devices and services in the Smart Space is critical since Smart Spaces are unlikely to comply with a single particular architecture. For a specific Smart Space, it is possible to create a successful stand-alone system that fulfills its business objectives. However, in the absence of adequate interoperability mechanisms the Smart Space will not be able to achieve economies-of-scale and ecosystem benefits that come from cost efficient mass production and ability to maximize the value add of investments through specialization and reuse. Questions: 1. Is the Smart Space composed of components that interoperate using a clear set of common interfaces (low integration effort of planned system)? 2. Is the Smart Space providing an interoperability mechanism for non-predefined components (low integration effort for extensions or enhancements)? 3. Is the Smart Space constructed from components that can be re-used in other Smart Spaces (lower risk for invested effort, support for evolution)?
8
S. Balandin and H. Waris
4. Is the Smart Space allowing all components to be implemented using any technical solutions (low adaptation effort by developers and businesses)? 5. Is the Smart Space composition implementable using easily available and well known development methodology and tools (efficient development effort)? All "yes": high interoperability; All “no”: low interoperability, more standard solutions should be adopted to lower the development costs. Smart spaces are inherently versatile as unique combinations of devices and services serving some purposes in a particular context. The nature of Smart Spaces as systems deployed in a physical space also make it more expensive to upgrade them in a managed fashion as time goes by. It is important to be able to easily extend the functionality of the Smart Space as it emerges over time. Questions: 1. 2. 3. 4. 5.
Is the Smart Space providing access to Internet functionality as de-facto standards? Is the Smart Space based on popular device platforms and Internet solutions? Is the Smart Space supporting the addition and modification of components? Is the Smart Space applicable to components with a wide performance range? Is the Smart Space supporting use of functionality from a different Smart Space?
All "yes": high extensibility; All “no”: low extensibility, later enhancements should be supported, or access to complementary or additional functionality provided. The complexity of developing and operating the Smart Space determines how easily many other properties can be improved. Logical complexity increases the risk involved in starting to develop it as a product, whereas implementation complexity reduces efficiency of installation, maintenance and upgrading. Questions: 1. 2. 3. 4. 5.
Is the Smart Space logically coherent and simple for an average developer? Is the Smart Space installable and maintainable cost efficiently by a non-expert? Is the Smart Space following a logical classification, supporting marketing efforts? Is the Smart Space adhering to a governance model to manage features and IPR? Is the Smart Space available in verified configurations, for distribution channels?
All "yes": simple to develop and operate; All “no”: the development and operation should be made easier. 4.2 Non-technical Properties For the adoption of Smart Spaces it is crucial to go beyond the technology enabler development, demonstrators and small trials. Smart spaces must address real and everyday consumer needs in a way that generates demand for the technical solutions. In particular, their accessibility needs to be targeted to suit the intended users of the various Smart Spaces, and the Smart Space must promise enough commercial added value compared to the costs involved. We are presenting a set of further questions that can be used subjectively to estimate how a Smart Space addresses some key properties. If some property is addressed particularly weakly, the researcher or developer may want to determine whether that is intentional or whether to focus available resources to improve that. The first property to focus on is the generality or specificity of intended users, because the brief first impression must convince the intended user of the value that the Smart Space can provide, and being attractive to more potential users will increase the chances that more will become users. Questions:
Key Properties in the Development of Smart Spaces
1. 2. 3. 4. 5.
9
Can the expected user be from any age group (flexibility and reception of novelty)? Can the expected user be of any occupation or life situation (habits, social needs)? Can the Smart Space be used with any level of attention (effort/means to interact)? Can the Smart Space be used with any level of technical skill? Can the Smart Space be used regardless of the level of mental or physical abilities?
All "yes": intended for a very generic user and thus a potentially large user base, adoption more determined by the rest of the properties. All “no”: requires a very specific user type, other circumstances (e.g. location) need to make it likely that such users would be available in sufficient numbers to make the deployment successful. The next challenge is to make the users aware of the existence of the Smart Space, which may be something very purpose specific in a particular physical space, composed of arbitrary physical elements that are not obvious indicators to any user that there would be a Smart Space in the area. Questions: 1. 2. 3. 4. 5.
Is the Smart Space associated with a concrete, visible object (position/coverage)? Is the Smart Space associated with a recognizable or familiar object or person? Is the Smart Space prominently labeled or indicated (sensory perception)? Is the Smart Space in a physically and information-wise uncluttered area? Is the Smart Space in a context occurring frequently with other similar spaces (possibility to extrapolate or intrapolate, or to memorize for re-use)? All "yes": easy to observe; All “no”: hard to observe or attract attention, existence and availability should be bootstrapped to the environment, or communicated via some other means such as advertisements or training, until a sufficient level of awareness has been established among the intended user base. Users can be aware of the availability of the Smart Space, but were not involved in its preparation and do not know that it would offer potentially attractive services or value. There is no general means to make all users understand the value of all potential Smart Spaces a priori, but functional familiarity with their representations may be possible. Questions: 1. Is the Smart Space serving a similar purpose as an associated object (extrapolate)? 2. Is the Smart Space used in similar ways by different users (examples)? 3. Is the Smart Space performing in a similar range as the associated object? 4. Is the Smart Space starting from a common user need in the context (motivation)? 5. Is the Smart Space involved in the daily habits of its users (likelihood of learning)? All "yes": easy to comprehend; All “no”: hard to comprehend, contents and value proposition should be communicated via some other means such as instructions, or simplified so that the functionality is comprehensible in expected usage situations. When the users start to interact with a newly encountered unique Smart Space available in a particular location, it may be their only occasion to use the system. It is important to serve the intended users by adapting to the interaction types that suit them best in the given circumstances. Questions: 1. Is the Smart Space usable with any modality (ability to serve users at their terms)? 2. Is the Smart Space usable by interacting with a concrete object (interaction learning effort)? 3. Is the Smart Space usable as an extension of existing object functionality (low cognitive learning effort)?
10
S. Balandin and H. Waris
4. Is the Smart Space usable with different methods leading to a function (ability to serve different user logics and approaches)? 5. Is the Smart Space usable with a similar effort regardless of the level of expertise (ability to serve users of various capabilities)? All "yes": intuitive interaction; All “no”: interaction requires meticulous effort and should be made easier through more alternatives suiting different users, or better integration with objects existing in the space or in the user’s possession. User interactions with a unique Smart Space can never fully satisfy the needs of all intended users: Better adaptation to the individual user’s imported configurations and preferences can compensate for the limitations. Questions: 1. Does the Smart Space provide parameters to configure most of its functionality? 2. Does the Smart Space identify the parameters unambiguously for portability? 3. Is the Smart Space linked with user accessible example configurations (ability to learn how to adapt the system)? 4. Is the Smart Space capable of exporting and importing configurations (ability to automatically apply selected configurations)? 5. Is the Smart Space capable of applying partially fitting configurations (portability of settings across similar but different systems)? All "yes": user specific preferences can be intuitively applied; All “no”: tailoring requires meticulous effort and should be made easier by adopting preference descriptions commonly used by comparable users and services. The final condition for a successful Smart Space is commercial viability. There needs to be a balance between the investments on deployment and operating costs and the expected income for all stakeholders. Attempting to estimate these may feel useless, but may also help in adjusting the ambition levels of the development effort. Questions: 1. Is the Smart Space fully sponsored by any of multiple committed business parties? 2. Is the Smart Space making the contributions of stakeholders visible to their potential clients (support for advertisement funded business model)? 3. Is the Smart Space operation clearly profitable after potential costs? 4. Is the Smart Space composed of elements that are fully reusable in other spaces (ability to recoup investments in case of lifetime expiration or failure)? 5. Is the Smart Space managing the rights of all stakeholders investing in it (reduce business risk over the lifetime of the system)? All "yes": low business risk; All “no”: business risk is high and the ambitions of the R&D effort should be considered accordingly. The attractiveness must also be made known to developers and users: what it can provide them, how well it does that, and how it can be used subsequently. Questions: 1. Is the Smart Space suggesting potentially useful non-requested functionality (value beyond expectations)? 2. Is the Smart Space capable of recognizing user's interest in similar spaces (ability to speed up adoption and distribution through network effect)? 3. Is the Smart Space detecting and repeating successful usage patterns (automation)? 4. Is the Smart Space detecting and correcting unsuccessful usage patterns? 5. Is the Smart Space conveying an image of continuity backed up by credible sponsors (trust that it is worth the personal resources invested in using it)?
Key Properties in the Development of Smart Spaces
11
All "yes": further use of the deployed technologies and solutions is encouraged; All “no”: further use beyond the immediate reason that the user started to interact is discouraged, and any economies-of-scale benefits are difficult to obtain. Finally, for the Smart Spaces to become successful as a broad category of systems available in physical locations, it is important to support a healthy ecosystem of multiple actors with versatile development capabilities and business interests. Questions: 1. Is the Smart Space possible to deploy in multiple combinations (ability to incorporate elements from multiple vendors)? 2. Is the Smart Space possible to deploy at multiple levels of quality (ability to adapt and apply the space in multiple environments)? 3. Is the Smart Space exempt from regulatory or other non-user imposed constraints (reliability of available functionality)? 4. Is the Smart Space capable of self-configuration to accommodate enhancements? 5. Is the Smart Space offering a light-weight licensing for enhancements? All "yes": the freedom of building additional or complementary business using the Smart Space is unconstrained; All “no”: the Smart Space constrains external innovations, and any ecosystem benefits are difficult to obtain.
5 Discussion and Conclusions The main purpose of this paper is to initiate a discussion on how Smart Spaces could be broadly adopted by users in their everyday lives, by paying attention to pragmatic product issues. The paper makes an overview of existing technologies that according to our opinion will play a key role in the future Smart Spaces. An important observation is that both efficient communication and service development frameworks have to be proposed and widely accepted in order to guarantee the broad success of Smart Spaces. It is clear that the Smart Spaces concept is an opportunity for consumer electronics and services industries to get even closer to the users, proactively assist them, and as a result optimize the consumption of critical resources. It is natural for mobile devices become the personalized access point and interface to the surrounding Smart Spaces due to their availability to the users and their significant processing and storage capabilities. For example, the management functionality should inform the Smart Space about the user preferences and see how to obtain the favorite service of the user from the modules available in the given space. By having access to a large amount of personal information (e.g. calendar, email, etc.) and being carried by the user, the device can learn about the individual preferences and thus find or build up new services and offer them to the user at the most convenient time. We have noted that the R&D community is well capable of addressing functional and non-functional product properties. However, for solutions intended to be deployed commercially as Smart Space products, there are additional properties that we encourage researchers and developers to take into account already at an early phase in order to increase the probability that their results will end up in the market. We have presented properties related to the efficiency of product creation; to the usability of arbitrary Smart Spaces in the physical space; and to their deployment as commercial products. Within each of these categories we have proposed a set of key properties and presented a list of five simple questions that allow the developers to subjectively estimate how easy it would be to make the leap from a technical Smart
12
S. Balandin and H. Waris
Space solution to a sustainable product desired by users and valuable to businesses. We do not expect that the questions would be answered in the affirmative in all or even most of the categories for any prospective Smart Space. However, if any of the categories scores poorly in any of the categories it should prompt the developer to reconsider the assumptions of the R&D effort. Finally, the questions can be translated into taxonomy and used for classifying Smart Space concepts and implementations.
References 1. Cook, D., Das, S.K.: Smart environments: Technology, protocols and applications. John Wiley & Sons, Chichester (2004) 2. Das, S.K.: Designing Smart Environments: Challenges, Solutions and Future Directions. In: Proceedings of ruSMART conference, St. Petersburg, Russia (2008) 3. Oliver, I.: Towards the Dynamic Semantic Web. In: Proceedings of ruSMART conference, St. Petersburg, Russia (2008) 4. Official web site of Resource Description Framework (RDF) / W3C Semantic Web Activity (2009), http://www.w3.org/RDF/ 5. IEEE Personal Computer Bus Standard P996, Draft D2.02. IEEE Inc. (July 13, 1990) 6. Universal Serial Bus (USB) 2.0 Specification, http://www.usb.org/developers/docs/ 7. Peripheral Component Interconnect (PCI) Standard, http://members.datafast.net.au/dft0802/specs.htm, http://www.pcisig.com/specifications/ordering_information 8. PCI express - computer expansion card interface, http://www.pcisig.com/members/downloads/specifications/ pciexpress/PCI_Express_Base_Rev_2.0_20Dec06_cb2.pdf 9. I2C-BUS (2000), http://www.nxp.com/acrobat_download/literature/ 9398/39340011.pdf 10. SPI - Serial Peripheral Interface, http://www.mct.net/faq/spi.html 11. FireWire® 800,http://www.lacie.com/download/more/WhitePaper_Fire Wire_800.pdf 12. Serial ATA-IO: Enabling the future, http://www.serialata.org 13. Web site of European Space Agency (ESA), official web page of the Spacewire standard working group, http://spacewire.esa.int/content/Home/HomeIntro.php 14. Official web site of Mobile Industry Processor Interface (MIPI) Alliance (2009), http://www.mipi.org/ 15. Official web site of Network on Terminal Architecture, NoTA World Open Architecture Initiative (2009), http://www.notaworld.org/ 16. Lappetelainen, A., Tuopola, J.-M., Palin, A., Eriksson, T.: Networked systems, services and information, The ultimate digital convergence. In: 1st International NoTA conference, Helsinki, Finland (2008) 17. Boldyrev, S., Balandin, S.: Illustration of the Intelligent Workload Balancing Principle in Distributed Data Storage Systems. In: Proceedings of workshop program of the 10th International Conference on Ubiquitous Computing (September 2008) 18. Oliver, I., Honkola, J.: Personal Semantic Web Through A Space Based Computing Environment. In: Middleware for Semantic Web 2008 at ICSC 2008, Santa Clara, CA, USA (2008)
Design a Multi-Touch Table and Apply to Interior Furniture Allocation Chien-Hsu Chen, Ken-Hao Nien, and Fong-Gong Wu Department of Industrial Design, National Cheng-Kung University No.1, University Road, Tainan 701, Taiwan
[email protected],
[email protected],
[email protected] Abstract. This is a study based on the integration of FTIR multi-touch technology with Industrial Design to produce a multi-touch table. An multi-touch system interface is also developed through this study. Furniture allocation is applied as the content to provide users practical operating experience on the multi-touch interface. The process includes FTIR structure related testing, hardware technology and specifications; and the exterior design. The system interface includes image recognition system and multi-touch application, and is developed in FLASH. This study not only uses the easy-to-use characteristics of the multi-touch technology but also integrates PV3D to link the 3D scene with the user interface. This provides a real-time 3D simulation image that the user can view the result of the furniture allocation while controlling the user interface. Observation and interviews are made on the users to evaluate the advantages and related problems of the multi-touch technology for future study and development. Keywords: Multi-Touch, Interior Design.
1 Introduction With the development and popularization of computer, massive digital information has leaded us to enter the age of digitizing. In this age, computing is closely linked with our life. For instance, digital document, music, map, mail etc, and the digital information also have changed the method of paper writing, data saving or information sending. Not only thus it can be seen massive digital information has changed our life, on the other hand it has also brought us a convenient life. And then we could find that keyboard and mouse are the most popular way to interactive with digital information no mater it is simple or not. But regarding such interactive way, recently, some scholars have proposed different views. Ishii and Ullmer [1] then proposed a new thought in the human-machine interaction-Tangible Bits. They thought that digital information should allow users to grasp or manipulate, but not have to use the keyboard or mouse. For this purpose, they have described three key concepts of Tangible Bits: interactive surface; the coupling of bits with graspable physical objects; and ambient media for background awareness. Its goal is to integrate the digital information into physical environment C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 13–19, 2009. © Springer-Verlag Berlin Heidelberg 2009
14
C.-H. Chen, K.-H. Nien, and F.-G. Wu
for achieving easy to use with natural operation in order to reduce learning in digital world. Enable the information to be manipulated directly in the intuition way is also called Tangible User Interface. Besides Tangible User Interface, Mitsubishi Electric Research Laboratories (MERL) have presented their study on the new type of interface [2]. They provided a touch screen that users can directly operate on it. Different from signal touch screen, it has the multi-touch and multi-user characteristics, enriching the operation interface. Before MERL, Rekimoto [3] also presented the related research about multi-touch technology called SmartSkin project. From NYU, Han [4] also presented the multitouch technology called frustrated total internal reflection (FTIR). Although the technologies of multi-touch are different, they have the same purpose to let users directly manipulate virtual information on the screen with their fingers and also bring the possibility of multi-user simultaneously working together. Thus it can be seen, the computer interface operation is no longer limited in the traditional mouse and keyboard. The new operating mode unceasing was proposed. Both the TUI and multi-touch have the same purpose on a more natural behavior. However, research apply these interactive interface in the tabletop or large-scale display. Because the user does not need to look at the computer screen, operate keyboard and mouse. This has changed the operation behavior of interface and increased the future interface from existence degree of freedom. Recently also has related research on combining the digital information with our physical environment for providing an intuition interface through different combination. For instance, Park et al. [5] presented a series of future wisdom family equipment in the “Smart home”. Without the complex technology and user interface, it can provide awareness of information according to user’s demand and take the user to the better life experience. And in some case, as we can see, they combined the digital information with around environment, for instance, furniture, wall, windows or tabletop. And then Sukeda et al. [6] also present the concept of information-accessing furniture. They embed the technical equipment in the table, wall and mirrors providing the different information in different situation. They hope that this can help users more easily to get the information from daily life. From the related research, the information display and the interaction of human and digital information will not only be through the use of screen, keyboard or mouse. With the development of new interaction interface will enhance those concepts to be come true. Although multi-touch, one of those research, is still at the initial development stage, it is getting popular and expending quickly because of the simple and intuitive operation. So the goal of this study is to integrate the multi-touch technology and design from the point of view of Industrial Design. In order to make it come true, this study describes the integration of multi-touch technology and design. A multitouch table is made and an interface for interior furniture allocation is designed to demonstrate its application and provide different kind of user experience.
2 Design and Implementation At the beginning, we have to choose the multi-touch technology first and understand the related technical equipment. After that, we could integrate it with design process.
Design a Multi-touch Table and Apply to Interior Furniture Allocation
15
Considering the low-tech, low cost and design implementation, we finally choose the FTIR technology. In the beginning of the design, we tested the related materials and equipments in order to make sure the related size and technology, including the projector position, size, the capturing area of camera and the electric circuit of IR-LED. After several testing, we recorded the related positions between projector, camera, mirror, and the area of image to design a foundation for them (Fig. 1).
Fig. 1. The design of foundation
Next, we extended foundation design into the table design, and several idea sketches were drawing with the outward appearance concept to have chosen (Fig. 2). Then, we co-manufacture the table with the furniture company to make it.
Fig. 2. The outward appearance of table design
On the two sides (right and left) of projecting screen, the electric circuits of IRLED were made. Each side has five groups of infrared LED; there are ten groups on each side. Each group has five infrared LED and adds one 130 ohm resistance, and the distance between two infrared LED is 1.8 cm. we made five infrared LED in a series connection, and then put ten groups in a parallel connection. During the manufacturing process, some assembling and equipment testing were made for final checking (Fig. 3). In the final stage, color and pattern planning had been done to complete the final work (Fig. 4).
16
C.-H. Chen, K.-H. Nien, and F.-G. Wu
Fig. 3. Assembling and equipment testing
Fig. 4. Final accomplishment
3 Software System After the design is completed, we created a system through FLASH to develop multitouch image recognition system which is called Blob Tracking, and apply to interior furniture allocation. Blob Tracking is used to trace the white point position when the user touches on the screen, and then sending the points’ data to FLASH application. Therefore, we can use those data to define different manipulation meaning. For example, signal point can represent the drag target, and two points can rotate or scale the target. This not only could achieve the characteristics of multi-touch, simultaneously also reduce
Design a Multi-touch Table and Apply to Interior Furniture Allocation
17
the difficulty of computer programming for industrial designers such that they can concentrate in the interface design application.
4 Interior Furniture Allocation Concerning the interface, we designed an interior furniture allocation for application. Through the characteristics of the multi-touch, users can drag and rotate furniture to complete the allocation from Bird-view. Because of the GUI, the environment of single point, it usually has to manipulate object through other menus. But multi-touch can manipulate the object directly, it is more intuitive and simple, through different definition, it will extend richer manipulations. So in the multi-touch environment, it will decrease the complication of manipulation. And provide users an easy-to-use manipulation. On the other hand, in the past, it was natural to have only one view to complete the allocation task, after finishing the task, and then have a 3D render. This kind of process not only spends much time but also use only one view to complete the task. It is not easy to image the 3D space from only one view. So we design a simultaneous 3D view for the users to see while they are allocating the furniture. 4.1 Interface This study combines FLASH with PV3D to design the interface. PV3D is a set of FLASH 3D technology. It can simulate 3D in FLASH application. Therefore, we use FLASH and PV3D to design the interior furniture allocation interface and simulate the 3D render image. Users can simply drag or rotate to allocate the furniture into ideal interior spatial position. While allocating the furniture into space, user can view a real-time 3D image related to his/her furniture allocation or user can switch to the first person viewpoint to view the result of the furniture allocation and modify the allocation. (Fig. 5) The furniture of the interface takes the reference of the IKEA product, choosing the classified living room as the system subject. The 3D model is constructed by 3DS Max. In order to reduce loading of system, we must construct the 3D models by lower spot. Besides, we modified all textures through Photoshop in order to get more quality of 3D models. User interface
Fig. 5. Interface
3D scene
18
C.-H. Chen, K.-H. Nien, and F.-G. Wu
4.2 Playing Experience We invited 20 users to operate the interface. During the operation, the users have to finish the allocation without any limitation (Fig. 6). The result of the interview and observation are as follows: They think that multitouch is intuitive, convenient and free. The manipulation is easier and faster than the mouse. In the past, using mouse usually has to manipulate objects through RightClick or other icons, but multi-touch can manipulate objects directly, it is easier and faster. So it also can manipulate furniture quickly, allocate their own interior space. And through 3D, it can bring them more real feel of space, including the relative position, relative height and the reality. One can also see the whole scene with different view from camera. So most users think that simultaneous 3D screen help them a lot. Furthermore, some users point that multi-touch allow multi-user, so it is possible for them to discuss the furniture allocation with their family or designer and client have a platform to discuss.
Fig. 6. Playing experience
Besides, some users mentioned that although multi-touch is very intuition and simple to manipulate, but they feel that must be reasonable and not have too complicated gesture. If it were too complicated and need to be remember excessive it would become a bad effect. Also there is another worthy question is about accurate. It is not easy for us to manipulate small objects on the interface because of our fingers have their own limitation. So there are have some shortages about accurate to be improved in the future.
5 Conclusion After this experience of the integration of design, we discovered that there are still trifle place to be improved such as assembling in the manufacturing process. We have realized the difficulty and had the questions to be at after the integrating technology with design. However, in this study, we show the multi-touch technology can be integrated into the traditional furniture design process and provide the implementation possibility for future furniture design. And, a FLASH based multi-touch imagerecognition software is development as the SDK of the multi-touch interface design in
Design a Multi-touch Table and Apply to Interior Furniture Allocation
19
FLASH environment such that designers can use it to develop different multi-touch applications. In addition, we develop an interior furniture allocation system to demonstrate the design work of multi-touch table. The results of this study, most users have positive opinion on multi-touch, and also think it will be a new trend in the future. But most users think that there is a main problem when they operated the multi-touch table. The problem is insensitivity. Because of insensitivity, users have to manipulate it with more forces, it will increase the trouble of manipulation imperceptibly. So after this study, we will look for some possible materials and methods to improve the problem, and star to plan a larger multi-touch screen and some other different form to develop at the same time. Looking forward to possible developments in the future.
References 1.
2.
3.
4.
5. 6.
Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: Pemberton, S. (ed.) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1997, pp. 234–241. ACM Press, New York (1997) Shen, C.: Multi-User Interface and Interactions on Direct-Touch Horizontal Surfaces: Collaborative Tabletop Research at MERL. In: Proceedings of the First IEEE International Workshop on Horizontal Interactive Human-Computer Systems (2006) Rekimoto, J.: SmartSkin: an infrastructure for freehand manipulation on interactive surfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves. CHI 2002, pp. 113–120. ACM Press, New York (2002) Han, J.Y.: Low-cost multi-touch sensing through frustrated total internal reflection. In: Proceedings of the 18th Annual ACM Symposium on User interface Software and Technology. UIST 2005, pp. 115–118. ACM Press, New York (2005) Park, S., Won, S., Lee, J., Kim, S.: Smart home – digitally engineered domestic life. Personal Ubiquitous Comput. 7(3-4), 189–196 (2003) Sukeda, H., Horry, Y., Maruyama, Y., Hoshino, T.: Information-Accessing Furniture to Make Our Everyday Lives More Comfortable. IEEE Transactions on Consumer 178 Electronics 52(1) (February 2006)
Implementation of a User Interface Model for Systems Control in Buildings Szu-Cheng Chien and Ardeshir Mahdavi Department of Building Physics and Building Ecology, Vienna University of Technology Karlsplatz 13 (259.3), A-1040, Vienna, Austria
[email protected],
[email protected] Abstract. Occupant control actions in a building (i.e. user interactions with environmental systems for heating, cooling, ventilation, lighting, etc.) can significantly affect both indoor climate in and the environmental performance of buildings. Nonetheless, relatively few systematic (long-term and highresolution) efforts have been made to observe and analyze the means and patterns of such user-system interactions with building systems. Specifically, the necessary requirements for the design and testing of hardware and software systems for user-system interfaces have not been formulated in a rigorous and reliable manner. This paper includes the prototyping of a new generation of user interface model for building systems in sentient buildings. The outcome of these efforts, when realized as a web-based user interface, would allow the occupants to achieve desirable indoor climate conditions with higher levels of connectivity between occupants and sentient environments. Keywords: sentient buildings, user interface, environmental controls.
1 Introduction An increasing number of sophisticated devices and systems are being incorporated in the so-called high-tech buildings. Particularly in large and technologically sophisticated buildings, the occupants, confronted with complex and diversified manipulation possibilities for environmental controls, are forced to deal with these devices via a wide range of distinct and uncooperative interfaces. These situations can lead to a frustration of the occupants while they attempt to achieve comfortable (visual/thermal, emotional and psychological) conditions. Occupant control actions in buildings (i.e., user interactions with environmental systems for heating, cooling, ventilation, lighting, etc.) can significantly affect both indoor climate in and environmental performance of buildings. Nonetheless, relatively few systematic (long-term and high-resolution) efforts have been made to observe and analyze the means and patterns of such user-system interactions with building systems. Specifically, the necessary requirements for the design and testing of hardware and software systems for user-system interfaces have not been formulated in a rigorous and reliable manner. Thus, we focus in this paper on an effort to further articulate the implementation of an adequate user interface system that can facilitate effective communication and C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 20–28, 2009. © Springer-Verlag Berlin Heidelberg 2009
Implementation of a User Interface Model for Systems Control in Buildings
21
interaction between occupants and environmental systems in sentient buildings [4]. An initial result of this effort is a prototypical interface design named as BECO (“Built Environment communicator”). BECO serves as a user interface model for building systems of an experimental project concerning “self-actualizing sentient buildings” [1]. We first discuss the related work and the results of previous research concerning the comparative evaluation of market products (interfaces) for user-based control of building systems. Secondly, in order to better understand the genesis of the result, we describe test bed infrastructure and system architecture. Furthermore, we elaborate on the implementation of the proposed interface model in terms of implemented services, layout design, and navigation.
2 Background 2.1 Related Work As to the role of user interfaces in the context of intelligent built environments, there are a number of precedents. For example, the ubiquitous communicator – the user interface of PAPI intelligent house in Japan – is developed as a communication device that enables the occupants to communicate with people, physical objects, and places [7]. The other example of this type of user interface is one from Samsung. Samsung's homevita system gives occupants an overview of their home network and allows them to manage daily household tasks such as controlling lights, air conditioners, and even washing machines [8]. More recent works on the integration of user interfaces into intelligent environments include Swiss house project in Harvard University [3], and Interactive space project by SONY [6]. In contrast to the above approaches, we concentrate on the exploration and translation of systematic user interface requirements and functionalities (for office environments) in prototypical designs, whereby users’ control behavior is considered. These requirements are then implemented in terms of a prototypical user interface model mainly supporting user interactions with building systems for indoor climate control. 2.2 Previous Research In previous research efforts [1, 2], the requirements and functionalities of user interfaces for building systems have been explored. We compared twelve commercial user-interface products for building control systems. These products were classified as follows: A type ("physical" devices), B type (control panels), and C type (web-based interfaces). Thereby, we considered three dimensions, namely control options, information types, and hardware. The results were arranged in terms of: 1) comparison matrices of the selected products based on three dimensions, namely control options, provision of information, and hardware, and 2) product comparison/evaluations by the authors' based on seven criteria (functional coverage, environmental information feedback, intuitiveness, mobility, network, input, and output). Subsequently, we conducted an experiment, in which forty participants examined and evaluated a subset of the above user interfaces for buildings' control systems, mainly in view of three evaluative categories (first impressions, user interface layout design, and ease of learning). Comparison results of the selected user interface products for intelligent
22
S.-C. Chien and A. Mahdavi
environments warrant certain conclusions regarding their features and limitations and inform efforts to develop new interface designs. Control Options and Functional Coverage- in sentient environments, one key point is how the occupants interact with the multitude of environmental control devices and how they deal with the associated information loads (technical instructions, interdependence of environmental systems and their aggregate effects on indoor conditions) in an effective and convenient manner. The result of the above mentioned study implies that limited functional coverage and intuitiveness of use often correlate. This suggests that an overall high functional coverage may impose a large cognitive load on (new) users. Provision of Information- if it is true, that more informed occupants would make better control decisions, then user interfaces for sentient buildings should provide appropriate and well-structured information to the users regarding outdoor and indoor environmental conditions as well as regarding the state of relevant control devices. Most B and C type products in our study provide the users with some information such as the state of the devices. However, they do not sufficiently inform the occupants regarding indoor and outdoor environmental conditions. This implies that the occupants are expected to modulate the environment with the condition of insufficient information. Mobility and Re-configurability- the hardware dimension addresses two issues, namely, 1) mobility: user interfaces with spatially fixed locations versus mobile interfaces; and 2) re-configurability: the possibility to technologically upgrade a user interface without replacing the hardware may decrease the cost of rapid obsolescence of technology protocols. C-type terminals such as PDA and laptops that are connected to controllers via internet, facilitate mobility. In contrast, Type A and B products are typically wall-mounted and thus less mobile. As far as re-configurability is concerned, the user interface software may be easily upgraded in Type B and C products, whereas the conventional A-type products are software-wise rather difficult to upgrade. Input and Output- certain type-B and type-C products provide the users with richer manipulation possibilities that – if transparent to the user – could support them in performing a control task. There are other products (particularly type-A), however, that are rather restricted in presenting to the users clearly and comprehensively the potentially available manipulation and control space. Nonetheless, as our results suggest, type-A products are more positively evaluated than the more modern/high-tech (type-B and C) products, especially in view of first impressions and ease of learning. Here, we see a challenge: modern (high-tech) interface products that offer high functional coverage, must also pay attention to the cognitive user requirements so that formulation and execution of control commands are not overtly complicated.
3 Built Environment Communicator The observations analyzed in the previous section informed the resulting interface named as BECO- “Built Environment communicator” (see Fig. 1), which serves as a user interface model for building systems of a research project “self-actualizing
Implementation of a User Interface Model for Systems Control in Buildings
23
Fig. 1. A screen shot of BECO in web browser
sentient buildings” [1]. In this section, firstly, the testbed infrastructure and system architecture are described. The features of this user interface model are then introduced in view of implemented services, layout design, as well as navigation. 3.1 Testbed Infrastructure A testbed infrastructure is set up to simulate office-based sentient environments where a set of services are deployed and seamlessly integrated. The testbed is installed for “self-actualizing sentient buildings” research project [1] as a 1:1 mockup of two office rooms located in our Building Automation Laboratory in Vienna Technical University, Department of Building Physics and Building Ecology. This testbed infrastructure involves a system controller associated with a variety of network protocols (based on the Internet, LAN, and LON Network), devices, and services. In order to create a realistic office environment, this existing light-weight test bed is equipped with systems for heating, lighting, ventilation, shading, and de-/humidification. These devices include: 1) HVAC system; 2) Radiator; 3) Electrical windows; 4) Electrical shading; 5) Ambient lighting system (2 luminaires and 1 task spot for each room); 6) De-/Humidification system (see Fig. 2). 3.2 System Architecture Our interface development is based on Silverlight 2 which is a major tool for building rich interactive user experience that incorporates user interface and media [10]. Visual Studio 2008 (based on C#, as a .NET language) is used as a development tool for coding this silverlight-based user interface framework and Adobe Illustrator for layout and graphic design. Specifically, in order to make the interface more graphical and interactive, XAML (Extensible Application Markup Language) is used as a user interface markup language to create user interface (dynamic) elements and animations. Also, a Microsoft SQL Server, which is a relational database and management system produced by Microsoft, serves as a database server of this interface application. ASP .NET AJAX was developed to improve performance in the browser space
24
S.-C. Chien and A. Mahdavi
Fig. 2. Schematic representation of the equipped devices in a test room (Lab 1)
by making communications between the web-based interface and database server asynchronously. In addition, a specific socket-based communication protocol is conducted to connect to the model-based service via a socket port. 3.3 Implemented Services All identified system services are implemented and aggregated in terms of a webbased interface that provides a central portal for the occupants to access all control services. Thereby, we consider four aspects, mainly control options, provision of information, settings, and hardware: Control options- three control groups considered essential for the occupants of an office [1] are implemented in order to accommodate the occupants’ preferences to control the occupants’ environment. These control groups include “Home” (based on control via perceptual values/parameters), “Devices” (involving control via devices) and scenes (encompassing control via scenes). All deployed control groups have been integrated in BECO providing a “one-for-all” and consistent interface to unify the control solutions. The realization of the above-mentioned control groups may be further customized via user-based definitions of spatial (micro-zoning) and/or temporal (schedule) extensions. An example of a spatial extension is a user-customized assignment of a control device state to a certain location (e.g., Lab1 or Lab2). Such spatial extension is deployed in these three control groups, namely “Home”, “Devices”, and “Scenes”. An example of a temporal extension is a user-defined timebased variation of (schedule for) the position of a certain device/scene. Such temporal extension is employed in control groups regarding “Devices” and “Scenes”.
Implementation of a User Interface Model for Systems Control in Buildings
25
Provision of information- information groups implement a schematic information service for the office-based environment, which continuously updates information from Building information model. Primary information groups include general information, information booth, and information extensions. General information, which is in the bottom of the layout, provides the occupants with user information, time, and date. The occupants can inquire the context information (i.e., indoor/outdoor information) and control task information (regarding device states) via information booth. Also, room surveillance (as linked to IP CAMERA) and location information may be obtained separately by the occupants. Among these information groups, the information booth, room image, and location information are divided into sections and placed into panels that allow the occupants to inquire one or two or close all at a time. Settings- “Settings” include general setting and scene setting. General setting pertains, for example, to startup page (based on “Home” and “Devices”), measurement (involving metric and English system), and suggestion notification marking. Scene setting includes manipulation steps such as control states setting (regarding the control devices in control options) and assigning name/icon. Also, the occupants may assign scene setting to timeline/date setting as optional extension. Hardware- occupants may use mobile interfaces (e.g., laptop and/or tablet pc) to call up this web-based interface model – BECO - and achieve the desirable indoor climate via internet regardless of the spatial limits. Also, it is software-wise easy to upgrade to provide the occupants and building management with high reconfigurability and flexibility potential. 3.4 Layout Design In order to achieve a clear visual hierarchy and semantic structure, this section discusses certain strategies to organize versatile groups and objects in this interface model: Layout framework- the users typically favor the interface to be easy to use/learn and to navigate through independent of the functional coverage ranges (see section 2). Keeping the user interface simple and clear makes it easier for the users to adapt to. Furthermore, changes in the appearance of the layout should clearly relate to users’ intention and operations. Thus, the first step in the design is to achieve a visually consistent and easily-recognizable framework. Firstly, a closure grouping strategy is deployed to form a focal point for short-term user-system interactions (see Fig. 3). Then, related attributes are gathered together and separated from other distinct attributes. For example, most information groups are constantly employed in the right side of the layout to keep them unambiguous separate from the control groups in view of navigation memory. Center stage- the primary job of a transient posture user interface with its shortterm usage patterns is to accomplish an indoor climate control task. For establishing a visual hierarchy and guiding the occupants’ focus immediately to the main control zone where the most important task take place, an obvious and large area is anchored in the center of this interface layout, whereas the auxiliary contents are clustered around the “center stage” [9] in small panels/pieces (see Fig. 4).
26
S.-C. Chien and A. Mahdavi
Fig. 3. (a) Interface layout; (b) Closure grouping; (c) Layout zoning in terms of attributes; (d) Visual hierarchy: center stage and auxiliary content
Use of color- for undertaking a variety range of assigned tasks, this user interface is designed and organized into many subsections in view of the layout. In addition to using the above-mentioned layout framework to integrate them visually, making each subsection distinct and capturing the users’ attention immediately is also an important issue. In our deployed layout, fives series of high-contrast colors are assigned together with the layout framework to identify and “echo” separate attributes in this user interface layout. 3.5 Navigation As to the navigation experience, instead of offering too many “jumps” to satisfy a wide range of flexibility/functional coverage, it is a key issue to provide a more straightforward manipulation memory helping the occupants to get around safely in a quasi “onepage” depth. A strong layout framework discussed in section 3.4, consistently shown on each sequence page, makes learning and retaining of the required manipulation sequence easy and relieves occupants’ cognitive burden to handle varying page content by a wide margin. Moreover, certain cognitively friendly user patterns are used to support the occupants whilst offering richness in manipulation options: Card stack- a number of control options are required for this interface, whereas the occupants may need only one group at one time. Thereby, the control options are grouped into three separate “cards” [9] together with titled tabs (i.e., “Home”, “Devices”, and “Settings”) to allow the occupants to access them one at a time.
Implementation of a User Interface Model for Systems Control in Buildings
27
Accordion- instead of overwhelming the occupants, each information group on the right-hand of the layout (based on context, surveillance image, and location information) are embedded in accordion-like panels and may be opened and closed separately from the others simply when needed. However, the occupants may also trigger these three groups simultaneously and keep them in view all the time. In this aspect, the occupants may experience a neat layout while offering richness in manipulation options. Target guiding- guiding the occupants to go through so many jumps may distract their attention and let them get lost easily in navigation. Two patterns (control "in place” and Sequence guiding) are used to guide the occupants to effectively accomplish the control task, whereby the perceived complexity of the interface is decreased. Continuous scrolling- going through long lists of items may also impose a cognitive burden on the occupants. In order to present a long set of items effectively in “Devices” (control group) and context information panel, a pattern of continuous scrolling is used to enhance the occupants’ rapid selection/review of the items. The occupants may click the arrow to invoke the scrolling. In response to the click, a certain list of items on the display is scrolled through in a horizontal/vertical way. Thus, the occupants may jump to the desired items visually. Terms /icons- labels (e.g., iconic buttons, tags, and text items) are used here to communicate knowledge visually/verbally and to enhance navigation proceeding. For example, in order to convey the cognitive message regarding the main control tasks to the occupants, “Home” and “Devices” control groups are presented in terms of large language-neutral icons. Also, by means of assigning short and easy-to-understand titles, certain text items (together with mapped icons) are made convenient to use by the occupants. To better portray the navigation of the interface, an illustrative scenario with manipulation steps is described and demonstrates how the occupant adjusts the indoor climate conditions. In this example scenario, a company manager is working and finds the room air to be too warm. Thus, she calls up “control via perceptual values” in “Home” control groups and chooses “Temperature” option (see Fig. 4). A control box is triggered on the main control zone of the interface screen. She presses “cool” button twice. That way, she has control over the temperature of the room, while the model-based system [5] translates her input with its own simulation-based approach
Fig. 4. The occupant adjusts the indoor climate conditions by control via perceptual values
28
S.-C. Chien and A. Mahdavi
to trigger an appropriate control action involving the related devices. Subsequently, the system changes the states of HVAC, the position of the blinds as well as the window of her office room. Meanwhile, the animated icon in the control box becomes cool by 2 levels, as an information feedback of the temperature transition. Once the control task is finished, she clicks somewhere else to terminate the control box and the screen reverts to a default view of “Home” control group.
4 Conclusion The present paper demonstrated the translation of systematic user interface requirements and functionalities (for office environments) into prototypical designs, whereby users’ control behavior is considered. The proposed user interface model mainly supports user interactions with building systems for indoor climate control. With easilyrecognizable icons and well-structured navigation possibilities, a wide range of control options are provided to the occupants. The implemented interface prototype provides a testable basis for future developments in user interface technologies for sentient buildings. Acknowledgements. The research presented in this paper is supported, in part, by a grant from FWF (Fonds zur Förderung der wissenschaftlichen Forschung), project Nr. L219-N07. We also thank Ministry of Education of Taiwan for its support of this work.
References 1. Chien, S.C., Mahdavi, A.: User Interfaces for Building Systems Control: from Requirements to Prototype. In: 7th European Conference on Product and Process Modelling, pp. 369–374. CRC Press, Sophia Antipolis (2008) 2. Chien, S.C., Mahdavi, A.: User Interfaces for Occupant Interactions with Environmental Systems in Buildings. In: 24th International conference on Passive and Low Energy Architecture, pp. 780–787. RPS Press, Singapore (2007) 3. Huang, J.: Inhabitable Interfaces: Digital Media- Transformations in Human Communication. In: Messaris, P., Humphrey, L. (eds.), pp. 275–286. Peter Lang, New York (2006) 4. Mahdavi, A.: Anatomy of a cogitative building. In: 7th European Conference on Product and Process Modelling, pp. 13–21. CRC Press, Sophia Antipolis (2008) 5. Mahdavi, A., Spasojevic, B.: Energy-efficient Lighting Systems Control via Sensing and Simulations. In: 6th European Conference on Product and Process Modelling, pp. 431– 436. Taylor & Francis, London (2006) 6. Rekimoto, J.: Organic interaction technologies: from stone to skin. Commun. ACM 51(6), 38–44 (2008) 7. Sakamura, K.: Houses of the Future- TRON House & PAPI: Insight The Smart Environments. In: Chiu, M.L. (ed.), pp. 203–222. Archidata Press, Taipei (2005) 8. Samsung Homevita, http://support-cn.samsung.com/homevita/ 9. Tidwell, J.: Designing Interfaces: Patterns for Effective Interaction Design. O’Reilly, Sebastopol (2005) 10. Wenz, C.: Essential Silverlight 2 Up-to-Date. O’Reilly, Sebastopol (2008)
A Web-Based 3D System for Home Design Anthony Chong, Ji-Hyun Lee, and Jieun Park Graduate School of Culture Technology, KAIST, Daejeon, Republic of Korea
[email protected],
[email protected],
[email protected] Abstract. Buying a home is a big investment and is one of the most important decisions made in one’s life. Home owners after purchasing the apartments are interested in having their own home unique design identity. They often seek expert interior designers to assist in designing the homes and bringing out the uniqueness in them. In current interior design industry, designers have to meet the owners often to discuss the designs and alter the housing layout design accordingly to the owners’ preferences. This process is often repeated many times before a finalized housing design layout will be accepted by the owners. In this paper, we propose a rule-based housing design system to generate many sets of alternatives housing design layouts based on the initial designer’s housing design layout. Designers, therefore, are able to produce alternative housing designs for the owners and also able to explore various alternatives done by the rule-based design system that they have not encountered before. Keywords: Housing design, rule-based system, web-based system.
1 Introduction Buying a home is a big investment and is one of the most important decisions made in one’s life. Selecting a house or an apartment for a buyer often involved many considerations. Location, cost, family size, neighbors, even to the extents of considering design of the house’s feng-shui (Chinese Geomancy) especially for Asian buyers where they believed certain layout of the home design will affect their wealth, health, career and family harmony. The Korean housing market, which accounts for 43.4% of the construction industry [1], is faced with a highly competitive environment due to various customer needs and the growth of the housing supply ratio (expected to attain to 11.6%), and the changes in housing policies where Korea Ministry of Construction and Transportation has set up plan to construct 5000, 000 housing units every year from 2003 to 2012. By 2012, there will be 5 million housing construction in Korea [2]. However, in most countries (including Korea) apartments/houses are often built with standardized plans that traditional reflects commonly house types. These apartments often come with standard design. Home owners after purchasing the apartments are interested in having their own home unique design identity. They often seek expert interior designers to assist in designing the homes and bringing out the uniqueness in them. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 29–38, 2009. © Springer-Verlag Berlin Heidelberg 2009
30
A. Chong, J.-H. Lee, and J. Park
In a customer-oriented environment, the design of housing however requires intensive communication between the customer and the designer, as well as complicated processes in design and construction phases. Designers and the owners often need to meet up very regularly to determine and to confirm the house designs on papers. These initial design concepts are then be further defined, criticized, and rejected, or revised, refined, and developed until a housing design is accepted by the clients. Regular changes to the design are made very often on the spot due to many reasons, for example, budget costing, materials of furniture, lightings and so on by owners. Rough plan proposals are reviewed and revised until a finished plan which meets all requirements is presented and approved. In housing design, there is decision which an interior designer take is likely to have implications that cut across multiple aspects. Removing a room’s balcony, for example, may result in a bigger comfortable room with more space, but at the same time, the room will feel warmer during summer and colder during winter along with bad noise insulation problem. These isolated issues themselves are the interconnectedness which makes interior design a highly complex activity. The paper is organized as follows: Section 2 is the literature review describing the housing design concept development and listed some of the commercial home design software products available in the market. Section 3 introduces the basic design rules. Section 4 describes the implementation of the rule-based home design system to a design scenario. Section 5 concludes this paper.
2 Literature Review 2.1 Housing Design Concept Development The final design of a housing layout accepted by clients often involves intensive discussion between the customers and the designer. This is the information acquisition process where time should be spent to gather as many facts as possible about the inhabitants of a home, getting the idea of who they are and what they like. It is important for a designer to conceptualize the future housing layouts his or her clients would prefer. There are the basic four elements in planning a house design that a designer have to consider, understand and gather before the graphic stage of planning any housing design drawing on paper [3]. The designer has to form (1) a Client Profile to find out the number, ages, sex, activities and relationship of people who will make up the household in order to met the special needs and interests of each individual and of the group. Normally this planning also involves the degree of privacy and the interaction. Designer also has to know and understand his or her clients’ lifestyle. For example, the clients have a hobby of rearing fishes and the designer has to consider the fish tank in his or her housing design stage. Designer has to know whether the clients like to invite friends to their home on every weekend or prefer to spend the weekend quietly working on a hobby in the house. In this case, the living room’s size has to be taken in mind during the design stage. Designer also need to consider the (2) Functional Goals during the design of the housing layout. Lifestyle of the family determines the home functions for which it is
A Web-Based 3D System for Home Design
31
being planned. The designer needs to consider any special needs for the clients who might need a home office with separate access or special facilities for an elderly or disabled person. When designing the layout of the house, designer needs to look at the (3) Equipment Needs where designer needs to know what kind of equipments the clients will be having in the house, for example, electricity appliance (television, sound systems) during the planning of the housing design. Designer needs to look into the (4) Space requirements where it is highly based on the careful study of the activities, behavior patterns, development needs and desires of the clients. The clients might love to read and have a big collection of books and the collection will generally increases slowly. The designer will have to put this information into consideration when allocating the space for during the design process. In the complexity of housing design, it is necessary to allow creative ideas and inspiration to develop and evolve freely. At this stage of the design process, creativeness begins to synthesize all the previously gained data with professional housing design knowledge and experience into a concept which will determine the outcome of the housing layout design. Designer uses bubble diagrams from adjacency studies and schematic drawings to generate floor plan via a series of sketches which begin to allocate more concrete layout shapes to activity spaces. A two dimensional floor plan drawing may seen as plain and it is sometimes hard to read, but its importance in determining the kind of life possible in any given space. Designer must first digest, analyze, and evaluate the information gathered in a systematic way. The design process provides a number of ways by which to organize and translate information into solutions. Interior zoning (zones) and adjacency studies are often used in design principles. In zoning principle, regardless of the size of the house, space divides itself into zones that group similar kinds of activities accordingly to the degree of privacy or sociable interaction. Designer normally starts the design accordingly to three activities -Social (living, dining and balcony), private (bedroom) and work (Kitchen) activities. In adjacency principle, the relationships between various spaces and activities can be outlined roughly using a bubble diagram to group and organized accordingly to zoning principles. The Bubbles represent interior spaces and their importance and relationship to each other. The connecting lines between bubbles indicate access and flow. It is easier to do this with the abstract tool of a bubble rather than to begin defining spaces with walls and the technical relationships between rooms, access and flow. Errors can be seen by conceptualizing what was a thought into concrete relationship on paper. This process is critical and precedes formal space-planning and help to clarify how space relates and flows before floor plans are drawn. Schematic drawings are made to help visualize concepts. They are refinements of the bubble diagrams used in analysis, with greater details, more accurate proportion and with measurement, suggesting how the space might look and feel. A bubble diagram with correspond to its schematic diagram is shown in figure 1. 2.2 Home Design Software Products Designer often uses 3D modeling software (example 3D Studio Max and 3D Maya) to assist in their job. However, in recent years, commercial home design software
32
A. Chong, J.-H. Lee, and J. Park
Fig. 1. A bubble diagram and its 2D schematic representation diagram. In bubble diagram, larger bubbles represent larger space. Overlapping bubble are common spaces that are accessible from another space (Kitchen and Living room).
products can be found in the market. 1) Punch! Professional Home Design Suite [4], 2) Better Homes and Gardens Home Designer Suite [5], 3) Instant Architect [6], 4) Total 3D Home & Landscape Design Suite [7] 5) My Virtual Home [8] 6) 3D Home Architect & Landscape Design Deluxe Suite [9] 7) Instant Home Design [10] 8)Your Custom Home [11] 9) Design Workshop Classic [12] 10) Quickie Architect [13]. All these packages offer easy-to-use application software products offer carious degree of features such as 3D household objects libraries with license fees involved. These commercial home design software products are not web-based and their functionalities are limited. Therefore, we propose to modify an open source home design software to allow the introduction of design rules to generate new housing layout automatically. The next section will describe the basic rules for developing the rule-based housing design system.
3 Design Rules In housing design, designer often requires intensive communication with the customer to finalize the housing design layout with the clients. During these period of communication, designer often draw multiple design layouts to show to the clients. This is very time consuming for both parties. In this work, we aim to develop common housing design rules that will allow computer to help the designer in generating housing layout automatically. In previous section, the 4 elements are often used by designer in collecting information from their clients. Designer uses these information to manually design the first housing design layout for the clients. Taking the first housing design layout, introduce the basic rules to the home design software to generate multiple new housing design layout based on the initial housing design layout. The 3D coordinates of all objects in the housing design layout will be defined. This is to ensure objects faced the correct direction as determined by the initial housing design layout. The 15 basic rules are determined as followed.
A Web-Based 3D System for Home Design
1.
2. 3. 4. 5. 6. 7. 8.
9. 10. 11. 12. 13. 14. 15.
33
All objects (furniture) must be within the dimension of the walls of each individual room. Designer would have considered the furniture style for each individual room during the initial design. Toilet cannot be relocated. The reason is due to the understanding of the drainage piping system of apartment building. All windows cannot be move and resize due to the windows locations in apartments are fixed. Living room area dimension size cannot be changed as the designer would have considered the lifestyle of the clients. All doors must have a clearance of more than 1 meter facing inwards and outwards. This is to ensure doors are not blocked by other objects in the design. All enclosed dimension (4 walls) will generate a door. This is to prevent an enclosed area without a door. Door dimension cannot be changed. All four edges of the door must be in collision with the wall and the floor. This is to ensure no stand-alone door appearing in the new housing layout design. The collision detection algorithm will be used. At least one edge of wall must be adjacent to any other wall. This is to ensure no stand-alone door. The collision detection algorithm will be used. No penetration of furniture with any other object by employing collision detection algorithm. No less than 2 meter gaps between two parallel walls. This is to ensure proper housing layout. The x-axis of the L-shaped couch is always in the direction of the LCD TV. Backside of all bookshelves must collide with the wall. The collision detection algorithm will be used. Headboard of the beds must collide with the wall. The collision detection algorithm will be used. Base (-ve X-axis) of all objects will always be in the direction of the floor. This is to prevent objects from appearing upside down.
These are the common rules that the home design software must be receiving and with these rules, new multiple housing design layouts can be generated automatically by the computer. These rules, depending on the owners’ requirement and needs, can be changed and updated to the home design software. The next section, we will introduce a free open source home design software (Sweet Home 3D) to demonstrate our work.
4 Implementation 4.1 Software Environment In our work, Sweet Home 3D (http://sweethome3d.sourceforge.net/), an open-source free interior design application, with a 3D preview was used for the designing and modeling of an apartment. Sweet Home 3D enables designers to easily design their housing layout with ease. This allows the designers to design and make the changes with ease with the home owner’s concepts in mind. In this section, we will use this
34
A. Chong, J.-H. Lee, and J. Park
software with the basic rules as set in the previous section to generate many sets of alternatives housing design layouts based on the first initial designer’s housing design layout. Designers therefore are able to produce alternative design for the owners done by the framework and also able to explore various alternatives done by the design framework system that they have not encountered before. This rule-based housing design system offers the designers a promising application for the housing domain because it allows times and efforts to be saved from showing multiple housing layout design to customers. Sweet Home 3D is written in Java and offers the ability to extend the functionalities of this application to include rule-based design to generate multiple housing design layouts. Sweet Home 3D has an applet version which supported web-based home design. 4.2 A Design Scenario We will describe a scenario in this section to demonstrate of our rule-based housing design concept. A woman and an adult son purchase a new apartment. The mother (62) is a retired teacher who teaches private knitting in her home. She has over 30 years experience in knitting and has is now teaching knitting at the comfort of her apartment. Her son (30) is a freelance 3D computer animator who works at home accepting various computer animation modeling jobs from his agency located in another city remotely. The designer will design the housing layout and furnish these specific rooms for the clients. 1. 2. 3. 4. 5. 6.
A knitting room for the mother A Dining/Living room One bedroom for the mother One Bedroom/home office for the son A Kitchen Two toilets
The bedroom/home office: Is to be a quiet, orderly work environment for the son who can work from home. He will not have on site clients. The home office needs to have a computer server (to save his data and work), a computer with a large scale monitor and professional books. The son has requested a desk, a bookshelf, an ergonomic office chair; appropriate lighting; floor, wall and window treatments; and furnishings, finishes, and accessories (FF&A) for his office.
‐
‐
The knitting room: Is to be a space that will be a well lighted inviting creative environment for teaching knitting to students of all ages. The space should be big enough to occupy up to 5 students. She has requested storage for knitting equipments (needles, yarn). The Living Room will serve as a combined relaxing social and informal entertaining area. There is natural light from windows. Access to the home office and knitting room will be through the living room. The client would like an informal designated eating and serving area to seat a maximum of 8 people. Bedrooms: Standard furnished common bedroom.
A Web-Based 3D System for Home Design
35
Client Preferences: The clients have asked the designer to begin with the rooms as if they were white boxes. Architectural details are to be appropriately drawn to scale and all floor, wall, window, and ceiling treatments are to be decided by the designer. The designer based on the requirement of the owner and created the initial housing layout as shown in figure 2. This initial housing design layout which the designer designed is based on the information gathered from the owners. All requirements from the owners are met: two bedrooms, two toilets, one knitting room, a kitchen and one living room with enough space to entertain 8 people. The mother’s bedroom (upper left hand corner) is next to the knitting room with a single partition. The son’s room is at lower right hand corner.
Fig. 2. This initial housing design layout which the designer designed is based on the information gathered from the owners
4.3 Implementation of the Design Scenario We executed the rule-based Sweet Home 3D to generate new alternative housing design as shown in figure 3, 4, 5 and 6 based on the initial housing design. Each figure shows different housing layout designs with comparison to the initial design from the designer. In figure 3, a new partition wall has been created in the son’s room (lower right hand corner) to create a mini-office environment. One of the edges is adjacent to the side wall obeying the basic rule. There is no object collide with that new wall. The partition between the mother’s room (upper left hand corner) and the knitting area has been extended to provide an enclosure for the mother’s room. An extra door has been created for the mother’s room. The two toilets and windows remain at their original location. As shown in figure 4, the couch and bookshelf in the living room have been shifted but not out of the dimension of the living room. The locations of the mother’s room (now at lower left hand corner) and the kitchen (now at upper left hand corner) are switched. A corridor has been created between the knitting room and the mother’s room obeying the rules (>2m gaps) between parallel walls. The doors for the toilet and the knitting room have been re-located.
36
A. Chong, J.-H. Lee, and J. Park
Fig. 3. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
Fig. 4. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
As shown in figure 5 below, the couch and bookshelf in the living room have been shifted. The kitchen is now at a new location (upper left hand corner) and the mother’s room is now created at the lower left hand corner. The knitting room has is now at the lower left hand corner. The mother’s room has been shift to the center. A new door for the mother’s room has been created. The door obeys the rule of clearance at least of 1 meter. Location of the bookshelf at the living room and the bed itself does not allow the door at the other walls to have a 1 meter clearance. A new wall partition has been created in the son’s room to ensure better privacy. The doors for the toilet and the knitting room have been shifted. Figure 6 below shown the couch and bookshelf in the living room have been shifted. The son’s room is now at a new location (upper center). The area dimension of the living room remains the same obeying the dimension rule of the living room.
A Web-Based 3D System for Home Design
37
Fig. 5. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
Fig. 6. Computed layout based on the design rules with comparison with figure 2 (designer’s design)
5 Conclusion and Future Work We proposed a rule-based system where the Sweet Home 3D is able to accept rules as listed in section 3 to generate new alternative housing design from the initial housing design. Designer will have the first meeting with the owners and will collect design layout preference information from the owners. Designer will use sweet home 3D to create the initial design and introduces rules to the modified sweet home 3D software to generate multiple housing design layouts for the clients at the second meeting. This will reduce significant time and efforts for both parties to reach to an agreement on the housing design layout. For the future work, the system could be improved by using a Case-Based Design approach where past experience of home designer will be stored and updated. This to allow the designers to look at each problem as a new case and computer is able to search for a related solution from the database. The system will revise the former solutions and adapt to the new situation. Designers can re-use the stored past experience to assist them in their housing design.
38
A. Chong, J.-H. Lee, and J. Park
References 1. Kim, Y.S., Oh, Y.K., Kim, J.J.: A planning model for apartment development project reflecting client requirements, Korea. Journal of Construction Engineering and Management, Korea Institute of Construction Engineering and Management 5(3), 88–96 (2004) 2. Kang, P.M.: Directions of Korean Housing Policy. In: 3rd ASIAN Forum conference, Tokyo, Japan, January 27-29 (2004); For the Field of Architecture and Building Construction, http://www.asian-forum.net/conference_2004/session_pdf/ 3-4%20R%20Korea%20G%20Kang.pdf 3. Nissen, L.A., Faulkner, R., Faulkner, S.: Inside Today’s Home. In: Harcourt Brace, 6th edn. (1994) 4. Punch! Professional Home Design Suite, http://www.punchsoftware.com/index.htm 5. Better Homes and Gardens Home Designer Suite, http://www.homedesignersoftware.com/products/ 6. Instant Architect, http://www.imsidesign.com/ 7. Instant Architect, http://www.individualsoftware.com/products/ home_garden_design/total3d_home_landscape/ 8. My Virtual Home, http://mvh.com.au 9. 3D Home Architect & Landscape Design Deluxe Suite, http://www.punchsoftware.com/index.htm 10. Instant Home Design, http://www.topics-ent.com/ 11. Your Custom Home, http://www.valusoft.com 12. Design Workshop Classic, http://www.artifice.com/dw_classic.html 13. Quickie Architect, http://www.upperspace.com/
Attitudinal and Intentional Acceptance of Domestic Robots by Younger and Older Adults Neta Ezer1, Arthur D. Fisk2, and Wendy A. Rogers2 1
Georgia Institute of Technology, Department of Industrial Design 247 4th St. Atlanta, GA 30332-0170, USA 2 Georgia Institute of Technology, School of Psychology 654 Cherry St. Atlanta, GA 30332-0170, USA {neta.ezer,af7,wr43}@gatech.edu
Abstract. A study was conducted to examine the expectations that younger and older individuals have about domestic robots and how these expectations relate to robot acceptance. In a questionnaire participants were asked to imagine a robot in their home and to indicate how much items representing technology, social partner, and teammate acceptance matched their robot. There were additional questions about how useful and easy to use they thought their robot would be. The dependent variables were attitudinal and intentional acceptance. The analysis of the responses of 117 older adults (aged 65-86) and 60 younger adults (aged 18-25) indicated that individuals thought of robots foremost as performance-directed machines, less so as social devices, and least as unproductive entities. The robustness of the Technology Acceptance Model to robot acceptance was supported. Technology experience accounted for the variance in robot acceptance due to age. Keywords: Domestic Robots, Older Adults, Technology Acceptance.
1 Introduction As robots are entering the domestic environment, a question to ask is: Will people be accepting of robots in their homes? This is an important and interesting question because robots have the potential to assist their human owners in many ways, but at the same time may be perceived as altering the social environment of the home. Robots would be considered disruptive technologies, as they would not simply be new versions of existing technologies. Disruptive technologies are often not accepted as readily as incremental innovations [9], [10]. This question about robot acceptance is particularly relevant to older adults. Robots are currently being designed to help older adults live in their homes longer, by helping them to perform activities such as medication management and to provide emergency monitoring [4]. There is a need to understand, first, what older adults’ perceptions are of a robot in their home and second, what variables can predict whether older adults would be accepting of such a robot. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 39–48, 2009. © Springer-Verlag Berlin Heidelberg 2009
40
N. Ezer, A.D. Fisk, and W.A. Rogers
1.1 Robot Acceptance In the Technology Acceptance Model (TAM), acceptance is defined as a combination of attitudes, intentions, and behaviors towards a technology [6]. In the model, perceived usefulness and perceived ease of use of a technology are incorporated into consumers’ attitudes about the technology. These attitudes predict intentions to buy or use the technology and actual behaviors involving acquiring and using the technology [7]. The relationship between perceived usefulness and perceived ease of use and technology acceptance has been demonstrated for numerous information technologies [11], [17]. The acceptance of a robot for the home, however, may involve alternative predictors. Other expectations about a robot, for example its social abilities, may be more predictive of attitudinal and intentional acceptance of that robot. Robots may also carry out tasks in which they behave as teammates with their human owners [5]. Thus, variables that are generally predictive of acceptance of humans as social partners (e.g., friendliness) and as teammates (e.g., motivated) may be more predictive of acceptance of robots than those describe in the TAM. 1.2 Older Adults and Robots Several research projects are currently underway to design robots for the older adult population [1]. In the future, robots may help older individuals learn new skills, manage finances, and remember to take their medication, among other things. A robot may be especially effective for these types of activities because it can be socially engaging, and an intelligently dynamic device [3], [12], [16]. Although there are many potential benefits of assistive robots in the home for older adults, older individuals might not be as accepting as younger adults of such a device in the home. Older adults may be especially concerned about how difficult a new device will be to learn [8]. On the other hand, older adults appear willing to accept technology if it allows them to live independently in their home [15]. Consequently, if older adults perceive a robot in their home as helpful rather than intrusive, they may be just as accepting of it as younger adults. Despite the growing interest in developing robots for older adults, few studies have investigated this age group’s acceptance of robots. The studies that have been conducted have generally measured responses of older adults to specific robots with limited functionality [2], [13], [14]. For example older adults expressed excitement with a nurse-robot that helped them navigate through a building [13]. These studies provide evidence that older adults may accept certain robots in certain situations. They do not, however, reveal more general attitudes and perceptions older individuals have about robots, which could be used to predict acceptance for a wider variety of robot types in the context of the home. There is a need to understand the relationship between older adults’ expectations of domestic robots and their acceptance of them. 1.3 Overview of Study An exploratory survey study was used to understand younger and older adults’ prototypical characteristics of domestic robots and the relationship between these characteristics and robot acceptance. Acceptance was limited to attitudinal and intentional acceptance because most robots designed for domestic use are still in the research and
Attitudinal and Intentional Acceptance of Domestic Robots
41
early development phase. It was predicted that perceived robot characteristics related to social partner and teammate acceptance would add significant predictive power to acceptance over that explained by perceived usefulness, and perceived ease of use alone. Additionally, there were two possible predicted patterns of age-related differences in robot acceptance. If older adults thought of robots as beneficial to them, they were predicted to be as accepting as younger adults of a robot in their home; if they did not see the benefit, they were expected to be less accepting than younger adults of a robot in their home.
2 Method 2.1 Sample Questionnaires were sent to 2500 younger adults (18-28 yrs) and 2500 older adults (65-86 yrs) in the Atlanta Metropolitan area and surrounding counties using an agetargeted list with a 65% hit rate. Forty-three packets were returned as undeliverable. Of the total questionnaire packets sent, 177 included completed questionnaires from individuals in the targeted age groups (110 packets contained only sweepstakes entry forms and 23 respondents were not of the correct age). The effective response rate was 5.6%. The response sample was composed of 60 younger adults (M = 22.7 yrs, SD = 3.2) and 117 older adults (M = 72.2 yrs, SD = 5.7). The younger and older adult samples were 21.7% and 53% male, respectively. Participants indicated living independently either in a house, apartment, or condominium. There were no older adults who indicated living in a nursing home or assisted living facility. 2.2 Questionnaire A separate page was included with the questionnaire instructing participants to imagine that someone gave them a robot for their home and to draw and describe this robot. This page was to be filled out before participants began the questionnaire. The questionnaire contained four sections: 1) Views about Robots, 2) Robot Tasks, 3) Technology/Robot Experience, and 4) Demographics. The Robot Tasks section of the questionnaire will not be discussed in this paper. Views about Robots. The first part of the section contained 48 Likert-type items of possible robot characteristics. The items were developed through an extensive literature review of variables predictive of technology/machine, social partner, and teammate acceptance. The instructions were for participants to indicate how much each item matched the characteristics of the robot they had imagined in their home from 1 = “not at all” to 5 = “to a great extent”. The second part of the section included four statements about perceived usefulness (performance, productivity, effectiveness and usefulness) and four statements about perceived ease of use (easy to learn to use, easy to become skilled at, easy to get technology to do what user wants, and overall ease of use) The instructions were for participants to indicate how much they agreed with each of the eight statements about
42
N. Ezer, A.D. Fisk, and W.A. Rogers
the robot they imagined in their home. A Likert scale was used from 1 = “strongly disagree” to 5 = “strongly agree”. The last part of the section contained items about the attitudinal and intentional acceptance of the robot that participants had imagined in their home. There were three 5-point scales for attitudes (Bad-Good, Unfavorable-Favorable, and NegativePositive) and three 5-point scales for intentions (No Intention-Strong Intention, Unlikely-Likely, and Not Buy It-Buy It). Participants were instructed to circle the number on each scale representing their attitudes about the robot and their intentions to buy the robot if it were available for purchase. Technology and Robot Experience. The technology and robot experience parts of the questionnaire consisted of 20 technology items and six robot items, respectively. Participants were asked to indicate on a Likert-type scale how often they had used each technology in the past year from 1 = “not at all” to 5 = “to a great extent (several times a week. The robot items were categories of existent robots: manufacturing, lawn mowing, mopping, vacuum cleaning, guarding, and entertaining. Participants were asked to indicate how much experience they had with each on a Likert-type scale from 1 = “no experience with this robot” to 5 = “I have and use this robot”. 2.3 Procedure The questionnaires and supporting materials were mailed to residents in the Atlanta area. Recipients were given four weeks to complete and return the questionnaire. A reminder postcard was mailed two week after the initial mailing. Recipients could mail back a sweepstakes entry form to win one of fifty $50 checks.
3 Results 3.1 Technology and Robot Experience Participants were each given a technology experience score from the mean of their responses to the frequency of using 18 technologies in the past year. Home medical device and non-digital camera were excluded due to a lack of significant correlations with the other items. A score of 1.0 on the technology experience scale would indicate no experience and a score of 5.0 would indicate daily experience with the items that were presented. An ANOVA with age (younger, older) as the grouping variable showed younger adults (M = 4.05, SD = .44) as having significantly more experience with tech2 nology than older adults (M = 3.38, SD = .66), F(1, 175) = 48.9, p 65) addressed the assessment from health professionals as CUORE’s main strength. Moreover, most patients stated that being remotely monitored increased their feeling of security and comfort. Younger patients, on the other hand, valuated the system as a tool for self-managing their condition, enhancing their motivation through self-assessment of vital signs such as weight and blood pressure. CUORE’s solution for medication management was highly appreciated by both groups, as most patients (7 out of 10) considered medication management as one of their main problems. Nevertheless, while most elderly patients considered that the medication management area would be useful for their informal caregivers and professionals’ helpers, younger patients valuated the use of the system for automatic reminders and education on the medicines. Education on symptoms and medication was highly valuated by both groups, while younger patients had reservations about education on lifestyle, as they considered it may be intrusive and annoying. Education should be displayed by prompting messages and should not be compulsory. Prompted advices should give access to additional material in standard formats such as paper or video. The general impression of the system was positive. The only hesitations came from elderly patients who described themselves as reluctant to technology. Nevertheless, most of the patients who had recognized to be scared of new technologies later recognized that, after the introductory explanations, it had been easy for them to interact with the system. The MSV-404 was considered as scary and difficult to use by most of the patients. Most patients already have measuring devices at home, such as weight scales, blood pressure cuffs, and considered it preferable to continue using devices they already know. Moreover, some of them stated the need to add new sensors to the system – i.e. glucometers – for patients with co-morbidities. Four patients took the system home and used it for one day or two, depending of the availability of the VSM. After this time, the interviewers went to the patient’s home, where the patient was asked to fill the same questionnaire they filed during the first encounter. The next figures show these 4 patients’ impressions before and after using the system in a real environment. The next figures show the patients’ impressions about the PDA before and after taking the system home. The results show that three of the patients did not significantly changed their minds after using the system in a real environment. The only patient that changed his mind had connection problems between the PDA and then at the end of the first day, which made his impressions much worse after taking the system home. It is worth noting that health care systems have to be very carefully designed and implemented in order to guarantee the patient’s adherence and confidence. Interviews with professionals aim to explore the business opportunities, to identify all actors involved in the health care process of chronic patients and to identify the barriers and challenges that arise when designing a holistic approach to treatments and health care. The selected professionals include cardiologists, electrophysiologists, general practitioners, nurses, pharmacists and social workers. The format of the interviews is similar to the format that was used with patients. During the storyboard phase, all professionals were shown the whole system, including the patient side. The tangible phase was different for each professional, depending on their profile and
256
E. Villalba, I. Peinado, and M.T. Arredondo
expected role. Thus, hospital professionals were asked to interact with the web portal, while pharmacists and social workers were asked to interact with the PDA and the web portal. After that, professionals were asked some open questions and they were prompted to fill a scoring sheet questionnaire in order to gather quantitative data on the impressions and insights about the system. The scoring sheet questionnaire comprises 10 questions. The questions are listed below: 1. This concept is a good solution for this health condition. 2. This concept will improve the quality of health. 3. This concept will reduce the effectiveness of care. 4. This concept will damage my relationship with the patient/client. 5. This concept will improve communication in the professional team. 6. This concept will increase my workload. 7. This concept will complicate my way of working. 8. This concept will provide me with reliable information. 9. I think health professionals would not easily adopt this concept. 10.I will recommend/prescribe this concept. The results showed that the response towards the system was mostly positive. Most professionals stated the importance of having quick access to all the information about the evolution of the patients’ vital signs between visits and their treatment. The web portal was highly appreciated, but most hospital professionals stated the need of having all information regarding the patient’s treatment and health record displayed in a unique screen. The medication section was also highly appreciated. Nevertheless, professionals of all profiles stated their concern regarding the difficulty of introducing a system like this in the actual health care system. 9 out of 10 professionals stated that they would recommend or prescribe this system to their patients or clients, as they considered the system as useful to enhance the patient’s motivation and adherence to the medication regime. Nevertheless, cardiologists had doubts on the reliability of the gathered data.
6 Conclusions The final results were really promising. Patients and professional who took part in the process show a high interest on the system. We could learn how to go on with our research trying to find the best way to implement solutions for personal health systems. A detailed analysis to enhance individual experience incorporating this system into the daily routine in a long term basis requires further study. A study is underway on the behavior components towards e-health to create a communication framework plus increasing the patient’s interest in such systems. A future framework considers the analysis of different variables to assure the motivation in the long term use. Likewise, we must evaluate the long-term impact in the quality of life of heart patients and their health status. In the coming years we will face a social change in which these system may play an important role in chronic patients’ daily lives, supporting a quality of life that will prevent and treat chronic diseases. Besides, from the economical point of view, we also need new ways of facing the amount of people demanding care.
Self Care System to Assess Cardiovascular Diseases at Home
257
Acknowledgments. This work has succeeded thanks to the close collaboration with Hospital San Carlos of Madrid, Spain and ITACA Institute, (Valencia, Spain). COURE is an integrated system of the NUADU project (ITEA 05003).
References 1. Eurostat, Population projections 2004-2050 (2005) 2. Mackay, J., Mensah, G. (eds.): World Health Organization. The Atlas of Heart Disease and Stroke (2004) 3. ISTAG Report on Experience & Application Research. Involving users in the Development of Ambient Intelligent. European Communities (2004) ISBN: 92-894-8136-3 4. Cooper, A.: About Face 3.0 The Essentials of Interaction Design. Wiley Publishing, Inc., Chichester (2007) 5. Villalba, E., Peinado, I., Arredondo, M.T.: User interaction design for a wearable and IT based heart failure system. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4551, pp. 1230– 1239. Springer, Heidelberg (2007) 6. Villalba, E., Ottaviano, M., Salvi, D., Peinado, I., Arredondo, M.T.: Iterative user interaction design for wearable and mobile solutions to assess cardiovascular chronic diseases. Advances in Human-Computer Interaction, pp. 335–354. Ioannis Pavlidis, ISBN 978-9537619-19-0 7. Moggridge, B.: Designing Interactions. The MIT Press, Cambridge (2007) 8. Zwick, C., Schmitz, B., Kuehl, K.: Designing for Small Screens. AVA publishing, SA (2005), ISBN: 2-940373-07-8 9. Rubin, E., Yates, R.: Microsoft .NET Compact Framework. Sams Publising (2003) 10. Schmidt, M.: Microsoft Visual C# .NET 2003. Sams Publising, USA (2004) 11. Haine, P.: HTML Mastery: Semantics, Standards, and Styling. Friends of ED (2007) 12. Dumas, J.S., Redish, J.C: A practical guide to usability testing. Intellect Books (1999), ISBN: 1-84150-020-8
Ambient Intelligence and Knowledge Processing in Distributed Autonomous AAL-Components Ralph Welge, Helmut Faasch, and Eckhard C. Bollow Institut für verteilte autonome Systeme und Technologien (VauST), Leuphana University of Lueneburg, Volgershall 1, 21339 Lueneburg, Germany {faasch,welge,bollow}@leuphana.de
Abstract. With the development of computers regarding integration, size and performance we observe a quick increase of computational intelligence into all areas of our daily life. It is shown how to build Ad-Hoc-networks with our middleware to generate emergent intelligence in the behavior of the complete networks. Our approach shows the application of AAL-components (components for ambient assisted living (AAL)). Here we have as well the questions of sustainable development: increasing consumption of resources and energy in the production phase, reduced periods of use phases. Ambient computing and ambient intelligence show a high potential to modify the society’s treatment of resources and energy. The interaction with “intelligent” things will change our conception of production and consumption. Keywords: Autonomous Systems, AAL (Ambient Assisted Living), Knowledge Representation, Services for Human-Computer-Interfaces, Ad-hoc Network, Semantic Method Invocation.
1 Introduction Ambient Intelligent Networks are built using different types of so called Smart Nodes. The Smart Nodes are implemented in different categories. First we have the Mobile Smart Nodes (MSN), which are a kind of PDAs (Personal Digital Assistants) – carried by human beings. Further on there are TESM (Thin Embedded Smart Nodes) and FESM (Full Embedded Smart Nodes). The Smart Nodes act as intelligent clients which communicate all together in Ad-hoc Networks. They offer different types of services to each other. While operating independent and together at the same time we receive emergence effects. The Smart Nodes together supply intelligent behavior. Human beings are assisted and supported by the network. We find application fields for AAL (ambient assisted living), for energy management and basically in everything [1], [2], [3]. If we embed Smart Nodes into usual things of the daily life it leads to an Internet of Things [4]. The support of human acting takes place without overruling the human will.
2 Prerequisites and Methods In the following we prepare the prerequisites and methods for establishing the intelligent ambient network. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 258–266, 2009. © Springer-Verlag Berlin Heidelberg 2009
Ambient Intelligence and Knowledge Processing
259
2.1 Ambient Intelligence-Platform To implement an Ambient Intelligence system we need as well the appropriate hardware and software platforms to establish a modular base infrastructure for a highly adaptable network. Our approach provides a transparent interconnection between users and ubiquitous knowledge [5]. Based on ambient networks consisting of so called Smart Nodes hybrid objects as well as remote services are offered through classical Internet services and channeled into the ambient network. Figure 1 gives an overview of the communication layers. The standard used for communication between Smart Nodes is IEEE 802.15.4. This standard defines a Personal Area Network (PAN) which has been developed with special focus on low energy consumption. While IEEE 802.15 provides the physical and data link layer, we use the standard IP protocol as network layer. This allows an easy integration of Ambient Intelligence networks into existing IT networks. In addition, it enables the Ambient Network to incorporate services from the IT backend, which may not be provided completely by Smart Nodes alone. Moreover, it allows a classic IT infrastructure to participate in the ambient network. The Smart Nodes form a special network – the MANET1 using the AdHoc On Demand Distance Vector (AODV, RFC3561) routing protocol [6]. A proprietary convergence layer is responsible for the interface between IEEE 802.15.4 data link and IP network layer. This includes address mapping and mechanisms for joining and leaving the network. The upper levels of communication are implemented by using XML-RPC2, which is used for calling services and for exchange of data. At the top level is the Semantic Decision Layer which uses OWL3 to represent human preferences, abilities of individual nodes, sensor attributes-value pairs etc. All of them will be used to draw conclusions while trying to match the preferences of humans and the ambient intelligence services [7].
Fig. 1. Network Stack 1
Mobile AdHoc Network. Extended Markup Language – Remote Procedure Call. 3 Web Ontology Language. 2
260
R. Welge, H. Faasch, and E.C. Bollow
2.2 Network Structure and Routing The IP overlay network is implemented in a way that reflects the topology of the underlying IEEE 802.15.4 network. Due to the limited range of a single node a routing protocol has to be applied to transfer data over multiple hops from source to destination. Due to the dynamic nature of the Ambient Intelligence network (most of the nodes may enter or leave the network at any time and some nodes are mobile), static routes are not feasible. We use the AdHoc On Demand Distance Vector (AODV, RFC3561) protocol for routing. AODV is a reactive distance vector routing protocol developed for mobile, dynamic ad hoc networks. Routes are discovered on demand by broadcasting RouteQueries through the network using the expanding ring search algorithm. The destination node or an intermediate node which currently knows a route to the destination replies with a RouteReply describing the discovered route. Like other distance vector protocols a network-node basically stores only information about destination, number of hops to destination and next hop. No complete route information is stored which conserves memory in contrast to link-state protocols. In an Ambient Intelligence network it is not necessary for all nodes to provide routing capabilities. In our design only some of the nodes, the Full Embedded Smart Nodes (FESN), provide routing capabilities. FESNs are stationary nodes forming an environmental network. In the deployment phase the network can be planned in such a way that irrespective of current mobile user’s positions every part of the Ambient Intelligence network is always reachable. The FESNs form the backbone of the Ambient Intelligence network and communicate using AODV. Another kind of nodes which are embedded into sensors and actors, the so called Thin Embedded Smart Nodes (TESN), have no routing capabilities. Instead they associate with FESNs using IEEE 802.15.4 and communicate only through their FESN. The same is true for Mobile Smart Nodes (MSN). This allows the TESNs to be technically simpler and thus less expensive than FESNs. From an IEEE 802.15.4 point of view, the FESNs are the PAN coordinators for the TESNs and MSNs.
Fig. 2. Living Lab
Ambient Intelligence and Knowledge Processing
261
A fourth kind of Smart Node is named Convergence Node. This is a special kind of FESN which, in addition to AODV capabilities, has a second interface to connect to classic IT networks, i.e. it serves as a bridge between IP based Ethernet networks and the Ambient Intelligence network (see figure 2). 2.3 OWL-Discovery–Middleware Each room has a coordinator, the FESN (Full Embedded Smart Node), which collects the self-descriptions of the energy loads (lamps, heaters, etc.) sent and stored by TESNs connected to the energy consumers. The self-descriptions are modeled in OWL and are transferred via HTTP. The FESN receives and keeps these descriptions.
Fig. 3. Network Structure
Additionally, the FESN holds abstract room information, for example the rooms location and rooms volume as well as a context description identifying the room and the FESNs’ IP address. Figure 3 illustrates one possible scenario. Figure 4 shows a typical sequence of actions: When a person enters the building the persons’ MSN (Mobile Smart Node) connects to a reachable FESN (1). After the MSN has received an IP address from the FESN the MSN transfers a “Context Request” to the FESN. The context specifies the current interest of the person, e.g. the persons office to be controlled. The FESN answers with a “Context Response” containing the IP address of the FESN associated to the context (2). These requests and responses are encoded in OWL and are transmitted via UDP. Next the MSN sends its complete preference profile, modeled in OWL, to the FESN associated to the desired context (3). This FESN now infers the users preferences with the accumulated selfdescriptions of all TESNs located in the room (4). After completion FESN controls electrical consumers (e.g. light and heater) by sending an XML-RPC-Call via HTTP to the TESNs according to the result of the inference process (5, 6).
262
R. Welge, H. Faasch, and E.C. Bollow
Fig. 4. OWL-Discovery Sequence
2.4 OWL-Modelling Modelling the Ambient Intelligence system using OWL and preliminary tests with a prototype implementation have shown that expressing simple properties as well as semantic preferences in a formal language like OWL poses clear advantages over plain classical building automation systems when dealing with human centric systems. While traditional building automation system can control some lights and manage air-condition for an entire multi-storey building their possibilities are quickly exhausted when dealing with individual humans in the sense of an Ambient Intelligence System. The basic difference is that traditional building automation systems may include control loops with feed back values from sensors. But the individual human is not part of this loop. Such systems do not factor the actual human characteristics and the actual human behaviour as part of its designs. In our approach the users carry own preferences within the MSNs. These preferences are more complex than “actor x=off, actor y=on”, which would be ok within the domain of a building automation system. Instead preferences are like “I want a temperature to 20°C when working”. To respond appropriately the system must have an "idea" about the context “working”: Where does the user work? When does the user work? And finally: “How do I get the temperature to 20”°C? As a final consequence the system would map the users desire to simple instruction a classical automation system might carry out, by specifying a set point for a specific controller. Formalizing properties, facts, and rules which enable the system to come to a decision which results in appropriate actions is possible with OWL. The expressiveness of OWL surpasses the simple example above, but things get more complicated easily. In an AAEM system there might be energy constraints: What to do if energy is scarce
Ambient Intelligence and Knowledge Processing
263
and the user wants a high temperature or some other energy intensive services? The system might override the preferred temperature and deliver some degrees less. For other services cutting the service level might not be possible, i.e. it is not possible supplying “50% of a laser printer”. This means detailed information about energy sinks and their utility for the user is needed – and therefore must be formalized. A final example introduces a scenario where a learning capability of the system is required: If a user leaves his current “work-context" heated to 20°C the system has to decide whether to keep the temperature at 20°C or to turn the heating down for the sake of energy savings. The system might learn that, if the user leaves the room and enters the canteen in a certain timeframe, it is not reasonable to turn off the heating, while on the other hand if he leaves his current context on Wednesdays to reappear in colleague X’s office, this means that he most probably will not return for the next 3 hours so that in the meantime a lowering of temperature is advisable. Rules like this, which can be either programmed into the system or learned during usage, have to be encoded in a formalized machine readable way using a framework like OWL. These examples show that for human centric Ambient Intelligence systems much more structured information about devices and the environment is needed than in a traditional building automation system.
3 Semantic Ambient Network Using SMI - Semantic Method Invocation We address the fragmented market of low power, low cost, low data rate embedded networking devices. The architecture is designed for seamless integration of mobile users into building environments consisting of wireless sensors, actuators as well as things of daily use. The current embedded implementation is based on NXP’s LPC21xx (ARM7) microprocessor architecture and TI’s IEEE 802.15.4 compliant CC2420 transceiver for wireless communications [8]. SUN’s JAVA platform is supported for IT developments. 3.1 Introducing Semantics into Ambient Networks In addition to typical ad hoc network tasks there are additional challenges: while discovering all available network nodes a user usually demands for room functions from embedded devices, even if he has never met them before. He isn’t aware of nodes and location of services; his context is of interest only. Concepts of services using hard coded IDs and textual descriptions fail here. We adopt the Semantic Web ideas propagated by the World Wide Web Consortium (W3C) in our project to address the problem of personal mobile devices understanding local environments. The Semantic Web project focuses on information retrieval using infrastructure networks processed by web agents rather than information retrieval using Personal Area Networks processed by mobile nodes, but there is no restriction of transferring the ideas. W3C’s Semantic Web idea. Today web contents are formatted for human readers rather than machines in terms of software agents. Common techniques for information
264
R. Welge, H. Faasch, and E.C. Bollow
processing are based on keywords. These techniques do a reasonably good job for human web users but are not a feasible solution for users exploring the local environment, e.g. a room. Search results are of high quantity as a result of low precision depending highly on the used vocabulary – the main problem of today’s WWW. The method of presenting result lists of web pages can be characterized as key word location finding rather than information retrieval. With the Semantic Web idea the W3C introduces the next generation web. The meaning of a web site’s content plays more a role than content management solutions – the main challenge of the current web generation. Next generation web is characterized by knowledge retrieval and processing based on formal languages describing resources, called objects. Knowledge should be organized as concepts to be retrieved and processed unambiguously by software agents. Keyword based search algorithms identifying only words should be replaced by semantic interpretation of formal descriptions. Meta-data, meaning “data describing data” play a key role. Metadata describes the affiliation of data contents to formal characteristics introducing semantic aspects. The Semantic Web community introduces the term ontology. The term originating from philosophy has been defined by T.R.Gruber: “An ontology is an explicit and formal specification of a conceptualization”. It provides for a shared understanding of a domain. E.g. an ontology prevent two applications from using one term with different meanings in one semantic context. Results will be precise navigation through the Internet and search engines with high precision information retrieval. The evolution of internet technologies dealing with knowledge management is a continuous process in terms of layers of a growing protocol suite. The following protocols have been standardized at the moment: XML is the language to develop structured data contents with a user defined vocabulary. XML does not define a way to express semantic. XML is suitable for data exchange at document level. Using XML Schema structure of XML documents can be restricted. • Resource Description Framework (RDF) may be considered as resource description data model. RDF is defined in terms of a XML-based syntax. It enables the expression of statements describing application specific objects (resources) and the relations between them [9]. • The description language RDF Schema (RDFS) offers language components for hierarchical organization of objects. It introduces classes and subclasses, properties and sub-properties, ranges, domains, restrictions and last but not least relationships. It can be used as a simple language for writing ontologies representing knowledge. • OWL (Web Ontology Language) will be used to interpret retrieved information. OWL extends RDFS offering a complex vocabulary adding disjointness of classes, cardinality and other useful features for knowledge representation. Furthermore it restricts RDFS to be decidable, thus enabling suitable support for reasoning. It is the current state of W3C’s Web Ontology language. Logic is an essential prerequisite for definition of declarative knowledge depending on the respective application. Logic is represented by a formal language in terms of sentences expressing declarative knowledge.
Ambient Intelligence and Knowledge Processing
265
3.2 SMI-Semantic Method Invocation: A Semantic Ambient Network Approach Using standard tools RDF Schema documents can be created describing the semantics of objects and methods. At run time each networking node embedded into the environment represents an object containing a RDF Schema self description. Using the Discovery Service of L3-NET Middleware1 a mobile device can retrieve the RDF Schema documents from available embedded device’s SMI servers understanding syntax and semantics of exported software methods [10]. This enables the mobile user to find suitable methods for his needs and how to operate those methods to fulfil his needs. This process is called “matching”. After successfully discovering suitable methods from the objects embedded into the environment the methods of remote SMI servers have to be bound to a stub processed by the mobile device. Any function of any embedded SMI server can be bound to the stub forming one temporary mobile client class which is instantiated immediately. Through this mechanism embedded node software methods including device driver services can be accessed using easy to use method calls of Java objects processed by standard IT systems while at the same time hiding all details of the distributed system like message passing based communications and remote execution from the user. This allows developers without an embedded background to develop applications for distributed systems. Development of a corresponding Remote Procedure Call mechanism for non JAVA mobile devices is in progress. We developed a suitable ontology engine for embedded devices for this task. It allows the user to control the whole room using his mobile device in an implicit driven manner. This means after announcing his identity and preferences stored in terms of a RDS Schema document on the mobile device the device automatically finds appropriate methods, freeing the user from having to care about this. Of course the user is able to specialize or generalize this proposal to fully meet his needs. In contrast to common ontological systems our SMI approach has some special requirements and constraints. As we are using embedded processors – NXP’s ARM7based LPC2148 – to support “small, deeply embedded things” it has severe resource restrictions concerning both memory and processor power. Furthermore we have a highly dynamic system, which is uncommon for traditional systems, too. And last but not least we need a reliable system which has to control the extension of its knowledgebase and ensure its correctness. An actual challenge is to enable distributed ontologies. OWL, tested first, does not have suitable mechanisms for this and thus had to be carefully extended in order to allow ontological communication in a consistent manner [11]. Usually, ontological systems tend to interpret contradicting information as an extension of the existing ontology. In our system we have to make sure the ontology is only extended in a wanted and correct manner and thus had to develop a system that treats contradicting information as wrong.
4 Conclusions The methods and procedures of the developed ambient intelligence platform offer a great field of options and opportunities for human centred man-machine 4
L3-Net: Low Power + Low Cost + Low Datarate Network.
266
R. Welge, H. Faasch, and E.C. Bollow
interfaces – especially in the fields of AAL or energy management by establishing ad-hoc-networks with both fixed and mobile nodes. There is no necessity for a centralized knowledge base with all information and parameters stored in one big memory. However we find the system knowledge spread all over all smart nodes. These smart nodes collect data and save data to offer the different services. In working together we inherit emergence for decentralized intelligent behaviour for individual services. The application of energy management gives a good example for such services: The collection of energy consumption data and the status of the surroundings, i. e. rooms and buildings are collected at their origins. Further on we can control the different energy consumers directly and decentralized. This concept differs completely from traditional automated control systems for buildings with big control centres. The intelligent networks take the human wishes into account, make suggestions and give informations. These networks are learning systems with knowledge based methods. The process of learning depends on user behaviours, acts and reacts on user patterns. However, we move towards the Internet of Things from all directions. We simulate by software and we implement hardware using an embedded middleware to put Things to life. The middleware enables us to connect both sides by standard IPNetworking mechanisms. It’s easy to design and implement a node moveable in a software simulation. But this is a completely different task in hardware! It’s quite easy to have a hardware node that is designed to work as a router. But to establish a dynamic ad-hoc routing protocol for highly movable Things is still challenging.
References 1. Kunze, C., Holtmann, C., Schmidt, A., Stork, W.: Kontextsensitive Technologien und Intelligente Sensorik für Ambient-Assisted-Living-Anwendungen. In: AAL Kongress 2008, VDE VERLAG GMBH Berlin Offenbach (2008) ISBN 978-3-8007-3076-6 2. Kamenik, J., Nee, O., Pielot, M., Martens, B., Brucke, M.: IDEAAL – an integrated development environment for AAL. OFFIS e.V., Oldenburg, Germany, VDE Verlag GMBH · Berlin · Offenbach, ISBN 978-3-8007-3076-6 3. Welge, R., Faasch, H., Bollow, E.C.: Ambient Assistet Living - Human Centric Assistance System. In: aps+pc, Workshop Proceedings (2008), ISBN 978-3-935786-49-2 4. ITU: ITU Internet Reports 2005: The Internet of Things, ITU Geneva (November 2005) 5. Locatelli, M.P., Vizari, G.: Awareness in Collaborative Ubiquitous Environments: The Multilayered Multi-Agent Situated System Approach 6. Perkins, C., et al.: Ad hoc On-Demand Distance Vector (AODV) Routing, IETF RFC 3561 7. Poole, D., Mackworth, A., Goebel, R.: Computational Intelligence – A Logical Approach. Oxford Press, Oxford (2006) 8. Coexistence Assessment of Industrial Wireless Protocols in the Nuclear Facility Environment; U.S. Nuclear Regulatory Commission (2007) 9. RDF/XML Syntax Specification (Revised) http://www.w3.org/TR/2004/ REC-rdf-syntax-grammar-20040210/ 10. Welge, R.: Sensor Networking with L3-NET- Characteristics. A SELF-X Middleware based on standard TCP/IP protocols, Embedded World 2006, Nürnberg, WEKA Zeitschriftenverlag (2006), ISBN 3-7723-0143-6 11. OWL Web Ontology Language XML Presentation Syntax, http://www.w3.org/TR/owl-xmlsyntax/
Configuration and Dynamic Adaptation of AAL Environments to Personal Requirements and Medical Conditions Reiner Wichert Fraunhofer Alliance Ambient Assisted Living Fraunhoferstrasse 5, 64283 Darmstadt, Germany
[email protected] Abstract. AAL concepts have been shaping scientific and market-oriented research landscapes for many years now [1]. Population development demands have made residing and receiving care in one’s own home a better alternative than institutionalized inpatient care. This reality has been reflected in open calls for proposals, as well as in numerous European and domestic projects, and has resulted in a considerable number of applications and product concepts with AAL ties. Unfortunately, it is already foreseeable that these project results will not be implemented in a comprehensive fashion, as individual applications and products can only be combined into a comprehensive solution with a great deal of effort and potential cost. Through stereotypical projects and prototypes, as well as concrete usage scenarios, this paper will extrapolate the added value resulting from integrating individual products into coherent comprehensive solutions within the framework of the complete supply and value chain. Business and technological obstacles will be identified and pathways shown by which AAL concepts and visions can lead to a better reality for all of those concerned, from healthcare recipients to those bearing the costs. Keywords: Ambient Assisted Living, User Interfaces, Elderly People, End User Configuration, AAL Platform.
1 Increase in the Elderly Living Alone In 2005, there were 82.4 million people living in Germany. According to prognoses of the German Statistical Office, this number will decrease to between 68.7 and 79.5 million residents by the year 2050. At the same time, the number of 80-plus-year-old residents will rise from 4 million (2005) to approximately 10 to 11.7 million (2050) [2]. With increasing age, the proportion of those living alone also increases. In 2000, 44 percent of private households occupied by 65- to 70-year-old primary residents were single occupancy dwellings. In light of increasing divorce rates and the growing number of single, as well as single parent households, this trend is expected to continue [3]. Age in and of itself is not necessarily an indicator for being in need of care, but with increasing age, a higher percentage of the population grows dependent on assistance, support and care giving. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 267–276, 2009. © Springer-Verlag Berlin Heidelberg 2009
268
R. Wichert
96% of those 70 and older have at least one internal, neurological or orthopedic disease process requiring treatment, whereas 30% have five or more [4]. Despite the paradigm „out-patient before in-patient“, data from care giving statistics reveal a trend toward professional care. Especially in the area of inpatient care, the number in need of care has risen over the last several years: in 2005, 676,000 people were in nursing home care; in 1999, this number was just 573,000. Current developments, such as the conceptualization and implementation of alternative forms of habitation, take the discussion of an „out-patient conversion“ of care giving under consideration. The primarily undesired increase in the number of residents in in-patient facilities highlights the current care giving dilemma, namely, that it is very difficult for those in need of outside care and support to continue to live in their accustomed environments. The realization of technologically supported, AAL-based concepts can contribute to closing these care giving loopholes. 1.1 The Desire for Independent Living in the Golden Years The majority of the elderly want to remain in their accustomed environment, even as the need for outside support and care giving increases. Institutionalized forms of habitation, on the other hand, are experiencing a decreasing level of acceptance according to a representative survey by the Schaderstiftung [5]. Independence and selfdetermination have high social value, also among the elderly population. Within the framework of the Fraunhofer IAO project „Pflege 2020” (Care giving 2020), 500 people aged 55-75 were questioned as to their desires and needs for future care giving. Key topics of the representative survey included desired services, forms of habitation, and technological applications. A few guiding universal themes could be identified as fundamental earmarks of a high quality of life according to those surveyed and can be characterized by the keywords “security”, “participation”, “individuality” and “daily structure”.
Fig. 1. Application areas for AAL and “Personal Health” concepts in the housing industry
Configuration and Dynamic Adaptation of AAL Environments
269
The effects of the demographic transformation pose enormous challenges for the housing industry, as well. One such challenge is that of keeping older residents with limited mobility and diminishing health in existing housing and to avoid vacancies. Suboptimal housing structures from the 1950s/60s often make a senior-appropriate transformation difficult. Technological support for senior living does not have to be limited to the adaptation of the personal living space, however. It is more pertinent to develop comprehensive concepts that link individual living spaces with a residential quarters-oriented infrastructure. Only the linking of ambient technologies with individualized, health-oriented service concepts can meet the needs and desires of elderly residents and become the foundation for new business models for the housing industry. 1.2 State of the Art in AAL and Personal Health Currently, further development of technological solutions is occurring in a number of predominantly European research ventures. At the beginning of 2007, a total of 16 individual AAL projects were launched within the 6th Framework Program by the European Commission with such diverse thematic focal points as, i.e., social integration, support for daily life, security and mobility (EU-IST PERSONA) [6], semantic service-oriented infrastructure (AMIGO) [7], special support for the blind (HAH Hearing at Home), secure »white goods« (EASY LINE+), Entertainment and Health (OLDES), mobile (cellular) support within the home and elsewhere (ENABLE), support for »Daily Life« and Health Care and Communication (NETCARITY), »Health« vital function monitoring, activities, position (CAALYX, EMERGE), automation between »white goods«, entertainment with variable user interface (IN-HOME), scalable, adaptive, customizable add-ons for personal assistance sensor systems (SHARE-IT), monitoring of daily activities and incorporation of biofeedback (SENSACTION-AAL), and many more. 2008 saw the launch of additional projects from the 7th Framework Program exploring these very topics. Through the continuation of pilot projects, such as “SmarterWohnenNRW” (Smarter Living in Northrine Westphalia) or the application project “Sophia”, with complete Internet integration, as well as that of communication and telemedicine, better conditions for the future implementation of AAL solutions in rental housing are being prepared. For many of these projects, the term “personal health” – like the term AAL - has been playing an increasingly important role. Similar to how the “personal computer” established itself as an extension and complement to professional computer technology, “personal health” denotes the accessibility of devices previously used only by medical personnel - but also that of respective information and service options – now available to the private user. “Personal health” also characterizes the direction of a paradigm transformation from traditional healthcare to person-centered, individualized prevention, diagnostics, therapy and care giving. This transformational process is supported by developments in the area of telemonitoring, as well as further developments in personalized medicine, which enable the person-oriented integration of digital patient data (images, vital function monitoring, demographic and anamnestic data, lab findings) through the intensive application of information technology and telematics (eHealth), while incorporating the latest developments in biotechnology, genomics, and pharmacology [8], [9].
270
R. Wichert
The technology applied in “personal health care” encompasses wearable medical devices or systems in particular, conceptualized for the diagnostic and therapyaccompanying application in the home environment. Such a telemonitoring system typically consists of medical sensors and a base station, either worn by or located in the immediate vicinity of the user. This base station captures data delivered by the sensors, prepares the data if necessary and makes it available via a wired or wireless transmission system to the stationary (AAL) infrastructure, or, if required, to a doctor, hospital or telemedical service provider, where further evaluative steps or data storage can occur. The sensor devices placed on or in the body communicate with the base station via a wireless network of limited range (Body Area Network / Personal Area Network). The base station can be either a stationary personal computer system with a fixed network connection or a mobile device (Smartphone, PDA, etc.) with wireless transmission technology (GSM, UMTS, WLAN). In order to develop markets for “personal health” systems and applications – initially based on conditions in the US – an international alliance dubbed “Continua Health Alliance” (www.continuaalliance.org) was founded in 2006 and currently encompasses approximately 170 companies. A prerequisite to the realization of “personal health” is the availability of reasonably priced, stand-alone, user-manageable system components, as well as their crossmanufacturer interoperability in “open” systems. To enable this availability, Continua guidelines will be compiled as “recipes” for the development of interoperable products supported by a comprehensive system of international norms and industry standards. In addition to fitness and wellness, the intended application areas also encompass chronic disease management outside of clinical environments, as well as support for independent senior living, with the goal of living as long as possible in one’s home environment.
2 Individualized Housing Development and Adaptation While the need and market potential for universal AAL applications is clear, there is currently a lack of marketable products that significantly rise above purely isolated applications. Viable, innovative services remain on our wish list, a distant, albeit desirable dream. Some of the isolated applications currently available include home emergency systems, which are constructed as pure alert signallers, sensors for light control, or device-specific user interfaces. These applications could only be linked together with a great deal of effort, with any emergent alterations requiring the involvement and expertise of system specialists, thereby increasing solution costs considerably. In addition, sensors and other hardware components, as well as individual functionalities, have a tendency to require multiple installations with multiple associated costs, as the systems are only offered as complete packages, data exchange formats and protocols are not compatible, and components from one application cannot be used by another application. Likewise, it is impossible to generate higher value functionalities through combining layers of individual functions. In other words, targeted AAL systems are not realizable.
Configuration and Dynamic Adaptation of AAL Environments
271
By contrast, future AAL solutions for the care and support of the elderly must be based on a flexible and expandable platform and be modular and expandable so as to remain adaptable to the individual’s changing needs, lifestyle and health. 2.1 Realizing the Vision through an Integrated Concept This scenario presupposes that it will actually be possible to dynamically adapt existing housing to the requirements of age-appropriate living. It is not enough that new components and devices can independently integrate themselves into the current infrastructure. Tools are also required that allow service providers to optimize available resources (services, sensors, device functions) for these infrastructures. With the targeted configuration option, any combination of existing functionalities can be reused in new applications, ultimately leading to the targeted universality, as well as the associated reduction in costs. The Fraunhofer Alliance AAL, the Fraunhofer frontline theme „Assisted Personal Health“, as well as the Fraunhofer innovation cluster "Personal Health", each with expertise in their specific technological fields and in cooperation with external partners, are all contributors in realizing this vision. It is essential to include the complete chain of players in the healthcare field and to get medical professionals, health insurance companies, health associations and organizations, social and health service providers, healthcare lobbyists, housing industry specialists, psychologists, as well as the respective technology developers, all together at one table, in order to develop new forms of cooperation between all participants. It is equally essential to link healthcare assistance in the sense of AAL with personalized information processing, information transfer and information management according to the „personal health“ paradigm and to further develop these linked components into one comprehensive, universal system. This integrated process chain approach appears to contradict demands for a quicker realization and marketability as they are raised by the respective industries. The participating institutes want to counter this supposed contradiction by stating that whereas exemplary existing prototypes close to production are first being further developed into marketable products, standard interfaces are being simultaneously prepared for a later integration of existing platforms. 2.2 Adaptation for Future Needs and Health Issues The goal is to equip existing housing with ambient technology such that it can be adapted for future needs and health issues as easily as possible. The focus is on people with chronic illnesses, who can be provided care via telemonitoring. This links to flexible services ensure comprehensive care. Telemedical care can take place, for instance, through a medical service centre and emergency care through a nearby urgent care centre. In addition to the recording of vital functions, the detection of accidental falls is a priority. Participation in societal life for individual residences to the outside world is to be supported by telecommunication. Reminder functions simplify the structuring of the day’s activities and simultaneously improve the quality of care and the self-management abilities of the chronically ill.
272
R. Wichert
An accompanying evaluation would prove useful for purposes of ascertaining which technological components should be linked together to optimize the adaptation of the residences to the individual needs of the residents. In a further step, resident requirements, results and experiences in existing housing should be compiled and fundamental infrastructure prerequisites for future accommodations in housing should be formulated for the housing industry. In summary, the following objectives in particular are essential: (1) Support for independent, autonomous living for the resident in later years and with potential health-related limitations, (2) Enabling a life lead with a high level of security and social living quality, (3) Enhancement of the selfmanagement of chronic diseases and increase in compliance through supportive ambient functions, (4) Expansion of the service portfolio for health-related and social service providers, as well as for the housing industry, (5) Development of an intuitive, operable configuration tool for device adaptation and data access and (6) Development of adaptable technical installations for increased flexibility in view of changing living requirements for existing and new housing.
3 Solution: Provision of Flexible and Expandable Platforms The overriding technological objective is the provision of a flexible and expandable platform for the care and support of the elderly in their home environment. New housing complexes need to be fundamentally constructed such that each residence can be individually adapted to the respective residents. Existing housing is to be retrofitted such that it can be dynamically adapted to the requirements of an aging population if at all possible. It would appear reasonable to follow a two-step approach: In phase one, AAL technologies are to be integrated into existing housing. In phase two, the fundamental infrastructure requirements for the future housing industry are to be developed in new housing construction on the basis of an expandable platform. Further, the integration of sensors worn on the body and medical devices is to be enabled and evaluated incorporating a central basis of information. Additional components and devices must be fundamentally capable of autonomous integration into these infrastructures [10], [11]. There are validated project results from EU-IST projects, such as PERSONA, AMIGO or SOPRANO, with a special focus on dynamic distributed infrastructures for the self-organization of devices, sensors and services. These results should be taken under consideration [12]. The PERSONA infrastructure e.g. with its four communication buses aims at the provision of mechanisms that facilitate the independent development of components that nonetheless are able to collaborate in a self-organized way as they come together and build up an ensemble. Therefore, the buses act as brokers that resolve the dependencies at runtime using the registration parameters of their members and semantic match-making algorithms [13]. The open nature of such systems must allow dynamic plugability of components distributed over several physical nodes. It consists of a middleware solution for open distributed systems dealing with seamless connectivity and adequate support for interoperability that makes use of ontological technologies and defines appropriate protocols along with an upper ontology for sharing context [14].
Configuration and Dynamic Adaptation of AAL Environments
273
Fig. 2. Decentralized software infrastructure (PERSONA)
3.1 Configuration Tools – Constraints and Functions On the basis of these infrastructures, tools for service providers can be designed, which enable the optimization and configuration of available resources. The targeted configurations should enable higher value functions resulting from a cooperation of resources, thereby generating an added value that has been unattainable up to this point. It is essential that any needs or relevant situations be automatically recognized, analyzed and associated with the call for corresponding functions. Unfortunately, it is almost never the case that a direct deduction of a reaction is possible on the basis of an event, since (1) situations are not always directly measurable and, therefore, conclusions regarding the situation cannot be based on single events. Rather, it is imperative to draw conclusions based on several events or facts (event aggregation). The situation “resident has fallen”, for instance, can be recognized with a greater degree of certainty, if, in addition to an alert sent by one of the acceleration sensors located in the resident’s cane that the cane has fallen on the ground, the camera-based analysis of the positioning of a human form (“is in the prone position”) is also reported and taken under consideration. It should likewise be taken into account that (2) required functions are not always provided by individual devices and components found in a given environment, but that the desired effect can perhaps only be achieved through the combination of several available functions (composition of services). For instance, a service provider receives an alert about a fall and there is an immediate automated address generated through the resident’s environment, asking whether he is alright. In this scenario, the combination of several functions could prevent false alarms, etc. Unfortunately, associations (links) of situations with the respective functions could change, meaning something that was wanted a particular way up until now could suddenly be interpreted and handled differently (i.e., if a resident’s health situation were to suddenly change).
274
R. Wichert
Configuration tools can also serve to make adaptations to the individual preferences, capabilities and limitations of the resident or for the specific health situation (i.e., with the question as to whether the neighbour, a relative, nursing home personnel, or a combination of the above should be notified of a certain event).. These tools become an essential complement to open systems, which can continue to evolve over longer periods of time. The software infrastructure, for instance, of PERSONA is already in the position to integrate new components ad hoc or to execute aggregate events and services by means of a script. 3.2 Intuitive Interaction Concepts In the (not so distant) future, novel interaction forms will essentially shape everyday life as we know it. Interaction concepts for the control of objects in AAL environments will no longer be centrally realized, as is common, for instance, with the PC. Instead, they will be implemented through networks of (computer) nodes that will interpret user commands and distribute them by way of existing communication infrastructures to the end devices that can best realize the task at hand. Multimodal interaction concepts, such as speech and gesture recognition or computer vision, require computationally intensive algorithms, which can only be executed by stationary computers. Should additional intelligent deductions from existing information be required, the temporarily increased computational effort can still be generated quickly enough through distributed (computer) nodes. Applications of such interaction concepts include speech interfaces, 3-dimensional interactive video interfaces or emotive interfaces for robots [15]. The potential applications of novel interaction forms can be illustrated by the home environment: In contrast to current concepts with central controls, where functionalities are laboriously programmed and the user must also remember which functions are being activated by which keys, interaction in the AAL environment is decoupled from the hardware. The user no longer uses commands to control devices. Rather, he provides goals that are then interpreted and automatically realized. For instance, if the user acoustically provides the goal „brighter“, first, the room in which the user currently finds himself will be ascertained. Then, the system will check which options are available for increasing the brightness in this room: Are there blinds which can be opened? What kind of lamps are available? With all actions, the status of the environment is ascertained, as well, as it makes no sense, for instance, to open the blinds at night. The preferences and other goals of the user are also taken into account. So, for watching television, the system could select indirect lighting, but for a work situation or for reading, direct lighting could be chosen. It is apparent that intelligent environments also require a configuration of the rules, as each user has a preference for his own personal settings and would like to make any modifications himself. In contrast to the approach presented in section 3.2, the users possess less technical know-how in the handling of rules in complex control systems. Conventional menu-based approaches, such as found with mobile telephones, fail due to the sheer number of modification options. Novel interaction forms are, therefore, necessary for configuration of the environment by the residents for acceptance purposes (end-user configuration) and have extensive research potential.
Configuration and Dynamic Adaptation of AAL Environments
275
4 Conclusion The AAL-vision is that one day sensors and systems give seniors a helping hand in their own home by measuring, monitoring and raising alarms if necessary. To reach this goal lot of scientific and market-oriented research have been done in the past years. Unfortunately, people normally have many problems that cannot be solved with a single product. Thus it is already foreseeable that the needed project results will not work together since individual applications and products can only be combined into a comprehensive solution with a great deal of effort. Future AAL applications, however, must be both flexible and expandable, specifically incorporating “personal health” components, in order to be dynamically adaptable to individual demands and respective medical conditions. In contrast, with the current closed system concepts, sensors or functionalities must be potentially installed and paid for multiple times, as the functionality of an application cannot be used by another application. The next step that should be made now would be to get the industry together at a table in order to work out certain standards. Only that way the products would become efficient and also cheaper. Thus if we would not change our strategies in AAL from individual products into coherent comprehensive solutions within the framework of the complete supply and value chain, there will be a huge risk that we have been spend a lot of money in AAL system solutions at the end and AAL would have been only a huge bubble.
References 1. Emiliani, P.L., Stephanidis, C.: Universal access to ambient intelligence environments: Opportunities and challenges for people with disabilities. IBM Systems Journal 44(3), 605–619 (2005) 2. Federal Statistical Office: Population of Germany till 2050. 11th coordinated population forecast. Wiesbaden, p. 43 (2006) 3. Cirkel, M., et al.: Produkte und Dienstleistung für mehr Lebensqualität im Alter – Expertise. Gelsenkirchen, p. 8 (2004) 4. Robert Koch Institut: Themenheft 10: Gesundheit im Alter, Gesundheitsberichterstattung des Bundes. Berlin (2005a) 5. Heinze, R.G., et al.: Neue Wohnung auch im Alter – Folgerungen aus dem demographischen Wandel für Wohnungspolitik und Wohnungswirtschaft. Schader-Stiftung, Darmstadt (1997) 6. Avatangelou, E., Dommarco, R.F., Klein, M., Müller, S., Nielsen, C.F., Soriano, S., Pilar, M., Schmidt, A., Tazari, M.-R., Wichert, R.: Conjoint PERSONA - SOPRANO Workshop. In: Sala Soriano, M.P., Schmidt, A., Tazari, M.-R., Wichert, R. (eds.) Constructing Ambient Intelligence: AmI 2007 Workshops, pp. 448–464. Springer, Heidelberg (2008) 7. Georgantas, N., Ben Mokhtar, S., Bromberg, Y., Issarny, V., Kalaoja, J., Kantarovitch, J., Gerodolle, A., Mevissen, R.: The Amigo Service Architecture for the Open Networked Home Environment. In: 5th Working IEEE/IFIP Conf. on Software Architecture (WICSA 2005), pp. 295–296 (2005) 8. Blobel, B., Norgall, T.: Standard based Information and Communication – The Personal Health Paradigma. HL7-Mitteilungen, Heft 21/2006, pp. 33–40 (2006)
276
R. Wichert
9. Norgall, T., Blobel, B., Pharow, P.: Personal Health – The Future Care Paradigm. In: Medical and Care Compunetics 3. Series Studies in Health Technology and Informatics, vol. 121, pp. 299–306. IOS Press, Amsterdam (2006) 10. Aarts, E., Encarnação, J.L.: Into Ambient Intelligence. In: Aarts, E., Encarnaçao, J. (eds.) True Visions: Tales on the Realization of Ambient Intelligence, ch. 1. Springer, Heidelberg (2005) 11. Wichert, R., Tazari, M.-R., Hellenschmidt, M.: Architektonische Requirements for Ambient Intelligence. IT - Information Technology, 13–20 (January 2008) 12. Hellenschmidt, M., Wichert, R.: Rule-Based Modelling of Intelligent Environment Behaviour. In: Künstliche Intelligenz: KI, vol. 2, pp. 24–29 (2007) 13. Furfari, F., Tazari, M.R.: Realizing ambient assisted living spaces with the PERSONA platform. ERCIM News (74), 47–48 (2008) 14. Fides-Valero, Á., Freddi, M., Furfari, F., Tazari, M.-R.: The PERSONA Framework for Supporting Context-Awareness in Open Distributed Systems. In: Aarts, E., Crowley, J.L., de Ruyter, B., Gerhäuser, H., Pflaum, A., Schmidt, J., Wichert, R. (eds.) AmI 2008. LNCS, vol. 5355, pp. 91–108. Springer, Heidelberg (2008) 15. Adam, S., Mukasa, K.S., Breiner, K., Trapp, M.: An Apartment-based Metaphor for Intuitive Interaction with Ambient Assisted Living Applications. In: Proceedings of HCI 2008, Liverpool, May 1-9 (2008)
Designing Universally Accessible Networking Services for a Mobile Personal Assistant Ioannis Basdekis1, Panagiotis Karampelas2, Voula Doulgeraki1, and Constantine Stephanidis1,3 1
Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Greece 2 Hellenic American University, Athens, Greece 3 Computer Science Department, University of Crete, Greece {johnbas,vdoulger,cs}@ics.forth.gr,
[email protected] Abstract. At present, a tendency towards smaller computer sizes and at the same time increasingly inaccessible web content can be noted. Despite the worldwide recognized importance of Web accessibility, the lack of accessibility of web services has an increasingly negative impact on all users. In order to address this issue, W3C has released a recommendation on Mobile Web Best Practices, supplementary to the Web Content Accessibility Guidelines. This paper presents the design and prototype development of universally accessible networking services that fully comply with those standards. Validation and expert accessibility evaluation on the XHTML Basic prototypes present 100% compliance. The followed design process is presented in details, outlining general as well as specific issues and related solutions that may be of interest to other designers. The results will be further verified through user tests on implemented services. Keywords: Web accessibility, mobile accessibility, user interface design, device independence, prototyping.
1 Introduction Since its creation, the mission of the World Wide Web Consortium (W3C) has been to lead the Web to its full potential. The first goal that specifies this mission1 is Web for Everyone (previously Universal Access) while the second is Web on Everything (previously Interoperability). Ten years ago web users had limited access to software, let alone Web services (eServices) that were designed specifically for desktop computers, as there was no alternative way of accessing the Internet. In parallel, assistive technology solutions were scarce, expensive to purchase, limited to specific age or disability categories, and in most cases incompatible with other hardware and software applications [1]. At present, a tendency towards smaller computer sizes and at the same time increasingly inaccessible web content can be noted. Users have more freedom to choose 1
W3C goals: http://www.w3.org/Consortium/mission
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 279–288, 2009. © Springer-Verlag Berlin Heidelberg 2009
280
I. Basdekis et al.
their preferred hardware-software combination for communication and work through a Web browser (i.e., desktop browser, speech browser, speech synthesizer, Braille display, mobile browser, car browser, etc). Therefore, there is increased demand for web material (i.e., content, digital services) interchangeable and accessible at any time and place. For example, a substantial growth can be observed in mobile Web usage and demand for mobile Web services (mServices). Recent studies indicate that the 27% of European and the 28% of US mobile subscribers who currently do not use mobile data services intend to start using them in the next two years [2]. Following this trend, new and existing eServices are being (re)designed in order to be accessed through mobile devices as well as traditional PCs, and serve the demand for 24/7 web access. However, as studies indicate, web material which is designed basically on visual concepts is largely inaccessible to people with disability [3, 4], raising as a consequence barriers to all mobile device users as well [5]. Therefore, and despite the worldwide recognized importance of eAccessibility, the lack of accessibility of eServices has an increasingly negative impact on all users, and especially those for whom Web access may be one of the main paths to address communication needs and support independent living. In addition to problems occurring because of inaccessible content, handheld mobile devices (such as PDA’s, smart-phones, mobile phones, Blackberries, Notebook PCs, ultra-mobile PCs, and others) can present usability problems as well. The use of a pointing device, touch screen or tiny buttons for input, and a small screen for output, is unsuitable for many users, so these options are not really helpful especially to those who are blind or unable to use a stylus. Additionally, installed browsers on mobile devices may vary in the way they interpret web pages without fully complying with markup standards of W3C (e.g., XHTML Basic, cHTML, CSS and others). Due to platform and hardware differentiations between mobile devices (e.g., sound generation), available assistive technology products are targeted mainly to some well-known device types or major operating systems rather than providing a global solution that works everywhere. Furthermore, mobile operating systems provide minimal or no built-in accessibility support. Inevitably, the rising mobile environment introduces hard constraints to interaction design as the technical characteristics which need to be addressed are much more complicated with respect to accessibility barriers on desktop solutions. As a consequence of the above, the development of fully accessible and interoperable eServices introduces new challenges to the accessibility provisions that have to be adopted from the early design stages [6]. As in the case of eServices, the accessibility limitations of the mobile Web Services (mServices) can also be addressed with the use of assistive technology products. To this effect, the design process of mServices is even more demanding, since the considerations mentioned previously have to be addressed; nevertheless, mobile accessibility is still feasible. This paper presents the design and prototype development of fully accessible web services, available through mobile devices as well as traditional desktop PCs equipped with assistive technology. The aim of the work presented is to identify the main challenges and propose experience-based practical design guidelines that web developers may follow in order to comply with W3C de facto standards for mobile accessibility.
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
281
2 Related Work As with existing standards and guidelines for web accessibility and usability, many design guidelines for mServices exist since the late 90s’ [7, 19]. Nevertheless, mobile web content providers are still not paying specific attention to accessibility, and they are unaware of the benefits of providing accessible solutions. Moreover, currently specialized implementation platforms do not help Web developers in integrating accessibility in Web services. Accessibility of mServices is not supported in existing development suites. In order to address this issue, the W3C’s Mobile Web Initiative (MWI) released in July of 2008 the Mobile Web Best Practices (MWBP) version 1.02, supplementary to Web Content Accessibility Guidelines (WCAG) versions 1.03 and 2.04. The aforementioned document sets out an additional series of recommendations designed to improve the user experience of the Web on mobile devices, without exceptions. Since the delivery of accessible and interoperable eServices should also address legal issues and satisfy the constraints raised from user requirements and devices’ technical specifications, the whole design process signifies an exponential design solution space which makes the compliance with W3C standards such as WCAG and MWBP essential (Figure 1).
Fig. 1. Rely on Web standards and guidelines for delivering Web content to mobile devices
Functionality targeted to desktop access is often transferred in the design process of mobile services, without considering any special adaptation. On the other hand, providing “text-only” versions of existing websites is a technique largely discredited by people with disability. As a result, it makes little sense developing separate mobile sites for disabled users. After all, content and services delivered through the web are the same, no matter how many different versions may occur as a result of possible adaptations, customisations or different versions to be used for a variety of devices. MWBP, although not a W3C recommendation, presents practical solutions that help deliver a full web experience on mobile devices rather than offering a separate-butequal treatment. It seems that the philosophy of those practices contradicts other service-oriented standards for mobile usage under development, such as for example the global standard of the International Air Transport Association (IATA) for global 2
W3C- MWI, Mobile Web Best Practices 1.0: http://www.w3.org/TR/mobile-bp/ W3C-WAI, Web Content Accessibility Guidelines 1.0: http://www.w3.org/TR/WCAG10/ 4 W3C-WAI, Web Content Accessibility Guidelines 2.0: http://www.w3.org/TR/WCAG20/ 3
282
I. Basdekis et al.
mobile phone check-in using two-dimensional (2D) bar5. For example, it is difficult to imagine displaying a 2D bar code image to a passenger’s Braille mobile phone. Schilit et al. [8] discuss various techniques can be followed to fit desktop content into a small display. Accordingly, the following strategies, ordered by resources needed, can be followed to ensure that an existing eService can be used on a PDA or other browser-equipped mobile device: 1. Keep the same eService (as the desktop design) and perhaps make use of scaling techniques or specific web browsing systems that reduce the size of the working area. The latest fit-to-screen features that are being incorporated in some web browsers allow automatic web page size adjustments (e.g., Mobile Opera6, Internet Explorer Mobile7, Handweb8 , Plamscape, and others9). Although such a solution can be handy to experienced users, those with visual disabilities will suffer from reduced readability and face scrolling problems, not to mention that the on-the-fly scaling cannot reorganize in an optimal way designs targeted to bigger displays. 2. Apply automated re-authoring techniques that involve removing all presentation information (i.e., Cascading style sheets, images) and produce raw HTML, or even utilize alternative presentation information (i.e., Cascading style sheets for handheld) by keeping the same markup. Such an automatic process, which is similar to proxy transcoding, may produce user friendly versions for mobile experience in a cost effective way. Examples of such tools and services are Power Browser [9], Mobile Google10, AvantGo11, and Skweezer.net. These solutions cannot work effectively though for eServices with broken markup beyond repair (i.e., the web page contains invalid HTML), since the result will look differently in different browsers, and in most cases tend to render well only in basic html markup. In addition, markup resources size is not reduced, so utilization through a mobile device may result in awkward behavior (e.g., scrolling) and increased costs due to mobile transfer fees. As traditional web services are usually developed with desktop computers in mind, their conventional web pages will not be adequately displayed on mobile devices. 3. Perform adaptations in content and/or in interface elements appropriate for enhancing the mobile experience. This process can include transcoding markup to be compatible with device formats, altering or rearranging the structure and the navigation, and introducing a new content structure. This method can be further classified according to the resulting transformed pages provided to the users, e.g., single column, fisheye visualization [10; 11], and overview-detail [12]. Examples of systems delivering such experience are Opera SSR, Fishnet [13], the Document 5
IATA Resolution 792: Bar Coded Boarding Pass (BCBP), version 2: http://www.iata.org/NR/rdonlyres/2BD57802-6D96-4D9A-8501-5349C807C854/0/ BarCoded BoardingPassStandardIATAPSCResolution792.pdf 6 Opera Software: http://www.opera.com/mobile/ 7 Microsoft: http://www.microsoft.com/windowsmobile/en-us/downloads/microsoft/internetexplorer-mobile.mspx 8 Smartcode Software: http://www.palmblvd.com/software/pc/HandWeb-1999-02-19-palm-pc. html 9 Wikipedia has a more comprehensive listing: http://en.wikipedia.org/wiki/Microbrowser 10 http://www.google.com/mobile/ 11 http://www.avantgo.com/frontdoor/index.html
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
283
Segmentation and Presentation System (DSPS) [14] and the Stanford Power Browser [15]. However, these solutions cannot be easily generalized. 4. Design and create new mServices from the beginning and constantly evaluate the outcomes against design standards. This process is complex to address for both web designers and developers, as it requires substantial effort, planning, deep knowledge of recent standards and well trained personnel. Although it is possible to reuse some of the principles and practical solutions delivered in the desktop version, design and implementation of these solutions implies the creation of new mobile web templates which is a time consuming procedure. The result of such process provides, in theory, the best experience for mobile users. Nevertheless, maintaining a specific mobile site which does not “look like its big brother” is inconsistent with Device Independence principles. When dealing with new web services, the optimal solution is obviously to provide universal accessibility at an early stage during the design phase (e.g., by means of evaluation and redesign on early mock-ups and design prototypes against accessibility standards, because accessibility is more expensive if introduced later in the design phase [16].
3 Design Process for Embedding Accessibility in Mobile Services It is argued that web accessibility can be achieved only if accessibility standards are applied from day one of the design. In the case of mobile Web services, the designer should comply with even more strict constraints than for desktop solutions, since the screen size of the mobile device or the interaction style may be totally different from the desktop environment. To this purpose, design and usability guidelines for mobile design can contribute significantly towards ensuring that the final outcome addresses functional limitations such as visual disabilities, hearing impairments, motor disabilities, speech disabilities and some types of cognitive disabilities. From a usability point of view, applicable principles can be derived from guidelines improving mobile web usability [13]. For example, excellent usability experiments demonstrate that the most effective navigation hierarchy for use with mobile devices is one with only four to eight items on each level [17]. The provision of a universally accessible web service, with mechanisms12 consistent among all devices in use [20], implies producing the intersection13 of all relevant standards and guidelines, design according to this larger set of rules, perform tests and at the end re-evaluate and re-visit the designs. In this recurrent process, user feedback is also critical, because it whittles away the design space and so eliminates possible alternatives. Once the design space has been documented, the resulting designs need to be encapsulated into reusable and extensible design components. The above process has been followed in the context of the Greek nationally funded project “Universally Accessible eServices for Disabled People". The aim of the project is to promote the equal participation of people with disability in e-government services, by the implementation of an accessible portal. 12 13
WCAG, Guideline 13: Provide clear navigation mechanisms. Set Theory: intersection of the sets A and B, is the set whose members are members of both A and B.
284
I. Basdekis et al.
Fig. 2. Design templates for mServices: the main (navigation) page (left) and the first page for email services (right)
Fig. 3. Home page of amea.net (main options translated in English) displayed on a HTC-TYTN II (left) and a Fujitsu Siemens Pocket Loox N500 screen capture (right)
The portal will offer personalized and informative accessible Web services, available through mobile devices as well as traditional desktop PCs equipped with assistive technology. To this purpose and in addition to adhering to aforementioned accessibility standards and generic design principles, the iteration processes involving experts in the field of accessibility as well as end users yielded specific design guidelines. With the stabilization of these guidelines, detailed design mock-ups for all the services were elaborated (Figure 2). Based on the design mockups, markup templates (XHTML Basic 1.1, CSS 1.0) have been implemented to serve as a compass for the implementation team. These templates have been exhaustively tested against
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
285
aforementioned guidelines and full compliance has been achieved (Figure 3). Refinement based on the actual usage of the mServices is expected in the future and to this purpose user tests have been scheduled.
4 Design Experience The practical experience acquired during the design process outlined in Section 3 in the context of the project “Universally Accessible eServices for Disabled People" resulted into the consolidation of the following set of guidelines: 1. Use of standards • Comply with WCAG 1.0 levels AAA (including subjective 14.1 whenever possible), with the use of valid XHTML. Tools that may be useful are the Bobby software of the Center for Applied Technology14, the W3C’s Markup Validation Service15, the Colour Contrast Analyser16, and the WAVE Toolbar17. • Comply 100% with MWBP 1.0, consult relationship documents18 and make use of valid XHTML Basic 1.1. Available validation tools include W3C’s mobileOK Checker19 and TAW mobileOK Basic Checker20. • Perform manual checks (e.g., rendering without style sheets, test the accuracy of alternative text descriptions, etc). 2. • • • • • • • 3. • • • •
14
General Use only server side actions. Do not use javascript at all. Avoid scrolling, unless user chooses to enlarge fonts beyond a threshold. To this purpose split the task into a number of sub-tasks. Provide single task dialogues (e.g., write a topic then save it). Group available options in a single screen. Correlate each service with specific color. Reuse faint version as content’s background color. Use lightweight icons (GIF: size less than 500K), consistent with desktop version for main option categories Navigation Stick to George Miller’s Golden rule (7±2). Use the card sort metaphor [18]. Always provide screen orientation (Hide/Unhide path). After reading – announcing page title, provide high priority/visibility “Return” (back) action.
Bobby: no longer supported. Markup Validation Service: http://validator.w3.org/ 16 Colour Contrast Analyser: https://addons.mozilla.org/en-US/firefox/addon/7313 17 WAVE Toolbar: https://addons.mozilla.org/en-US/firefox/addon/6720 18 W3C, Relationship between Mobile Web Best Practices (MWBP) and Web Content Accessibility Guidelines (WCAG): http://www.w3.org/TR/mwbp-wcag/ 19 W3C mobileOK Checker: http://validator.w3.org/mobile/ 20 TAW mobileOK Basic Checker: http://validadores.tawdis.net/mobileok/en/ 15
286
I. Basdekis et al.
• Use of icons defined in stylesheets to avoid double announcements of alternative descriptions. • Avoid relying on color alone, but use the color coding in a consistent manner to help users correlate colors with services (learning disabilities). Comply with the “color opponent process”. • Use graphic icons only for orientation. 4. Data Form Completion • Provide error messages at the beginning of the (refreshed) form with links to errors. • Provide one-click login for unregistered users. • Auto fill default information. • Provide simple search as well as advanced search options such as history. Table 1 provides a summary of the service-specific guidelines emerged: Table 1. Examples of additional service-specific guidelines for the design and implementation of mServices Service E-mail News
Message board Chat Contacts
Blogs
User defined shortcuts Site map
Guideline Place the most important task first Provide each time just one free-text area on each screen Display the picture list after the content of the article with alternative descriptions Use article pagination to increase readability if necessary Flatten message-responses hierarchy for simplicity Place attachments and responses at the end of the message Provide access to the list of participants first Refresh the content on user demand Use contacts filtering based on letters Use an index where the letters will be visible only when there are contacts Use multiple pages (cards) with the contact details Focus on the current topic All replies/comments displayed should be associated with the current topic Use archiving mechanism for past topics Place that option high in the menu Allow the user to define the shortcuts up to a task level Use a list of the main tasks of the eservices with explanatory description
5 Discussion/Future Work This paper proposes the adoption of specific guidelines in the context of designing and developing networking mServices mainly targeted to people with disability. By following strict accessibility standards from the beginning of the design process, it is possible to deliver mServices that fully comply with even harder restrictions than for eServices, without compromising functionality. The presented design guidelines emerged as one of the results of an iterative design process involving web accessibility experts as well as users with disability. A conclusion stemming from this
Designing Universally Accessible Networking Services for a Mobile Personal Assistant
287
experience is that the provision of universally accessible web services in a mobile context requires more intensive efforts with respect to traditional web accessibility. This is mainly due to the fact that practical guidelines have to be derived from both MWBP and WCAG in the context of the specific services being developed. Overall, it is claimed that this experience contribute towards improving the production of costeffective and qualitative accessible and interoperable Web material by designers with no previous knowledge of accessibility guidelines. Initial tests proves that is possible to develop mServices that fully comply with W3C’s accessibility guidelines, however more user tests and heuristic evaluations are require to further validate this process. In the context of the project “Universally Accessible eServices for Disabled People”, user-based tests will follow, targeted to the refinement of the mServices. Users’ tests are are necessary for the fine tuning of the final outcome, based on a specific PDA device equipped with a mobile screen reader. To this purpose, HTC-TYTN II and Mobile Speak Pocket have been selected among candidates. Acknowledgments. This research has been conducted within the Greek nationally funded project “Universally Accessible eServices for Disabled People”. The authors would like to thank the Panhellenic Assiociation of the Blind (www.pst.gr), acting as the Project contractor, for their support. The project is funded by the Greek Government under the 3rd Community Support Framework and the accessible and interoperable web services will be available at http://www.ameanet.gr
References 1. Blair, M.E.: U.S. education policy and assistive technology: Administrative implementation. Invited paper for the Korea Institute of Special Education (KISE) in preparation for the KISE International Symposium (2006) 2. Nielsen Group, Survey of over 50,000 consumers reveals mobile operators’ issues and opportunities (2008), http://www.tellabs.com/news/2009/index.cfm/nr/53.cfm 3. Cabinet Office: eAccessibility of public sector services in the European Union (2005), http://www.cabinetoffice.gov.uk/e-government/eaccessibility 4. Nomensa: United Nations global audit of web accessibility (2006), http://www.un.org/esa/socdev/enable/documents/ fnomensarep.pdf 5. W3C-WAI, Shared Web Experiences: Barriers Common to Mobile Device Users and People with Disabilities, http://www.w3.org/WAI/mobile/experiences 6. Basdekis, I., Alexandraki, C., Mourouzis, A., Stephanidis, C.: Incorporating Accessibility in Web-Based Work Environments: Two Alternative Approaches and Issues Involved. In: Proceedings of the 11th International Conference on Human-Computer Interaction (HCI International 2005), Las Vegas, Nevada, USA, July 22-27 (2005) 7. Jones, M., Marsden, G., Mohd-Nasir, N., Boone, K., Buchanan, G.: Improving Web interaction on small displays. Computer Networks: The International Journal of Computer and Telecommunications Networking 31(11-16), 1129–1137 (1999) 8. Schilit, B.N., Trevor, J., Hilbert, D.M., Koh, T.K.: Web interaction using very small Internet devices. Comput. 35(10), 37–45 (2002)
288
I. Basdekis et al.
9. Buyukkokten, O., Molina, H.G., Paepcke, A., Winograd, T.: Power Browser: Efficient Web Browsing for PDAs. In: Proc. Conf. Human Factors in Computing Systems (CHI 2000), pp. 430–437. ACM Press, New York (2000) 10. George, F.: Generalized Fisheye Views. Human Factors in computing systems. In: CHI 1986 conference proceedings, pp. 16–23. ACM, New York (1986) 11. Gutwin, C., Fedak, C.: Interacting with big interfaces on small screens: a comparison of fisheye, zoom, and panning techniques. In: Proceedings of Graphics Interface 2004, London, Ontario, Canada, May 17-19, 2004, pp. 145–152 (2004) 12. Xiao, X., Luo, Q., Hong, D., Fu, H., Xie, X., Ma, W.-Y.: Browsing on small displays by transforming Web pages into hierarchically structured subpages. TWEB 3(1), 4 (2009) 13. Buchanan, G., Farrant, S., Jones, M., Thimbleby, H., Marsden, G., Pazzani, M.: Improving mobile internet usability. In: Proceedings of the 10th international conference on World Wide Web, Hong Kong, May 01-05, 2001, pp. 673–680 (2001) 14. Hoi, K.K., Lee, D.L., Xu, J.: Document Visualization on Small Displays, pp. 262–278 (2003) 15. Buyukkokten, O., Molina, H.G., Paepcke, A., Winograd, T.: Power browser: Efficient web browsing for pdas. In: Proceedings of the Conference on Human Factors in Computing Systems CHI 2000 (2000) 16. Clark, J.: Building accessible websites. New Riders (2003) 17. Geven, A., Sefelin, R., Tscheligi, M.: Depth and breadth away from the desktop: the optimal information hierarchy for mobile use. In: Mobile HCI 2006, pp. 157–164 (2006) 18. Card sorting: a definitive guide by Donna Spencer and Todd Warfel on 2004/04/07, http://www.boxesandarrows.com/view/ card_sorting_a_definitive_guide 19. Karampelas, P., Akoumianakis, D., Stephanidis, C.: User interface design for PDAs: Lessons and experience with the WARD-IN-HAND prototype. In: Proceedings of the 7th ERCIM Workshop, User Interfaces for All, Paris (Chantilly), France, October 24-25, pp. 474–485 20. Karampelas, P., Basdekis, I., Stephanidis, C.: Web user interface design strategy: Designing for device independence. In: Proceedings of 13th International Conference on HumanComputer Interaction (HCI International 2009), San Diego, California USA, July 19-24 (2009)
Activity Recognition for Everyday Life on Mobile Phones Gerald Bieber, Jörg Voskamp, and Bodo Urban Fraunhofer-Institut fuer Graphische Datenverarbeitung, Rostock, Germany {gerald.bieber,joerg.voskamp,bodo.urban}@igd-r.fraunhofer.de
Abstract. Mobile applications for activity monitoring are regarded as a high potential field for efficient improvement of health care solutions. The measurement of physical activity within every-day conditions should be as easy as using an automatic weighing machine. Up to now physical activity monitoring required special sensor devices and are not suitable for an every day usage. Movement pattern recognition based on acceleration data enables the usage of standard mobile phones for measurement of physical activity. Now, just by carrying a standard phone in a pocket, the device provides information about the type, intensity and duration of the performed activity. Within the project DiaTrace, we developed the method and algorithm to detect activities like walking, jumping, running, cycling or car driving. Based on activity measurement, this application also calculates the consumed calories over the day, shares activity progress with friends or family and might deliver details about different kinds of transportation during a business trip. The DiaTrace application can easily used today by standard phones which are already equipped with the required sensors. Keywords: Physical Activity Monitoring, Sensor Location, Mobile Assistance, Acceleration Sensor, Pattern Recognition, feature extraction, DiaTrace.
1 Motivation Mobile applications for activity monitoring are regarded as a high potential field for efficient improvement of health care solutions. The measurement of physical activity within everyday conditions should be as easy as using an automatic weighing machine. The determination of physical activities in everyday life suffers on suitable sensors and algorithms. Distributed multisensory systems provide a high accuracy for the recognition rate but they are very unhandy and inconvenient and can not be used in daily life. Single sensor systems achieve a sufficient recognition rate only in laboratory scenarios [8] and using a fixed sensor location and orientation which is in general at the hip, wrist or upper arm. The requirements on wearing position or hardware of these recognition systems do not support the real life scenario. The concern of everyday usage is not to have an additional sensing device but an integration of this functionality into a standard device such as a mobile phone, which should be easy to handle and accurate detect the every day activities. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 289–296, 2009. © Springer-Verlag Berlin Heidelberg 2009
290
G. Bieber, J. Voskamp, and B. Urban
2 Related Work In 2001, Richard W. DeVaul from the MIT started the scientific research of physical activity recognition by acceleration sensors. These research projects were carried out within the framework of the project MIThril (the name Mithril comes from a book by Tolkiens) with the aim to expedite context-aware wearable computing for daily life. From this research group important work is derived furthermore from e.g. S. Intille, Pentland et. al. who increased working on the comprehensive Context Awareness / Ambient Intelligence. The project MIThril was not continued in 2003 any further [5]. Furthermore the Finnish research institute VTT could speed up the research within the scope of a nationwide oriented project Palantir (the name Palantir likewise comes from a book by Tolkiens) together with the Finnish partners Nokia, Suunto, Clothing + and Tekes. The project [8] was discontinued in 2006. VTT itself continues currently the research in project Ramose, which is related to motion tracking. The research activities of Intel Research in cooperation with the University of Washington are focused on an activity logging system, which is called iMote. This platform provides support for the vision of Ambient Intelligence. The approaches of detection and classification of physical basic activities (e.g. walking, running etc.) can also be used for research on the quality of the execution of movements. Hereby the progress of e.g. Multiple Sclerosis can be examined, like Sylvia Lawry Centre MS Research Munich, Germany is doing. At present, in the field of algorithm development are especially active as follows: University of Technology Darmstadt, Georgia Tech (Group Abowd), Lancaster University (Group Gellersen), ETHZ (Group Mattern), Univ. Linz (Group Ferscha), University of Kagawa (Group Tarumi), VTT Finland and likewise University of Rostock (Group Kirste), Germany. The current work shows that physical activity recognition just by one high performance acceleration sensor is possible in laboratory environments. The challenge of research is the development of a suitable method of preprocessing and the identification of relevant features for activity recognition in the every day life.
3 Mobile Phone as Sensor Device In this paper we describe a novel concept of using a mobile phone without any additional devices for physical activity recognition. This enables a permanent, nonobtrusive activity monitoring for everyday usage. The latest generation of mobile phones is using acceleration sensors for orientation detection while taking pictures of landscape or portrait objectives. The acceleration sensor, also often called g- or tilt sensor, are becoming popular also as a new input interface for games. Hereby the steering wheel or squash racquet will be simulated by moving the mobile phone. Some manufactures are using the sensor for new interaction like "shake control" by Sony Ericsson to control the sound player. The quality criteria for acceleration sensors can be summarized in measurement range, sampling rate, sampling stability, quantization and noise. The acceleration sensor of mobile phones was designed for other purposes than activity recognition and so the sensor performance is quite low (e.g. 20Hz sampling, 3bit/g quantization). Usually, the sensor requirements for activity recognition are much higher than the
Activity Recognition for Everyday Life on Mobile Phones
291
acceleration sensor of mobile phones are able to provide. Sampling rates of over 100 Hz are usually used; a lower rate is regarded as unpractical because of sensor noise. The hardware of mobile phones requirements lead to the need of a better preprocessing and a suitable feature extraction. 3.1 Wearing Position Current motion detection systems require a predefined location of the acceleration sensor. Various phones (e.g. Sony Ericsson 560) already provide very simple pedometer functionality. Hereby it is mandatory to fix the phone at the belt. This works for training or sport sessions but in general it is not very suitable for users to have predefined wearing position for their phone. Specific wearing position does not meet the wearing behavior of users in everyday life. In [4], a survey of about 1549 participants from 11 cities in 4 continents provide characteristics of how mobile phones are carried whilst users are out and about in public spaces, typical phone locations are described as follows: y y y y y y
Trouser pockets, Shoulder bags, Belt enhancements, Backpacks, Upper body pockets and Purses.
To offer common pedometer functionality for everybody, easy and uncomplicated to use, it is very important to detect physical activity in every case of different wearing location. Acceleration Sensing for physical activity need basic requirements on the sensor signal. In our laboratory we could estimate the acceleration forces at the simulation of sport equipment as follows: − − − − − − −
Bowling (hand) Jogging (hip) Basketball (hand) Jumping (hip) Playing, romp (hip) Tennis, Golf (hand) Boxing without partner (hand)
~ 4g ~ 5g ~ 6g ~ 7-9g ~ 11g > 16g > 16g
The acceleration of the body for the first steps of a run is about 0.4 g, a rollercoaster has up to 4g, a human survives a permanent acceleration of 10g and a tennis ball has up to 1000g during the start phase [10]. 3.2 Sampling Rate Muscles of the human body are controlled by information which is transferred by nerves. The response time of humans depends on the kind of the signal (acoustic signals cause a longer response time than optical). In addition, the temperature of the muscles, psychological and physical constitution as well as external parameters such
292
G. Bieber, J. Voskamp, and B. Urban
as drugs, alcohol, nicotine, medicines influences the respond time. The average optical response time of a human is approx. 220msec [1]. The trill in the music for piano plays is indicated in the literature [7] as maximally 10 cycles per second and for stringed instruments as 13 cycles per second. A reflex however is a direct reaction without a procession in the brain which occurs within approx. t=0.06 seconds, which corresponds to t-1 = 16 Hertz. Because of Shannon theorem, a double sampling rate is necessary. The sampling rate should be a minimum of 32 cycles per second. Likewise to this view of sampling rate, researchers of similarly orientated projects [3] using similar frequency. This sampling rate is relevant for body movements. Artificial movements, e.g. engine vibration while driving a car provide additional frequency bands which are not covered. However, it is to be assumed that the selected sampling rate of 32 Hz is sufficient. 3.3 Relevant Activity Types A mobile device which is carried by the user for the entire day might be influenced by the user's physical activity. The every day usage of the mobile device requires the consideration of the relevant user activities (activity types). The every day behavior of young people and children consists [6] of only a few activities types. Hereby the most performed activity are lying (ca. 9 hours), sitting (ca. 9 hours), staying (ca. 5 hours) and being active (ca. 1 hour)[2]. For the determination of the energy consumption, some activity types can be summarized such as sitting, staying or lying to "resting". The locomotion is typically performed by walking, jogging, cycling or car driving and should be represented each in a separate activity type. Fuzzy activities such as cleaning, gardening or household are classified as being active. For an every day usage, this lead to the activity-list as follows: • • • • • • •
Device not present Resting (sleeping, sitting) Walking Running / Jogging Bicycle riding Car driving Being active (gardening, cleaning etc.)
This list can be extended and is not limited but the given list allows an estimation of the daily calorie consumption by the usage of the individual metabolic equivalent. DiaTrace is supporting to detect each of the given activity types plus jumping. 3.4 Mobile Phones Requirements Mobile devices with Java J2ME development environment such as Sony Ericsson w910i or w760i provide a sensor api (JSR-256) for an easy access to the acceleration sensor. The integrated sensors of the devices provide a sampling rate of 20 Hz which is lower than a requested 32 Hz frequency. In addition, the samples are not constant in time. The following figure illustrates the sampling distribution and shows the strong abnormality.
Activity Recognition for Everyday Life on Mobile Phones
293
Fig. 1. Varying sampling rate of acceleration values
During normal phone usages (e.g. calling), the sampling rate varies even more, up to some 1/10sec. The device provides the acceleration data with the exact time-stamp. These strong constraints leads to the concept of a basic reconstruction of the input data. 3.5 Preprocessing and Data Conditioning The very strong variability of the sampling rate of mobile phones requires a preprocessing and data conditioning of the acceleration values. Very acceleration value is delivered with an exact time-stamp. This enables the use of a data conditioning within the preprocessing module. We designed a preprocessing module which eliminates the effect of low sampling rate and varying scanning. DiaTrace uses a reconstruction of the true course of acceleration by interpolation of the scanned acceleration value of each axis. This preprocessing compensates the varying sampling rate as well as the rough quantization. This leads to a new input signal for the pattern recognition. By using relevant features, a long term assessment of daily activities is possible by DiaTrace.
4 Sample Application DiaTrace is a mobile application which provides assistive functionalities. DiaTrace measures the every day activities and reminds of additional activity if necessary, otherwise it congratulates the user. In a cooperative scenario – like long term support the comfortable activity monitoring throughout everyday enables a new kind of social connectedness because group members can see what users are doing during the day.
294
G. Bieber, J. Voskamp, and B. Urban
Fig. 2. Phone with integrated sensor showing actual activity
Fig. 4. Activity top ten of the buddies
Fig. 3. Activity recognition by a mobile phone over an entire day
Fig. 5. Electronic medals
The physical activity of a person can be shared to other friends. The mobile device with integrated acceleration sensor is able to send the activity level automatically to other buddies and so DiaTrace can be connected to a community platform. The mobile phone ranks the activity level and displays a top ten list with the current activity type. Other motivation instruments to support the performance of more physical activity are the achievement of electronic medals. In addition, the activities can be transferred to a personal web space. Here the activities are analyzed by intensities and daily energy consumption is calculated. The medical relevance of DiaTrace for overweight children is currently evaluated in a medical study. Hereby the eating is monitored by functionality of taking photos of the food with the mobile phone. The application showed that physical activity monitoring by a standard mobile phone is possible. The evaluation showed that a recognition rate of the type of physical activities is higher than 95% by wearing the phone in the front pocket of a trouser. The correctness is lower at other wearing locations, some activity types (e.g. cycling)
Activity Recognition for Everyday Life on Mobile Phones
295
is false detected (e.g. as car driving) when the phone will be carried in a jacket or bag. The good recognition is possible by the preprocessing of data and suitable feature selection.
5 Conclusions In this paper, we present the DiaTrace project which allows the identification of physical activity in everyday life on a standard mobile phone. A three-dimensional acceleration sensor, which is already integrated in standard phones, can be used to determine physical activity by domain specific feature extraction. By use of data mining techniques and a preprocessing of acceleration data, a suitable feature can be recognized which describe high quality and robust classification of physical activity. The proof-of concept prototype receives a recognition rate of >95% by the activity types of resting, walking, running, cycling and car driving, just by wearing the device in the front pocket of a trouser. The activity level can be shared to friends or buddies and might be helpful to appraise the sporting activity. The application can be used for monitoring the daily calorie consumption by inclusion of the metabolic equivalent for each activity type. This technique enables to support medical applications. We envision the setup of a physical activity database for a homogeneous appraisal of results of activity recognition. Furthermore we are working on a combination of physical activity monitoring with emotion sensing devices like EREC [9], which would allow for an even better personalized, sensitive assistance.
References 1. Biermann, H., Weißmantel, H.: Benutzerfreundliches und Seniorengerechtes Design (SENSI Regelkatalog), VDE-Fortschrittsberichte, Reihe 1 Konstruktionstechnik Nr.247 (2003), ISBN -318-324701-1 2. Bös, K., Worth, A., Opper, E., Oberger, J., Romahn, N., Wagner, M., Woll, A.: MotorikModul: Motorische Leistungsfähigkeit und körperlich-sportliche Aktivität von Kindern und Jugendlichen in Deutschland (i.V.). Forschungsendbericht zum Motorik-Modul, KIGGS (2007) 3. Bouten, C.V.C., Koekkoek, K.T.M., Verduin, M., Kodde, R., Janssen, J.D.: A Triaxial Accelerometer and Portable Data Processing Unit for the Assessment of Daily Physical Activity. IEEE Transactions On Biomedical Engineering 44(3), 136–147 (1997) 4. Chipchase, J., Yanqing, C., Ichikawa, F.: Where’s The Phone? Selected Data, survey, NOKIA (2007) 5. DeVaul, R., Sung, M., Gips, J., Pentland, A.: MIThril: Applications and Architecture. In: Proc. 7th IEEE International Symposium on Wearable Computers, White Planes, NY, USA, October 21-23 (2003), http://www.media.mit.edu/wearables/mithril/ 6. Gesundheitssurvey, Mensink: Körperliche Aktivität, Gesundheitswesen 61, Sonderheft 2, Robert Koch-Institut, p. 126, Berlin (1999) 7. Lange, H.: Allgemeine Musiklehre und Musikalische Ornamentik. Franz Steiner Verlag (2001) ISBN-13: 978-3515056786
296
G. Bieber, J. Voskamp, and B. Urban
8. Pärkkä, J., Ermes, M., Korpipaä, P., Mäntyjärvi, J., Peltola, J., Korhonen, I.: Activity Classification Using Realistic Data From Wearable Sensors. IEEE Transaction on Information Technology in Biomedicine 10(1) (2006) 9. Peter, C., Ebert, E., Beikirch, H.: A Wearable Multi-sensor System for Mobile Acquisition of Emotion-Related Physiological Data. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 691–698. Springer, Heidelberg (2005) 10. Wikipedia (2009), http://de.wikipedia.org/wiki/Beschleunigung (last access: February 23, 2009)
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems Pascal Bruegger and Béat Hirsbrunner Pervasive and Intelligence Research Group Department of Informatics - University of Fribourg - Switzerland {pascal.bruegger,beat.hirsbrunner}@unifr.ch
Abstract. We present in this paper a semantic model for the conception of pervasive computing systems based on object or user's motions. We describe a system made of moving entities, observers and views. More specifically, we focus on the tracking of implicit interaction between entities and their environment. We integrate the user’s motion as primary input modality as well as the contexts in which the interaction takes place. We have combined the user activities with contexts to create situations. We illustrate this new concept of motionawareness with examples of applications built on this model. Keywords: Pervasive computing, Ubiquitous computing, Motion-awareness, Kinetic User Interface, HCI.
1 Introduction In this paper, we explore a new human-computer interaction (HCI) paradigm for pervasive computing systems where location-awareness and motion tracking are considered as first input modality. We call it Kinetic User Interface (KUI) [1]. Nowadays many projects such as EasyLiving [2] or GUIDE [3] have developed Ubicomp1 technologies like mobile devices or applications and have enhanced human experience for instance by providing contextualised services mainly according to user’s location. However, most of current context-aware systems are limited to external parameters and do not take into account user-centric dimensions. In our model, we consider user’s activity as a way to reflect its goal and intention. The paper formalizes KUI as a system composed of entities and observers. Kinetic objects (entities), possibly living things, interacting naturally with their environment are observed by agents (observers) which analyse their activities and contexts. The model focuses on implicit interaction and unobtrusive interface of motion-aware computing systems. The challenge consists in modelling the physical "world" in which entities live and interact into a conceptual system representing this world in a simple and flexible manner. We propose a generic model and a programming framework that can be used to develop motion-aware and situation-aware applications. In section 2, we define our model of system made of on entities, observers and views. In section 3, we present the concept of motion-awareness. In section 4, we 1
Ubiquitous Computing equivalent to what we have called in this paper pervasive computing.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 297–306, 2009. © Springer-Verlag Berlin Heidelberg 2009
298
P. Bruegger and B. Hirsbrunner
describe the implementation of this model and in section 5, we present the application domain with the description of three KUI enabled projects based on our model.
2 KUI Model: A Systemic Approach The approach we have chosen for our semantic model is based on the General System Theory (GST) and the research of three authors [4],[5],[6] who propose interesting visions of systems. For Alain Bouvier ([6], p.18), a system (complex organised unit) is a set of elements in dynamic interaction, organised to reach a certain goal and differentiated within its environment. It has an identity and represents a "finalised whole". General System Theory (GST) defined by von Bertalanffy [4] describes systems in sciences such as biology, chemistry, physics, psychology. GST gives the framework and the concepts to model specific systems studied in sciences. There exist different types of systems such as inert or dead, living or evolutionary, open (exchanging matter with its environment) or close. For instance, we can see the world as a whole extremely complex living but close system. For physicians, this perception of the world is not correct and this vision is reductive. The world (our planet) is one component of the solar system and it is part of the equilibrium of this system. Boulding in [5] writes: an "individual" - atom, molecule, animal, man, crystal - (entity) interact with its environment in almost all disciplines. Each of these individuals exhibits "behaviour", action or change and this behaviour is considered to be related in some way to the environment of the individual, that is, with other individuals with which it comes into contact or into some relationship. The important points in Boulding’s definition is that: • The entity’s actions (activities, behaviour) are related to their environment; • Entities come into relationship. In KUI, systems are open and dynamic (living). Their complexity evolves over time with respect to their components. Components can join and leave systems, increasing or reducing their size. We have included two concepts which are not present in the chosen author’s definitions: • The observer: who/what is observing the system; • The view: observer ‘s point of view. We define a system as a set of observable, interacting and interdependent objects, physical or virtual, forming an integrated whole. The system includes different types of objects: entities, observers, and views. 2.1 Entities Entities are the observable elements of the system. They can be physical or virtual, living things (humans, animals), moving objects (cars, planes), places (rooms, buildings). An entity is made of contexts and do activities (Fig. 1).
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
do
Made of
Contexts
299
Entity
Activities
Fig. 1. Basic structure of an entity
Contexts. As defined by A. Dey et al. [8], a context is any information that can be used to characterise the situation of an entity. In our model, the contexts are used to define the attributes of an entity. Contexts do not include the activity. The activity is influenced by the environment and therefore by the contexts in which it is made. We will see later that contexts provide relevant information to the observer in the situation analysis. We use the following contexts in our model: Identity, Location, Role, Status, Structure, Relations. Identity and location. The identity is the name of the entity and must be unique in order to be differentiated within the system [6]. The location is the address where the entity can be observed. The location is made of an address and a time. The dynamic behaviour of an entity makes the address possibly dynamic and time must be taken into consideration. Role. We have defined two roles of entity in our model: 1) actor and 2) place. They indicate to the observer what it should focus on. For instance, motions and activities are the focus when the entity is an actor. Roles are dynamics and the entity, according to the point of view of the observer, is sometimes an actor and sometimes a place. For example, a cruise boat can be observed as an actor when we consider its kinetic properties (cruising around the globe) and it is considered as a place when we focus on its structure (passenger, cabins, decks). Status. It provides entity kinetic information to the observer. An entity has two possible status: mobile (motion capabilities) or static (fixed). Structure. The structure of an entity can be simple or complex. A simple entity does not contain other entities. It is called an atom (e.g. a walking person). At the opposite an entity with a complex structure is said composed (e.g. a house). A composed entity contains other entities and must have at least one atom. The structure of an entity is dynamic and evolves over time. For example, a house is made of rooms and contains inhabitants. Each time a person changes room, the structure is modified. Relations. They determine the type of interaction the entity has with its environment. Relations provide information about the state of entities and contribute to evaluate the situation. We consider two types of relations between entities: spatio-temporal relations and interactional relations. A spatio-temporal relation defines the "physical" connection between entities. When an actor is nearby a place or an other actor at the same time, a temporary relation exists. Our model of spatio-temporal relation is inspired from the spatial relationship used in GIS [10]. We differentiate 4 types of spatio-temporal relations:
300
1. 2. 3. 4.
P. Bruegger and B. Hirsbrunner
Proximity (Next To) Containment (Inside). Contiguity (Juxtaposed) Coincidence (overlap)
The relations 1,2 are created between an actor and an other actor or a place and the relations 3,4 concern places. We call interactional any relation between actors needed to carry out complex activities. These relations are parameters used to determine the feasibility of an activity. We have identified 3 types of relations: 1. Collaborative 2. Causal 3. Conditional. The collaborative relations are set to carry out activities that cannot be achieved by one actor. For instance, the activity of fixing a vertical pillar on the street requires the intervention of a crane driver who lifts up the pillar and maintains it stable while a worker is bolting the pillar on the plinth. We see in this case that the completion of the main activity is possible only if at least two specialised activities are combined. A causal relation is present when the activity A causes the activity B. For instance, if a car (entity A) moves it implies that the driver (entity B) moves as well. Causal relation is useful to check the "validity" of the detected activity. Conditional relations are created when activities have to be made in a given order. Like in causal relation, it allows to check the validity of an activity. Activity B can be done only if activity A is done before. Activities in Places. Activities are controlled within a place. Places have rules that determine the authorised activities, the forbidden ones and the negotiated ones. We introduce the concept of white and black activity lists. White-listed activities are the authorised activities that can be carried out with no reaction from the observer. They are accepted as such. Black-listed activities are, in contrary, forbidden and provoke an immediate reaction from the observer. We also take into consideration what we call the "grey" list. If an activity is not explicitly declared in the white or black lists then it is "negotiable" and gives the freedom to evaluate it and to infer on the situation. Activity lists (black and white) allow the observer to quickly react when something is going on in a place. The observer simply checks if the ongoing activity is explicitly present in one of the lists. 2.2 Observers and Views In the previous section, we have detailed the system by its entities. The second part of the system consists of the entities observation. Observers are the agents which observe the moving entities populating a system. They collect and analyse information (activities and contexts) of actors and places and possibly react on dangerous or inappropriate situations in which actors could be. Observers are specific and analyse one or a small number of situations. Our vision is to have more observers but less programming complexity per observer.
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
301
To illustrate this concept, we take the example of UN2 observers placed at the border between two countries during a cease fire. Their role is to watch, to observe movements of troupes of both countries (the situation) and to react or notify when they detect violations of rules. These rules are established in advance and must be respected by the actors on the field. For instance soldiers must not cross the no-man’s land. An UN observer analyses the actor’s activities and contexts (location, time) and reports any detected incident or violation to the higher level, his hierarchical superior. None intrusive behaviour of observers. Weiser in [9] has introduced the concept of calm technology. In his concept, the user is more and more surrounded by computing devices and sensors. It becomes a necessity to limit the direct interaction with the computing systems in order to avoid an unneeded cognitive load and let the user concentrate on its main activity. Our concept of observer is inspired from the Weiser’s idea. There is no interference with actors and places: the observer reports only problematic situations to the higher level and let the application decide what to do. Views. The entities are observed under certain points of view. Observers can select different points of view to analyse the same situation. Each point of view represents a focus on the situation. Many observers can use similar views for different situations analysis. A view is a multi-dimensional filter placed between an observer and the entities. It allows or constraints the observer to focus on a certain part of the system. The focus goes from the root structure, to one atom. We have 2 dimensions in our model of view: range and level. The range is a parameter that influences the scope of the observation (e.g. the ocean or only a cruise boat) and the level is the parameter which gives the granularity of the observation (e.g. decks or decks and cabins or passengers). For instance, a photographer uses different types of lenses according to the level of observation. The wide angle lens gives a large view of the landscape (the range) but looses details like bees on flowers. Now if the focus is a bee on a flower then a macro lens is needed. The level changes. Actually, the photographer cannot have at the same time the level of a bee and a wide landscape. This limitation range/level is solved in our model.
3 A Motion-Based Model for Situation Awareness In this section, we start to define how the kinetic information of the different entities are processed and how from a simple motion a situation is derived. Context-aware system often consider the user’s external parameters such as location, time, social information and activity to characterise a situation. In our model, we bring a new point of view in situation characterisation by separating the activity from contexts. Indeed, we consider that user’s activity should be interpreted in their contexts in order to fully understand their situation. Like the figure 2 shows, the motion-aware model is divided in 2 levels. At the entity level, we have the activities and contexts. It includes the motion detection. Situations are analysed at the observer level and are high-level semantic information. 2
United Nations.
302
P. Bruegger and B. Hirsbrunner
Fig. 2. Activity and contexts are component of a situation
Our situation-aware model is inspired from the Activity Theory presented by B. Nardi and K. Kuutti in [11],[12], Y. Li and J. Landay in [13] as well as the Situation Theory of J. Barwise et al. [14] and the work of S. Loke [15]. Situations. In [13], Y. Li and J. Landay propose a new interaction paradigm for Ubicomp based on activity (activity-based ubiquitous computing). In their model, the relation between activity and situation is defined as follow: An activity evolves every time it is carried out in a particular situation. A situation is a set of actions or tasks performed under certain circumstances. Circumstances are what we call contexts. According to Loke, the notion of context is linked to the notion of situation [15]. He proposes the aggregation of contexts (perhaps varieties of) in order to determine the situation of entities. In that sense the situation is thought as being at an higher level than context. Loke makes a difference between activity and situation and considers an activity as a type of contextual information to characterise a situation. Our model of situation combines the two visions (contexts and activities, Fig 3b) and we define it as follows: A situation is any activity performed in contexts. Context-Awareness. The user is often unaware of the surrounding computing systems and do not feel the interaction with it. As mentioned by A. Dix [16], the main challenge of the pervasive computing is the Where computers are, the context-aware computing challenges are What it means to interact with computers. Context-aware applications uses contextual information to automatically do the right thing at the right time for the user [15]. Dey et al. [8] defines contexts as “Any information that can be used to characterize the situation of entities (i.e. whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves […]”. They consider that the most important contexts are the location, identity, activity and time. This definition brings a new fundamental dimension in our model: the activity. Activity. In [15], activity typically refers to actions or operations (Fig. 3b.) undertaken by human beings such as “cooking”, “running”, “reading”. For Yang Li and James Landay [13], activity like "running" is considered as an action focused on attaining an immediate goal. They consider, like Kuutti in [12], that an activity is the long-term transformation process of an object (e.g. a user’s body) oriented toward a motive (e.g. keeping fit). The notions of "long term" and "immediate" allow the separation of activities and actions. This raises some questions not answered in this paper: what do we consider as long term and immediate? When an action becomes an activity and vice-et-versa? In our model, we consider that an activity is made of detected motions aggregated into operations and actions and is an input for observers.
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
303
4 KUI Development Framework: uMove v2 uMove v2 is the 2nd JAVA™ based implementation of the KUI concepts [1],[17]. It offers to programmers a standard platform to develop KUI enabled applications. The framework is separated into three layers in order to have a clear separation of concerns (Fig. 3a).
Sensor layer
Entity layer Observation layer
Motion-aware application
Obs1
Obs2
View1
Application
Obs3
Situation
View2 Activity
e11 e111
e12
e112 e1121
Contexts
Conscious
e-Space
e1
Activity mng
Observer level
Action
Contexts mng
Entity level Operation
e1122
Unconscious Motion Widgets1
Widgets2
Widgets3
sensor1
sensor2
sensor3
s1
s2
s3
s4
Sensor level
Fig. 3. a) uMove architecture, b) motion-aware model
The sensor layer contains all the widgets representing the logical abstraction of the sensors connected to the system. Then we have the entity layer in which we find the logical representation of the physical users or objects being observed. The activity manager aggregates the motion events into activities and makes them available to the observer. The contexts manager gets the sensor information, updates the entities and sends the information to the observation layer which analyses the current situation of the entity. Observers send events to the application according to the detected situations. This model allows the programmer to concentrate on the specific needs of the application without worrying about the communication between sensors (widgets), user or objects (entities) and their management (creation, remove, modification). The activity classes must be specifically developed and can be combined to enable complex motion pattern recognition. Observer and view classes, like activity classes, are developed for specific situations analysis.
5 Application Domain Our KUI model focuses on applications which integrate motions in a large sense as main input modality. In the ubiGlide project [17], uMove allows the application to track hang-glider or paraglide motions. Based on their contexts the application infers and informs the pilot (Fig. 4) about potentially dangerous situations like flying over a no-fly zone, a national park or nearby a storm. In ubiShop project [18], the application is in charge to control the inventory of the fridge in a family house or a shared flat and
304
P. Bruegger and B. Hirsbrunner
to request one or more house/flat inhabitant to get missing items (milk, juice, eggs) according to their current location and activity. For instance the father quietly returning by foot from work and passing nearby a grocery shop will be informed by the system that items are needed. However the system does not react if it detects that the father is running or walking fast; it looks for somebody else. In Smart Heating System (SHS) project [19], the application must adapt the room temperature according to the profile of users in the room and their current activity. The application regulates the thermostatic valves of radiators keeping a comfortable temperature in the room and avoids the waste of energy.
Fig. 4. ubiGlide - pilot’s graphical user interface of FlyNav application
These projects validate uMove in three classes of applications. ubiGlide proposes a model of application for outdoor activity tracking such as trekking, mountain biking, sailing. These applications can prevent accidents by observing the user’s behaviour in specific environments and by informing him/her about potential dangers. ubiShop is a validation scenario for applications that need to track motions and locations of entities in urban environments and distribute tasks in an opportunistic manner like for courier Table 1. Overview of the three projects developed with uMove Projects: Entities Activities Contexts Sensors Architecture Environment
ubiGlide[17] Flying objects, zones, mobile zones Flying Flying objects, zones, mobile zones, location, speed GPS Distributed Outdoor
ubiShop[18] People, zones, shops
SHS[19] People, rooms
Running, walking, standing Time, location, speed
Quiet, active, slepping Location, time
GPS, RFID
RFID, accelerometer Centralised Indoor
Centralised, web based Indoor, outdoor
Kinetic User Interface: Interaction through Motion for Pervasive Computing Systems
305
or taxi services. Finally, SHS validates uMove in indoor environments and can be a model for applications providing service information on mobile devices or controlling the environment within a building according to user’s location and activities. Table 1 shows the components of our model used or not in each of the three projects. Situation analysis is not yet implemented in these projects.
6 Conclusion This paper has presented a new human-computer interaction paradigm where location-awareness and motion tracking are considered as first input modality. We call it Kinetic User Interface (KUI). We have presented the semantic model of KUI based on a systemic approach. uMove programming framework implementing KUI has been described and three projects using the KUI concepts and uMove have been presented. We believe that KUI concept and implementation can offer a good tool for developers to rapidly prototype applications that integrate motions of users and mobile objects as main implicit input modality. Based on this new semantic model, uMove v2 has been finalised and as future work the three projects presented in section 4 will be upgraded to the new version. This will include the concepts of observers and views. Two other important challenges are planned in a near future: activity and situation modelling and implementation. In Smart Heating System, only 3 types of activities are taken into consideration and we will propose standard interaction patterns that can be used by developers in specific applications and extended types of activity recognition. We will provide guidelines to help programmers to properly define their system including the entities, observers and views before using uMove to implement the motion-aware application, activities tracking and situation analysis. User studies must be also conducted in order to verify the concept of unobtrusive interfaces in particular in user’s activities that request a high level of attention like flying, driving or manipulating dangerous equipments. Acknowledgments. We would like to thank Denis Lalanne and Daniel Ostojic for the feedback and advises on this paper. This work is supported by the Swiss National Found for Scientific Research Grant n. 116355.
References 1. Pallotta, V., Bruegger, P., Hirsbrunner, B.: Kinetic user interfaces: Physical embodied interaction with mobile pervasive computing systems. In: Kouadri-Mostefaoui, S., Maamar, Z., Giaglis, G. (eds.) Advances in Ubiquitous Computing: Future Paradigms and Directions, ch. 7. IGI Publishing (2008) 2. Brumitt, B., Meyers, B., Krumm, J., Kern, A., Shafer, S.: EasyLiving: Technologies for Intelligent Environments. In: Thomas, P., Gellersen, H.-W. (eds.) HUC 2000. LNCS, vol. 1927, pp. 12–29. Springer, Heidelberg (2000) 3. Cheverst, K., Davies, N., Mitchell, K., Friday, E.A.: Developing a Context-aware Electronic Tourist Guide: Some Issues and Experience. In: Proceedings of CHI 2000, Netherlands (2000)
306
P. Bruegger and B. Hirsbrunner
4. von Bertalanffy, L.: General System Theory. Foundations, Development, applications. George Braziller (1969) 5. Boulding, K.: General systems theory. The Skeleton of Science 2(3), 197–208 (1956) 6. Bouvier, A.: Management et projet. Hachette, Paris (1994) 7. Vallgårda. A: A framework of place as a tool for designing location-based applications. Excercept of Master Thesis (2006) 8. Dey, A., Abowd, E.D., Salber, G.D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human Computer Interaction Journal 16, 97–166 (2001) 9. Weiser, M., Brown, J.S.: The coming age of calm technology, http://www.cs.ucsb.edu/ebelding/courses/284/w04/papers/ calm.pdf 10. Calkins, H.W.: Entity-relationship modelling of spatial data for geographic information systems, http://www.geo.unizh.ch/oai/spatialdb/ergis.pdf 11. Nardi, B.A.: Context and Consciousness, vol. 1. MIT Press, Cambridge (1995) 12. Kuutti, K.: Activity Theory as a Potential Framework for Human-Computer Interaction Research. MIT Press, Cambridge (1996) 13. Li, Y., Landay, J.A.: Activity-based prototyping of ubicomp applications for long-lived, everyday human activities. In: CHI 2008: Proceeding of SIGCHI conference on Human factors in computing systems, New York, NY, USA, pp. 1303–1312 (2008) 14. Barwise, J., Gawron, J.M., Plotkin, G., Tutiya, S.: Situation Theory and its Applications. In: Center for the study of language and information - Stanford, vol. 2 (1991) 15. Loke, S.W.: Representing and reasoning with situations for context-aware pervasive computing: a logic programming perspective. The Knowledge Engineering Review, 213–233 (2004) 16. Dix, A., Finlay, J., Abowd, G.D., Beale, R.: Human-Computer Interaction. In: Pearson, 3rd edn. Prentice Hall, Englewood Cliffs (2004) 17. Bruegger, P., Pallotta, V., Hirsbrunner, B.: UbiGlide: a motion-aware personal flight assistant. In: Adjunct Proceedings of the 9th International Conference on Ubiquitous Computing, UBICOMP, Innsbruck, Austria, pp. 155–158 (2007) 18. della Bruna, D.: Ubiweb & Ubishop. Master project, Supervisors V. Pallotta, P. Bruegger, university of Fribourg – CH (2007) 19. Pallotta, V., Bruegger, P., Hirsbrunner, B.: Smart Heating Systems: optimizing heating systems by kinetic-awareness. In: 3rd IEEE International Conference on Digital Information Management (ICDIM 2008), pp. 887–892 (2008) ISBN: 978-1-4244-2917-2
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation Vlado Glavinic1, Sandi Ljubic2, and Mihael Kukec3 1
Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia
[email protected] 2 Faculty of Engineering, University of Rijeka, Vukovarska 58, HR-51000 Rijeka, Croatia
[email protected] 3 College of Applied Sciences, Jurja Krizanica 33, HR-42000 Varazdin, Croatia
[email protected] Abstract. Many ubiquitous computing systems and applications, including mobile learning ones, can make use of personalization procedures in order to support and improve universal usability. In our previous work, we have created a GUI menu model for mobile device applications, where personalization capabilities are primarily derived from the use of adaptable and adaptive techniques. In this paper we analyze from a theoretical point of view the efficiency of the two adaptation approaches and related algorithms. A task simulation framework has been developed for comparison of static and automatically adapted menus in the mobile application environment. Algorithm functionality is evaluated according to adaptivity effects provided in various menu configurations and within several classes of randomly generated navigation tasks. Simulation results thus obtained support the usage of adaptivity, which provides a valuable improvement in navigation efficiency within menu-based mobile interfaces. Keywords: personalization, adaptation, algorithmics, m-devices, m-learning.
1 Introduction Mobile learning (m-Learning), the intersection of online learning and mobile computing, promises the access to applications supporting learning anywhere and anytime, implementing the concepts of universal access [10]. Personal mobile devices and wearable gadgets are presently becoming increasingly accessible and pervasive, while their improved capabilities make them ideal clients for the implementation of many various mobile applications [8], among which m-Learning represents one of the most important and attractive ones. However, the acceptance of new m-Learning systems is highly dependent on usability challenges, the most important of them being technology variety, gaps in user knowledge and user diversity [9]. The potentiality for including the widest possible C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 307–316, 2009. © Springer-Verlag Berlin Heidelberg 2009
308
V. Glavinic, S. Ljubic, and M. Kukec
parts of the population in the interactive mobile learning process implies particular emphasis on the user interface design and the quality of interaction [5]. These HCI issues become even more considerable within present day mobile device applications (MDAs), which have a firm tendency for increased complexity, sophisticated interfaces and enriched graphics. Hence, the development process for such MDAs must involve personalization procedures that are essential for tailoring them to individual users' needs and interaction skills. As the general framework of mobile interaction heavily bases on two interaction styles – menu-based and direct manipulation, we have focused our interest on personalization of MDAs through a transformable and moveable menu component with adaptable and adaptive features, introduced in [6] – see Fig. 1.
Fig. 1. Adaptation algorithm usage in the general menu personalization process
In this paper we analyze from a theoretical point of view the respective menu navigation efficiency, in the case where automatic interaction personalization is provided by the usage of two different adaptation algorithms.
2 Transformable Menu Component In general, menus represent a core control structure of complex software systems, and therefore provide an interesting object for personalization research [2], especially in the mobile devices' environment. Here the focus is primarily on the speed of interaction between user and menu-based MDA, since this is considered to be one of the main factors in producing a truly usable system. Minimization of the interaction burden for the user, what is an important aspect of speed [1], can be accomplished both by avoiding a multi-screen menu hierarchy and by reducing the number of keystrokes. For that reason, our menu component has the usual well-known form, with size and shape adequately reduced according to mobile device display limitations (Fig. 2). Because of the user diversity and high probability that different users will use different navigation patterns, even when working on very similar tasks, adaptation algorithms can generate various menu configurations [6]. A particular configuration thus personalized can be retrieved (at MDA startup) from and stored (at MDA shutdown) to the Record Management System (RMS) of the local device, or to a remote server by
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
309
Fig. 2. Menu componet running on different device emulators
the respective Servlet application. RMS represents both an implementation and API for persistent storage on Java ME devices. It provides associated applications (of the Java MIDlets class) the ability to access a non-volatile place for storing the object states [7]. Since the RMS implementation is platform-dependent, it makes a good sense to guarantee redundancy by additionally storing menu configurations to the remote server and subsequently retrieving them from the server. A personalized menu must furthermore provide an easy access to all of the existing menu functions, including adaptable ones. For that reason, our menu component is thoroughly modeled using state diagrams (cf. [11]) where all available state transitions are initiated through exactly one keystroke on the mobile device input, thus providing a platform for optimal interaction efficiency (Fig. 3). 2 { Servlet 1
*
{ Store @servlet }
*
retreive }
{ Show bar }
STARTING_STATE
{ RMS retreive }
{ Embedded retreive }
#
3 UP/DOWN LEFT/RIGHT
{ cancel }
* 5 { custom
*
mark }
HEADER_CUSTOMIZATION
# UP/DOWN LEFT/RIGHT
FIRE
{ Toggle txt/icon }
0 # { Toggle 0
HEADER_NAVIGATION
FIRE
NO_DRAW
5 { Store @RMS }
{ Toggle txt/icon }
{ substitution }
UP DOWN LEFT RIGHT
MOVEABLE
FIRE
{ response, adapt }
txt/icon }
UP/DOWN
0 ITEM_CUSTOMIZATION
FIRE UP DOWN LEFT RIGHT
# { Toggle txt/icon }
*
{ custom mark }
LEFT/RIGHT
# { Toggle FIRE
ITEM_NAVIGATION
*
0
txt/icon }
{ substitution }
{ cancel }
UP/DOWN
Fig. 3. Menu state diagram. The black node represents the spot where automatic adaptation is performed.
Regarding the implemented adaptable (i.e. user-controlled) options (see Fig. 1), the user is provided with the ability (i) to easily control the visibility mode, (ii) to adjust menu orientation and respective docking position, (iii) to toggle the menu appearance
310
V. Glavinic, S. Ljubic, and M. Kukec
both between character-oriented and iconic-based styles and (iv) to manually customize both the menu header and item positions within its hierarchical scheme. On the other hand, the use of adaptive techniques means that MDA user interface changes will be partially controlled by the system itself, providing usability enhancement through increased interaction speed while getting m-Learning tasks done.
3 Adaptation Algorithms First of all, it should be noted that automatic adaptation is based on algorithms both monitoring a user's prior navigation behavior and rearranging menu item positions within a particular popup/pulldown menu frame. In the following two adaptation approaches are compared with the original (static) menu configuration. While a frequency-based (FB) algorithm simply changes item positions according to their selection frequencies, a frequency-and-recency-based (FRB) algorithm refines the same idea by additionally promoting the most recently selected item [3]. The difference in related adaptation effects is visualized in Fig. 4.
Adaptive menu configuration ( current state)
Original menu configuration
Header X: Name
Header X: Name
Item X-1
Item X-4
(11)
Item X-2
Item X-2
(7)
Item X-1
(5)
Item X-5
(5)
TIME (menu navigation, items selection )
Item X-3 Item X-4
Item X-1: 5 selections Item X-2: 7 selections Item X-3: 0 selections Item X-4: 11 selections Item X-5: 5 selections Item X-6 (hidden subset): 2 selections
Item X-5
Current selection: Item X-5
Frequency based adaptation
(2) Item frequency list (sorted)
Frequency -and-recency based adaptation
Header X: Name
TMFU
Item X-6
Header X: Name
Item X-4
(11)
Item X-2
(7)
Item X-5
(6)
Item X-1
(5)
Item X-6
(2)
Item frequency list (sorted)
TMFU TMRU
Item X-4
(11)
Item X-5
(6)
Item X-2
(7)
Item X-1
(5)
Item X-6
(2)
Item frequency list (sorted, with TMRU exception)
Fig. 4. The difference between frequency-based and frequency-and-recency-based adaptation: while the former promotes the most frequently used (TMFU) item only, the latter additionally promotes the most recently used (TMRU) one
As shown in the figure above, after a certain period of time and a related menu navigation pattern, the initial item positions are rearranged according to a sorted item frequency list. Denoting this as a current state of the adapted menu configuration, two
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
311
outcomes are possible upon Item X-5 selection. If FB adaptation is used, new repositioning is expected, based on the updated item frequency list. However, if automatic adaptation is ensured by the FRB algorithm, the currently selected item (Item X-5) will be replaced to the TMRU position, updating the frequency list and reordering its items. Using recency criteria, every menu item has a "fair chance" to quickly appear and be retained in the promoted part of the item set, regardless of its current frequency value. The core of the automatic adaptation algorithm can be specified through the following pseudocode, with the framed part referring to the case when the recency condition is active (can be omitted if the FB approach is used): if (keyPressed=FIRE_BUTTON) then Update_Frequency_List(selected_item, item_freq++); if NOT position(selected_item, TMFU) then Update_ItemPositions(itemSet, freqList, noRecency); if NOT position(selected_item, TMRU) then Move(selected_item, TMRU_position); Update_ItemPositions(itemSet, freqList, recency); end if end if Application_Response(selected_header, selected_item); end if
The abovementioned algorithm's usage is inspired by the work carried out in [2], where a similar approach was applied on adaptive split menus. In that particular research, the idea of using frequency and recency characteristics emerged from experiences gained working with the Microsoft Office 2000 suite with dynamic menus, which adapt to an individual user's behavior. Whilst the related work is based in the desktop application environment (with the mouse as exclusive input device), we are dealing with an MDA setting and the corresponding mobile device navigation keypad. We believe that efficiency enhancement in mobile interfaces navigation, provided by automatic adaptation, exceeds the debatable benefits reached in desktop menu navigation.
4 Task Simulation Framework As there is still a lack of evaluation studies capable of distinguishing adaptivity from general usability [4], we have developed a task simulation framework able to compare static and adaptive menu configurations and their respective navigation options. Since the time required for the completion of menu navigation tasks directly depends both on the time to locate a target menu header in a root menu bar, as well as on the time to select an item from a single popup/pulldown menu frame, it is quite straightforward to specify the navigation performance level by determining the exact number of keystrokes needed for task fulfillment, within the input set of four navigation keys and a fire button (Fig. 5).
312
V. Glavinic, S. Ljubic, and M. Kukec
Fig. 5. If the general navigation keypad is used, it is a simple task to calculate the "keypad distance" from the starting position to destination. If Header 3 is considered as the current menu position, selecting Item 5-4 would require 7 keypad strokes: 2 RIGHT arrows, 4 DOWN arrows, and the FIRE button.
Various static and adapted menu configurations can be compared, based on the aforesaid calculation method and several simulation parameters which are introduced and thoroughly explained in Table 1. Table 1. Parameters (and structures) used in the task simulation framework Configuration
Menu configuration
Task configuration
Parameter / Structure
Type
Characteristic
Headers
User-defined
Items_MIN
User-defined
Number of first-order menu options (number of menu headers) Minimal number of items within each menu frame
Items_MAX
User-defined
Maximal number of items within each menu frame
Config
Random
Randomly generated menu configuration with #Headers, each of them containing between #Items_MIN and #Items_MAX items
Picks
User-defined
Number of randomly chosen menu selections
Repetition
User-defined
Repetitive selections percentage (within set of #Picks selections)
Pools
User-defined
Number of task subsets (for repetitive selections distribution)
Task
Random
Randomly generated navigation and selection task, with a given number of randomly chosen menu selections and defined repetitive selections distribution
Simulations can be performed with different menu configurations, which are randomly generated according to a given number of menu headers (Headers) and an allowed number of items within each menu frame (Items_MIN and Items_MAX being the limitations). This way we are confronted with the option to analyze many various configurations that can afterwards be classified basing on menu sizes. The basis of the simulation process is a randomly generated navigation and selection task, which consists of an explicit number of random selections (Picks), some of
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
313
which are repetitive in accordance with a defined percentage (Repetition) and distribution (Pools). It is highly unlikely that the user will make the most of her/his repetitive selections at once, therefore these selections are evenly dispersed throughout the whole task, thus generating the desired distribution (Fig. 6).
Fig. 6. Distribution of repetitive selections within a randomly generated task. Parameter values for task configuration: Picks = 50, Repetition = 30%, Pools = 5.
Obviously, if items change their initial positions within a particular menu frame (according to algorithm instructions), a variation in keypad distance for selecting a particular item in both the original and the modified menu configuration will result. The overall difference between static and adaptive configuration in the total count of keystrokes (for completing the given task) represents the interaction speed enhancement provided by (automatic) adaptivity. In the task simulation framework, this criterion will be used to evaluate the usefulness of menu adaptation and to quantify efficiency of the used adaptation algorithms.
5 Simulation Results The measure of interaction speed improvement derived from automatic adaptation is given by (X-Y), where X stands for the number of keystrokes required for completion of the generated task using the original menu configuration, while Y stands for the number of keystrokes using the adaptive one. Fig. 7 shows a sample result of an FRB adaptation simulation session. Menu configurations that are used within the simulation process are categorized according to structure complexity, so we basically distinguish small, medium and large scale menus. It is easy to realize that wading through large scale menus requires increased user attention, because related headers can extend on several display screens. Because of the random characteristic of the task generation process, we used exactly 100 different instances of the generated task for every particular set of simulation parameters. Consequently, 100 simulation sessions are performed for a distinct menu configuration and specified task class, while the final simulation results are presented as a mean value of data thus collected. Altogether 4500 simulation runs have been carried out for 9 menu configurations and 5 classes of randomly generated tasks. The mean values show an observable level of navigation efficiency enhancement for both FB and FRB approaches. The obtained simulation results are structured and presented in Table 2.
314
V. Glavinic, S. Ljubic, and M. Kukec
Fig. 7. Sample result derived from task simulation framework. In this particular case, FRB adaptation decreases overall keypad distance for 81 keystrokes, thus decreasing the input interaction burden for approximately 10%.
Task classes with small designated number of Picks represent the user's interaction with the menu-based MDA in short interactive cycle. Conversely, tasks which include very large number of selections (e.g. Class #5 with 10000 Picks) correspond to longer usage of the application with menu component navigation options. Related to that, we can see that adaptivity effects considerably grow with task duration, so users can improve their navigation efficiency with the duration of adaptive menu usage. Table 2. Simulation results. For every task class, we set a Repetition parameter value to 15%. Menu configuration Scale
small menu
medium menu
large menu
Items Headers [min-max]
Task classes and simulation results Class #1 100 Picks 5 Pools
Class #2 200 Picks 5 Pools
Class #3 500 Picks 10 Pools
Class #4 1000 Picks 30 Pools
Class #5 10000 Picks 300 Pools
FB / FRB
FB / FRB
FB / FRB
FB / FRB
FB / FRB
3
2-5
5/9
14 / 19
42 / 70
40 / 105
102 / 652
4
2-5
5/8
17 / 23
43 / 67
71 / 152
98 / 584
5
2-5
4/6
17 / 19
32 / 41
44 / 79
118 / 770
6
3-8
14 / 18
25 / 31
50 / 71
133 / 255
330 / 1957
7
3-8
13 / 16
37 / 43
99 / 132
116 / 217
257 / 1348
8
3-8
16 / 19
52 / 59
100 / 123
142 / 241
503 / 2240
9
4-11
30 / 34
57 / 64
128 / 160
187 / 318
785 / 3027
10
4-11
29 / 32
65 / 71
150 / 178
185 / 298
692 / 2767
10
8-12
34 / 38
85 / 94
212 / 247
255 / 417
1081 / 4263
According to simulation results, the benefit of adaptivity implementation is on the other hand questionable in small scale menu configurations, especially within infrequently used applications. In such menus all popout/pulldown frames can be
On Efficiency of Adaptation Algorithms for Mobile Interfaces Navigation
315
expanded on a single display screen, resulting in no need for navigation to hidden item subsets, hence rearranging items according to prior user's navigation patterns has no manifest significance. Regarding the recency criterion, in most cases FRB adaptation resulted with better enhancement with respect to the FB approach, regardless of menu configuration scale. However, this difference in adaptivity effects becomes more prominent within tasks formed by a larger number of menu selections (e.g. within task class #5, FRB adaptation outperforms several times the FB approach). Hence, promotion of the most recently used items within a particular menu frame is preferable in MDAs requiring a frequent usage of a navigation-and-selection interaction style (as is the case in e.g. mlearning applications). Generally speaking, simulation outcomes support the concept and confirm the usefulness of adaptive techniques implementation for menu navigation in the mobile application environment. Nevertheless, it should be noted that the abovementioned conclusions emerge from theoretically based results. Let us note that it is quite hard to model real application tasks by using random generators because actual navigation patterns contain to some extent more predictive sequences of menu selections, which is not the case in our task simulation framework. This is the reason we can expect even more enhanced results in real application adaptation scenarios. Users' possible mistakes in navigation, impressions and levels of satisfaction while working with adaptive interfaces are excluded from this analysis, as the groundwork for these indicators (e.g. usability testing) is yet not implemented.
6 Conclusion and Future Work M-learning systems, one of our main research interests, will certainly become an additional advantage in the wide-ranging process of lifetime learning. When developing related m-learning MDAs, there arises a strong aspiration to completely utilize mobile device technology upgrowth, and to make these applications very powerful, graphically rich and usable. Hence, following the concept of universal usability, our efforts are focused on the quality of mobile user interaction. We make use of personalization procedures in order to enable users to work with MDA interfaces that are adjusted according to their preferred individual interaction patterns, thus making the users faster and more satisfied in performing assigned (m-learning) tasks. In our previous work, we introduced a transformable menu model for MDAs, with personalization capabilities derived from the use of both adaptable and adaptive techniques. The model is implemented as a Java ME API extension, and can easily be reused in all likewise applications (not necessarily m-learning ones). In this paper we are dealing with system-driven personalization of the presented menu model and efficiency of adaptation algorithms. Various static and adaptive menu configurations are compared within a task simulation framework, and the results thus obtained confirm that the usage of adaptivity makes a difference, providing a valuable improvement in navigation efficiency within menu-based mobile interfaces. Directions for future work include further improvements of our cognition on mutual influence between user diversity and automatic interaction adaptation. We would like to identify the conditions with respect to which the benefit of adaptation is more
316
V. Glavinic, S. Ljubic, and M. Kukec
valuable than the eventual loss of control due to unexpected changes of the menu configuration. Results derived from the described task simulation framework will be substantiated with new research outcomes which base on running adequate usability tests. Moreover, for every presented and completed user task, appropriate time measurements will be carried out, within both static and adaptive menu-based applications. With results thus collected, we expect to get a better insight into the correlation between theoretical and empirical adaptation effects. Acknowledgments. This paper describes the results of research being carried out within the project 036-0361994-1995 Universal Middleware Platform for e-Learning Systems, as well as within the program 036-1994 Intelligent Support to Omnipresence of e-Learning Systems, both funded by the Ministry of Science, Education and Sports of the Republic of Croatia.
References 1. Anderson, D.J.: Speed is the Essence of Usability (Editorial). UIdesign.net: The Webzine for Interaction Designers (1999), http://www.uidesign.net/1999/imho/sep_imho2.html 2. Findlater, L., McGrenere, J.: A Comparison of Static, Adaptive, and Adaptable Menus. In: Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI 2004), pp. 89–96. ACM, New York (2004) 3. Gajos, K.Z., Czerwinski, M., Tan, D.S., Weld, D.S.: Exploring the Design Space for Adaptive Graphical User Interfaces. In: Proc. 8th Int’l. Working Conf. Advanced Visual Interfaces (AVI 2006), pp. 201–208. ACM, New York (2006) 4. Glavinić, V., Granić, A.: HCI Research for E-Learning: Adaptability and Adaptivity to Support Better User Interaction. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 359–376. Springer, Heidelberg (2008) 5. Glavinic, V., Ljubic, S., Kukec, M.: A Holistic Approach to Enhance Universal Usability in m-Learning. In: Mauri, J.L., Narcis, C., Chen, K.C., Popescu, M. (eds.) Proc. 2nd Int’l. Conf. Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM 2008), pp. 305–310. IEEE Computer Society, Los Alamitos (2008) 6. Glavinic, V., Ljubic, S., Kukec, M.: Transformable Menu Component for Mobile Device Applications: Working with both Adaptive and Adaptable User Interfaces. International Journal of Interactive Mobile Technologies (iJIM) 2(3), 22–27 (2008) 7. Mahmoud, Q.: MIDP Database Programming Using RMS: a Persistent Storage for MIDlets. Sun Developer Network (SDN) - Technical Articles and Tips, http://developers.sun.com/mobility/midp/articles/persist/ 8. Roduner, C.: The Mobile Phone as a Universal Interaction Device – Are There Limits? In: Rukzio, E., Paolucci, M., Finin, T., Wisner, P., Payne, T. (eds.) Proc. of the MobileHCI Workshop on Mobile Interaction with the Real World (MIRW 2006), pp. 30–34 (2006) 9. Shneiderman, B.: Universal Usability: Pushing Human-Computer Interaction Research to Empower Every Citizen. Comm. ACM 43, 85–91 (2000) 10. Stephanidis, C.: Editorial. International Journal - Universal Access in the Information Society 1, 1–3 (2001) 11. Thimbleby, H.: Press On: Principles of Interaction Programming. The MIT Press, Cambridge (2007)
Accessible User Interfaces in a Mobile Logistics System Harald K. Jansson1, Robert Bjærum2, Riitta Hellman3, and Sverre Morka2 1
Norkart Geoservice AS, Løkketangen 20a, 1300 Sandvika, Norway
[email protected] 2 Tellu AS, Hagaløkkveien 13, 1383 Asker, Norway {robert.bjarum,sverre.morka}@tellu.no 3 Karde AS, P.O. Box 69 Tåsen, 0801 Oslo, Norway
[email protected] Abstract. In this paper, we focus on ICTs for young people attending occupational rehabilitation and training. An important goal is to develop ICTs that decrease the need for reading and writing dramatically. The UNIMOD-prototype demonstrates how mobile phones can be used as the main and only ICTdevice by truck drivers who deliver mats from the laundry to a large number of companies and public places. The mobile phone can be used in the truck for navigation according to traffic situation and geography, and for handling the customer and delivery information. The test sessions show that mobile phones offer an excellent point of departure for the development of simple and intuitive services that support users with cognitive declines. Keywords: Accessibility, Cognitive disabilities, GIS, Mobile solutions.
1 Introduction The influx of people registered as unfit for work is steady, if not increasing in Europe. In particular, incapacity for work amongst young employees increases continuously. There are also large numbers of so-called drop-outs, i.e. young people who drop out of school for various reasons. One of the reasons for not completing basic or occupational education is learning disabilities, such as dyslexia. According to an OECD-study [6], approximately 30 % of adults have difficulties in reading and writing, to such an extent that it is difficult for them to handle daily activities at study or work. Other cognitive declines [14], such as concentration problems are also rather common among young people. In many European countries, different occupational training and rehabilitation policies and programmes have been established to combat unemployment due to occupational disability. In Norway, there is a large number of enterprises that are dedicated to and specialized in occupational rehabilitation [1]. Such enterprises are organized as shareholder companies where the main shareholder usually is the local municipality. The services provided for occupationally disabled persons include assessment of the potential work and educational capacity of the individual and qualification of the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 317–326, 2009. © Springer-Verlag Berlin Heidelberg 2009
318
H.K. Jansson et al.
individual through individually adapted job training and guidance. The enterprises qualify occupationally disabled persons in real work environments. In this paper we focus on ICTs for young people attending occupational rehabilitation and training. An important goal is to develop ICTs which decrease the need for reading and writing dramatically. In connection with the R&D-work of the UNIMOD-project, the authors have collaborated with the rehabilitation enterprise ÅstvedtGruppen and their logistics team [2] on specifying and testing the prototype. 1.1 The UNIMOD-Project The main objective of the UNIMOD-project [12] is to develop new knowledge of multimodal, personalized user interfaces, and thus to improve the accessibility and use of electronic services. The UNIMOD-prototypes are based on real cases, and show how to increase the accessibility of the user interface on the mobile phone. The project presented in this paper addresses users with different kinds of cognitive declines in such areas as memory, problem-solving, orientation, attention/concentration, reading/writing, learning and verbal/visual/mathematical comprehension [4]. These areas are crucial to support in order to achieve an inclusive HCI-design for ICTs [1], such as mobile phones [14]. The so-called Åstvedt-prototype of the UNIMOD-project demonstrates how mobile phones can be used as the main and only ICT-device by truck drivers who deliver mats from the laundry to a large number of companies and public places in the Bergen area in Norway. The mobile phone will be used in the truck for navigation according to traffic situation and geography, and for handling the customer and delivery information. The Åstvedt-prototype results from software development collaboration between three UNIMOD-partners: Norkart Geoservice [9] delivers GIS, Tellu [11] develops applications for mobile phones, and ÅstvedtGruppen [2], a large rehabilitation enterprise (cf. Chapter 1). Researchers from Karde [5] and Norwegian Computing Center [8] have performed user requirements analyses and usability tests. 1.2 The UNIMOD-Prototype in a Nutshell Based on field studies, i.e. empirical observations of (a) the collaboration between the truck driver and the co-driver, (b) available documents (driving instructions, delivery information etc.), (c) the variety of different mats and other “deliverables”, (d) concrete traffic situations and actual navigation, (e) the architecture of public places and buildings, and (e) customer behaviour and preferences, the UNIMOD-team has developed a two-dimensional model for the mobile user interaction. The first dimension of the model handles geographic and navigational information in a user-centered way. The solution suggests a route, but the truck driver makes the concrete navigation decisions depending on the work situation. This dimension also handles the delivery information (i.e. delivering clean mats and picking up dirty ones). It is possible to change the order of delivery stops, and to switch between different representations of the route. The second dimension manages the presentation. The presentation is based on a multimodal user interface, and a “minimal information model”. Interactive forms of information input follow design guidelines developed in earlier projects [4]
Accessible User Interfaces in a Mobile Logistics System
319
Multimodality enables alternative presentations of the very same information (e.g. points of delivery on a map instead of a list of addresses). The minimal information model shows just the necessary information, until the user asks for more. The users require that the user interface introduces the lowest possible cognitive load, and that it must be possible to operate the application multi-modally, depending on personal preferences. In all modalities, the progress of the working day is shown to the user. This is considered an important motivating factor for both the truck driver and the co-driver. In the remainder of this paper, the R&D-work concerning the prototype will be presented. We approach the presentation by discussing the opportunities and limitations of GIS on mobile phones. Second, we will make some remarks about mobile phone technology and the challenges it poses to developers. Finally, we will present the HCI of the prototype.
2 Challenges and Opportunities for Mobile GIS In general, it is fair to say that navigating with the use of a map is not a trivial task. Even a simple map is saturated with topographical data, road data, buildings and landmarks. When constructing a good map, it is always important to have good idea of what information it should convey, what it is will used for, in what context it will be used, and it is certainly important to have a notion about the end user. The mobile digital map differs from its analogue counterpart in some important ways. A traditional printed map is static in its nature; it has a set scale and gives no option of filtering any information in the map. It is mobile in the sense that you can bring it with you, but offers no context sensitive help or guidance (GPS, nearby points of interests, turn-by-turn navigation etc.). However, it often provides detailed, high resolution data, and a well defined cartography. Therefore, it is common that printed maps have a specific theme, and if you need other information you go out and buy a new separate map. A digital map, on the other hand, has the ability to present dynamic data. Also, depending on the hardware and software platform, it can assist the user in simple tasks like knowing his or her position, calculating route and distance to a destination, gathering context information (speed, bearing etc.), and communicating with similar devices or other users. This makes the digital map a very flexible tool, which can be tailored to match the user’s abilities and the use context. 2.1 Technical Constraints Pertaining to Mobile Maps While mobile digital maps excel at providing dynamic, context-based information, they are less good at displaying complex cartography. This has a lot to do with the relatively small screen-sizes associated with such devices, but also with the resolution of the screens. A typical mobile phone has a resolution of 240x320 pixels on a 2.8 inch display, which gives little room for detail. Moreover, mobile screens often render colours in disparate ways, which constricts the map to a small colour space. Mobile devices have numerous technical constraints in addition to the screen limitations which do not affect the map directly. They often have cumbersome interfaces,
320
H.K. Jansson et al.
lack the inclusion of a decent “qwerty-keyboard”, they rely on battery power, and they have unreliable and low data connections (at least in comparison to an ordinary PC). Such limitations play an important role when mobile systems are designed for professional use at work. 2.2 Mobile Cartography Mobile cartography is a concept described by Reichenbacher as follows [10]: “Mobile cartography deals with theories and technologies of dynamic cartographic visualization of spatial data and its interactive use on portable devices anywhere and anytime under special consideration of the actual context and user characteristics.” While the prototype application did not use a client side vector map rendering engine, we could make small server-side adjustments to enhance readability on small screens. One of the things we encountered through user feedback during the project was the lack of road names at certain spots on the map. A given map which may work well in a desktop solution will lack essential information because of the limitations connected with the small screen. The map engine uses pre-rendered map tiles from a WMS-source (Web Map Service), so in this case the rules pertaining to road names had to be changed. We changed text to occur more frequently in vector and hybrid maps, resulting in a more informative map for our mobile users (Fig. 1).
Fig. 1. Text changes in maps for mobile users: more frequent information
2.3 Aerial Photos, Hybrid Maps and Waypoint-Based Navigation It is a well known fact that landmark navigation is a simple and intuitive method of getting from one place to another, given some knowledge about the route. It is, for example, common to use landmarks when explaining a route to another person. On a vector-based map, these objects are often hard to spot, as the features that characterize a building or structure, often is lacking from the dataset. In these cases the use of ortophotos, which are terrestrial images that have been geometrically corrected in such a way that they can be used as maps, may be useful. Hybrid maps can be used as an addition to the vector map. These pictures provide extra details pertaining to the user's surroundings and allow for landmark based navigation. Hybrid maps, which are ortophotos with vector data layered on top of them is also a good alternative. Fig. 2 illustrates the three types of maps. Our user tests showed that a number of users preferred ortophotos instead of vector maps.
Accessible User Interfaces in a Mobile Logistics System
321
Fig. 2. Vector map, orthophoto and hybrid map of same point of interest. Different end users may prefer different presentations.
2.4 Delivery List and Its Map Representation The delivery list (Fig. 3) is a central artifact connected to the work process of the truck drivers. It tells them where to drive and what to deliver. One of the early objectives of the UNIMOD-prototype project was to facilitate multiple ways of viewing the delivery list. We wanted a simple model that could be viewed as a geographically ordered list or as waypoints along a route in a map, depending on the user’s cognitive abilities and preferences. One alternative was to create a simplified route-view, which had a loose connection to the actual geographical locations, much like a modern subway map. The map representation of the list has the same interface mechanisms for going back and forth between waypoints, or selecting delivery spots, as the normal list. Either view contains the same amount of information, so the user should not need to change display to access another kind of data. The map view was intended for users with limited or no knowledge of the route, and the list view was intended for users who were familiar with the locations, and only needed to know the name of the next stop. 2.5 Challenges of Mobile Computing The prototype client for the UNIMOD-project was developed for mobile phones. As pervasive computing comes with a lot of additional challenges, this is most notable on cell phones. According to Forman et al. [3], the main challenges in mobile computing can be divided into three fields. These are wireless communication, mobility and portability. The concrete problems, in addition to wireless communication, are related to disconnection, low bandwidth, high bandwidth variability, heterogeneous networks and security risks. For the UNIMOD-prototype, the wirelessness is a real challenge. The problem with wireless communication is that there is no guarantee that the device is connected to the network. As the mobile client is developed for use in a truck delivering goods, there might be environmental issues that block both GPS and GPRS communication, such as tall buildings. Hence the application can not rely on communication at all times. For instance, the mobile client can use GPS-positions to provide relevant user interface and activate the current views. However, it is necessary to allow the user to override application navigation to avoid a deadlock when the GPS-signal is lost.
322
H.K. Jansson et al.
Fig. 3. A paper-based delivery list. Exceptions are written on it etc. At the end of the day, it may not look like this at all, with comments and coffee stains on it. Perhaps a bit of the sheet is torn off, because the co-driver needed something to write a phone number on…
Another reason why mobile computing relies on wireless connection is that it is necessary to reduce the amount of computation and processing executed on the mobile device to maintain battery capacity. Wireless communication allows the client to delegate computational work to a central server, by sending some parameters and receiving the outcome of the computation from the server. In fact, we experienced some problems with the prototype as a consequence of congestion when sending much data during a short time interval. The map service used by the client sends a grid of nine images of 250x250 pixels. This would not be a problem for stationary devices, as TCP has mechanisms that deal with congestion. For the UNIMOD-prototype, this caused the application to crash. We solved this by using a proxy on the application server. This assured that just one image is sent at a time. For the end-users in real working situations, such mechanisms are vital. The next challenge of mobile computing is portability. This challenge includes such aspects as the lack of standardisation on mobile devices with respect to screen sizes, input interface, communication features (such as Bluetooth, GPS, Wi-Fi etc.) and other hardware constraints. Other concrete constraints are battery capacity, small and different user interfaces and relatively small storage capacity. Fortunately, the latter is not a great issue anymore, as new devices ship with at least 40 megabytes of RAM with the possibility to increase this by inserting memory cards. However, the constrained application memory (heap) is still an issue, and we experienced problems with it during the prototype development.
Accessible User Interfaces in a Mobile Logistics System
323
There are three main concerns regarding portability on mobile devices. It is important to reduce the amount of heap to an absolute minimum, to reduce the processing needed in order to preserve battery and to make generic interfaces with respect to both screen and user input. One of the factors that make it difficult to achieve processing and memory heap economy is that the API used on the relevant devices is Java. Java is an intuitive programming language, and it is easy to implement it on all systems. Unfortunately it is not efficient with respect to memory management and economic processing. The reason for this is that Java is an interpreted language, meaning that it is a high level that requires a lot of redundant processing and large structures to fit in the memory. Solutions to reduce processing are distributed processing by external server for heavy computations, use of native functionality to avoid processing overhead whenever possible and keeping data structures at an absolute minimum. The features in the prototype that are the largest threat to heap consumption are the images used by the navigation module and the paths between destinations. To solve this we had to implement mechanisms that assured that only visible images were in the memory, and persistent storage of all the other images to reduce communication latency and potential charge for bandwidth usage. As for paths we have reduced the number of points needed to draw the path. With regard to external processing, this is implemented on the paths between destinations. All paths are calculated on the server. The client requests the path with the start and the destination coordinates as parameters, and the server responds with the shortest path to the destination. 2.6 The Development Framework The UNIMOD-partner Tellu has developed a framework that solves most of the issues connected with wireless communication. This framework is called ActorFrame [7]. It is an open framework that connects devices to a message bus. The framework can operate on most of the protocols and connections used in mobile computing. ActorFrame divides the application into a number of Actors. An actor is a module that serves a responsibility (called “role” in ActorFrame) in the application. One actor may consist of several inner actors. Seen from the outside the application consist of one actor, usually with multiple inner actors. ActorFrame assures that there is a connection between all peers comprising the application at all times, using a message bus. The prototype consists of tree actors on the server side and two actors on the client side. These actors are in addition to standard actors that are part of the framework, such as resource manager, name server etc. The actors used by the server are:
MapServer handles map tile requests and transformation and visual representation of navigation. UnimodServer handles handshake with the client, and routes the other messages to their respective actors. UnimodFileedge parses the delivery lists and prepares initial data, such as default spider paths and initial images, persistent delivery objects etc.
324
H.K. Jansson et al.
The client consists of the following actors, in addition to framework actors:
MapClient handles map requests, path requests and mapping to the client view. This actor handles all interaction with the map view, including the module assuring persistent storing of map images. UnimodClient handles the more application specific logic, such as sorting delivery lists, handling progress, user interaction etc.
ActorFrame maintains a connection between the client and server at all times using a router that allows the message bus to communicate over GPRS. ActorFrame provides a persistent connection between the server and the client, and so it is adequate for continuous communication. This is convenient in terms of dynamic route updates from base, or reporting back to base about schedule changes etc. ActorFrame will also allow communication between clients. This may be convenient in case of obstacles in the road, or when a truck lacks delivery objects. The client can simply broadcast a message about this, or send directly to another client, and this car might make the delivery instead.
3 The Prototype Using a hierarchical ordering of screens, and having the intended user in mind (i.e. users with potential cognitive declines), the prototype was kept as simple as possible. The end user should only need to see information pertaining to the task at hand, while also maintaining an overview of the delivery process and the route. The UNIMOD-prototype has six levels of screens. (Fig. 4). The first two levels are used as preliminary steps to the actual delivery. Level one lets the user choose which truck (carrier) should be used. Level two has a list of deliverables pertaining to the chosen truck. When the carrier has finished loading the truck he is presented with one of the level three screens, depending on a user setting. This level holds all supernodes in the route, and the carrier can choose between list mode or map mode. When the carrier has arrived at a supernode, he can expand it to see which subnodes (actual delivery spots) it contains. In this particular case the deliverables were doormats, and for each supernode (building) there could be multiple subnodes (entrances). While at the level four screen, the carrier can choose to recursively check the subnode with its products as delivered. This is to avoid unnecessary interface navigation, but the user can – if necessary – dig one step further, to level five. If the user is not sure about what kind of goods to deliver, or what number of goods to deliver, or if there is a mismatch for some reason, the user is given the option to check off single deliveries at the level five screen. When finished with all deliveries, the user is presented with a message box, indicating a job well done. At all times during the delivery, a status bar is shown in the upper portion of the screen. This is to give the carrier a quick overview of the delivery route progress, even while at low levels in the interface.
Accessible User Interfaces in a Mobile Logistics System
325
Fig. 4. The UNIMOD-prototype and the work flow. The bar that indicates the progress is shown at the top of the screens. The blue colour indicates the degree of completion.
4 Conclusion In this paper we have presented our R&D-work to increase the usability and accessibility of applications on mobile phones. Walkthroughs were applied as the main test methodology, allowing the designers and developers to communicate on the prototype. Feedback from the expert users concerned more intuitive navigation, the need to increase the visual clarity of map symbols and the possibility to use multimodal input when registering exceptions on the delivery route. The overall impression from the test sessions is that mobile phones offer an excellent point of departure for the development of simple and understandable services which support users with cognitive declines, such as people with dyslexia.
326
H.K. Jansson et al.
It is, however, necessary to continue increasing the accessibility requirements connected to physically small screens and interactivity designs that apply to mobile devices. It is also necessary to address the constraints of multimodality. The flexibility afforded by multimodality raises considerable challenges for the users who interact with their systems, services and devices. This concern is connected to the overload that may be generated by the introduction of several modalities, such as combinations of visual and audio information, and the opportunity to choose. Finally, there is the question of suitable use contexts for the mobile phone. The UNIMODprototype clearly shows the potential of mobile phones in professional use contexts. Acknowledgments. The research work has been partially financed by the Norwegian Research Council. Personnel from the ÅstvedtGroup have made the empirical work possible. Special thanks go to truck drivers and co-drivers for driving the researchers around Bergen and commenting on the prototype.
References 1. Association of Vocational Rehabilitation Enterprises, http://www.attforingsbedriftene.no/uk/home.aspx 2. Åstvedt Logistikk, A.S., http://www.astvedt.no/?aid=9045819 3. Forman, G.H., Zahorjan, J.: The Challenges of Mobile Computing. IEEE Computer, 38–47 (April 1994) 4. Hellman, R.: Universal Design and Mobile Devices. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4554, pp. 147–156. Springer, Heidelberg (2007) 5. Karde AS, http://www.karde.no/karde-web/Karde_engelsk.html 6. Learning a Living. First Results of the Adult Literacy and Life Skills Survey. Organisation for Economic Co-operation and Development (2005), http://www.oecd.org/dataoecd/44/7/34867438.pdf 7. Melby, G., Husa, k.E.: ActorFrame Developers Guide, Technical Report, Ericsson (2005) 8. Norwegian Computing Center, http://www.nr.no/ 9. Norkart Geoservice AS, http://www.norkart.no/wip4/detail.epl?cat=1077 10. Reichenbacher, T.: The world in your pocket – Towards a mobile cartography. In: Proceedings of the ICC 2001, Beijing, China, pp. 2514–2521 (2001) 11. Tellu AS, http://www.tellu.no/tellu_webpage.html 12. Universal Design in Multi-modal Interfaces, http://www.unimod.no 13. WebAim: Cognitive Disabilities - Design Considerations, http://www.webaim.org/articles/cognitive/design.php 14. WebAim: Cognitive Disabilities - Introduction, http://www.webaim.org/articles/cognitive/
Multimodal Interaction for Mobile Learning Irina Kondratova National Research Council Canada Institute for Information Technology 46 Dineen Drive, Fredericton, NB, Canada E3B 9W4 {Irina.Kondratova}@nrc-cnrc.gc.ca
Abstract. This paper discusses issues associated with improving usability of user interactions with mobile devices in mobile learning applications. The focus is on using speech recognition and multimodal interaction in order to improve usability of data entry and information management for mobile learners. To assist users in managing mobile devices, user interface designers are starting to combine the traditional keyboard or pen input with “hands free” speech input, adding other modes of interaction such as speech-based interfaces that are capable of interpreting voice commands. Several research studies on multimodal mobile technology design and evaluations were carried out within our state-of the art laboratories. Results demonstrate feasibility of incorporating speech and multimodal interaction in designing applications for mobile devices. However, there are some important contextual constrains that limit applications with speech-only interfaces in mobile learning, including social and environmental factors, as well as technology limitations. These factors are discussed in detail. Keywords: Mobile usability, multimodal interaction, speech recognition, mobile evaluation.
1 Introduction Many researchers see great value in mobile learning because of portability, low cost and communication capabilities of mobile devices [21]. Mobile devices are becoming an increasingly popular choice in university and school classrooms, and are increasingly being adopted by the “lifelong learners”. Several features of mobile technologies make it attractive in learning environments, among them: relatively low cost of mobile devices [25] and good fit within informatics and social layers of classroom communications [20]. Evaluations of mobile technologies within the classroom environment are largely positive [1, 24]. However, widespread use of mobile technology in learning applications is impeded by numerous usability issues with mobile devices. The gravity of mobile usability problems is highlighted by recent surveys of mobile Internet users [22]. They show that usability is by far the biggest source of frustration among the users of mobile technologies. In particular, for learning applications, research shows that the most important constraining factors for widespread mobile learning adoption, along with battery life, are the screen size and user interface of most portable devices [17]. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 327–334, 2009. © Her Majesty the Queen in Right of Canada, 2009.
328
I. Kondratova
This paper explores possible improvements in the usability of mobile devices that is facilitated by utilization of natural user interfaces to enhance interaction with mobile devices. In section two of the paper the author provides background information on speech-based interaction with mobile devices and on technologies involved. This section of the paper also addresses the concept of multimodality and multimodal applications for interaction with mobile devices. The follow-up section discusses several laboratory studies conducted to evaluate efficacy and feasibility of multimodal interactions with mobile devices and their potential applications to mobile learning. The author concludes with observations on the potential for incorporating speech and multimodal technologies in mobile learning domain and some limitations of these technologies.
2 Alternative Interaction Modalities 2.1 Speech as an Interaction Modality In order to assist users in managing mobile devices, user interface designers are starting to combine the traditional keyboard or pen input with “hands free” speech input [28], adding other modes of interaction such as speech-based interfaces that are capable of interpreting voice commands [23]. As a result, speech processing is becoming one of the key technologies for expanding the use of handheld devices by mobile users [18]. In the eLearning technology foresight, technology-based education guru Tony Bates predicted that: “A new computer interface based on speech recognition will have a major impact on the design of e-learning courses” [15]. Currently, automated speech recognition (ASR) technology is being used in desktop e-learning applications for automated content-based video indexing for interactive e-learning [29], audio–clip retrieval based on student questions [30], and, together with speech synthesis, to improve accessibility of e-learning materials for visually impaired learners [3, 4]. Another novel application of mobile technology for experiential learning is being developed for functionally illiterate adults [14]. This application employs speech recognition and text-to-speech to assist adult literacy learners in improving pronunciation of words they learn. 2.2 Multimodal Interaction Speech technology seems to be ideally suited for enhancing usability of mobile learning applications designed for the mobile phone. In this domain speech is a natural way of interaction, especially where a small screen size of a mobile device limits the potential for a meaningful visual display of information [2]. However, speech technology is limited to only one form of input and output - human voice. In contrast to this, voice input combined with the traditional keyboard-based or pen-based input permits multimodal interaction where the user has more than one means of accessing data in his or her device [16]. This type of user interface is called a multimodal interface [5]. Multimodal interfaces allow speedier and more efficient communication with mobile devices, and accommodate different input modalities based on user preferences and the usage context. A field trip learning environment, offers the most comprehensive scenario for using of speech and multimodal interaction with mobile device. For
Multimodal Interaction for Mobile Learning
329
example, in a field trip scenario for a group of engineering students, a student can request information about the field structure (bridge, building, road, etc.) from the course repository using “hands free” voice input on a “smart phone” (hybrid phoneenabled PDA). The requested information would then be delivered as a text, picture, CAD drawing, or video, if needed, directly to the PDA screen. The student will be able to enter field notes in the forms using a portable keyboard or a pen, if appropriate or via voice input during field data gathering. In addition to this, free-form verbal field notes could be attached to the data collected as an audio file and later analyzed in class [6].
3 Evaluations of Mobile Speech and Multimodal Technologies This section compares several applications of mobile multimodal technologies (speech-based and keyboard/stylus). In particular, the focus is on user evaluations of these technologies conducted to study the feasibility and efficacy of speech-based and multimodal interactions in different contexts. This comparison will form the basis for author’s estimate for potential of using speech as an interaction technique in various learning contexts. 3.1 Speech vs Stylus Interaction Comparison of efficacy of speech-based and stylus-based interaction with a mobile device was conducted as a part of our research in the area of mobile field data collection that focus on multimodal (including voice) field data collection for industrial applications. We investigated the use of technologies that allow a field-based concrete testing technician to enter quality control information into a concrete quality control database using various interaction modes such as speech and stylus, on a handheld device. A prototype mobile multimodal field data entry (MFDE) application have been developed to run on a Pocket PC that is equipped with a multimodal browser and embedded speech recognition capabilities. The prototype application was developed for the wireless Pocket PC utilizing the multimodal NetFront 3.1 Web browser and a fat wireless client with an embedded IBM ViaVoice speech recognition engine. An embedded relational database (IBM DB2 everyplace) was used for local data storage on the mobile device. A built-in microphone on a Pocket PC was utilized for speech data input [7]. User evaluation was conducted as a lab-based mobile evaluation of the prototype technology we developed. The detailed description of the study design is given in [8]. Our mobile application was designed to allow concrete technicians to record, while in the field (or more specifically, on a construction site), quality control data. The application supported two different modalities of data input – speech-based data entry and stylus-based data entry. The purpose of the evaluation was to (a) determine and compare the effectiveness and usability of the two different input options and (b) to determine which of the two options is preferred by users in relation to the application’s intended context of use. In order to appropriately reflect the anticipated context of use within our study design, we had to consider the key elements of a construction site that would potentially
330
I. Kondratova
influence a test technician’s ability to use one or both of the input techniques. We determined these to be: (a) the typical extent of mobility of a technician while using the application; (b) the auditory environmental distractions surrounding a technician – that is, the noise levels inherent on a typical construction site; and (c) the visual or physical environmental distractions surrounding a technician – that is, the need for a technician to be cognizant of his or her physical safety when on-site. A total of eighteen participants participated in the study. The results of the evaluation confirmed, as it was anticipated, that stylus-based input was significantly more accurate than speech under the conditions of use that included construction noise in the range of 60-90 dB (A) [11]. We observed, however, that the stylus-based interaction was, on average, slower than speech-based input and that speech-based input significantly enhanced the participants’ ability to be aware of their physical surroundings. In addition, majority of participants expressed preference for using speech as interaction technique with mobile device. As a result, this research study demonstrated significant preference for using speech as an interaction modality, with some limitations imposed by the lower speech recognition accuracy levels due to environmental noise. These findings led us to investigation of several technology factors that can potentially influence the accuracy of speech recognition, such as the type of the microphone and the type of speech recognition engine used. 3.2 Speech-Based Interactions – Technology Evaluations The choice of microphone technology and speech recognition engine plays an important role in improving quality of speech recognition [19]. Our study described in detail in [12] was designed to evaluate and compare three commercially available microphones – the bone conduction microphone, and the two types of condenser microphones for their effect on accuracy of speech recognition within mobile speech input application. We developed a data input application based on a tablet PC running Windows XP and utilized IBM’s ViaVoice embedded speaker-independent speech recognition engine, the same speech recognition engine that was utilized in our previous study [8]. Twenty four people participated in the laboratory-based study. The participants were mobile while entering information requested. The results of the study helped us to prove that the choice of microphone had significant effect on accuracy of mobile speech recognition; in particular, we found that both condenser microphones (QSHI3 and DSP-500 microphones) performed significantly better than bone conduction microphone (Invisio). In addition, we found that there was no significant effect of a background noise (within our evaluation scenario we incorporated street noise of 70 dB (A) level) on the accuracy of speech recognition, indicating that all microphones under evaluation had sufficient noise-cancelling capabilities. Considering the importance of choosing the best speech recognition engine on the accuracy of results obtained, a complementary laboratory study was conducted to evaluate a number of state-of-the art speech recognition engines as to their effect on the accuracy of speech recognition [13]. This study was based on pre-recorded user speech entries, collected in our previously mentioned study [12]. All speech recognition engines were evaluated in speaker independent mode (e.g. walk-up-and-use). Based on the results of this study, we also proved the importance of proper pairing of
Multimodal Interaction for Mobile Learning
331
microphone systems and speech recognition engines to achieve the best possible accuracy of speech recognition for mobile data entry. 3.3 Feasibility of Using Speech Interaction in Learning Contexts Our previous research demonstrates that it is technically possible to implement speech-based and multimodal interaction with a mobile device and to achieve significant level of user acceptance and satisfaction with technology. However, if we were to consider implementation of speech-based interfaces within mobile learning domain, we have to look at other important considerations, such as appropriateness of speech as an interaction modality within certain contexts of use and social acceptance of speech-based interactions. In a classroom environment, when a number of learners could potentially utilize mobile technology to participate in learning and collaboration process, the appropriateness of speech-based interaction is questionable, since simultaneous use of speech by multiple users will introduce high level of environmental noise that could significantly reduce the accuracy of speech recognition for each individual device. Thus, based on contextual considerations, this application of speech-based interfaces is not appropriate. At the same time, utilization of speech interaction by a single mobile learner is very much appropriate and could significantly improve experience of his/her “learning on the go”. Research has proven that mobile speech-based interaction could be successfully designed for users on the go, such as city tourist guides or in-car speechbased interfaces [10]. Most frequently these types of applications utilize a constrained vocabulary of user commands and a constrained grammar of possible user entries. This functionality enables menu navigation, information retrieval and some basic data entry capabilities. The same principles apply to utilization of speech-based and multimodal interfaces for student field trips, where students are mobile and take notes “on the go” [9]. Another interesting and rapidly developing research area is an application of speech-based and multimodal interfaces within various training scenarios, including industrial and military training. Within these scenarios, when training is conducted in the field or in the simulated field environment, voice command could enable efficient “hands free, eyes free” information retrieval, menu navigation and basic data entry. Another application of speech-based interfaces is within the domain of gaming, including “serious gaming” in education and training domains [27]. A major challenge for speech-based interfaces within “serious gaming” domain is to improve the accuracy of speech recognition within environmentally challenging conditions (high level of noise, people possibly being under stress thus affecting the way they speak and reducing the accuracy of command recognition, etc). Within this usage domain, we see an opportunity to successfully deploy multimodal interaction so that multiple channels of input would assist in improving accuracy and usability of the system [26].
4 Conclusions Our research on speech-based and multimodal user interaction with mobile devices has proven that it is technically feasible to implement speech-based (or multimodal)
332
I. Kondratova
interaction with a mobile device and to achieve significant level of user acceptance and satisfaction with this technology. We also identified some challenges associated with use of speech-based and multimodal interaction within the learning and training domains. Our future research efforts will be focused on exploring ways to better incorporate multimodal (including speech-based) interfaces within “serous gaming” scenarios, where this technology has potential to significantly improve usability of user interactions with technology, especially in cases where “hands-free and eyes free” interaction is a must, such as military and industrial training applications. Acknowledgements. The author would like to acknowledge the support for this research program provided by the National Research Council Canada.
References 1. Crawford, V., Vahey, P.: Palm Education Pioneers Program: Evaluation Report. SRI International, Menlo Park (2002) 2. de Freitas, S., Levene, M.: Evaluating the Development of Wearable Devices, Personal Data Assistants and the Use of Other Mobile Devices in Further and Higher Education Institutions. JISC Technology and Standards Watch Report: Wearable Technology (2002) 3. Guenaga, M.L., Burger, D., Oliver, J.: Accessibility for e-Learning Environments. In: Miesenberger, K., Klaus, J., Zagler, W.L., Burger, D. (eds.) ICCHP 2004. LNCS, vol. 3118, pp. 157–163. Springer, Heidelberg (2004) 4. Jahankhani, H., Lynch, J.A., Stephenson, J.: The Current Legislation Covering E-learning Provisions for the Visually Impaired in the EU. In: Shafazand, H., Tjoa, A.M. (eds.) EurAsia-ICT 2002. LNCS, vol. 2510, pp. 552–559. Springer, Heidelberg (2002) 5. Jokinen, K., Raike, A.: Multimodality – Technology, Visions and Demands for the Future. In: Proceedings of the 1st Nordic Symposium on Multimodal Interfaces, Copenhagen (2000) 6. Kondratova, I., Goldfarb, I.: M-learning: Overcoming the Usability Challenges of Mobile Devices. In: Proceedings International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL 2006), p. 223. IEEE Computer Society Press, Los Alamitos (2006) 7. Kondratova, I.: Speech-Enabled Handheld Computing for Fieldwork. In: Proceedings of the International Conference on Computing in Civil Engineering 2005, Cancun, Mexico (2005) 8. Kondratova, I., Lumsden, J., Langton, N.: Multimodal Field Data Entry: Performance and Usability Issues. In: Proceedings of the Joint International Conference on Computing and Decision Making in Civil and Building Engineering, Montréal, Québec, Canada, June 1416 (2006) 9. Kravcik, M., Kaibel, A., Specht, M., Terrenghi, L.: Mobile Collector for Field Trips. Educational Technology & Society 7(2), 25–33 (2004) 10. Larsen, L.B., Jensen, K.L., Larsen, S., Rasmussen, M.H.: Affordance in Mobile Speechbased User Interaction. In: Proceedings of the 9th international Conference on Human Computer interaction with Mobile Devices and Services, MobileHCI 2007, pp. 285–288. ACM, New York (2007)
Multimodal Interaction for Mobile Learning
333
11. Lumsden, J., Kondratova, I., Langton, N.: Bringing A Construction Site Into The Lab: A Context-Relevant Lab-Based Evaluation Of A Multimodal Mobile Application. In: Proceedings of the 1st International Workshop on Multimodal and Pervasive Services (MAPS 2006), Lyon, France (2006) 12. Lumsden, J., Kondratova, I., Durling, S.: Investigating Microphone Efficacy for Facilitation of Mobile Speech-Based Data Entry. In: Proceedings of the British HCI Conference, Lancaster, UK, September 3-7 (2007) 13. Lumsden, J., Durling, S., Kondratova, I.: A Comparison of Microphone and Speech Recognition Engine Efficacy for Mobile Data Entry. In: The International Workshop on MObile and NEtworking Technologies for social applications (MONET 2008), part of the LNCS OnTheMove (OTM) Federated Conferences and Workshops, Monterrey, Mexico, November 9-14 (2008) 14. Lumsden, J., Leung, R., Fritz, J.: Designing a Mobile Transcriber Application for Adult Literacy Education: A Case Study. In: Proceedings of the International Association for Development of the Information Society (IADIS) International Conference Mobile Learning 2005, Qawra, Malta, June 28 – 30 (2005) 15. Neal, L.: Predictions for 2002: e-learning Visionaries Share Their Thoughts. eLearn Magazine 2002(1), 2 (2002) 16. Oviatt, S., Cohen, P.: Multimodal Interfaces that Process What Comes Naturally. Communications of the ACM 43(3) (March 2000) 17. Pham, B., Wong, O.: Handheld Devices for Applications Using Dynamic Multimedia Data, Computer Graphics and Interactive Techniques in Australasia and South East Asia. In: Proceedings of the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia. ACM Press, New York (2004) 18. Picardi, A.C.: IDC Viewpoint. Five Segments Will Lead Software Out of the Complexity Crisis, Doc #VWP000148 (December 2002) 19. Quek, F., MCNeill, D., Bryll, R., Dunkan, S., Ma, X.-F., Kirbas, C., MCCullough, K.E., Ansari, R.: Multimodal Human Discourse: Gesture and Speech. ACM Transactions on Computer-Human Interaction 9(3), 171–193 (2002) 20. Roschelle, J., Pea, R.: A Walk on the WILD Side: How Wireless Handhelds May Change Computer-supported Collaborative Learning. International Journal of Cognition and Technology 1(1), 145–168 (2002) 21. Roschelle, J.: Keynote paper: Unlocking the learning value of wireless mobile devices. J. of Computer Assisted Learning 19, 260–272 (2003) 22. Sadeh, N.: M-Commerce: Technology, Services, and Business Model. John Wiley & Sons, Inc., Chichester (2002) 23. Sawhney, N., Schmandt, C.: Nomadic Radio: Speech and Audio Interaction for Contextual Messaging in Nomadic Environments. ACM Transactions on Computer-Human Interaction 7(3), 353–383 (2000) 24. Smordal, O., Gregory, J.: Personal Digital Assistants in Medical Education and Practice. Journal of Computer Assisted Learning 19(3), 320–329 (2003) 25. Soloway, E., Norris, C., Blumenfeld, P., Fishman, B.J.K., Marx, R.: Devices are Ready-atHand. Communications of the ACM 44(6), 15–20 (2001) 26. Tse, E., Greenberg, S., Shen, C.: Exploring Interaction with Multi User Speech and Whole Handed Gestures on a Digital Table. In: Proceedings of ACM UIST 2006, Montreux, Switzerland, October 15–18 (2006) 27. Wang, X., Yun, R.: Design and Implement of Game Speech Interaction Based on Speech Synthesis Technique. In: Pan, Z., Zhang, X., El Rhalibi, A., Woo, W., Li, Y. (eds.) Edutainment 2008. LNCS, vol. 5093, pp. 371–380. Springer, Heidelberg (2008)
334
I. Kondratova
28. Wilson, L.: Look Ma Bell, No Hands! – VoiceXML, X+V, and the Mobile Device. XML Journal, August 3 (2004) 29. Zhang, D., Nunamaker, J.F.: A Natural Language Approach to Content-Based Video Indexing and Retrieval for Interactive E-Learning. IEEE Transactions on Multimedia 6(3), 450–458 (2004) 30. Zhuang, Y., Liu, X.: Multimedia Knowledge Exploitation for E-Learning: Some Enabling Techniques. In: Fong, J., Cheung, C.T., Leong, H.V., Li, Q. (eds.) ICWL 2002. LNCS, vol. 2436, pp. 411–422. Springer, Heidelberg (2002)
Acceptance of Mobile Entertainment by Chinese Rural People Jun Liu1, Ying Liu2, Hui Li1, Dingjun Li1, and Pei-Luen Patrick Rau1 1
Institute of Human Factors & Ergonomics, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China 2 Nokia Research Center, No. 5, Donghuan Zhonglu, Beijing Economic & Technological Development Area, Beijing 100176, China
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] Abstract. This study explores and analyzes contributing factors of mobile entertainment acceptance by Chinese rural people. First, 27 factors were drawn from literatures. Then a new factor “cost” was found through interview. After that, a survey was built based on the 28 factors. From the data collected in Chinese rural area, seven factors were extracted through explorative factor analysis: social influence, technology and service quality, entertainment utility, simpleness and certainty, self-efficacy, perceived novelty, and cost. Finally, a comprehensive model was provided involving the seven factors as well as their importance rank. This research provides a comprehensive approach in technology acceptance theory. It can also help practitioners to better understand the rural user group and improve their products accordingly. Keywords: Technology acceptance; mobile entertainment; rural people.
1 Introduction Mobile technologies and applicants are rapidly and widely developed for entertainment. However, entertainment related services are far from fully accepted by mobile phone users, especially in emerging markets. Therefore, to study the emerging market users’ perception and acceptance of mobile entertainment is a great demand for business, technology and social practice. The objectives of this research include: 1) to build a comprehensive model for the acceptance of mobile phone entertainment by Chinese rural people, considering users, technologies and the environment; and 2) to generate design and ecosystem suggestions for improving the acceptability of mobile entertainment services.. The research is creative and significant from two aspects: the first by its comprehensive modeling paradigm, and the second by its special focus on mobile entertainment issue. Mobile entertainment acceptance is a sub-question of technology acceptance. Since 1975, much has been done in investigating the technology acceptance [1]. Several models have been developed to describe contributing variables of technology acceptance. However, there is no comprehensive model considering users, C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 335–344, 2009. © Springer-Verlag Berlin Heidelberg 2009
336
J. Liu et al.
technologies and the environment. Some of the proposed models seem to be comprehensive, but if the measuring items are examined, they are still only focusing on technology itself, or the users [2]. So a comprehensive model which can analyze and predict better is vital to build. On the other hand, although there are lots of researches in the topic of technology acceptance, few have been focused on the special issue of mobile entertainment acceptance. Since mobile entertainment is different from other technologies for its mobility, legerity, emotionality, and personalization, the contributing variables of mobile entertainment acceptance should act differently from those of technology acceptance. Therefore, there is a great demand researching factors influencing user’s acceptance of this particular mobile service, and models describing how those influences happen. To achieve those objectives, a four-phase study was conducted. Phase I was background research. As mobile entertainment acceptance is a sub-issue of technology acceptance, researches on the latter topic are reviewed. From literatures, factors contributing to technology acceptance were concluded as the basis of following study. Phase II was user study by in-depth phone interviews. A new factor that is not included in former literatures was found during the interview analysis. In Phase III, survey and modeling, all the factors were validated and modeled together. Phase IV was to discuss design and ecosystem suggestions based on the model.
2 Background Research 2.1 Mobile Entertainment Definition According to Mobile Entertainment Forum 2003, the term “mobile entertainment” refers to “entertainment products that run on wirelessly networked, portable, personal devices, which includes downloadable mobile phone games, images and ring tones, as well as MP3 players and radio receivers built into mobile handsets.” The term excludes mobile communication like person-to person SMS and voicemail, as well as mobile commerce applications like auctions or ticket purchasing [3]. In this research, we adopt this definition, and encapsulate the device scope to mobile phone. 2.2 Technology Acceptance Theories Information technology acceptance is relatively well studied with many models and researches. The most widely used four models are the theory of reasoned action (TRA), the technology acceptance model (TAM), the theory of planned behavior (TPB) and the innovation diffusion theory (IDT). The TRA, developed by Fishbein and Ajzen[1], shows that a person's specific behavior is determined by the behavioral intention. In turn, behavior intention is determined by the person's attitude and subjective norm. The TAM, developed by Davis [4], is adapted from TRA, and is specially focus on the behavior of information system acceptance. In this model, “perceived usefulness” and “perceived ease of use” are primarily relevant to the acceptance behaviors. The TPB is a theory of planned behavior developed by Ajzen who is also the developer of TRA [5]. It is an extension of TRA by adding the variable of “perceived behavior control”. The IDT, developed by Rogers [6], explains the behavior of innovation adoption. There are strong affiliations
Acceptance of Mobile Entertainment by Chinese Rural People
337
among the four models. First of all, TPB extends TRA with the additional factor of “perceived behavior control”. Then, TAM is based on TRA, and is focus on information technology acceptance. For the two theories of TAM and IDT, the core constructions are very similar: “perceived usefulness” is very like “relative advantage” and “perceived ease of use” has similar meaning with “complexity” [2][7]. Based on the theories and related researches, totally 27 factors contributing to technology acceptance are concluded. The related measure items for each factor are collected as well. For example, to measure “perceived ease of use”, there are items like “learning to operate information technology would be easy for me” [8]. The 27 factors are listed as below. 1. Perceived usefulness [7][9] 2. Perceived ease of use [4][7][8][9] 3. Perceived complexity [6][10] 4. Perceived enjoyment/fun [11][12] 5. Output quality [9] 6. Relative advantage [7][6] 7. Compatibility [13][7][6][14] 8. Perceived behavioral control [5][12][14] 9. Subjective norm [5][14] 10. Peer influence [14] 11. Word-of-mouth [15][16] 12. Job relevance [17][18][9] 13. Voluntariness [7][12][9] 14. Innovativeness [19] 15. Self-efficacy [20][14][12] 16. Computer anxiety [12] 17. Computer playfulness [12] 18. Technology facilitating conditions [14] 19. Organizational support [10] 20. Visibility [6] 21. Trialability [6] 22. Being-younger [21][22] 23. Perceived modernness [23] 24. Perceived Risk [24] 25. Communication facilitating [21] 26. Perceived novelty 27. Image [7][9]
3 Research Questions The research was aimed to build a comprehensive model for the acceptance of mobile entertainment by Chinese rural people. Two research questions are: 1) what variables can affect the acceptance of mobile entertainment by Chinese rural people? And 2) what are the relationships between these variables and rural people’s mobile entertainment acceptance intention?
338
J. Liu et al.
The previous 27 variables extracted from literatures were generated in the context of information technology. In this study, they are supposed to affect the acceptance of mobile entertainment as well. Besides, new variables are also supposed to be found in this specific technology area. All these variables could have different impact in users’ acceptance, and we are asking which are crucial, and which are not so related. Therefore, the model not only structures all the variables into a few dimensions, but also describing the weights and importance of them, so that the model can be more predictive and more practical.
4 Methodology The study has two steps. Firstly, qualitative user study using in-depth phone interview was adopted to explore the factors contributing to the acceptance of mobile entertainment by Chinese rural people. As the 27 technology acceptance factors extracted from literatures are also supposed to have effects in this study. The aim of the interview was to check if there are factors lost besides the theoretical based 27 factors. Secondly, quantitative survey data were collected to model the relationships between these variables and rural people’s mobile entertainment acceptance intention. 4.1 In-Depth Phone Interview Interview Questions. To explore factors, the open end explorative questions were formulated. The questions were mainly about participants’ experiences using mobile services including entertainment services, their entertainment life, and the behaviors of surrounding people. A question example is “what things happened will promote you to use the mobile phone services, or why do you want to use it?” Participants. Three Chinese rural people, two male and one female, with their ages ranged from 28 to 50 took part in the interview. They all had mobile phones and had mobile entertainment experiences. They were recruited from two provinces of China (Shandong and Shanxi), where the economic and life patterns are different. Procedure. The whole interview was taken by mobile phone with a loudhailer, and recorded on PC. The interview time for each participant was 30~45 minutes. During interview, the dialect of each district was used. Data analysis. Following the Long Table Approach [25], the transcripts from phone interview were printed out, followed with a series of cutting and categorizing of the transcripts. At the end, citations reflecting the same factor were pasted together, and the factor was named and written beside. 4.2 Survey Questionnaire Construction. Twenty-nine items were designed based on the factors found from the interview. All the items were attitude statements concerning mobile entertainment acceptance. For example, from the factor of “perceived usefulness”, an item was designed like “I often need to use mobile entertainment in my daily work and life.” Three of the items were inversed to help excluding the invalid samples during the analysis phase. The 7-point Likert scales were used with different levels of
Acceptance of Mobile Entertainment by Chinese Rural People
339
agreement to the statements from “1=totally disagree” to “7=totally agree”. The definition of “mobile entertainment” was given in front of every page to help the participant to remember. For further analysis, questions about personal information, mobile phone and related technology experience were included in the questionnaire. Participants. The sample size was designed according to Gorsuch [26], who recommended that the subject to item ratio should be larger than 5. Therefore, at least 145 participants should fill in the questionnaire. We finally got 150 valid questionnaires in the research. All the participants were from the rural areas of Dezhou City in Shandong province of China. Their ages ranged from 16 to 56, with an average of 33.39. Most (51.3%) participants have an education level of junior high school. All the participants had mobile phones, with 87.3% had used more than one mobile before. The prices of most (51.4%) participants’ mobiles were in the range of 500-1000RMB. Procedure. As most rural people are not familiar with web-based questionnaires, paperbased questionnaires were given to them face to face. In order to keep the validity of each questionnaire, the entire process of answering the questions were assisted by the survey conductor. After filling the survey, each participant got 20 RMB reward. Data Analysis. Among the 29 items of the questionnaire, 3 of them were reversed items aiming to identify invalid samples. They were excluded for further analysis. Therefore, 26 items were utilized in this phase of data analysis. The data analysis was conducted by three steps. Firstly, the internal consistency of the questionnaire could be tested by Cronbach Alpha calculation [27]. Secondly, exploratory factor analysis was used to find the structural characteristics among the items. We used the KaiserMeyer-Olkin (KMO) testing to analyze if the items had enough common information. Then the “principal component analysis” method for factor extraction was adopted, and the rotation method “Varimax with Kaiser Normalization” was used to further interpret these extracted factors. After the factor extraction, some of the original items were eliminated or grouped. And each factor was named by its included items. Finally, a visualized model was build to describe the results comprehensively.
5 Results 5.1 Interview Results Eighteen factors were explored from the interviews. Seventeen among them are matched with those concluded from literatures. They are: 1) perceived usefulness; 2) perceived ease of use; 3) perceived complexity; 4) perceived enjoyment/fun; 5) output quality; 6) relative advantage; 7) compatibility; 8) perceived behavioral control; 9) job relevance; 10) voluntariness; 11) innovativeness; 12) technology facilitating conditions; 13) organizational support; 14) visibility; 15) perceived risk; 16) communication facilitating; and 17) perceived novelty. One new factor which cannot match any one from the literatures was named as “cost”. “Cost” means the charges of a particular service which could influence users’ acceptance. Citation examples are like: “Is listening to music for free?”and “I hope the function/service totally for free.” Although there are other 10 factors which were obtained from literatures were not explored from the phone interview, we cannot delete them at this step, for the limit of
340
J. Liu et al.
sample size. The aim of the interview was achieved to check if there are factors lost besides theoretical based 27 technical acceptance factors. Therefore, all those 28 factors were tested and analyzed quantitatively in the next phase of survey. 5.2 Survey Results Firstly, the Cronbach’s alpha of the 26 items is 0.812. For each item, this index can not increase significantly if the item is deleted. From literatures, an alpha (α) value of 0.70 or above is considered to indicate strong internal consistency [28]. For exploratory research, an alpha value of 0.60 or above is also considered significant [29]. This indicates that all the 26 items in our study has a high internal consistency, therefore they were all included for further analysis. Secondly, two iterates of exploratory factor analysis were conducted. After the first run, according to the criteria from literatures [29][30], we eliminated a single-item factor (item 21), and an item (item 19) with factor loadings significantly less than 0.45. In the second run, seven factors were extracted from the left 24 items, with 61.79% of variance be explained. For both iterates, the exploratory factor analysis method is appropriate because the KMO were 0.783 for the initial 26 items, and 0.793 after eliminating the two items, both more than 0.7. The extracted factors were named by the common meanings of the items included (Table 1). The first factor social influence shows that others’ suggestion and social norms can influence the users’ acceptance for mobile entertainment. Also people like to accept the products that promote their social image or enhance social communication. This factor is related with the theoretical factors of being-younger, peer influence, subjective norm, word-of-mouth, and organizational support. The second factor technology and service quality is about the quality and convenience of mobile entertainment technology and service. It includes the former theoretical factors of trialability, technology facilitating conditions, output quality, innovativeness, and visibility. The third factor entertainment utility means the emotional and entertainment utility of mobile entertainment. It is related with the theoretical factors of perceived enjoyment/fun, voluntariness, perceived usefulness, and perceived modernness. The fourth factor is the users’ perceived simpleness and certainty of the interaction with the product. People tend to accept the products when the interaction is simple and the consequences are certain. It is formed by two theoretical factors of perceived ease of use and perceived risk. The fifth factor self-efficacy is self perception of being able to use the service or product. It is related with theoretical factors of self-efficacy, perceived complexity, and perceived behavioral control. The sixth factor perceived novelty indicates that people are like to accept novel products and services. And the last factor is the cost of familiar or unfamiliar mobile entertainment services. The reliability and validity of a questionnaire’s construction are confirmed. The internal consistency methods are adopted to establish reliability in this study. After eliminating two items during factor analysis, the Cronbach’s alpha of this measuring instrument is 0.814, which indicates strong reliability according to former discussion. The seven factors account for 61.79% of the total variance and factor loadings range from 0.44 to 0.81. So the construct validity of the instrument is acceptable.
Acceptance of Mobile Entertainment by Chinese Rural People
341
Table 1. Factor naming Factor Name Social influence
Technology and service quality
Entertainment utility
Simpleness and certainty
Selfefficacy Perceived novelty Cost
Questions involved 12. Mobile entertainment can make me feel younger. 14. If my friends think I should use mobile entertainment, I will try it. 20. I feel that people around (family member, friends, etc.) think I should use some mobile entertainment. 18. If my friends tell me some mobile entertainment service, I will try it. 17. I think mobile entertainment provide me more chances to communicate with others (family member, friends, etc.). 29. I hope I can try a ring tone before download it. 27. If some problem happens during the mobile entertainment process, I hope to get help and instruction easily. 22. If the mobile phone can take clear and good-quality photos and videos, I would like to use the function. 28. I like novel mobile games rather than those people are familiar with. 15. Many people around me use mobile entertainment. 5. I find mobile entertainment enjoyable. 7. I like to take mobile as an entertainment tool, and play with it voluntarily. 1. In daily work and life, I often need to use mobile entertainment. 8. Mobile entertainment can keep me up with the times. 2. I will give up a mobile entertainment if it’s too hard to use. (negative loading) 25. I concern a lot of mobile monthly tariff. 9. I think there are risks in some mobile entertainment process. (negative loading) 10. I can use mobile entertainment without any help. 16. I don’t feel trouble in using mobile entertainment. 6. I’m capable enough to use mobile entertainment. 11. Mobile entertainment is novel to me. 26. I decide whether to use a mobile entertainment by it’s price. (negative loading) 24. A low price is very important for me to buy a mobile. 23. I think it’s more convenient to take photos/videos using a mobile than using a particular camera. (negative loading)
Finally, according to the results of factor analysis, the mobile entertainment acceptance (MEA) model for Chinese rural people was built (Figure 1). The model has two main aspects: First, seven factors that influence Chinese rural people’s mobile entertainment acceptance, represented by seven ellipses in the visualized model (Figure 1). Second, the importance order of each factor, which is ranked by the factor eigenvalue: social influence (3.170), technology and service quality (2.875), entertainment utility (2.524), simpleness and certainty (1.713), self-efficacy (1.644), perceived novelty (1.510), and cost (1.394). Factor eigenvalue is the measurement of explained variance. For each factor, the higher the eigenvalue is, the more variance it can explain, therefore in the model the more important the factor is. In our visualized model (Figure 1), the area of each ellipse represents the importance grade, for example, the first ranked factor “social influence” has the largest area.
342
J. Liu et al.
Fig. 1. Mobile entertainment acceptance (MEA) model for Chinese rural people
6 Discussion and Conclusion In the research, the MEA model considers users, technologies and the environment comprehensively. It not only involves most of the theoretical factors that have been proved to influence users’ technology acceptance, but also structures them into seven main factors and gives the importance weights of each one. As a result, a whole picture is provided for technology acceptance theory. Additionally, a new factor “cost” was found to effect users’ acceptance, at least for rural people’s mobile entertainment acceptance. It suggests a novel point of view and makes the theory more comprehensive. Based on the comprehensive model, we can easily draw a series of practical suggestions for mobile entertainment service and ecosystem design. From the empirical index, practitioners can also get references for which is more important and which is less. For example, the model shows that Chinese rural people are most influenced by social factors when considering whether to accept a new technology, so designers and marketers can put social strategies first priority. However, the model still needs to be refined to be more predictive. First is because all the seven factors are compared and weighted by the factor eigenvalue. This method simply puts them in the same level and ignores the internal relationships among factors. Therefore, further analysis, like path analysis or regression is needed to explore their real relationships. Second, the sample is from a single rural area of China. However, several different types of rural society exist all around the world, varying in economics, culture, education, weather, and so on. All these may influence people’s technology acceptance pattern. A more systematic sample will be explored to verify the model. In conclusion, through both qualitative and quantitative research, a factor-based model is built to invest the mobile entertainment acceptance by Chinese rural people.
Acceptance of Mobile Entertainment by Chinese Rural People
343
Seven factors are involved as well as their importance index. This research provides a comprehensive approach in technology acceptance theory. It can also help practitioners to better understand the rural user group and improve their products accordingly.
References 1. Fishbein, M., Ajzen, I.: Belief, attitude, intention and behavior: An introduction to theory and research. Addison-Wesley, Reading (1975) 2. Wang, L.: Variables contributing to old adults acceptance of IT in China, Korea and USA. Unpublished PhD proposal (2007) 3. Wiener, S.N.: Terminology of Mobile Entertainment: An Introduction. In: Mobile Entertainment Forum (2003) 4. Davis, F.D.: A technology acceptance model for empirically testing new end-user information systems: Theory and results. Unpublished Doctoral dissertation, MIT Sloan School of Management, Cambridge, MA (1986) 5. Ajzen, I.: The Theory of Planned Behavior. Organizational Behavior and Human Decision Processes 50(2), 179–211 (1991) 6. Rogers, E.M.: Diffusion of innovations, 4th edn. Etats-Unis Free Press, New York (1995) 7. Moore, G.C., Benbasat, I.: Development of an Instrument to Measure the Perceptions of Adopting an Information Technology Innovation. Information Systems Research 2(3) (1991) 8. Davis, F.D.: Perceived usefulness, perceived ease of use and user acceptance of information technology. MIS Quarterly 13(3), 319–340 (1989) 9. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 46(2), 186–204 (2000) 10. Igbaria, M., Parasuraman, S., Baroudi, J.: A motivational model of microcomputer usage. Journal of Management Information Systems 13(1), 127–143 (1996) 11. Hsu, C.-L., Lu, H.-P.: Consumer behavior in online game communities: A motivational factor perspective. Computers in Human Behavior 23, 1642–1659 (2007) 12. Venkatesh, V., Davis, F.D.: A Theoretical Extension of the Technology Acceptance Model: Four Longitudinal Field Studies. Management Science 46(2), 186–204 (2000) 13. Igbaria, M., Schiffman, S.J., Wieckowshi, T.S.: The respective roles of perceived usefulness and perceived fun in the acceptance of microcomputer technology. Behaviour and Information Technology 13(6), 349–361 (1994) 14. Taylor, S., Todd, P.A.: Understanding Information Technology Usage: A Test of Competing Models. Information Systems Research 6(2) (1995) 15. Lee, S.M.: South Korea: From the land of morning calm to ICT hotbed. Academy of Management Executive 17(2) (2003) 16. Webster, C.: Influences upon consumer expectations of services. Journal of Services Marketing 5(1), 5–17 (1991) 17. Black, J.B., Kay, D.S., Soloway, E.M.: Goal and plan knowledge representations: From stories to text editors and programs. In: Carroll, J.M. (ed.) Interfacing Thought, pp. 36–60. The MIT Press, Cambridge (1987) 18. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: Extrinsic and Intrinsic Motivation to Use Computers in the Workplace. Journal of Applied Social Psychology 22(14), 1111–1132 (1992)
344
J. Liu et al.
19. Park, C., Jun, J.-K.: A cross-cultural comparison of Internet buying behavior Effects of Internet usage, perceived risks, and innovativeness. International Marketing Review 20(5), 534–553 (2003) 20. Compeau, D.R., Higgins, C.A.: Computer self-efficacy: development of a measure and initial test. MIS Quarterly 19, 189–211 (1995) 21. Boulton-Lewis, G.M., Buys, L., Lovie-Kitchin, J., Barnett, K., David, L.N.: Ageing, Learning, and Computer Technology in Australia. Educational Gerontology 33(3), 253– 270 (2007) 22. Stark-Wroblewski, K., Edelbaum, J.K., Ryan, J.J.: Senior Citizens Who Use E-mail. Educational Gerontology 33(4), 293–307 (2007) 23. White, J., Weatherall, A.: A grounded theory analysis of old adults and information technology. Educational Gerontology 26(4), 371–386 (2000) 24. Dowling, G.R., Staelin, R.: A Model of Perceived Risk and Intended Risk-Handling Activity. The Journal of Consumer Research 21(1), 119–134 (1994) 25. Krueger, R., Casey, M.: Focus Groups: A Practical Guide for Applied Research, 3rd edn. Sage Publications, Inc, Thousand Oaks (2000) 26. Gorsuch, R.L.: Factor analysis, 2nd edn. Lawrence Erlbaum, Hillsdale (1983) 27. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951) 28. Nunnally, J.C.: Psychometric Theory. McGraw-Hill, New York (1978) 29. Hair, J.F., Anderson Jr., R.E., Tatham, R.L., Black, W.C.: Multivariate Data Analysis. Prentice-Hall International, New Jersey (1995) 30. Stiggelbout, A.M., Molewijk, A.C., Otten, W., Timmermans, D.R.M., van Bockel, J.H., Kievit, J.: Ideals of patient autonomy in clinical decision making: a study on the development of a scale to assess patients’ and physicians’ views. Journal of Medical Ethics 30(3), 268–274 (2004)
Universal Mobile Information Retrieval David Machado, Tiago Barbosa, Sebastião Pais, Bruno Martins, and Gaël Dias Centre of Human Language Technology and Bioinformatics, University of Beira Interior 6201-001, Covilhã, Portugal {david,tiago,sebastiao,brunom,ddg}@hultig.di.ubi.pt
Abstract. The shift in human computer interaction from desktop computing to mobile interaction highly influences the needs for new designed interfaces. In this paper, we address the issue of searching for information on mobile devices, an area also known as Mobile Information Retrieval. In particular, we propose to summarize as much as possible the information retrieved by any search engine to allow universal access to information. Keywords: Mobile Information Retrieval, Clustering of Web Page Results, Automatic Summarization.
1 Introduction and Related Work The shift in human computer interaction from desktop computing to mobile interaction highly influences the needs for new designed interfaces. In this paper, we address the issue of searching for information on mobile devices, an area also known as Mobile Information Retrieval. Within this scope, two issues must be specifically tackled: web search and web browsing. On the one hand, small size screens of handheld devices are a clear limitation to displaying long lists of relevant documents which induce repetitive scrolling. On the other hand, as most web pages are designed to be viewed on desktop displays, web browsing can interfere with users’ comprehension as repetitive zooming and scrolling are necessary. To overcome the limitations presented by current search engines to handle information on mobile devices, we propose a global solution to web search and web browsing based on clustering of web page results and web page summarization. Most of the projects on mobile search deal with organizing the information to fit into small screens without benefiting from new trends in Information Retrieval presented in [1] and [2]. Indeed, projects such as Yahoo Mobile1, Google Mobile2 or Live Search Mobile3 present information in a classic way by listing web page results as it is shown in Figure 1. In order to show as many results as possible on the screens of PDAs or smart phones, layout structures are usually redesigned to keep to their basics. In fact, commercial projects have mainly privileged services over location on 1
http://mobile.yahoo.com/yahoo http://www.google.com/mobile/ 3 http://www.livesearchmobile.com/ 2
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 345–354, 2009. © Springer-Verlag Berlin Heidelberg 2009
346
D. Machado et al.
(a)
(b)
(c)
(d)
Fig. 1. (a) Google Mobile. (b) Yahoo Mobile. (c) Live Search Mobile. (d) Searchme Mobile.
mobile devices such as news, weather forecast or maps rather than providing new ways of searching for information, maybe to the exception of local search facilities. Other projects have proposed different directions. In particular, Searchme Mobile4 is certainly one of the first mobile search engine to categorize web page results as shown in Figure 1d. By doing so, it is clear that web search is made easier to the user. Indeed, the more the information is condensed into chunks of valuable information the more it is accessible to any user (paired or impaired) in any location (car, home, street, etc.). However, the solution implemented by Searchme is based on a set of predefined categories for each query term. As a consequence, the categorization can only be performed for well-known queries. In the case the category is not known, no search results are displayed. This solution is clearly unsatisfactory as one may want to query any term in any language over the all web. Within the VipAccess project5, we propose to cluster web page results “on the fly” independently of the language thus allowing to searching for any query in any language over the entire web and providing a user-friendly interface for mobile devices (Figure 2b). For that purpose, we propose to cluster web page results based on a new clustering algorithm CBL (Clustering by Labels) especially designed for web page results. Comparatively to Searchme, we propose a more sophisticated way to cluster web page results, which does not depend on pre-defined categories, and as such can be applied “on the fly” as web page results are retrieved from any domain, any language or any search engine. In terms of visualization of web page results, clustering may drastically improve users’ satisfaction rates as only few selection items are presented to the user. However, an extra-step in the search process is introduced which may interfere with the users’ habits to scroll lists of web page results. In this paper, we also propose different visualizations which try to make the most of both techniques i.e. lists of web page results and lists of clusters of web pages results. 4 5
http://m.searchme.com/ This project is funded by the Portuguese Fundação para a Ciência e a Tecnologia with the reference PTDC/PLP/72142/2006.
Universal Mobile Information Retrieval
(a)
(b)
347
(c)
Fig. 2. (a) VipAccess Mobile interface. (b) VipAccess Mobile with clusters. (c) VipAccess Mobile for summarization.
In terms of summarization of web contents, accessing summaries of information instead of full information may be a great asset for users of mobile devices. Indeed, most web pages are designed to be viewed on desktop displays. As a consequence, users may find it hard to evaluate the importance of a document as they have to come through all of it by repetitive zooming and scrolling. Some solutions have been proposed by content providers to overcome these drawbacks. They usually require an alternate trimmed-down version of documents prepared beforehand or the definition of specific formatting styles. However, this situation is undesirable as it involves an increased effort in creating and maintaining alternate versions of a web site. Within the VipAccess project, we propose to automatically identify the text content of any web page and summarize it in an efficient way so that web browsing is limited to its minimum. For that purpose, we propose a new architecture for summarizing Semantic Textual Units [3] based on efficient algorithms for semantic treatment such as the SENTA multiword extractor [4] which allows real-time processing and languageindependent analysis of web pages thus proposing quality content extraction and visualization (Figure 2c).
2 Clustering Web Page Results The categorization of web page results is obtained by the implementation of a new clustering algorithm called CBL (Clustering by Label) which is specifically designed to cluster web page results and is inspired from the label-derived approach. In terms of clustering algorithms, two different approaches have been proposed: label-derived clustering [1][5][6][7] and document-derived clustering [2][8][9]. The first approach defines potential labels and agglomerates documents which share common labels and the second groups similar documents based on text similarities and extracts potential
348
D. Machado et al.
labels at the end of the process. CBL is a label-derived clustering algorithm and as such, the first step of the clustering process aims at identifying potential labels. 2.1 Label Identification Most methodologies identify potential labels based on the extraction of frequent itemsets. A frequent itemset is a set of items that appear together in more than a minimum fraction of the whole document set. For that purpose, different language-independent and language-dependent approaches have been proposed. In the first case, [5] implement a suffix tree-like structure and [6] use association rules. In the second case, [1] propose to extract common gapped sentences from linguistically enriched web snippets and [7] extract frequent word sequences based on suffix-arrays which are weighted using the well-know tf.idf score. As one may want to search over the entire web in any language and any domain, it is important that the clustering algorithm only depends on language-independent features. Within this scope, the identification of relevant labels based on frequent itemsets mainly takes frequency of occurrence as a clue for extraction. However, this methodology suffers from the poor quality of web snippets which mainly contain illformed sentences with many repetitions. To overcome this drawback, we propose to weight strings based on three different word distributions and consequently extract potential labels. Internal Value of a String. If a string6 appears alone in a chunk of text separated on both sides by any given delimiter (such as a HTML tag or a comma), this string is likely to be meaningful. This characteristic is weighted in Equation (1) where w is any string, A(w) is the number of occurrences where w appears alone in a chunk and F(w) is the total number of occurrences of w. .
(1)
External Value of a String 1. The bigger the number of strings that co-occur with any string w both on its left and right contexts, the less meaningful this string is likely to be. This characteristic is weighted in Equation (2) where w is any string and WIL(w) (resp. WIR(w)) is the number of strings which appear on the immediate left (resp. right) context of string w. .
(2)
External Value of a String 2. The bigger the number of different strings that cooccur with any string w both on its left and right contexts compared to the number of co-occurring strings on both contexts, the less meaningful this string is likely to be. This characteristic is weighted in Equation (3) where w is any string, WDL(w) (resp. WDR(w)) is the number of different strings which appear on the immediate left (resp. right) context of string w and FH(w) is equal to max[F(w)], for all w. 6
In our context, a string is any sequence of characters separated by spaces or other common linguistic delimiters such as dots, commas, etc.
Universal Mobile Information Retrieval
349
.
(3)
Based on these three characteristics, we propose to weight all strings from the web snippets as in Equation (4) such that the smaller the W(w) value, the more meaningful the string w.
1
,
0.5
,
0.5
(4)
In Table 1, we present the 30 most relevant results of our weighting score W(.) for the query term “programming” searched over Google search engine7, Yahoo search engine8 and MSN search engine9 accessed via respective web services. Table 1. The first 30 strings ordered by W(.) for the query “programming” String (1-5) Articles Wikibooks Computers Compilers Subject
String (6-10) Perl Java Php Training Forums
String (11-15) tutorials c wiki security database
String (16-20) Cgi Category Knuth Home Advanced
String (21-25) documentation News Net Unix Internet
String (26-30) tips science object-oriented site downloads
2.2 Clustering by Labels Once all important words have been identified, these are going to play a crucial role in the process of clustering following the label-derived approach. Within this scope, many algorithms have been proposed based on frequent item sets [1][5][6][7]. In this paper, we propose a new algorithm called Clustering by Label (CBL) which objective is to group similar documents around meaningful word anchors i.e. labels. The algorithm is based on three steps: pole creation, unification and absorption, and labeling. Pole Creation. We first need to initialize the algorithm so that we can start from potential meaningful labels. For that purpose, all words with less than a given threshold α10, which cover more than two urls, are proposed as initial cluster centers i.e. poles. For each start pole, a list of urls is built. An url is added to the list if it contains the pole word before a β position of the sorted relevant word list of each url. In particular, this allows to controlling the number of urls which are added to each pole since low β will produce smaller clusters and on the opposite, high values will join more results.
7
http://www.google.com http://www.yahoo.com 9 http://www.msn.com 10 Most meaningful strings. 8
350
D. Machado et al.
Union and Absorption. The next stage aims at iteratively unifying clusters which contain similar urls. For that purpose, we define two types of agglomerations: Union, when two clusters contain a significant number of common urls and share similar size in terms of cluster number; Absorption, when they share many common urls but are dissimilar in size. As a consequence, we define two proportions: P1, the number of common urls between two clusters divided by the number of urls of the smaller cluster and P2, the number of urls in the smaller cluster divided by the number of urls in the bigger cluster. The following algorithm is then iterated. For each cluster, P1 is calculated over all other clusters. Then for each pair of clusters, if P1 is higher than a constant γ, then we evaluate P2 between both clusters. If P2 is higher than a constant δ, then the pair of clusters is added to the Union list otherwise it is integrated in the absorption list. Once all clusters have been covered, both union lists and absorption are treated. The union list is first processed as follows. For each cluster pair in the union list11, each two clusters are joined into the original cluster with the highest W(.) score for its label. At each step of this process, clusters indexes are substituted and unified clusters are removed in the union list to keep a list of updated clusters. Then the absorption list is processed. Iteratively select the pair of clusters which contains any cluster which cannot be absorbed by any other one in the absorption list. Once encountered, this cluster absorbs the cluster which forms the pair with it, cluster indexes are updated and useless clusters removed. Both lists have been updated and the initial process iterates, thus enabling flat clustering (first step of the algorithm) or hierarchical clustering (all steps of the algorithm). Moreover, the CBL algorithm allows soft clustering as urls may be contained in different clusters. Finally, clusters are labeled. Labeling. By union and absorption, each cluster may contain different candidate labels. However, it may be the case that urls in the cluster contain more meaningful words (i.e. multiword units) than the highly scored single words. As a consequence, multiword units are extracted from the web snippets agglomerated in the clusters by applying a methodology proposed by [18] implemented with suffix-arrays for real-time processing12. Then, each multiword unit is compared to the potential labels and if it contains one of the single words it is evaluated by frequency if it must replace the single word label. Finally, the best scoring labels, with a given threshold, are chosen as final labels. 2.3 Visualization In terms of visualization of web page results, clustering may drastically improve users’ satisfaction rates as only few selection items are presented to the user on the small screens of mobile devices (Figure 2b). However, an extra-step in the search process is introduced which may interfere with the users’ cognitive process to search for information. Indeed, the user is used to find web page results after the first selection. In order to avoid the gap between the classic view (lists of web pages) and the cluster view (list of clusters), we propose to display the most relevant web page result of each encountered cluster in the form of a list as shown in Figure 3a. As such, the user is proposed the best possible coverage of its query with the minimum number of 11 12
Both the union and the absorption lists are ordered by W(.) score of the label. This method has proved to be particularly suited for web snippets processing.
Universal Mobile Information Retrieval
351
web page results thus reducing scrolling and maintaining the cognitive process for information search. If the user wants to keep the classic view, this option is available but always with the indication of the name of the belonging cluster so that the user can navigate to any given cluster and visualize only its members (Figure 3b). In order to take into account that the users of mobile devices may use their device in different contexts, such as car, classroom or street, we also propose a full-screen visualization (Figure 3c). In this case, the best first web page result of the most relevant cluster is presented to the user. The next result is obviously the best first web page result of the second most relevant cluster, and so on and so forth.
(a)
(b)
(c)
Fig. 3. (a) Clustering visualization. (b) List visualization. (c) Full-screen visualization.
The visualization issue of web page results has never been addressed as far as we know, although it is at the core of the success or failure of new techniques in Information Retrieval. Indeed, most search engines which propose interfaces with clustering of web page results13 are not as popular as classic search engines although they provide a better understanding of the retrieved information. A reason for that may be the lack of newly designed interfaces for the sake of information search.
3 Web Page Summarization After clustering web page results, scrolling and zooming must also be kept to its minimum for web browsing. For this purpose, we propose a new architecture to summarize Semantic Textual Units [3] which embeds an efficient algorithm for multiword extraction [4]. 3.1 Semantic Textual Units and Multiword Units One main problem to tackle is to define what to consider as a relevant text in a web page. Indeed, web pages often do not contain a coherent narrative structure. So, the 13
For example, http://www.clusty.com or http://www.searchme.com
352
D. Machado et al.
first step of any system is to identify rules for determining which text should be considered for summarization and which should be discarded. For this purpose, [3] propose to identify Semantic Textual Unit (STU). STUs are page fragments marked with HTML markups which specifically identify pieces of text following the W3 consortium specifications. It is clear that the STU methodology is not as reliable as any language model for content detection [10] but on the opposite it allows fast processing of web pages. Once each STU has been identified in the web page it is processed with the SENTA software [4] to identify and mark relevant phrases in it. SENTA is statistical parameter-free software which can be applied to any language without tuning and as a consequence is totally portable. Moreover, its efficient implementation shows time complexity Θ(N log N) where N is the number of words to process which allows the extraction of relevant phrases in real-time. 3.2 Extractive Text Summarization Extractive text summarization aims at finding the most significant sentences in a given text. So, a significance score must be assigned to each sentence in a STU. The sentences with higher significance naturally become the summary candidates and a compression rate defines the number of sentences to extract. For this purpose, we implement the TextRank algorithm [11] combined with an adaptation of the wellknown inverse document frequency, the inverse STU frequency (isf) to weight word relevance. The basic idea is that highly ranked words with high isf are more likely to represent relevant words in the text and as a consequence provide good clues to extract relevant sentences for the summary. Within our purpose, each STU is first represented as an unweighted oriented graph being each word connected to its successor following sequential order in the text. Following the TextRank algorithm, the score S(.,.) of any word wi in any stu is defined as in Equation (5) where In(wi) is the set of words that point to wi, Out(wj) is the set of words that the word wj points to and d is the damping factor set to 0.85. ,
,
∑
1
|
|
.
(5)
Then, each word is weighted as in Equation (6) based on its graph-based ranking and its relevance in the text based on its inverse STU frequency where N is the number of STUs in the text and stuf(w) is the number of STUs the word w appears in. .
,
,
.
(6)
Finally, the sentence significance weight is defined as in [12], thus giving more weight to longer sentences, as shown in Equation (7) where |S| stands for the number of words in sentence S, wi is a word in S and max(|S|) is the length of the longest sentence in the STU. ,
| |
∑
.
, | |
| |
.
(7)
Universal Mobile Information Retrieval
353
In order to present as much information of the web page as possible so that its understanding is eased, the best scoring sentences of each STUs are retrieved and presented to the user as in Figure 2c14. As such, the user gets the most of the web page in a small text excerpt easy to read and scroll.
4 Conclusions and Future Work In this paper, we proposed a global solution to web search and web browsing for handheld devices based on web page results clustering, web page summarization and new ideas for visualization. In order to enable full information access to any users (paired or impaired), we also propose a speech-to-speech interface which is used as the exchange mode which may allow to achieving greater user satisfaction [13] in situations where the hands are not free [14], whenever reading is difficult [15], or in situations of mobility [16]. Moreover, we propose a location search based on Global Positioning System (GPS) which automatically expands the original query with the closest city name to the user’s location. In particular, a test of the interface has been conducted in the context of visually impaired people which received positive feedback although coherent and exhaustive evaluation is still needed in the way [17] explain.
References 1. Ferragina, P., Gulli, A.: A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Journal of Software: Practice and Experience 38(2), 189–225 (2008) 2. Campos, R., Dias, G., Nunes, C., Nonchev, B.: Clustering of Web Page Search Results: A Full Text Based Approach. International Journal of Computer and Information Science 9(4) (2008) 3. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices. In: 10th International World Wide Web Conference (2000) 4. Gil, A., Dias, G.: Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora. In: Workshop on Multiword Expressions of the International Conference of the Association for Computational Linguistics (2003) 5. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th Annual International SIGIR Conference (1998) 6. Fung, P., Wang, K., Ester, M.: Large Hierarchical Document Clustering using Frequent Itemsets. In: SIAM International Conference on Data Mining (2003) 7. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on Singular Value Decomposition. In: Intelligent Information Systems Conference (2004) 8. Jiang, Z., Joshi, A., Krishnapuram, R., Yi, Y.: Retriever Improving Web Search Engine Results using Clustering. Journal of Managing Business with Electronic Commerce (2002) 14
Compression rate is defined by the user in the menu options.
354
D. Machado et al.
9. Dias, G., Pais, S., Cunha, F., Costa, H., Machado, D., Barbosa, T., Martins, B.: Hierarchical Soft Clustering and Automatic Text Summarization for Accessing the Web on Mobile Devices for Visually Impaired People. In: 22nd International FLAIRS Conference (2009) 10. Dolan, W.B., Quirk, C., Brockett, C.: Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In: Interantional Conference on Computational Linguistics (2004) 11. Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing (2004) 12. Vechtomova, O., Karamuftuoglu, M.: Comparison of Two Interactive Search Refinement Techniques. In: Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting (2004) 13. Lee, K.W., Lai, J.: Speech versus Touch: A Comparative Study of the Use of Speech and DTMF Keypad for Navigation. International Journal Human-Computer Interaction 19, 343–360 (2005) 14. Parush, A.: Speech-based Interaction in a Multitask Condition: Impact of Prompt Modality. Human Factors 47, 591–597 (2005) 15. Fang, X., Xu, S., Brzezinski, J., Chan, S.S.: A Study of the Feasibility and Effectiveness of Dual-modal Information Presentations. International Journal Human-Computer Interaction 20, 3–17 (2006) 16. Oviatt, S.L., Lunsford, R.: Multimodal Interfaces for Cell Phones and Mobile Technology. International Journal of Speech Technology 8, 127–132 (2005) 17. Fallman, D., Waterworth, J.A.: Dealing with User Experience and Affective Evaluation in HCI Design: A Repertory Grid Approach. In: Conference on Human Factors in Computing Systems (2005) 18. Frantzi, K.T., Ananiadou, S.: Retrieving Collocations by Co-occurrences and Word Order Constraint. In: 16th International Conference on Computational Linguistics (1996)
ActionSpaces: Device Independent Places of Thought, Memory and Evolution Rudolf Melcher, Martin Hitz, and Gerhard Leitner University of Klagenfurt, Universitätsstraße 65-67, 9020 Klagenfurt, Austria {rudolf.melcher,martin.hitz,gerd.leitner}@uni-klu.ac.at
Abstract. We propose an inherently three-dimensional interaction paradigm which allows individuals to manage their personal digital artifact collections (PAC) regardless of the specific devices and means they are using. The core of our solution is to provide unified access to all user artifacts normally spread across several repositories and devices. Not till then individuals may foster and evolve persistent multi-hierarchical artifact structures (PAS) fitting their cognitive needs. PAS subsets can be arranged and meaningfully related to virtual habitats or even mapped to physical contexts and environments they are frequenting to solve their tasks. Keywords: 3DUI, interaction paradigm, semantic desktop metaphor, ubiquitous computing, distributed computing, distributed cognition, mixed realities, concept maps, virtual file system, post-WIMP, post-desktop, digital artifacts, information space.
1 Introduction In our work we emphasize the need for a paradigm change as a direct consequence of complex personal device infrastructures. We therefore try to determine the minimum characteristics facilitating device-independent interaction in the face of virtual artifacts (i.e. digital entities like files, folders, repositories, emails, contacts, appointments, web pages, database-views, services views, rendered objects, etc.) Virtual artifact management is a difficult cognitive task on its own at least with regard to artifact classification. But now we have to discuss additionally where multi-device usage will finally lead to and how its potentials may be leveraged. In our opinion techniques subsumed as cloud computing are not sufficient. We recognize a demand for user-centric, deviceindependent, and non-hierarchical structures carrying the artifacts of individuals persistently while being ubiquitously accessible. Our approach is to determine an adequate set of concepts and methods for so-called workspace-level integration [1]. The driving vision and assumption is that in the course of time ubiquitous augmented realities (UARs) related to user’s cognitive panoramas will succeed. In terms of system architecture we propose two new layers reorganizing the artifact management between hardware and operating systems on the one hand and applications on the other hand. For all that, the feature sets and capabilities specific to distinct appliances shall remain unlimited. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 355–364, 2009. © Springer-Verlag Berlin Heidelberg 2009
356
R. Melcher, M. Hitz, and G. Leitner
Fig. 1. Four layer approach to facilitate device-independent interaction
First, we outline the middleware layer called SubFrame which masks the specific file handling mechanisms and unifies the artifact references across an individual’s heterogeneous device infrastructure. Persistent user-specific structures for virtual artifacts (PAS) are accessed, manipulated, and stored on this basis. Second, subsets of PAS are embedded and presented in 3D work spaces called ActionSpaces. We define ActionSpaces as session-persistent geometrical areas populated by virtual artifacts according to an individual’s concept map or mental map. An ActionSpace may be rendered at arbitrary points on Milgram’s reality-virtuality continuum [2]. Accordingly, it is an extension and virtualization of workplaces and habitats normally experienced in the physical world with a view to tool and artifact arrangements. ActionSpaces are more flexible than virtual desktops. Current desktop implementations are bound to screen characteristics and represent only a small fraction of possible manifestations along Milgram’s continuum. In this sense, ActionSpaces manifold virtual desktop scenarios on common PC-systems as well. There seems to be a firm conviction in the HCI community that 3DUI technology is only of use in complex and expensive visualization scenarios and probably will never penetrate common workplaces, with the exception of gaming. In opposition to that we are convinced, that inherently three-dimensional work-spaces represent an important step towards ubiquitous augmented realities (UARs) [3]. We found two intriguing arguments for this. On the one hand 3DUIs are the most comprehending virtual hull to arrange and render all kinds of virtual artifacts such as text, image, video, sound, animated 3D graphics, etc. and to express relations between them. On the other hand virtual three-dimensional spaces can be mapped to physical locations without the slightest effort, provided appropriate paradigms are applied. This, in turn, is an important foundation for pervasive computing scenarios and location based services. With the two layers proposed we follow a top-down approach in order to develop a device-independent thought-pattern for interaction.
2 Related Work The research assignment to specify a device-independent and user-centric interaction paradigm is related to many distinct research areas partially summarized here. This
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
357
multi-disciplinary interdependence, the need to develop a basic understanding of the heterogeneous issues related and the difficulty to specify formal models makes it difficult to prove the assumptions, potentials and consequences of our approach in this early stage. The potentials and consequences have to be discussed separately and investigated issue by issue in view of every distinct area after a complete prototypical framework is available. For now we want to provoke an interdisciplinary discussion about its need. Cognitive Models and Semantic Desktops As Winograd and Flores stated “we recognize that in designing tools we are designing ways of beeing!"[4] There is a lot of empirical evidence for the defects of the desktop metaphor discussed closely in [1]. Regarding possible workspace-level implementations for non-hierarchical artifact structures we can build on the semantic desktop paradigm defined by Sauermann et al.: “A Semantic Desktop is a device in which an individual stores all her digital information like documents, multimedia and messages. These are interpreted as Semantic Web resources, each is identified by a Uniform Resource Identifier (URI) and all data is accessible and queryable as RDF graph. Resources from the web can be stored and authored content can be shared with others. Ontologies allow the user to express personal mental models and form the semantic glue interconnecting information and systems. Applications respect this and store, read and communicate via ontologies and Semantic Web protocols. The Semantic Desktop is an enlarged supplement to the user’s memory.” [5] However, by some means or other these semantic desktop solutions remain application-, platform-, or device-dependent. These dependences are predetermined breaking points for the whole paradigm. The user-centrism aimed is not consistently portable and the visualization of ontologies remains problematic. 3DUI, MR, UARs Kirsh argues that „how we manage the spatial arrangement of items around us is not an afterthought: it is an integral part of the way we think, plan, and behave.”[6] Gregory Newby argues that in exosomatic memory systems, the information spaces (geometry) of systems will be consistent with the cognitive spaces of their human users. [7] Following this discussion we conclude that 3D user interfaces (3DUIs) summarized by Bowman et al. [8] and AR interfaces discussed by Haller et al. [9] can fit these requirements. Issues related to the tracking and rendering performance on common user platforms were striking problems in the last decades. Now, the major hindrances are caused by application, platform and device dependences. As Sandor and Klinker conclude “Increasingly, an overarching approach towards building what we call ubiquitous augmented reality (UAR) user interfaces that include all of the just mentioned concepts will be required” [3]. At the end of the day “user interfaces for ubiquitous augmented reality incorporate a wide variety of concepts such as multi-modal, multi-user, multi-device aspects and new input/output devices.” [10]
3 SubFrame Our ultimate goal is to establish true device-independence while supporting consistent artifact management across the whole range of user devices and appliances. We
358
R. Melcher, M. Hitz, and G. Leitner
therefore propose a middleware layer called SubFrame which we introduced in [11]. We assume the existence of such a layer for the discussion of ActionSpaces. For sake of completeness, we will briefly address it here. The SubFrame holds and provides a complete set of references to the user’s artifacts. It is, however, neither responsible for nor involved in the rendering of artifacts. The terms “personal information cloud” and “cloud computing” immediately coming to mind are very popular but fuzzy. While clearly related to our work we need to distinguish from them as they are disputed with regard to service orientation and we discuss user-centered artifact management here. Throughout this paper we use the term personal artifact collection (PAC) being more precise regarding artifact sets and personal artifact structures (PAS) being more precise regarding their interrelationships. The whole concept is based on the following set of key assumptions: Table 1. Key assumptions of device-independence 0 1 2 3 4 5 6
Devices are networked (temporarily) PACs don’t belong to devices. PACs don’t belong to applications or services. PACs are private and non-substitutable. PACs are partially shared. PAS are unique in their characteristics. PAS are ever-changing and evolving.
In [11] we suggest employing a personal proxy server (PPS) for the purpose of sub framing. The PPS holds a central instance of the user’s PAS. Every single user appliance has to request the artifacts through the PPS. Even – and that is very important – local artifacts have to be requested through the PPS (loopback) to enable classification and temporal tracing. Despite evident redundancies this is maintainable, as artifacts are handled by reference and not by containment. In a second step this approach may also be used to implement transparent backup mechanisms. The consequence of workspace-level integration yields the ground rule not to foster proprietary solutions. For prototyping we use Squid proxy servers and we investigate how the RDF format may be used in conjunction to express and annotate non-hierarchical personal artifact structures. At the end of the day this layer implements the five functional requirements for personal information management defined by Ravasio et al. [12], which are: unified handling, multiple classification, bi-directional links, tracing of documents’ temporal evolution, and transparent physical location of a particular piece of information. Finally, PAC and PAS together represent an instance of the personal information cloud, a kind of exo-somatic memory [7] at our disposal.
4 ActionSpaces Now let us assume individuals have unobstructed access to every single digital artifact in every collection they own or share with others. We have to identify the
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
359
fundamental building blocks needed to make them (re-)presentable and manageable on every device, every interface, in every virtual environment we can think of. On typical WIMP (windows, icons, menus, and pointer) platforms application windows are used to render one or even more documents simultaneously. File manager applications of all kinds and proprietary file dialogs are playing an outstanding role on these platforms. They provide hierarchical artifact access and classification means. But this powerful technique cannot be implemented consistently and efficiently across the heterogeneous device infrastructure and Milgram’s reality-virtuality continuum [2]. Even on the historical target platforms for desktop computing we face a lot of problems in the view of cognitive load, efficiency and consistency [1]. Thus we have to find an alternative better suitable for our needs and consistently implementable. We propose to rely on the principle of abstraction since it is common sense that a monolithic “universal interface” is neither feasible nor eligible in terms of specification, implementation, and usage. We quest for a user-centric thought pattern, where all possible interactive features and interface implementations fit in. This thought pattern will allow for smooth transitions with minimal cognitive loads across the user’s device infrastructure. In the year 2000 Dachselt suggested a metaphorical approach called action spaces to structure three-dimensional user interfaces. He argues, that “new applications will be built in the near future, where the focus does not lie on navigation through more or less realistic worlds, but rather on 3D objects as documents in an interactive threedimensional user interface.”[13] He defines action spaces as task-oriented scenes of actions or more precisely as virtual 3D spaces with interface controls serving an associated task. Noteworthy he points out that action spaces do not have to be rooms in a geometric sense and “there is a need for the integration of these spaces in a more general visual application framework, in a geometric and metaphorical structure”. [13] Based on Dachselt’s work we reformulate and generalize ActionSpaces as sessionpersistent mixed-reality areas (geometry and location) populated by virtual artifacts (PAC) which are rendered according to the individual’s mental models (PAS). According to this definition ActionSpaces are a kind of generalized display and interaction areas. Traditional screen spaces, now part of a more generic scheme, represent only a small fraction of possible instances. 4.1 Device Independent Places Rendered or not, an ActionSpace has a geometry or cubic expansion respectively. The geometric bounding box may be based on the underlying hardware (e.g. screen size), it may be specified explicitly (e.g. fish-tank VR) or its dimension may be implicitly based on a real-world subspace (e.g. desk, wall, room). For now we define the bounding box dimensions of ActionSpaces as: ·
·
0,
0,
0.
(1)
The point of origin and the orientation may be bound to an absolute or relative location, i.e. a point in physical/virtual space or to a physical/virtual object (e.g. by using GPS coordinates or other tracking techniques). There are several possible bindings listed in Table 2. In accordance with Feiner et al. [14] we have to carefully
360
R. Melcher, M. Hitz, and G. Leitner
Y Environment Space
Object Space
Screen Space
Object Space
User Space Object Space
Object Space
X Z Fig. 2. Types of reference-spaces to distinguish carefully. Cubes represent the local coordinate systems involved.
distinguish environment spaces, object spaces, screen spaces, and user spaces as depicted in Figure 1. Following Bowman et al. [15], we should establish comprehensible mappings between the different spaces in terms of user interaction and interplay between real and virtual. Therefore we discuss the characteristics of ActionSpaces and possible mappings across the heterogeneous device infrastructure. Table 2. Binding examples between ActionSpaces and (physical) reference space Space Binding environment object screen user dynamic unbound
Physical Link Parameters used in ActionSpaces GPS, Fiduciary Marker, … position and orientation screen, prop, … object dimensions, orientation cell phone, monitor, … screen dimensions individual perspective, distance, 3DMouse, … position, orientation and scale --determined programmatically
ActionSpaces themselves may be rendered explicitly as a composition of virtual objects (i.e. all kinds of semi-opaque virtual environments) or not (i.e. as transparent augmented realities). The examples in Table 3 give a rough impression of interface paradigms possibly consulted to render ActionSpaces. The attribution of entries is ambiguous because the paradigms available today were not designed with the suggested classification in mind. In any mapping-case ActionSpaces serve as reference coordinate systems for the positioning and location of artifacts which always have to be rendered visually as they
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
361
Table 3. Rendering examples and paradigms for ActionSpaces Space Binding environment object
screen user dynamic unbound
(Semi-)opaque Virtual Room (Cube VR)
Transparent augmented workbench (AR), Handheld AR World in Miniature (WiM), Augmented engine manual (AR), Control Panel (Virtual Prop) Marker-based AR, Tangible Computing Desktop, Today Screen TV-Inserts, RT Video-Overlays Head-Up-Display (HUD) AR HUD (head-tracked VR) (head-tracked AR) Fish-tank VR, Dome VR Pervasive Annotation Layers, Animation, Film RT Video
would be not accessible otherwise. Together, an ActionSpace and the artifacts populating it are specified in an XML based file format. Hence, ActionSpaces are artifacts themselves and an ActionSpace may contain other ActionSpaces. This is an important aspect of our concept allowing complex (i.e. non-hierarchical) relationships between artifacts and spaces and advanced information and artifact exchange between individuals. The two later show great promise for future collaboration scenarios, but they cannot be discussed here for now. By design an ActionSpace may be accessed on demand everywhere and at any time (universal accessibility). We fulfill this requirement by requesting it with its URI from a personal proxy server (PPS). For instance, on a desktop machine the process may appear similar to the request of a free-form HTML file in a full-screen browser. The appropriate type of space binding according to Table 3 is subject to the user’s location, device and platform (interface paradigm). Even time may be used as a parameter to support pervasive scenarios. The possibilities are manifold. Until the potentials and consequences can be studied in detail, we suggest an obvious design rationale, provided that the actual platform supports it: An ActionSpace shall be mapped to the environment-space when its position has to be fixed persistently. It shall be mapped to an object-space when it has to be moved frequently or transported. Screen-space mappings are used for currently prevalent location-independent applications and for remote scenarios, e.g. to access virtual artifacts located on my home-office shelf. User-space mappings capturing the user’s attentiveness are used to present artifacts available for interaction, activity guidance, and communication, e.g. note taking. We now will briefly look into ActionSpaces and discuss the abstract characteristics and features supporting activities like thinking, memorizing and evolving. 4.2 Thought, Memory, and Evolution Thinking involves the activity of classification. [16] Because spatial classification works well for individuals we need to provide proper means for this task. Depending on the device or platform on-hand artifacts should be rendered directly in full-detail, or represented intermediately as 3D objects, 2D icons, and text entries. Each artifact has a well-defined position, orientation and scale relating to the
362
R. Melcher, M. Hitz, and G. Leitner
ActionSpace it is member of. o To support the act of classification all three parameeters are separately adjustable. Wherever W possible, direct interaction shall be used to faccilitate this parameterization. Automatic grouping, ordering and alignment functiions shall be provided for morre complex activities to be conveniently performed. T The actual values shall be sessiion-persistent, i.e. they are saved together with the artiffact references in the XML file specifying the ActionSpace. As already mentioned, ActionSpaces A may contain other ActionSpaces in additionn to artifacts since ActionSpacees are artifacts themselves. The possibilities cannot be exposed in detail here, but reelated to cognitive activities this feature yields expressiions like to “change a topic”, “sttep through a process”, “go into detail”, “get an overvieew”, and many more. That way y we have plenty of means to browse multi-hierarchhical structures in relation to phy ysical and virtual contexts now. By design an artifact is always created inside an ActionSpace or imported frrom another one. Interaction teechniques akin to drag-and-drop (pointer) and linguiistic interaction (speech recogniition, command line interface) are used for these taskss so that position, orientation, an nd scale are implicitly defined at first. Complex manipuulation of “opened” artifacts is – aside from some advanced platforms and paradigm ms – still the prevalent domain of applications. The application layer is not within the scope of this work.
Fig. 3. At least three ActionSp paces are active with two of them bound to screen space (LAP and AAP) and one of them bound to t the physical environment space as actual workspace
As depicted in Figure 3 there t are at least three ActionSpaces active, even if not vvisible, at any time. Two of them are special instances and are always bound to the screen space or the user sp pace respectively. Their position and size depends on uuser preferences. The first one represents r the device-bound local artifact pool (LAP) and makes its content accessiblle. It may for example contain several image files recenntly shot by the user and not yeet integrated (i.e. not classified) in the PAS. LAPs are the counterpart to the prevalen nt file managers. The second one represents the ambiient artifact pool (AAP), i.e. thee environmental resources accessible in a given context. It makes their artifacts availab ble (e.g. thumb drives content) and allows sending artifaacts to them (e.g. printer queue). In advanced mixed reality scenarios the “real device” and its artifacts may be directly y addressed at their physical locations, by dragging virttual text documents and droppin ng them onto the printer with gestures. In most paradiggms an iconographic representattion is still essential to make AAPs handling convenientt.
ActionSpaces: Device Independent Places of Thought, Memory and Evolution
363
Structural persistence is guaranteed across the device infrastructure. Geometric arrangements related to gestalt principles are kept persistent or they are emulated to support individual recognition. For instance, if an ActionSpace bound to an environment space is accessed remotely, the geometric proportions and distances remain valid. But they will probably be scaled to establish comprehensible viewpoints and perspectives. Even if the artifacts contained in such an ActionSpace are listed in pure textual form, the original positions will be preserved. Hence, the content arrangements in an ActionSpace may be temporarily reorganized using non-persistent automatic modes or they may be manually and persistently rearranged by the user. The personal proxy server (PPS) allows the implementation of transparent backup strategies and encrypted artifact pools. It will be of great importance for individuals that with our approach every single artifact is accessible, and in terms of devicedependence nothing gets lost accidentally. In fact we work towards lifelong solutions for artifact management and thus lifelong exosomatic memories. For instance, the damage of one’s laptop yields no loss of artifacts since all ActionSpaces and PAS arrangements are still available from the PPS. Transparent backups and versioning make up for a potential loss of local copies. If an artifact is not findable via navigating the PAS it may be still found with common search techniques applied on the PPS. That way and not surprising search is still complementing the structural access.
5 Conclusions Every tool/device/appliance has its own strength and weaknesses. For that reason users want to interact with more than one. The availability of devices in different contexts (e.g. tasks, times and locations) is another reason. In the quest for device independence we are not well advised to seek for the intersection (i.e. greatest common divisor) of feature sets. With the widely discussed problems of current interaction paradigms and the steadily growing diversity of electronic appliances in mind we try to determine a minimal set of interaction means unifying artifact handling in a device-independent manner. That way the cognitive load for users can be minimized and the potentials of heterogeneous infrastructures may be leveraged. Both layers proposed – the SubFrame and the ActionSpaces – challenge the hierarchical file systems across the user’s device infrastructure and the dated desktop metaphor on PC devices. With both layers consequently realized, we will see new possibilities in artifact handling and sharing which will meet the cognitive conditions and needs of individuals. The concrete representation of artifacts, interrelations, and ActionSpaces depends on the individual’s needs and flavors. Therefore it could not be specified and, in view of design style guides, it never will. Following the argumentation of Ravasio and Tscherter, every single ActionSpace “should be a place where users, and only users, are able to engrave personal preferences and tastes.” [1] Acknowledgments. The authors would like to thank Prof. R. Mittermeir at the University of Klagenfurt for his constant input and challenging discussions and our student Bonifaz Kaufmann for the brave development of prototypical solutions.
364
R. Melcher, M. Hitz, and G. Leitner
References 1. Kaptelinin, V., Czerwinski, M.: Beyond the Desktop Metaphor. MIT Press, Cambridge (2007) 2. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: A class of displays on the reality-virtuality continuum. In: SPIE, Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282–292 (1994) 3. Sandor, C., Klinker, G.: A rapid prototyping software infrastructure for user interfaces in ubiquitous augmented reality. Personal Ubiquitous Comput. 9(3), 169–185 (2005) 4. Winograd, T., Flores, F. (eds.): Understanding computers and cognition. Ablex Publishing Corp., Norwood (1985) 5. Sauermann, L., Bernardi, A., Dengel, A.: Overview and outlook on the semantic desktop. In: Proceedings of the 1st Workshop on The Semantic Desktop at the ISWC 2005 Conference (2005) 6. Kirsh, D.: The intelligent use of space. Artificial Intelligence 73(1-2), 31–68 (1995) 7. Newby, G.B.: Cognitive space and information space. J. Am. Soc. Inf. Sci. Technol. 52(12), 1026–1048 (2001) 8. Bowman, D.A., Kruijff, E., LaViola, J.J., Poupyrev, I.: 3D User Interfaces: Theory and Practice. Pearson Education, Inc., Boston (2005) 9. Haller, M., Mark Billinghurst, B.T. (eds.): Emerging Technologies of Augmented Reality: Interfaces and Design. Idea Group Publishing, USA (2007) 10. Hilliges, O., Sandor, C., Klinker, G.: Interactive prototyping for ubiquitous augmented reality user interfaces. In: IUI 2006: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 285–287. ACM, New York (2006) 11. Melcher, R.: Device-independent handling of personal artifact collections. Submitted to Interact 2009 (2009) 12. Ravasio, P., Vukelja, L., Rivera, G., Norrie, M.C.: Project infospace: From information managing to information representation. In: HumanComputerInteraction Interact 2003, 8092, Swiss Federal Institute of Technology, Zurich, Switzerland, pp. 864–867. IOS Press, Amsterdam (2003) 13. Dachselt, R.: Action spaces - a metaphorical concept to support navigation and interaction in 3d interfaces. In: Proceedings of ’Usability Centred Design and Evaluation of Virtual 3D Environments, April 13-14, 2000, Shaker Verlag, Aachen (2000) 14. Feiner, S., MacIntyre, B., Haupt, M., Solomon, E.: Windows on the world: 2d windows for 3d augmented reality. In: UIST 1993: Proceedings of the 6th annual ACM symposium on User interface software and technology, pp. 145–155. ACM, New York (1993) 15. Bowman, D.A., North, C., Chen, J., Polys, N.F., Pyla, P.S., Yilmaz, U.: Information-rich virtual environments: theory, tools, and research agenda. In: VRST 2003: Proceedings of the ACM symposium on Virtual reality software and technology, pp. 81–90. ACM Press, New York (2003) 16. Bowker, G.C., Star, S.L.: Sorting Things Out: Classification and Its Consequences (Inside Technology). The MIT Press, Cambridge (1999)
Face Recognition Technology for Ubiquitous Computing Environment Kanghun Jeong, Seongrok Hong, Ilyang Joo, Jaehoon Lee, and Hyeonjoon Moon School of Computer Engineering, Sejong University, Seoul, Korea
[email protected] Abstract. In this paper, we explore face detection and face recognition algorithms for ubiquitous computing environment. We develop algorithms for application programming interface (API) suitable for embedded system. The basic requirements include appropriate data format and collection of feature data to achieve efficiency of algorithm. Our experiment presents a face detection and face recognition algorithm for handheld devices. The essential part for proposed system includes; integer representation from floating point calculation, optimization of memory management scheme and efficient face detection performance on complex background scene. Keywords: ubiquitous computing environment, face recognition, face detection, application programming interface, algorithm optimization.
1 Introduction In recent years, face detection and recognition technology has been developed rapidly. The need for biometric security system had increased for ubiquitous computing environment. Face recognition technology maintains high security level while providing convenience for both human and computer. Therefore, it is necessary to optimize the face recognition system by maintaining recognition accuracy while decreasing computational complexity. However, most of the hand-held devices are hard to satisfy such factors because of limited configuration of memory and CPU power. The major factors of the face recognition system are feature size and the processing efficiency. Generally, processing time is governed by geometric progression which is the dimensionality of feature data in the case of mobile-based devices. It is essential to optimize data structure while maintaining a reasonable recognition performance. In this paper, we explore various algorithms for face detection and recognition algorithms. Experiments include normalization of a face database to increase recognition performance through various pre-processing methods. We propose a novel algorithm for automatic face detection and post-processing method to further improve the face recognition performance. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 365–373, 2009. © Springer-Verlag Berlin Heidelberg 2009
366
K. Jeong et al.
2 Sejong Face Database (SJDB) Generally, face recognition system requires face image data that used for learning and features are normalized and registered for face recognition process. These face image data are collected for some time intervals in various conditions [1]. We have collected Sejong face database for this experiment and images are called ‘session’ images. There are several session images in the Sejong face database. A set of images of each person are used for learning and other sets are used for face recognition system. A session contains three images for each person (Figure 1).
Fig. 1. Sejong face image database (SJDB)
The session images of each person are collected one or two images a day in the same physical setup such as pose and lightning. Session images of N different people are collected without distinction of sex. In this way, we collected 100 different ‘session’ about 70 male and 30 female. Each session contains three frontal images which were collected with similar illumination condition (170~220 lux) [2][3].
3 Face Detection and Face Recognition For any face recognition system noise-data is a primary factor of degradation. Therefore, the process for remove noise is essential to improve recognition accuracy [3][4]. This process is called pre-processing and it begins with geometric transformation (rotation, translation, and rescaling). The correction of the geometric transformation uses the intermediate point of the eyes and mouth. We remove the unnecessary data of the outer face region using an oval mask. These processes can be more accuracy controlled as shown in Figure 2. The pre-processed image is properly modified with several feature point (eyes, nose, mouth) to have equal dimension regardless of personal face size [5] (Figure 3). The face images for this experiment has horizontal versus vertical ratio of 3:4. As a result, face images are modified to 1200*1600, 120*160, and 30*40 size.
Face Recognition Technology for Ubiquitous Computing Environment
367
Fig. 2. Pre-processing with the mask and three points information
Fig. 3. Sejong face database (SJDB) and pre-processed face data
The pre-processing time is shown in Table 1 which was performed based on Intel Pentium 4 computer with 3.0GHz and 1GByte configuration. In this paper the preprocessing was performed with fully automatic setup with Adaboost algorithm [6][7]. Table 1. Pre-process time for one face data Image size Processing time (ms)
1200 * 1600 1639.75
120 * 160 449.02
30 * 40 414.24
Understanding it in Table 1., if image size reduced ratios of the decrease of time are different. Time when it is necessary to preprocess size of 120 * 160 and an image of 30 * 40 has few differences, the number of image pixel used for face recognition has many differences. Data used for collecting recognition with much number of pixels increase, and recognition performance improves. The image with size of 120 * 160 does the best performance reference of Table 1.
368
K. Jeong et al.
A face image is converted into a pre-processed image for face recognition with normalized pixel value as well as geometric transformation. In this experiment, we have applied series of algorithms including histogram equalization and filtering. As for the image, noise data was removed and the feature data is emphasized. In addition, intensity information of the image was redistributed through histogram equalization. The structure of our face recognition system is shown in Figure 4.
Fig. 4. Structure of proposed face recognition system
We have used principal component analysys (PCA) [8] and linear discriminant analysis (LDA) for feature extraction [9]. As for the feature data, the change of recognition rate is presented with different cutoff ratio (percentage of feature vectors used) as shown in Table. 2. We set the cutoff ratio at 80% and designed our face recognition system which we considered as most appropriate performance. We have measured the distance between feature vectors based on L2 norm (Euclidian distance). Table 2. A face recognition result by the change of image size and cutoff ratio Image size Cutoff
100 % 80 % 70 % 50 %
1200 * 1600
120 * 160
30 * 40
96 % 97 % 94 % 91 %
98 % 98 % 95 % 87 %
97 % 96 % 91 % 82 %
The choice of the automatic face detector was based on Adaboost algorithm which was trained using FERET [10], XM2VTS [11], CMU PIE [12] and Sejong face database. The size of learning data is using face (30x30), eyes (10x6) and mouth (20x 9) pixels (Figure 5). The 3,513 positive data and 10,777 negative pieces of image data are collected and used (Figure 6). As a result, we succeeded in the face detection from near frontal face images ranging from 0 to 15 degrees. Figure 7 is the structure of the Adaboost classifier used in this experiment [7].
Face Recognition Technology for Ubiquitous Computing Environment
Fig. 5. Negative training data
Fig. 6. Positive training data
Fig. 7. Adaboost classifier
369
370
K. Jeong et al.
For learning the face detection data using Adaboost algorithm, negative (wrong) data is equally important as positive (correct) data. After appropriate training, the face detector can locate feature points (the center of each eyes and the center of the mouth) from a various rotated angle and front face image. After pre-processing of face image, face recognition takes about 7 fps (frame per second) which is reasonable for realtime processing. The total processing time requires 1.1 fps (frame per second) including the face detection. Face detection and face recognition process show rate of 87% detection and 95% recognition percentage each.
4 Face Recognition Application Programing Interface (API) We have implemented four major face recognition functions into application programming interface (API) to performed general face recognition experiment. API is composed of face detection, face recognition, face similarity, and face evaluation module designed based on C/C++ as shown in Table 3. Table 3. API for Face Recognition Function Modules Function Face Detection Face Recognition Face Similarity Face Evaluation
Description Face region and feature point detection for input image After pre-processing, face recognition on detected face image. Face similarity calculation between probe image and gallery image. Evaluation of face detection and face recognition algorithms
Generally, face recognition algorithms are expressed by normalized numerical value. Such real number based algorithms are used for graphical algorithms such as image processing, 3D rendering, etc. There exists noticeable performance drops between floating and integer number based calculation. Moreover, this difference appears greatly on embedded system without floating point unit (FPU). Since face recognition API use floating point based calculation algorithm, performance degradation can be significant. In order to reduce these differences, integer number type was used with fixed point method which is a numerical analysis method to store fraction's part of the decimal and the integer number separately. Fixed-point method was used for API to produce faster than existing program with floating algorithm. We designed fixed decimal point algorithm is designed to perform faster than floating point algorithm. There is proved measurement of a runtime through simple C program in Figure 8. Integeral fragmentary performance can be verified through profiling. In practice, performance can be evaluated through integeral conversion which uses fixed point algorithm. Face image data was used for performance evaluation with 120x160 pixels. Evaluation unit compare each module during the process. Change of total arithmetic velocity depends on integeral transform considerably.
Face Recognition Technology for Ubiquitous Computing Environment
Fig. 8. Function profiling result (performance increase by the fixed-point use) Table 4. 7,000,000 times of profiling repetition results Intel ® Core™2 Quad Q9300 Integer number Addition 62.1 Subtraction 78.2 Multiplication 113.3 Division 311.2
Floating number 89.1 87.6 447.2 490.5
Double number 255.1 277.6 997.1 817.7
371
372
K. Jeong et al. Table 5. 1,000,000 times of profiling repetition results
Intel ® Pentium – 233 Addition Subtraction Multiplication Division
Integer number 39.246 38.242 39.156 197.004
Floating number 38.661 38.506 528.966 579.077
Double number 40.044 40.400 528.861 574.697
This fact is that because additional arithmetic is contained through integeral transform, and performance improvement is gauranteed if it can be applied on embedded system as shown in Table 4 and Table 5.
5 Conclusion The face recognition system is essential in multiple biometric system because it provides the only non-contact biometric features and user friendly information which can be processed by both human and computer. This is very important feature in case of communication problem caused by network failure which prohibits database access for border control applications. Recently, face detection and recognition technology can be build into a embedded system for closed-circuit television (CCTV) related applications. But this system should respond to numerous conditions including various pose and lighting which is challenging for face recognition system. Generally, a face recognition system contains two major function which is face detection and face recognition. We produce several face detection and recognition algorithms into application programming interface (API) for various biometric applications. We present performance evaluation for integral transformed calculation which is optimize for embedded system suitable for ubiquitous computing environment. We have trained face detection and face recognition modules based on facial recognition API using Sejong face database. Proposed face detection and face recognition modules are integral transformed which can be calculated faster, and optimized for embedded system. Our experimental result shows that we can minimize the run-time by decreasing computational complexity while maintaining reasonable accuracy which is applicable for ubiquitous computing environment. Acknowledgement. This work was supported by the Seoul R&BD Program (10581).
References 1. Papatheodorou, T., Rueckert, D.: Evaluation of 3D Face Recognition Using Registration and PCA. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 997–1009. Springer, Heidelberg (2005) 2. Tusk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)
Face Recognition Technology for Ubiquitous Computing Environment
373
3. Moon, H., Phillips, P.: Computational and Performance Aspects of PCA-Based FaceRecognition Algorithms. Perception 30, 303–321 (2001) 4. Sim, T., Baker, S., Bsat, M.: The CMU Pose, Illumination, and Expression Database. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12), 1615–1618 (2003) 5. Phillips, P., Moon, H., Rauss, P.: The FERET Evaluation Methodology for Face Recognition Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104 (2000) 6. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, pp. 511–518 (2001) 7. Viola, P., Jones, M.: Robust Real-time Object Detection. In: Second International Workshop on Statistical and Computational Theories of Vision, July 13 (2001) 8. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991) 9. Yambor, W.S.: Analysis of PCA-based and Fisher Discriminant-based Image Recognition Algorithms. Technical report, CSU (June 2000) 10. FERET Database. NIST (2001), http://www.itl.nist.gov/iad/humanid/feret/ 11. XM2VTS Database, http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/ 12. Sim, T., Baker, S., Bsat, M.: The CMU Pose, Illumination, and Expression Database. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12), 1615–1618 (2003)
Location-Triggered Code Execution – Dismissing Displays and Keypads for Mobile Interaction W. Narzt and H. Schmitzberger Johannes Kepler University Linz, Altenbergerstr. 69, A-4040 Linz, Austria {wolfgang.narzt,heinrich.schmitzberger}@jku.at
Abstract. Spatially controlled electronic actions (e.g. opening gates, buying tickets, starting or stopping engines, etc.) require human attentiveness by conventional interaction metaphors via display and/or keystroke at the place of event. However, attentiveness for pressing a button or glimpsing at a display may occasionally be unavailable when the involved person must not be distracted from performing a task or is handicapped through wearable limitations (e.g. gloves, protective clothing) or disability. To automatically trigger those actions just at spatial proximity of a person, i.e. dismissing displays and keypads for launching the execution of electronic code in order to ease human computer interaction by innovative mobile computing paradigms is the main research focus of this paper. Keywords: Location-Triggered Code Execution, Natural Interaction Paradigms.
1 Introduction Currently available mobile location-based communication services enable their users to consume geographically bound information containing static text, images, sound or videos. Having arrived at previously prepared spots people are provided with information about the next gas station, hotels or sights of interest. More sophisticated variants of mobile location-based services also include dynamic links to locally available content providers. They additionally reveal the current gas price, vacancy status and reroute users to the online ticket service for tourists. Recently recognizable trends even consider individual user profiles as contextual constrains for supplying personalized information and as a technique to counteract spam and to selectively address content to specific user groups. However, the potentials of mobile interaction are far beyond being exploited, considering limiting factors preventing users from attending to the information screen of their mobile device e.g. while driving in a car. Active interaction may also be hindered when people are handicapped or requested to wear gloves, safety glasses or protective suits in order to perform a working task. What is the use of perfectly filtered, personalized dynamic information when the addressee is not able to perceive or react to it? We expect mobile services to support the users in their tasks by automatically triggering (personally authorized) electronic actions just at spatial proximity of approaching users without the needs of glimpsing at displays, typing, clicking or pressing buttons. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 374–383, 2009. © Springer-Verlag Berlin Heidelberg 2009
Location-Triggered Code Execution – Dismissing Displays and Keypads
375
As a consequence, mobile location-based services are not solely regarded as information providers but also as action performers. The context location in combination with personalized access privileges and further quantifiably sensory input are the triggers for opening gates, automatically stopping engines in danger zones or validating tickets at entrance areas. Hence, people are able to continue their natural behavior without being distracted from their focused task and simultaneously execute an (assumed incidental but necessary) action. The users' mobile devices enabling locationtriggered code execution remain in their pockets.
2 Related Work In this paper, we are primarily concerned with intuitive human computer interaction in mobile computing scenarios derived from the spatial context of the respective user. In this regard, the notion "context" has been issued in numerous publications and is widely thought of as the key to a novel interaction paradigm. In [1] Dourish analyzed the role context plays in computer systems and claimed that future computing scenarios will move away from the traditional single user desktop applications employing mouse, keyboard and computer screen. In [2] the usage of context in mobile interactive systems has been analyzed. Dix et al. determined the relevance of space and location as fundamental basis for mobile interaction. [3] studies the user's needs for location-aware mobile services. The results of the conducted user evaluation highlight the need for comprehensive services as well as for seamless service chains serving throughout a user's mobile activity. To achieve broad user acceptance mobile computing research is confronted with the issue of seamless transitions between the real and the digital world [4] without distracting the user's attention [5]. Modern solution approaches use Near Field Communication (NFC) for contactless initiated actions following the same objectives of dismissing the conventional display and key-controlled interaction paradigm in order to claim a minimum of attention for performing an action at a place of event (e.g. SkiData - contactless access control in skiing areas through RFID). However, the disadvantage in this solution lies within the fact that every location which is supposed to trigger an electronic action has to consider mandatory structural measures for engaging the NFC principle. Beyond, a remaining part of attention is still required as users are supposed to know the position of the NFC system and bring up the RFID tag or reader (depending on which part of the components carries the reading unit) close to the system for proper detection. Regarding the structural measures for implementing NFC, this technology is only marginally applicable causing financial and environmental impairments. In earlier research on context with respect to HCI the notion implicit interaction appears [6]. Implicit interaction denotes that the application will adapt to implicitly introduced contextual information learned from perceiving and interpreting the current situation a user is in (i.e. the user's location, surrounding environment, etc.). In [7] this definition has been refined and split up separating between implicit input and implicit output. Implicit input is regarded as action performed by a human to achieve a specific goal while secondary captured and interpreted by a system. Implicit output is described as not directly related to an explicit input but seamlessly integrated with
376
W. Narzt and H. Schmitzberger
the environment and the task of the user. The approach presented in this paper proposes a means of combining implicit input and output to achieve a minimum of user distraction. Up to now, HCI research already offers numerous contributions on implicit interaction from an application point of view. In [8] the usage of accelerometer data attached to digital artifacts has been exploited to implicitly grant access to a room. The Range Whiteboard [9] explores implicit interaction on public displays supporting co-located, ad-hoc meetings. Similar to this work interactive public ambient displays are explored towards interaction in [10]. Here, the contextual focus lies on body orientation and position cues to determine interaction modes. In [11] the personalization of GIS area maps is realized by monitoring users' implicit interaction with maps. All these works on implicit interaction strongly focus on the primary role of the user in the interaction process and the modalities of interacting with ubiquitous computers respectively digital artifacts. Spontaneous interaction triggered upon physical proximity was further studied in numerous works [12] [13] [14]. These approaches share the aspect that radio sensors are used to determine mutual proximity between smart artifacts and humans. The simplest form of smart artifacts are Smart-Its [15], small computing devices that can be attached unobtrusively to arbitrary physical objects in order to empower these with processing, context-awareness and communication. Smart-Its are designed for ad hoc data exchange among themselves in spatial proximity. In [16] Gellersen et al. underlined the importance of awareness of the environment and of the situation for inferring behavior of mobile entities. Many researchers have focused on identifying smartness in mobile systems. As for the reason of usability this paper focuses on implementing smart environments rather than smart tools or appliances. Key issues of such smart environments have already been discussed recently. The ReMMoC system [17], a web service based middleware solution, deals with the problem of heterogeneous discovery and interaction protocols encountered in mobile applications. In [18] interaction with physical objects is supported by a web service backend system providing mobile phone users with access to associated services. Common to most solutions for mobile interaction is the usage of spatial context. Zambonelli et al. presented the spatial computing stack [19], a framework to overcome the inadequacy of traditional paradigms and to accentuate the primary role of space in modern distributed computing. Generally, it represents virtual environments and their physical counterpart as a common information space for creating awareness among the participants.
3 Architecture Similar to mobile telecommunication services we propose a distributed provider model as the basis for realizing a common information space enabling worldwide unbounded mobile location-based communication services. This proven model allows users to join the provider of their choice and guarantees scalability of the service as each provider only handles a limited number of clients [4]. Every provider stores a set of geographically linked information in appropriate fast traversable geo-data structures (e.g. r-trees) containing hierarchically combinable content modules (which we
Location-Triggered Code Execution – Dismissing Displays and Keypads
377
call gadgets) for text, pictures videos, sound, etc. The name gadget already refers to a possible activity within a module and is the key for a generic approach of integrating arbitrary electronic actions to be triggered automatically on arriving users. The main focus of designing an architecture for location-triggered code execution is high extensibility to third-party systems, for the number and variety of non-recurring electronic possibilities is unforeseeable and simultaneously enriches the potentials of such a service. Fig. 1 illustrates the common principles of a flexible component architecture which enables fast connections to third party systems:
Fig. 1. Extensible Component Architecture for Location-Triggered Code Execution
The basic technical approach is a client-server model where clients repetitively transmit their own (commonly by GPS-based) position to a server (1) which evaluates the geo-data considering visibility radiuses and access constraints (2) and transmits the corresponding results back to the clients (3). Generally, when the transmitted information contains conventional gadgets as text and pictures, it is immediately displayed on the output device of the client (4). The basic idea for executing code is to use the gadget metaphor and store executable code inside instead of text or binary picture data (smart gadgets). Whether this piece of code is executed on the client or the server is primarily irrelevant for the paradigm of automatically triggering actions at certain locations.
378
W. Narzt and H. Schmitzberger
However, where to execute the code is crucial for system compatibility and extensibility issues. The client as the executing platform brings up portability tasks at every new third-party connection, as there is possibly more than one system implementation for covering multiple mobile platforms. The server on the other hand would need an elaborate plug-in mechanism for covering new third-party connections and is then still faced with the problem of integrating code from varying third-party operating systems. In order to solve this conflict, we propose a web-service-based mechanism which is both effective and simple to extend: Smart gadgets do not actually contain executable code but a simple URL to a remote web-service which is the actual component to execute the code. Every third-party vendor provides a web-service and decides about the URL and its parameters on her own. When a client receives information containing a smart gadget, its URL is resolved in some kind of HTTP-request (5) which is handled internally (6) and finally triggers the desired electronic action (7). A response back to the client (8)(9) can additionally be illustrated as a visual confirmation of the thirdparty system whether the action could have been executed successfully or not (10). This architecture comprises several advantages: • In order to execute location-based code clients just have to handle standardized HTTP-requests. A majority of currently utilized mobile platforms support these mechanisms. • The system does not run into compatibility or portability problems, for the executable code is exclusively run on the platform of the corresponding web-service. • Commonly, location-triggered actions are provided by third-party vendors (e.g. opening of garage doors, gates, starting or stopping of machines, etc.). Using a web-based approach, external systems can easily be linked without compiling the core system or adding plug-ins to it. • Most important for third-party vendors: Their internal data representations, servers and control units are hidden from the publically accessible location-based service, guaranteeing a maximum of data security for the vendors. Summarizing all those architectural thoughts, location-triggered code execution is easily achievable by using conventional (GPS- and wireless-enabled) devices and services and adding web-services via a smart gadget mechanism to them. Simple standardized HTTP-requests enable arbitrary integration of third-party systems without structural measures as they are e.g., mandatory for NFC systems.
4 Implementation In the course of a research project, the Johannes Kepler University of Linz, Siemens Corporate Technology in Munich and the Ars Electronica Futurelab in Linz have developed a novel location-based information and communication service for mobile devices facing the challenges of natural interaction triggered by geographical closeness without display and keypad. It enables users to arbitrarily post and consume information in real locations for asynchronous one-to-many or one-to-any communication having time-driven and contextual perceptibility; and it provides a framework interface for extending the functional range of the service, especially for adding new smart elements by third party vendors.
Location-Triggered Code Execution – Dismissing Displays and Keypads
379
Fig. 2. Location-Based Client for Cell Phones and PDAs
Fig. 3. Up-to-date Lecture Information at the campus of the University of Linz
The server component of our proposed architecture (as sketched in Fig. 1) has been implemented as a multithreaded C++ application capable of providing different kinds of gadgets regarding the user’s current position. Localization is selectively accomplished via GPS, WLAN triangulation, RFID- or Bluetooth-based positioning. In order to guarantee multi-platform compatibility, the client component uses a slim J2ME system core supporting various mobile platforms including cell phones, PDAs and conventional notebooks. Data transmission is implemented for the most
380
W. Narzt and H. Schmitzberger
common wireless communication techniques (GPRS, UMTS, WLAN, Bluetooth). Fig. 2 shows snapshots of the client software for mobile phones (left) and PDAs (right). The web-based third-party component can be deployed to external server systems and uses REST technology (Representational State Transfer, a client-server communication architecture) to offer access to and control of its internal set of functions. In order to prevent abuse of the service the component includes identification and authorization mechanisms, only granting access to registered users. The applicability of this framework architecture is currently being demonstrated in the course of a first reference implementation at the campus of the University of Linz, available for students, academics and maintenance staff. At the moment, the campuswide deployment of third-party components is restricted to solely embedding dynamic content from several university-related information systems (e.g. event management, study support system, lecture room occupation plan). However, the mechanism already follows the principle of location-triggered code execution as proposed in chapter 3. For instance, students are able to obtain up-to-date lecture information at special proximity to respective auditorium halls (see Fig. 3).
5 Fields of Application The project described in the previous chapter has already attracted potential customers from industry and the consumer market, who have already expressed their interest in adopting our service in their business fields. Due to the manifold fields of interested industry segments, we could identify four different types of relevant locationtriggered actions: 1. 2. 3. 4.
Actions that should be executed when users approach at a geographical point. The execution is due to entering an area (e.g. a room) Also the opposite is valid for location-triggered actions (e.g. leaving a room) For certain places, users are supposed to reside for a predefined period of time before actions are executed.
All those examples can additionally be enriched by considering the current heading of a user, i.e., from which direction is the user approaching a point or entering or leaving an area? The following use-cases demonstrate examples of (already implemented and planned) location-triggered actions validating the functional scope and the extensibility of the system: To start with, a common area of application for triggering actions at points of arrival is derived from logistics requirements: Carriers arriving at their designated destination automatically engage the process of loading or unloading cargo controlling e.g., local conveyor belts, and affecting storage management software for altering working procedures. Another use-case for location-triggered actions on entering an area has already resulted in a real business scenario: A producer of golf carts is equipping his vehicles with a location-based information system displaying overviews on the players' current position at the golf course revealing distances to holes and obstacles. When players
Location-Triggered Code Execution – Dismissing Displays and Keypads
381
Fig. 4. Alerting or stopping Engines when driving on Fairways or Greens
try to drive on forbidden fairways and greens (marked in light green in Fig. 4) the system automatically alerts to the operator or even automatically stops engines. Leaving an area may be interesting for power consumption issues. A household being aware of two persons living in it and recognizing the dwellers' positions leaving a selected region around the property triggers electronically controllable units (e.g. lights, central heating, door lock, etc.) to be switched off or to be locked in order to decrease power consumption. In contrast to existing smart power consumption solutions, location-triggered code execution does not need any additional sensory gadgets for context recognition and an electronic backbone to keep them working [20]. The personal mobile device which is powered on anyway is the only gadget required to be turned on for enabling power consumption. Concerning the application field of industrial security mechanisms where people must leave contaminated zones after strict time slots, the fourth type of triggering actions can be applied: The maintenance staff is restrained by protective clothing needs to focus on their primary repairing tasks and is unable to monitor security displays. The alarm automatically signalizes upcoming hazardous situations to each worker individually and to the supervising operators.
6 Conclusion and Future Work Location-triggered code execution enables a variety of innovative interaction mechanisms, neither distracting the users' attention from their currently performed tasks nor requiring structural measures for implementing it. For the initiation of electronically controlled actions users can abstain from conventional interaction techniques using displays and keystrokes. Solely, their physical presence is the trigger for real events. Users are simply requested to carry the enabling infrastructure, i.e., a mobile, wireless communication device equipped with some kind of tracking technology in their pockets. Currently available cellular phones and PDAs already fulfill these technical requirements and are suited as client devices for instantiating location-triggered code execution. Generic extension to external systems via web-services is the key for implementing a manifold of application scenarios by third-party vendors. Without interfering into the core system, new electronic functions can be adopted by using simple, standardized methods widening the palette of applications unboundedly.
382
W. Narzt and H. Schmitzberger
Our prototype at the University of Linz already shows the applicability of the proposed concept focusing on eliminating key strokes for mobile computing interaction. However, some interaction modalities still depend on the use of arbitrary forms of display metaphors (visual, acoustic, haptic), yet. Future work will comprise further studies on coupling location context with triggering actions in order meet the paradigm of display- and keyless mobile interaction.
References 1. Dourish, P.: What we talk about when we talk about context. Personal Ubiquitous Computing 8(1), 19–30 (2004) 2. Dix, A., Rodden, T., Davies, N., Trevor, J., Friday, A., Palfreyman, K.: Exploiting space and location as a design framework for interactive mobile systems. ACM Trans. Comput.Hum. Interact. 7(3), 285–321 (2000) 3. Kaasinen, E.: User needs for location-aware mobile services. Personal Ubiquitous Comput. 7(1), 70–79 (2003) 4. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Müller, R., Wieghardt, J., Hörtner, H., Haring, R., Lindinger, C.: Addressing concepts for mobile location-based information services. In: Proceedings of the 12th International Conference on Human Computer Interaction HCI 2007 (2007) 5. Ishii, H., Ullmer, B.: Tangible bits: towards seamless interfaces between people, bits and atoms. In: CHI 1997: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 234–241. ACM, New York (1997) 6. Schmidt, A.: Implicit human computer interaction through context. Personal Technologies (2000) 7. Schmidt, A., Kranz, M., Holleis, P.: Interacting with the ubiquitous computer: towards embedding interaction. In: sOc-EUSAI 2005: Proceedings of the 2005 joint conference on Smart objects and ambient intelligence, pp. 147–152. ACM, New York (2005) 8. Antifakos, S., Schiele, B., Holmquist, L.E.: Grouping mechanisms for smart objects based on implicit interaction and context proximity. Interactive Posters at UbiComp 2003 (2003) 9. Ju, W., Lee, B.A., Klemmer, S.R.: Range: Exploring implicit interaction through electronic whiteboard design. Technical report, Stanford University, HCI Group (2006) 10. Vogel, D., Balakrishnan, R.: Interactive public ambient displays: transitioning from implicit to explicit, public to personal, interaction with multiple users. In: UIST 2004: Proceedings of the 17th annual ACM symposium on User interface software and technology, pp. 137–146. ACM Press, New York (2004) 11. Weakliam, J., Bertolotto, M., Wilson, D.: Implicit interaction profiling for recommending spatial content. In: GIS 2005: Proceedings of the 13th annual ACM international workshop on Geographic information systems, pp. 285–294. ACM, New York (2005) 12. Ferscha, A., Mayrhofer, R., Oberhauser, R., dos Santos Rocha, M., Franz, M., Hechinger, M.: Digital aura. In: Advances in Pervasive Computing. A Collection of Contributions Presented at the 2nd International Conference on Pervasive Computing (Pervasive 2004), Austrian Computer Society (OCG), Vienna, Austria, April 2004, vol. 176, pp. 405–410 (2004) 13. Kortuem, G., Schneider, J., Preuitt, D., Thompson, T.G.C., Fickas, S., Segall, Z.: When peer-to-peer comes face-to-face: Collaborative peer-to-peer computing in mobile ad hoc networks. In: Proceedings of the First International Conference on Peer-to-Peer Computing, P2P 2001 (2001)
Location-Triggered Code Execution – Dismissing Displays and Keypads
383
14. Brunette, W., Hartung, C., Nordstrom, B., Borriello, G.: Proximity interactions between wireless sensors and their application. In: WSNA 2003: Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, pp. 30–37. ACM Press, New York (2003) 15. Gellersen, H., Kortuem, G., Schmidt, A., Beigl, M.: Physical prototyping with smart-its. IEEE Pervasive Computing 3(3), 74–82 (2004) 16. Gellersen, H.W., Schmidt, A., Beigl, M.: Multi-sensor context-awareness in mobile devices and smart artifacts. Mob. Netw. Appl. 7(5), 341–351 (2002) 17. Grace, P., Blair, G.S., Samuel, S.: A reflective framework for discovery and interaction in heterogeneous mobile environments. SIGMOBILE Mob. Comput. Commun. Rev. 9(1), 2–14 (2005) 18. Broll, G., Siorpaes, S., Rukzio, E., Paolucci, M., Hamard, J., Wagner, M., Schmidt, A.: Supporting mobile service usage through physical mobile interaction. In: Proceedings of the Fifth IEEE international Conference on Pervasive Computing and Communications, PERCOM (2007) 19. Zambonelli, F., Mamei, M.: Spatial computing: an emerging paradigm for autonomic computing and communication. In: 1st International Workshop on Autonomic Communication, Berlin (October 2004) 20. Ferscha, A., Emsenhuber, B., Gusenbauer, S., Wally, B.: PowerSaver: Pocket-Worn Activity Tracker for Energy Management. In: Adjunct Proceedings of the 9th International Conference on Ubiquitous Computing UbiComp 2007, pp. 321–324 (2007)
Mobile Interaction: Automatically Adapting Audio Output to Users and Contexts on Communication and Media Control Scenarios Tiago Reis, Luís Carriço, and Carlos Duarte LaSIGE, Faculdade de Ciências, Universidade de Lisboa
[email protected], {lmc,cad}@di.fc.ul.pt
Abstract. This paper presents two prototypes designed in order to enable the automatic adjustment of audio output on mobile devices. One is directed to communication scenarios and the other to media control scenarios. The user centered methodology employed on the design of these prototypes involved 26 users and is also presented here. Once the prototypes were implemented, a usability study was conducted. This study involved 6 users that included our prototypes on their day-to-day lives during a two-week period. The results of the studies are presented and discussed on this paper, providing guidelines for the development of audio output adjustment algorithms and future manufacturing of mobile devices. Keywords: Media Control, Communication, Automatic Volume Adjustments, Context Awareness, Hand-held Devices, User Centered Design, Contextual Evaluation.
1 Introduction Nowadays, mobile devices are strongly integrated in peoples’ lives. The ubiquitous nature of these devices enables humans to use them in an enormous variety of contexts, which are defined by various sets of dynamic characteristics (contextual variables) that heavily affect user interaction. However, the differences amongst users (e.g. preferences, capabilities) and the frequent context mutations, which occur during device utilization (e.g. moving from a silent to a noisy environment), usually result on users’ adaptation to both contexts and interfaces available, and not the other way around. Most mobile user interfaces are unable to adapt effectively and automatically to the mutations of their utilization contexts, introducing difficulties on user interaction and, many times, inhibiting it [1, 2, 3]. Accordingly, it is necessary to explore new approaches for user interface design and development, aiming usability and accessibility improvements on mobile applications. To achieve this, applications must be constantly aware of their utilization contexts and respective mutations, naturally providing users with adequate interaction modalities, combining and configuring them according to the contexts in which the devices are used. The work presented in this paper addresses the contextual adaptation of audio output on mobile devices, focusing communication and media control scenarios. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 384–393, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mobile Interaction: Automatically Adapting Audio Output
385
Although there are solutions available for similar purposes, especially regarding communication scenarios, they consider only few, insufficient, dimensions of context: noise levels are used in order to adapt different aspects of audio output (e.g., ringtone volume, earphone volume) [4, 5, 6]. The personal dimension of context, which represents a non-trivial issue on user interface adaptation, is not addressed. Peoples’ preferences and capabilities are not considered. Nevertheless, nowadays, the available technology enables the transparent gathering of the information needed for this purpose [7, 8] and its correct employment may significantly increase usability and accessibility, consequently, improving user experience by assuring interaction‘s adequateness to both utilization contexts and user needs. This motivated the user centered design, development, and contextual evaluation of two context-aware prototypes. These prototypes gather noise levels from the surrounding environment, adapting audio output according to scenarios, user needs and environmental noise. They present a proof of concept that can be achieved through the utilization of the technology available on most mobile devices, and can be considered and improved on the manufacturing of new devices in order to overcome the limitations found by the studies conducted and presented on this paper. This paper starts by presenting and discussing the related work developed in this area. Following, it presents a strongly user centered design process that enabled the definition of a set of requirements, guidelines and design decisions that were considered during the prototypes’ implementation. Afterwards, it presents the prototypes created, and their underlying volume adjustment algorithms. It details and discusses the contextual evaluation of these prototypes, emphasizing the limitations and advantages of the algorithms created for audio output adaptations, as well as their users’ acceptance. Finally, the paper is concluded providing future work directions within this domain.
2 Related Work From simple headphones and headsets with physical noise canceling [9] to headphones and headsets that include very complex algorithms of noise elimination [10, 11], different solutions have been proposed and created in order to reduce the impact of noise on audio interaction. Firstly, it is important to emphasize that there are significant differences between noise canceling and noise-based automatic volume adjustments. While the first is of utmost importance in many scenarios (e.g. communicating inside a plane or close to a helicopter), it demands an aggressive noise elimination that is extremely complex and, consequently, extremely expensive, sometimes making the headphones themselves more pricey than regular mobile devices [9, 10, 11]. Accordingly, the latter presents a reliable and more affordable solution for contexts in which the environmental noise varies significantly, but not tremendously. Moreover, noise canceling solutions can become dangerous in several contexts. For instance, while a user is walking on the street, if he/she is completely inhibited from hearing the cars passing by, he/she can be injured. The work presented on this paper focus the contextual-based adaptation of volume in communication and media control scenarios. Regarding communication scenarios,
386
T. Reis, L. Carriço, and C. Duarte
Chris Mitchell developed an application that is capable of monitoring noise levels on the surrounding environment, consequently adjusting the ringtone volume of a mobile phone [4]. Regarding media control scenarios, there are several car stereos that adjust the volume according to the estimated noise generated by the car. Regarding both scenarios, Apple recently published a new patent with the US Patent Office [12]. They claim to show a new technology feature that may be included on their future products. Their concept is similar to the ones proposed on this paper: automatic volume adaptation on communication and media control scenarios. However, they envision the inclusion of sound sensor (an extra microphone) in order to capture noise, while we used only the hardware available on every mobile device including communication and media control capabilities. Moreover, this paper focuses the user centered design and contextual evaluation of such concepts. Finally, all the existent solutions consider noise as the only contextual variable affecting audio output preferences, which is, as we show on the following section, an incomplete assumption, especially in media control scenarios.
3 Early Design This section is dedicated to the early design stages of the two prototypes created. The design processes followed a strongly user centered methodology, involving 26 nonimpaired users: 16 male, 10 female, ages between 14 and 45 years old, familiar with mobile devices, especially mobile phones and mobile MP3 players. The users involved answered a questionnaire regarding important aspects related to the contextual adaptation of volume in both communication and media control scenarios. It was important to understand which context variables have an impact on users’ voluntary volume modifications, how the automatic volume modifications should be performed, and how useful the users’ believed the concepts proposed to be. The resultant information was carefully analyzed, culminating on a set of requirements, design decisions and guidelines, which were considered during the implementation of the prototypes. 3.1 Questionnaires Firstly, concerning users’ decisions regarding volume’s modification on the considered scenarios, it was important to understand which contextual variables have an impact on these decisions. Furthermore, it was necessary to quantify the impact of each variable on the different scenarios, defining a level of importance for each one of them. Accordingly, for the different scenarios, users were asked to rate the impact of a set of contextual variables on a scale from 0 (null impact) to 100 (great impact). The results (Fig. 1) indicate noise, with an impact of 100, as the only contextual variable affecting voluntary volume modifications on communication scenarios. Conversely, on media control scenarios, the variables influencing voluntary volume modifications are significantly more, and their impact considerably different. Noise remains the primary variable; however, its impact is reduced to 82. The task the user is engaged reveals an impact of 70 and interruptions generated by third parties an impact of 58. The remainder factors only apply to media control scenarios and are all related to specific characteristics of the media file being played (the song itself, album, artist,
Mobile Interaction: Automatically Adapting Audio Output
387
Fig. 1. User ratings: the impact of different contextual variables on voluntary volume modifications in both, communication and media control scenarios
and genre of music), presenting impacts between 30 and 45. Most users involved added that their emotional condition also influences voluntary volume modifications significantly, and can even influence the impact of the other context variables. Secondly, it was fundamental to understand users’ preferences regarding the automatic volume adaptation on the different scenarios considered. Two alternatives were implemented on a very simple application: gradual and direct volume adaptation. Fig. 2 a) depicts the differences between these alternatives. The gradual volume adaptation increases volume gradually (in approximately 0.8 seconds) from its current value (50 on the example provided) to the value suggested by the adaptation algorithm (80 on the example provided). On the other hand, the direct volume adaptation performs the same action in a quarter of that time (approximately 0.2 seconds).
a)
b)
Fig. 2. Example of gradual and direct volume adaptations (a) and user preferences (percentage) regarding the volume adaptation alternatives on both scenarios (b)
A small laboratory experiment was conducted with all the users involved, in order for them to experiment and be aware of the differences amongst these alternatives. The results of such experiment (Fig. 2 b)) show substantial differences regarding the appropriateness of the alternatives implemented. 96% of the users involved prefer the gradual volume adaptation on both communication and media control scenarios. Finally, it was in the best interest of our team to understand how useful users believed the automatic volume adjustments to be. The results (Fig. 3) indicate a strong user acceptance for the concept proposed on both communication and media control scenarios. On communication scenarios 50% of the population involved rated the
388
T. Reis, L. Carriço, and C. Duarte
concept very useful, 30% rated it useful, 20% rated it slightly useful, and none of the persons involved considered the concept useless. On media control scenarios 58% of the population involved rated the concept very useful, 38% rated it useful, 4% rated it slightly useful, and none of the persons involved considered the concept useless.
Fig. 3. Users’ opinions about the usefulness of the concept proposed on communication and media control scenarios (percentage)
When asked about the discrepancies on the answers for each scenario, users emphasized the fact that they usually avoid noise on communication scenarios by moving to a silent place after answering a call and realizing that the noise is affecting the communication. Moreover, the users who rated the concepts useful and very useful were the ones which used mobile devices including communication and media control capabilities more often, and on contexts with significant environmental noise mutations (e.g. street, subway), while the remainder users used these devices mostly at work and at home.
4 The Prototypes This section explains the noise monitoring process of the prototypes created, details the two automatic volume adjustment algorithms created, and the logging mechanisms implemented in order to ease the contextual evaluation of the prototypes. The mentioned prototypes are available for devices running Windows Mobile and were written in C#, using Microsoft’s .Net Framework. 4.1 Noise Monitoring Both prototypes created use the noise monitor available in [4]. This monitor gathers sample values from the device’s microphone, consequently calculating loudness values in order to adjust the ring tone volume on a mobile phone (on a 0 to 5 scale). The measure used to calculate loudness is root-mean-square (RMS). 4.2 Algorithms for Automatic Volume Adjustments The environmental noise was considered the primary context variable influencing users’ decisions regarding volume modifications on both communication and media control scenarios. However, as described on section 3, there are other context variables that have a significant impact on these decisions. Moreover, users’ hearing capabilities
Mobile Interaction: Automatically Adapting Audio Output
389
must also be considered. Accordingly, despite behaving differently, the two algorithms created take all these dimensions into account, sharing some principles. For both algorithms, the noise spectrum varies from 0 to 127.5 [4] and the volume spectrum varies from 0 to 100. Moreover, both algorithms can be configured by defining 4 parameters that are accessible to the users (Fig. 4): • Minimum: defines the minimum volume that can be set by the algorithm. This boundary is defined in order to avoid adaptations that are inconvenient for the users (e.g. setting the volume to low due to the absence of noise). • Maximum: defines the maximum volume that can be set by the algorithm. This boundary is defined for the same reason as the above mentioned (e.g. setting the volume to high due to excessive noise on the environment). • Sensibility: defines the coefficient dividing the noise spectrum into noise levels. For instance, if sensibility is defined to 3 and the noise spectrum’s range is 127.5, the spectrum is divided in 43 noise levels. • Volume Step: defines the increase or decrease of volume whenever the noise on the surrounding environment goes up or down one level, respectively.
Fig. 4. Configuration screen presenting default values of the algorithms
The only contextual variable that is directly monitored by our prototypes is the environmental noise. Nevertheless, the remaining context variables are also considered (e.g. preferences, hearing capabilities, third party interruptions, etc.). These are indirectly expressed by the users whenever they perform a voluntary volume modification. For instance, if for any reason a user is not satisfied with the volume automatically set by the algorithm, his/her natural behavior would be to manually set the volume according to his/her preferences. When this happens, the algorithm registers a user preference, which is composed by a pair noise/volume, modifying the adaptation table accordingly and registering it on a XML file for posterior use. The developed algorithms behave differently on such situations and will be further explained: Automatic Volume Adjustments on Communication Scenarios. This functionality is achieved through the utilization of a non-continuous preference based algorithm. The non-continuity derives from the constraints imposed by the scenario considered (the user is talking on the phone) and the device used to create the prototype, which includes only one microphone. Accordingly, the microphone used to communicate is the same one monitoring the environmental noise. The automatic volume adjustment is direct and effectuated based on the noise levels gathered immediately before the user answers the phone.
390
T. Reis, L. Carriço, and C. Duarte
The preference base emerges from the last voluntary volume modification performed during a call. On the end of the call, a preference entry is registered and the adaptation table is modified according to that preference. This only happens on the end of the call because the noise monitoring is stopped in the meantime, while the user might move between contexts with different values of environmental noise. Automatic Volume Adjustments on Media Control Scenarios. This functionality is achieved through the utilization of a continuous preference based algorithm. This algorithm was specifically designed for media control scenarios in which the users are wearing headphones. Accordingly, the sound produced by the media being played does not influence the noise monitoring process, enabling a continuous utilization of the device’s microphone in order to monitor noise. Therefore, the automatic volume adjustments are applied continuously and gradually, while the user is controlling the media. The preference base emerges from users’ voluntary volume modifications. When these take place, the algorithm assumes that the volume set by the user represents his/her preferences for the noise captured at that moment, overriding his/her previous preference. Accordingly, the table noise/volume defined by the algorithm’s sensibility and volume step is instantaneously modified in order for the noise registered to match the volume set, continuing the adaptation according to the modified table. 4.3 Logging Mechanisms The logging mechanisms were implemented in order to ease the evaluation process of the prototypes. These mechanisms enabled middle term studies to be conducted, removing the need for direct monitoring. Both prototypes created include these mechanisms and are able to register the users’ and the algorithms’ behaviors on a XML file. Every user action is registered and associated with a contextual stamp, which includes time and noise information.
5 Contextual Evaluation The prototypes created were evaluated through a strongly user centered procedure. The users selected to participate on this procedure were very familiar with mobile phones and media players. Moreover, there was a strong concern from our team on selecting users which used these devices on a broad variety of contexts (e.g. home, street, bus, subway, gym, etc.). These concerns emerged from the necessity of having a basis for comparison of our solutions with the existing technology, on several real contexts. Six users were involved: 3 male, 3 female, with ages between 18 and 35 years old. They used our prototypes on their day-to-day lives during a two week period. Accordingly, the logs gathered during the process represent the utilization of the prototypes on real contexts, under real, constantly mutating contextual constraints. In the end of the process, the users returned the utilization logs, which were carefully analyzed. Moreover, these users provided their feedback and opinions about the automatic volume adjustments, emphasizing situations where these were, and were not, satisfactory.
Mobile Interaction: Automatically Adapting Audio Output
391
5.1 Contextual Evaluation on Communication Scenarios Considering situations where the environmental noise decreases significantly during a call, all the users were slightly uncomfortable with the algorithm’s behavior, reporting that the volume became too loud and they had to manually set it down. Such situations occurred mostly on the beginning of the study, when the users would still bond to their natural behavior, moving to a more silent place after answering a call. However, as the users continued to use the prototype, they have all changed their behavior regarding this issue, moving to a more silent place only in situations where they didn’t feel comfortable discussing the topic of the call in front of other people. Regarding situations where the environmental noise doesn’t change significantly during a call and the topic of that call is not private, all the users were very satisfied with the algorithm. On situations where the environmental noise increased significantly during a call, users were very uncomfortable with the algorithm’s behavior. Such situations emerged mostly on public transportations (e.g. bus, subway). Users reported that the volume was too low, and that they had to manually modify it. Finally, on situations where the environmental noise was extremely loud, users reported that the maximum volume was not enough to maintain the conversation, suggesting the creation of alerts that advise users not to answer calls in such situations. 5.2 Contextual Evaluation on Media Control Scenarios Users reported several situations where they had to move between contexts with significantly different environmental noise values, without the need to manually modify the volume configuration. Such situations include going from home to the users’ workplaces, having to walk on the street, ride public transportations, get in and out of different buildings, etc. The log analysis corroborates these reports, showing no manual volume adjustments during long periods of time (until 2 hours), characterized by very discrepant environmental noise values. The only situations where users felt uncomfortable with the algorithm’s behavior were situations where they were interrupted by third parties, engaging conversations. They explained that on such situations the volume would start to increase (due to the increasing environmental noise generated by the conversation), culminating on a manual volume modification or on users removing their headphones. These situations are also corroborated by the logs gathered, where in some situations of increasing environmental noise the users manually set the volume to mute, and then back to its previous value. 5.3 Discussion The algorithm directed for communication scenarios was the one raising more questions. This occurred due to the non-continuity of the noise monitoring, implied by the hardware available on the device used, and the constraints of the scenario for which the prototype was developed (the only microphone available was being used to talk). However, the limitations of the algorithm, except the ones regarding the privacy of the topics being discussed during a call, could be overcome with the inclusion of another microphone on the mobile device. This microphone should be used in order to
392
T. Reis, L. Carriço, and C. Duarte
continuously capture noise, enabling continuous volume adjustments during the calls. Despite the limitations of the prototype created, users considered it better than their personal mobile phones, explaining that in the worst case scenarios their behavior was very similar to the one they had when using their phones: manually modifying the volume configuration. The continuous noise monitoring of the algorithm directed to media control scenarios revealed an excellent user acceptance in most contexts. However, third party interruptions generated noise, consequently leading to an automatic increase of volume, which resulted on uncomfortable situations for the users. This problem can be solved by separating speech from environmental noise as in [10] (not only the speech of the user using the device but also of the third parties engaged in conversations with this user). Nonetheless, such solution would clearly increase the complexity of the algorithms and the amount of hardware used. Overall, the studies conducted revealed that noise monitoring should be performed continuously, in order to enable accurate and continuous volume adjustments. There was a strong user acceptance of both prototypes created, and despite being able to modify all the algorithms’ parameters, the users involved on the evaluation procedures only personalized the maximum and minimum volumes, being very satisfied regarding the default sensibility and volume step of the algorithms. The utilization logs gathered also corroborate these affirmations.
6 Conclusion and Future Work In this paper we described the user centered design of two context-aware prototypes directed for communication and media control scenarios. These prototypes were built on top of a regular mobile device, including both communication and media control capabilities. They are capable of adjusting volume according to different aspects of the context in which they are being used, monitoring noise directly through the microphone, and considering user preferences and capabilities, which are expressed indirectly on the users’ voluntary volume modifications. Issues regarding the amount of contextual information directly monitored by the algorithms responsible for the volume modifications, are left untied and will be studied on our future work. The contextual evaluation of the prototypes revealed a strong user acceptance of the concept proposed, especially on media control scenarios. However, the studies conducted also point some issues that could not be overcome using only the hardware available on most mobile devices nowadays. Accordingly, the study also provides important information to be considered on the manufacturing of future mobile devices. Acknowledgments. This work was supported by LaSIGE and FCT through the Multiannual Funding Programme and individual scholarships SFRH/BD/44433/2008.
References 1. Barnard, L., Yi, J.S., Jacko, J., Sears, A.: Capturing the effects of context on human performance in mobile computing systems. Personal and Ubiquitous Computing 11(2), 81–96 (2007)
Mobile Interaction: Automatically Adapting Audio Output
393
2. Reis, T., Sá, M., Carriço, L.: Multimodal Interaction: Real Context Studies on Mobile Digital Artefacts. In: HAID 2008. LNCS, vol. 5270, pp. 60–69. Springer, Heidelberg (2008) 3. Schmidt, A., Aidoo, K.A., Takaluoma, A., Tuomela, U., van Laerhoven, K., Van de Velde, W.: Advanced interaction in context. In: Gellersen, H.-W. (ed.) HUC 1999. LNCS, vol. 1707, pp. 89–101. Springer, Heidelberg (1999) 4. Mitchell, C.: Mobile Apps: Adjust Your Ring Volume For Ambient Noise. MSDN Magazine (2008), http://msdn.microsoft.com/en-us/magazine/cc163341.aspx 5. Kumar, Larsen, Infotelimcithed, T.: Smart Volume Tuner for Cellular Phones. IEEE Wireless Communications (2004) 6. US Patent 7023984 - Automatic volume adjustment of voice transmitted over a communication device, http://www.patentstorm.us/patents/7023984/description.html 7. Baldauf, M., Dustdar, S., Rosenberg, F.: A survey on context-aware systems. International Journal of Ad Hoc and Ubiquitous Computing 2(4), 263–277 (2007) 8. Chen, G., Kotz, D.: A survey of context-aware mobile computing. Technical Report TR2000–381, Dartmouth College, Department of Computer Science (2000) 9. Review on noise canceling headphones, http://www.seatguru.com/articles/noise-canceling_review.php 10. The Jawbone Headset, http://eu.jawbone.com/epages/Jawbone.sf 11. The Boom Headset, http://www.theboom.com/ 12. Patent for automatic noise-based volume adjustments, http://appft1.uspto.gov/netacgi/nph-Parser? Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=20090022329.PGNR.&O S=DN/20090022329RS=DN/20090022329
Interactive Photo Viewing on Ubiquitous Displays*
†
Han-Sol Ryu1, Yeo-Jin Yoon1, Seon-Min Rhee2, and Soo-Mi Choi1 1
Department of Computer Engineering, Sejong University, Seoul, Korea
[email protected] 2 Integrated Media Systems Center, University of Southern California, USA
Abstract. This paper presents a method of showing photos interactively based on a user’s movements using multiple displays. Each display can identify the user and measure how far away he is using an RFID reader and ultrasonic sensors. When he approaches to within a certain distance from the display, it shows a photo that resides in his photo album and provides quasi-3D navigation using the TIP (tour into the picture) method. In addition, he can manipulate photos directly using a touch-screen or remotely using an air mouse. Moreover, a group of photos can be represented as a 3D cube and can be transferred to PDA for a continuous viewing on other displays. Keywords: photo viewing, distance-based interaction, multiple displays.
1 Introduction As digital cameras are becoming more popular, the demand for digital picture frames or digital photo frames that can store and display many photos is increasing. People want interactive photos and augmented photos to enrich their viewing experience, according to Darbari’s survey [1]. If useful facilities for user interaction are added to the digital photo frames, the displays can do more than simply show pictures; they can interact with users in various ways and can function as ubiquitous displays around the home, placed on convenient surfaces or attached to the wall. Several systems using a digital frame-type display have been studied in projects relating to ubiquitous health care. In the AwareHome project at Georgia Institute of Technology, indirect interactions with remote family members are facilitated using the Digital Family Portraits display [2]. This display looks like a picture, but can provide family members who live at a distance with information about their elderly relative’s everyday life, including their health, environment, relationships and activities. This information is summarized every week and gives the user a feeling that they are talking to distant family members through the frame. The CareNet display [3] developed by the Intel Research Center in Seattle enables users to access information directly by operating the menus of a touch screen, and it also allows images to be edited. These existing picture-like displays are not sufficient to give users enriched picture viewing experiences, because they only show pictures and permit only explicit *
This work was supported by the Seoul R&BD program (10581).
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 394–401, 2009. © Springer-Verlag Berlin Heidelberg 2009
Interactive Photo Viewing on Ubiquitous Displays
395
interactions, for instance through the menus of a touch screen [2-4]. In addition, these studies did not consider any continuous viewing across multiple displays. In this paper, we present a smart photo frame that allows a user to tour into a photo based on his movement. The photo interface is automatically changed according to the distance between him and the display. He not only can navigate the photo in 3D space, but he can also change its background space. Moreover, our system allows him to move photos to another display at home using PDA. The rest of this paper is organized as follows: Section 2 describes a hardware configuration of our system; in Section 3, interaction zones and user activities are explained; in Section 4, methods of generating a 3D space from a 2D photo are presented; in Section 5, a photo arrangement in 3D space is described; experimental results are given in Section 6 and lastly we draw some conclusions in Section 7.
2 A Smart Photo Frame The proposed photo frame consists of a touch panel, an LCD (15~32 inches), an RFID reader, ultrasonic sensors (2~4 sensors) and LEDs attached to the rear of the display, as shown in Fig. 1.
(a)
(b)
(c)
Fig. 1. Components of a smart photo frame; (a) a user wearing a RFID tag; (b) a small display (15-inch); (c) a large display (32-inch)
To identify a user, we use a 900MHz RFID reader (Infinity 210UHF: SIRIT) and a tag (Class 1, Gen 2 type). He wears a small tag that looks like a necklace or bracelet as shown in Fig. 1(a). When he approaches to within the reaction zone of the RFID reader, the RFID reader recognizes the tag and identifies him from its ID. A tag can be recognized reliably within 4.0m of the display with an external antenna. The RFID reader is able to identify several users simultaneously. In this case, the user whose tag responds most strongly to the RFID reader will be recognized as the primary user and receive the highest priority. Piezo-type ultrasonic sensors and an ultrasonic sensor board with a transmission and reception module are used to measure the distance between a user and the display. It also measures the direction of the user’s movement by ultrasonic sensors attached on the sides of the photo frame. For a small display, we attach two ultrasonic sensors on the left and right sides of the frame as shown in Fig. 1(b). However, we attach
396
H.-S. Ryu et al.
more sensors on the bottom of the frame for a large display, as shown in Fig. 1(c). The range of these sensors can be adjusted up to 7m. We also use the touch screen panel to operate icons on the display. Some detailed information can be gotten by touch interaction. In the event of an emergency, LEDs attached around the display call user’s attention by flashing the light.
3 Interaction Zones and User Activities The interaction zones corresponding to the proximity of a user to the display are shown in Fig. 2. We treat the space occupied by its user as having three zones like Vogel [5]. Fig. 3 depicts an activity diagram of our system in UML (Unified Modeling Language), which shows the flow of actions in the zones.
Fig. 2. Interaction zones corresponding to the user’s proximity
Fig. 3. An activity diagram showing the flow of actions
When he is outside all the zones, the photo frame functions as a simple picture frame and shows black-and-white pictures with little attention. In the display zone, ultrasonic sensors react to the presence of him and pictures are shown in color. Nevertheless, at this stage the photo frame does not demand excessive attention, although the display shows a high-level menu, with which he can interact using a mouse-like
Interactive Photo Viewing on Ubiquitous Displays
397
pointing device if required. In the implicit interaction zone, both the ultrasonic sensors and the RFID reader recognize him, and the image on the display becomes threedimensional. He can tour into the photo by movement-based interaction. In the touch interaction zone, he can operate menus directly using the touch screen, and access detailed information related the photos. If an urgent message is sent from the server system, it is displayed on the screen and the LEDs light up regardless of the proximity of him.
4 Interactive Photo Viewing Based on the User’s Movements When a user moves into the implicit interaction zone and the selected photo has 3D information such as a vanishing line, a 3D space is simulated using the TIP (tour into the picture) method [6,7]. This requires three input images: an original image, a background image in which foreground objects are removed, and a mask image in which foreground objects are colored white and the background is colored black.
Fig. 4. The 3D reconstruction of the background and the foreground objects using a vanishing line
Fig. 4 shows the 3D reconstruction of a photo using a vanishing line. To render the 3D background which consists of a ground plane (2', 4', 5', 6') and a back plane (1', 3', 5', 6'), we assume that the camera is positioned at the origin O, the viewing direction is +z, and the vertical direction is +y in 3D space. In Fig. 4, points 5 and 6 are the intersections of the vanishing line with the image boundary. Foreground objects are selected by means of the pre-processed mask image. These objects appear to be placed on a billboard that moves independently of the background, providing the user with a simple illusion of moving in 3D space. To get the 3D coordinates of the foreground objects, we computed intersection points of the billboard and the ground plane. In Fig. 4, the foreground object is represented as a quadrilateral with four points (p'1, p'2, p'3, p'4) and it is stood orthogonally on the ground plane. The coordinates of p'2 and p'3 are computed from the camera position and the points (p2, p3) on the image plane. After 3D reconstruction of the photo, the position and orientation of the virtual camera can be changed by the user’s movements, as shown in Fig. 5. When he approaches to the display, the camera moves toward the back plane in 3D space like zoom in, and when he moves right or left direction, the camera rotates the same direction like panning. In addition, when he moves forward to the display, the detailed information of the photo appears gradually such as date, names, places, etc. If he
398
H.-S. Ryu et al.
(a)
(b)
Fig. 5. Camera movements according to the user’s movements; (a) backward and forward movements; (b) left and right rotations
moves back it disappears. The level-of-detail technique for the detailed information can naturally attract his interest to the display without requiring excessive attention.
5 A Photo Arrangement in 3D Space When a user is in touch interaction zone, he can interact explicitly with the display using touch screen menus. He can categorize photos and select one to be shown on the display. Unlike other photo frames, our system displays photos in 3D space. Thus, we can give 3D effects on photos when they appear or disappear such as rotation, translation, flipping effects. As a good rule of thumb of object-oriented design, more than seven entities are not used at the same time for easy understanding. Here, we use a cube metaphor. As shown in Fig. 6, a group of photos is represented as a 3D cube, which can be moved in 3D space freely. Each cube contains from one to six photos. When it is touched, contained photos are automatically arranged on the screen. Using the menus on the left side, the selected photo can be zoomed in or zoomed out, and some cubes can be transferred to PDA to continue to see the photos while he is moving.
Fig. 6. An automatic photo arrangement in 3D space and touch interaction
6 Experimental Results Several experiments were conducted to investigate the effects of the proposed digital frame in different interaction zones. Fig. 7 shows how the screen is changed when he approaches to the display. As shown in Fig. 7(b), the detailed information gradually appears on the right side of the screen depending on the extent and direction of his
Interactive Photo Viewing on Ubiquitous Displays
399
movement. In our study, we assumed that one person navigates at once in the implicit interaction zone. After recognizing the primary user by the RFID, the ultrasonic sensors attached on the frame react to the user’s movements until it is clear that the user has left the implicit interaction zone. Fig. 8 shows the result of 3D navigation of a photo using our sensor-based interaction. (a)
(b)
Fig. 7. Backward and forward movements of the user
Fig. 8. 3D Navigation by sensor-based interaction
After reconstructing a 3D space from a photo, we can change its background image. Thus, some virtual photos can be created, as shown in Fig. 9. This can make people feel that they are traveling through some places that they have not visited before. It can add some fun effects.
(a)
(b)
(c)
Fig. 9. Virtual tour; (a) Sejong university in Korea (original photo); (b) Asakusa temple in Japan (virtual photo); (c) Opera house in Australia (virtual photo)
400
H.-S. Ryu et al.
In the touch interaction zone, several photos can be summarized into a 3D cube using a touch menu. If the 3D cube is double-touched, all photos in the cube are automatically arranged depending on the number and size of photos, together with 3D effects (See Fig. 10). In addition, users can flip the photos through a traditional album-like interface.
Fig. 10. The automatic arrangement of photos in 3D space
In order to see some photos while being away from the display or move them to another display, several 3D cubes can be transferred to PDA as shown in Fig. 11(a). The circled area shows the transferred 3D cube. To reduce the transmission time and display at a low resolution, we decreased the image resolution to 320×240 pixels. To see the photos again in large, he can approach to the previous or another large display at home, as shown in Fig. 11(b). The information of his current state is transferred from the PDA to the large display through the server, such as the name of cubes, the name of photos, etc. Thus, he can resume his interaction with the display.
(a)
(b)
Fig. 11. User interaction within the touch inter action zone
7 Conclusions We developed a smart photo frame to improve the quality of interaction not only by explicit feedback, but also by implicit feedback from the user. The proposed system reacts to user’s movements without requiring excessive attention from the user based on the concept of different interaction zones. Moreover, 3D navigation of a photo in the implicit interaction zone made the photo more memorable with a fun. The system
Interactive Photo Viewing on Ubiquitous Displays
401
also provided a partial migration service with which the user can change a device and continue the interaction.
References 1. Darbari, A., Agrawal, P.: Enliven Photographs: Bringing Photographs to Life. In: The 4th international symposium on ubiquitous VR (2006) 2. Mynatt, E., Rowan, J., Craighill, S.: Digital Family Portraits: Supporting Peace of Mind for Extended Family Members. In: Proc. of CHI 2001, pp. 333–340 (2001) 3. Consolvo, S., Roessler, P., Shelton, B.E.: The CareNet Display: Lessons Learned from an In Home Evaluation of an Ambient Display. In: Davies, N., Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, pp. 1–17. Springer, Heidelberg (2004) 4. Molyneaux, D., Kortuem, G.: Ubiquitous displays in dynamic environments: Issues and opportunities. In: Proc. of Ubicomp 2004 (2004) 5. Vogel, D., Balakrishnan, R.: Interactive Public Ambient Displays: Transitioning from Implicit to Explicit, Public to Personal, Interaction with Multiple Users. In: Proc. of UIST 2004, pp. 137–146 (2004) 6. Horry, Y., Anjyo, K., Arai, K.: Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image. In: Proc. of ACM SIGGRAPH, pp. 225–232 (1997) 7. Kang, H.W., Pyo, S.H., Anjyo, K., Shin, S.Y.: Tour Into the Picture Using a Vanishing Line and it’s Extension to Panoramic Images. In: Proc. of Eurographics, pp. 132–141 (2001)
Mobile Audio Navigation Interfaces for the Blind Jaime Sánchez Department of Computer Science, University of Chile Blanco Encalada 2120, Santiago, Chile
[email protected] Abstract. In this paper we present a set of mobile, audio-based applications to assist with the navigation of blind users through real environments. These applications are used with handheld PocketPC devices and are developed for different contexts such as the neighborhood, bus transportation, the Metro network and the school. The interfaces were developed with the use of verbalized and soundbased environments. The usability of the hardware and the software was evaluated, obtaining a high degree of acceptance of the sound and user control, as well as a high level of satisfaction and motivation expressed by the blind users. Keywords: blind navigation, orientation and mobility, mobile audio interfaces.
1 Introduction The problems faced by blind users in mobile contexts are diverse and nondeterministic. This makes it difficult for such users to make decisions on what routes to follow, resulting in movement with very little autonomy. Furthermore, blind users orient themselves in space by using straight angles, which does not allow them to develop a full representation of the real environment. One way to resolve this problem is by navigating through the use of a clock system 10. A clock system in combination with mobile technology can be a valuable alternative to help with the mobility and orientation of blind users. Having a mental map of the space in which we travel is essential for the efficient development of orientation and mobility techniques. As is well known, the majority of the information needed for the mental representation of space is obtained through the visual channel [5, 12]. For blind people, key environmental information is received through the spatial relations constructed by the remaining senses. Despite this limitation, the cognitive mapping skills of blind people are flexible enough to adapt to this sensory loss. Even the congenitally blind are able to manage spatial concepts and are competent navigators [3]. Some generic problems blind people have when moving about have to do with localization and their perception of the environment, as well as choosing and maintaining a correct orientation, and detecting possible obstacles [14]. Jacobson & Kitchin 4 point out that the most important problem for the blind has to do with their incapacity for independent navigation and interacting with the rest of the world. Also, exploration can lead to disorientation, which is accompanied by the fear, stress and panic associated with the feeling of being lost. There is also a risk associated with obstacles C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 402–411, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mobile Audio Navigation Interfaces for the Blind
403
that cannot be detected by the body or with mobile aids such as the cane [13]. Ran et al. [8] propose that the main difficulty for the blind in the context of orientation and mobility is knowing where they are at any given time and which way they are going, and that reorienting themselves is especially complicated if they get lost.
2 Related Work There are several ways to help blind users achieve autonomous movement with the aid of mobile technology. One way is helping them with in situ assistive technologies in order to provide them with additional contextual information while they are moving; this is known as location technology. Such technology uses a variety of means such as RFID, IrDA, Bluetooth or WIFI, with which several solutions have been designed and developed to assist with the movement of blind users [2,6,7,8,15]. Some studies propose different modes of interaction for blind users who use mobile devices, which implies the implementation of entry modes that use tactile or voice commands, and outputs provided through verbal and/or iconic sounds [10,1]. Loomis et al. [6] has presented a study on the use of a GPS device that can guide a blind user in an outdoor environment. The synthesized voice-based system helped users to be able to identify the shortest route between two points. GPS does not work in indoor spaces, and in such contexts it is necessary to recur to other methods. Gill [2] presents a solution by using infrared technology standards (IrDa) that work as sensors to determine the user’s indoor location. The Drishti system, developed by Ran et al. [8], uses a combination of GPS for outdoor navigation and ultrasonic sensors for indoor navigation. One problem concerning GPS is the error associated with the measurements taken, more so when associated with a cloudy climate or if there are very tall buildings in the area. For the indoor environment, the blind user must carry two ultrasonic sensors that receive signals that are transmitted from different points in the rooms. With these signals it is possible to detect the location of the users by processing and analyzing the data. A grid with RFID technology (Radio Frequency Identification) informs us on the location and proximity of a user in a certain environment [15]. Combined with Bluetooth technology the reading apparatus sends data to the user’s handheld device or a cellular phone, which analyzes the information and indicates the user’s position. Finally, Na [7] proposes BIGS (Blind Interactive Guide System), a grid system that contains a group of RFID tags that are placed on the floor of a room, and an RFID reader carried by the user. In addition, this system is capable of constantly monitoring the movements of and the route taken by the user, thanks to communication via WIFI.
3 Methodology 3.1 Mobile Audio Interfaces The 4 software programs presented in this paper are oriented towards developing navigation (orientation & mobility) abilities and strategies in blind users through the use of a mobile PocketPC device: 1. aGPS is used to navigate a neighborhood, which is a space that they travel daily, but deficiently and not in all its magnitude. Also,
404
J. Sánchez
throughout their lives they may need to visit and navigate various unknown neighborhoods; 2. AudioTransantiago is a mobile application that provides assistance for using public transportation, particularly a bus; 3. mBN is mobile software that helps to move through and use the Metro network; and 4. MOSS is a system that provides necessary assistance for moving about without problems in an indoor environment such as a school or a specific building. aGPS. In the aGPS software the entry interface consists of 3 buttons on the PocketPC. The first button is used to enter the starting point of a path, which could be the first position entered when starting the software, or a location entered after having changed directions. The second button is used to ask the system for information. When the user presses this button, the Text-to-Speech engine (TTS) replies with the distance to and direction of the destination (using the clock system to express the direction). The third button is used to change the destination point. When the user presses this button, he/she navigates a circular list 11 with different destinations read by the TTS. The output interface is made up mostly of the TTS. As previously mentioned the TTS responds to the user’s requests when he/she presses a certain button. The only output provided to the user consists of the distance to and direction of the destination point, and the names of the destination points. The user is not provided with the routes to be taken. The user must decide the paths to follow in order to arrive at the destination. There is also a visual interface that provides information regarding the destination point, the distance to and the direction of the destination point at any given time. This interface is used to help the facilitators to be able to support blind users in their learning for navigation purposes. AudioTransantiago. AudioTransantiago stores contextual information on each stop of the Transantiago routes, from which the user chooses in order to plan his/her trips in advance. In addition, this software navigates the stops of a previously planned trip in order to strengthen orientation and facilitate the blind user’s journey. AudioTransantiago uses an audio-based interface consisting of a TTS engine and non-verbal sounds that help to identify the navigational flow within the application menus and through which it conveys information to the user. The interface is made even better by a minimalist graphic interface that includes only the name of the selection that is being used and the option that has been selected, including a strong color contrast (yellow letters over a blue background) that is useful for users with partial vision, but legally blind, who can only distinguish shapes when displayed as highly contrasting colors. Navigation through the software application’s different functions is performed through circular menus 11 and the use of the lower buttons on the PocketPC. The advantage of these menus is that they facilitate searches within the lists, which have a large number of different elements. The software application’s two operational functions can be accessed through this structure (planning a trip and making a trip), as well as their respective submenus, which are explained in the next section. mBN. The mBN, or mobile Blind Navigation, is a navigational system for use in a Metro network. The mBN software contents are presented in a hierarchy of menus
Mobile Audio Navigation Interfaces for the Blind
405
displayed on the screen and also as audio cues. A menu has a heading and a set of items; the number of elements in each set has to meet the cognitive usability load restriction of 7 ± 2 chunks of information. Menus can be defined as circular 11 or normal according to the way in which they are explored. When using mBN software, users have to execute commands through the touch screen of a PocketPC. The interface was designed and developed “with” and “for” users with visual impairments. The interaction is achieved with the corners of the PocketPC screen by joining adjacent corners. Thus the software registers, analyzes, and interprets the movements and jumps of the pointer. With this information, the software knows whether a command was executed. A blind user’s interaction with the touch screen is performed without the need for the pointer pen (stylus) by using touch to map the relief of the four corners needed to construct and execute a command. The information managed by mBN is represented internally by strings transmitted to the user via spoken audio texts and high contrast color text on the screen. A TTS engine performs the translation of the written information to audio speech messages. These messages are complemented by earcons for a higher degree of attention and motivation when interacting with the software. MOSS. This interface is mainly audio-based, in which information is provided through sound in two different ways. On one hand, iconic sounds (sound effects) associated with the different actions that the user performs (walk, navigate the menu, etc.) were used, which also provide contextual information (pass through a door, walk down a hall, bump into a wall, etc.). On the other hand, a TTS engine was used to provide information verbally (for the description of an element, or current position, etc.). One of the main actions that the user can perform is SoundTracing (ST). ST follows the metaphor that the individual emits a ray that detects all the objects that are in a certain direction, even if there are solid objects in the way. To generate an ST, the user must make a gesture on the touch screen of the PocketPC device that represents a line in the direction that he/she wants the ray to go. 3.2 Evaluation For each one of the mobile audio interfaces a usability evaluation was made in order to detect the level of acceptance for the applications and their potential for use. This was done to determine whether the users were able to interact with the PocketPC device by using the sound-based interfaces. It was expected that users would be able to recognize both entry and output means of interaction (buttons, screen and audio). Sample. The participants in the usability test for the aGPS software consisted of 4 users (two boys and two girls) with ages ranging from 11 to 13 years old, and all of who attended the Santa Lucia School for the Blind in Santiago, Chile. They had a variety of ophthalmic diagnoses and degrees of vision. The sample for the evaluation of the AudioTransantiago prototype consisted of 6 legally blind participants between the ages of 27 and 50 years old, all of who were residents of Santiago, Chile. They had a variety of ophthalmic diagnoses, 3 of them had partial vision and all were men.
406
J. Sánchez
The sample for the usability evaluation of mBN consisted of 5 people, aged 19 to 28 years old, from the Santa Lucia School for the Blind in Santiago, Chile. Four of them had partial vision and one was blind. It is important to mention that none of these users had any previous experience interacting with PDA devices. The sample to evaluate MOSS consisted of five children aged 8 to 11 years old, including three girls and two boys. Two of them attended a segregated school (fifth grade), while the rest attended integrated schools (3rd, 4th and 6th grade) and were held to the same standards as their sighted classmates. Of all the participating users, only one had partial vision (non-functional). On the other hand, the users did not present any additional deficit other than their visual disability. All of users were legally blind (totally blind and partial vision). For all the usability evaluation sessions, two special education teachers specializing in visual disabilities and one usability evaluation engineer participated. Instruments. For the usability evaluation of the aGPS, an End-User Usability Questionnaire was used that consisted of two parts: (1) A set of 24 closed questions with a scale of appreciation from 1 to 5; 12 questions regarding the software and 12 on the hardware, and (2) A set of 7 open questions that were extracted from Sanchez’s Software Usability Questionnaire 9. The questionnaires were read and explained by facilitators and answered by users. The usability evaluation of AudioTransantiago was performed by means of a Software Usability Questionnaire 9 adapted for adult users in the context of this study. This questionnaire includes 18 closed questions on specific aspects of the software’s interface, together with 5 more general, open-ended questions regarding trust in the system, the way the system is used, and the perceived sense of utilizing these devices as a way to help users travel on a bus system. The results obtained can be grouped into four categories: (1) Satisfaction, (2) Control and Use, (3) Sound, and (4) Image. To evaluate the usability of the mBN software, automatic data recording was used. This consisted of data structured in XML format that is internally stored by the software during the user’s interaction, registering data on every key used, the Metro stations taken, and the time used to perform each action. To support the data collection process for usability testing, complementary software was created (AnalisisSesion). This software checks the data recorded during mBN sessions (automatic data recording). The end user usability evaluation of MOSS focused on user acceptance, with questions on whether the user liked the software, which things he/she would change or add to the software, what use the software had for him/her, and other similar questions. These questions were based on Sanchez’s Software Usability Questionnaire 9. Each statement on the software was evaluated with a score from 1 (strongly disagree) to 10 (strongly agree). Procedure. Each usability evaluation was completed during two 60-minute sessions. In each session, the users interacted with the software for 25-30 minutes in order to evaluate the effectiveness of their interaction with the buttons and the PocketPC screen, control and use, and the clarity of the audio support. Each session involved the following steps: (1) Introduction to the software. The functions of the software application and its use through the PocketPC buttons were
Mobile Audio Navigation Interfaces for the Blind
407
explained to the participants. (2) Interaction with the software. The users tried out the software’s functions and the use of its buttons. At this point they also planned a trip as their final task. This trip was arbitrarily defined so as to be used in a later cognitive impact evaluation. (3) Documentation. Sessions were documented in detail through the use of digital photographs. (4) Evaluation. The Software Usability Questionnaire was administered. Based on the comments and recommendations the participants provided, the software was modified and redesigned in order to improve its usability. 3.3 Results Figure 1 shows the average scores obtained for the software and the hardware used. All scores are over 4.2 points, on a range that varies from 1 to 5 points. This means that the users’ evaluation of the software and hardware’s usability was quite satisfactory. These results are the same for the four dimensions of the software that were analyzed, in that all average scores are 4 or above, which indicates a high degree of user acceptance regarding each of the dimensions analyzed.
Fig. 1. Usability results of the software aGPS
The dispersion of the data is similar for both the software and hardware variables and the satisfaction and control & use variables. For software and hardware the standard deviation is between 0.348 (software) and 0.357 (hardware), with a kurtosis of 2.980 (software); 3.210 (hardware) and skew of 1.673 (software); 1.725 (hardware), which means that the evaluations of the software and hardware received very similar opinions, with a slight deviation towards higher scores. For control & use, the standard deviation is slightly lower than that for satisfaction (SD = 0.479 and 0.5 points respectively). The case of the image dimension is distinct, in that the degree of dispersion is far greater than that observed in the other dimensions (SD = 0.816). A kurtosis of -1.289 for control & use and of -3.901 for satisfaction was obtained, with skew of - 0.855 for control & use and skew of -0.37 for satisfaction. Users were able to construct a correct map of the software. Their mental models easily grasped the application. The usability data showed that the proposed interface was easy to use and easily understood by blind users. The usability questions for AudioTransantiago were evaluated on a scale ranging from 1 to 10 points, 10 being the highest. On average, the values obtained for all the items were quite satisfactory, obtaining an average of over 9 points for each item. The totally blind users assigned a score of 10 points for all the questions, while those users with partial vision assigned slightly lower scores (average of 9.02 points) (Fig. 2). As
408
J. Sánchez
can be seen in table 1, users assigned high scores to all 4 dimensions. The scores are higher than 9.2 points for all dimensions. The most highly evaluated dimension is image, although this dimension was only analyzed by three users who were not total blind. This dimension also has the lowest degree of dispersion among the answers, with a standard deviation of 0.577. The degree of dispersion increased slightly for control & use (SD=0.698), satisfaction (SD=0.816) and sound (SD=1.123). The control & use and sound dimensions obtained a kurtosis of -0.053 and – 1.646 points respectively. Satisfaction obtained a Kutosis of 2.774. The skews for all the dimensions were the following: -1.732 (images); -1.276 (control & use); -1.783 (satisfaction); -1.006 (sound). This means that the highest degree of agreement is reached in the satisfaction dimension, and a comparatively lower degree of agreement is reached for control & use and sound.
Fig. 2. Usability results of AudioTransantiago software
In the case of mBN software the usability evaluation sessions provided information that validated the event and sound feedback, the logic of the interface, the design, and the programming strategy. It also favored the improvement of the design and coding for the following milestones. Information was gathered on the time that a user needed to use the functions through the proposed input interface by dragging the pointer from one corner to another. The average time taken by the users for the different tasks assigned was 0.693 seconds, with a standard deviation that reaches 172.48 points. The distribution of the users’ times shows a kurtosis of 2.358 and a skew of -1.225. With this information, a 2.5-second limit was established for entering a command. After this time, the action is timed out (Table 1). Table 1. Action spent time
Minimum Average Maximum Timed Out
Seconds 0,325 0,69625 1,35 2,5
The device’s screen can be used as support for the audio interface in the case of users with partial vision and for teachers involved in the testing. The same restrictions were obtained as those observed for the mBN software, with functions that should be implemented in the logic for menus, requirements, organization, and the debugging of
Mobile Audio Navigation Interfaces for the Blind
409
contents presented in the software, such as including a menu with the value of a ticket over time, and including relevant information about the station’s surroundings. The proposal to present information on the stations’ surroundings is related to the orientation and mobility cues that blind people use when navigating urban environments. These cues are: street numbers, cardinal points regarding traffic direction, cardinal points regarding street curbs, street intersections and other urban landmarks (sidewalks, stairs, rails and traffic lights). Figure 3 displays the users’ satisfaction with the MOSS software. This dimension obtained 9.5 points of a total of 10, and is followed by control & use with 9.17 points, and interface with 8.20 points. The standard deviation for the first two dimensions was 0.16, reaching 0.60 for interface. A kurtosis of 0 and a skew of 0 for all three dimensions show that the distribution for the three dimensions is symmetrical. On average a score of 8.88 points was obtained, which is an extremely relevant result that assures the usability and acceptance of the software. Some of the most highly evaluated statements were: “I like the software’s sounds (9.8 points), “I learned with this software” (9.6 points), and “I like the software” (9.4 points). The lowest scores were obtained for the statements, “The software adapts to my rhythm” (8.0 points) and “I felt in control of the software’s situations” (8.2 points), which reveals the existence of a certain learning and appropriation curve. In general, however, a high degree of appreciation was obtained from the end users.
Fig. 3. Results of the end user evaluation of the MOSS software
4 Conclusions Four prototype software applications were evaluated that seek to support the navigation of blind users in real environments such as a neighborhood, public bus transportation, the Metro network and the school or a closed building. The interfaces of all prototypes evaluated are adequate for use by blind users. During the interaction it was possible to observe that users easily learned and recognized the audio cues and the functions used in the software, as well as their meaning. Through the evaluation of all the software applications we could determine that the use of a PocketPC was appropriate for the end goals of this study, in that the participants learned to use the device without any major difficulties, demonstrating a high level of skill in using the buttons on the PocketPC. Users were highly receptive to the 4 software applications, and were motivated by their use of the system.
410
J. Sánchez
Also, the use of both the synthesized voice and the non-verbal sounds in the audio system were highly accepted by the users. In this case, the natural sound of the TTS and the clarity of the sounds in general were highlighted. In particular, the clock system that the software used to transmit information regarding directions was easily assimilated by users with visual disabilities. The use of all the software applications allowed for relevant navigation by the users because it provided specific information to guide them during their travel. Because the handheld apparatus was a new device for them, there were some difficulties in the very beginning, but users slowly began to adjust to using the device. They discovered solutions such as using it from their pockets with earphones in order to avoid losing the auditory references in their surroundings, and choosing a safe and comfortable place in which to handle them.
5 Discussion The interfaces of the software applications developed were evaluated by using a sample made up of participants with different ages and degrees of blindness, verifying in all cases that the users were able to interact with all the mobile software applications independently. At the same time they demonstrated that the handheld device, the interfaces designed and the model of interaction were all appropriate for users with visually disabilities. Although the samples used for this evaluation were limited, the different contexts of use and the various users’ backgrounds allowed us to detect several usability problems (real and potential), as well as to measure the level of understanding the objective of the software, embedded representation and the ways to navigate and interact with it. During the interaction it was possible to observe that the users quickly learned and recognized the audio cues used in the software and their meanings. They were able to understand the model of interaction and the metaphor used. As far as the use of the device, none of the users had a hard time finding and identifying the buttons, the joystick or the screen. Besides these significant usability results, the evaluation became a useful opportunity to detect problems and opportunities to improve the design, as well as to correct the software’s programming and modeling errors. Acknowledgements. This report was funded by the Chilean national Fund of Science and Technology, Fondecyt #1060797 and Project CIE-05 Program Center Education PBCT-Conicyt.
References [1] Dowling, J., Maeder, A., Boldes, W.: A PDA based artificial human vision simulator. In: Proceedings of the WDIC 2005, APRS Workshop on Digital Image Computing. Griffith University 2005, pp. 109–114 (2005) [2] Gill, J.: An Orientation and navigation System for Blind Pedestrians (2005), http://isgwww.cs.uni-magdeburg.de/projects/mobic/ mobiruk.html (last Accessed, January 2009)
Mobile Audio Navigation Interfaces for the Blind
411
[3] Jacobson, R.: Navigation for the visually handicapped: Going beyond tactile cartography. Swansea Geographer 31, 53–59 (1994) [4] Jacobson, R., Kitchin, R.: GIS and people with visual impairments or blindness: Exploring the potential for education, orientation, and navigation. Transactions in Geographic Information Systems 2(4), 315–332 (1997) [5] Lahav, O., Mioduser, D.: Construction of cognitivemap0s of unknown spaces using a multi-sensory virtual environment for people who are blind. Computers in Human Behavior 24(3), 1139–1155 (2008) [6] Loomis, J., Marston, J., Golledge, R., Klatzky, R.: Personal Guidance System for People with Visual Impairment: A Comparison of Spatial Displays for Route Guidance. Journal of Visual Impairment & Blindness 99, 219–232 (2005) [7] Na, J.: The Blind Interactive Guide System Using RFID-Based Indoor Positioning System. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I., et al. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 1298–1305. Springer, Heidelberg (2006) [8] Ran, L., Helal, A., Moore, S.: Drishti: An Integrated Indoor/Outdoor Blind Navigation System and Service. In: Proceedings of the 2nd IEEE Pervasive Computing Conference, Orlando, Florida, March 2004, pp. 23–30 (2004) [9] Sánchez, J.: End-user and facilitator questionnaire for software usability, Usability Evaluation Test, University of Chile (2003) [10] Sánchez, J., Aguayo, F.: Mobile Messenger for the Blind. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 369–385. Springer, Heidelberg (2007) [11] Sánchez, J., Maureira, E.: Subway Mobility Assistance Tools for Blind Users. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 386–404. Springer, Heidelberg (2007) [12] Sánchez, J., Zúñiga, M.: Evaluating the Interaction of Blind Learners with Audio-Based Virtual Environments. Cybersychology & Behavior 9(6), 717 (2006) [13] Sasaki, H., Tateishi, T., Kuroda, T., Manabe, Y., Chihara, K.: Wearable computer for the blind – aiming to a pedestrian’s intelligent transport system. In: Proceedings of the 3rd International Conference on Disability, Virtual Reality and Associated Technologies, ICDVRAT 2000, pp. 235–241 (2000) [14] Virtanen, A., Koskinen, S.: NOPPA: Navigation and Guidance System for the Blind (2004), http://virtual.vtt.fi/noppa/noppa%20eng_long.pdf (last Accessed, January 2009) [15] Willis, S., Helal, S.: A Passive RFID Information Grid for Location and Proximity Sensing for the Blind User. University of Florida Technical Report number TR04-009 (2005), http://nslab.ee.ntu.edu.tw/iSpace/seminar/papers/ 2005_percom/passive_RFID_information_grid.pdf (last Accessed, January 2009)
A Mobile Communication System Designed for the Hearing-Impaired Ji-Won Song and Sung-Ho Yang College of Design, Inje University, 607 Obang-Dong, Kimhae, Kyongnam, Korea {dejsong,deyangsh}@inje.ac.kr
Abstract. This is a case study of the design of a communication system and its interfaces aimed at addressing the communication needs of the hearingimpaired. The design work is based on an in-depth investigation of the problems pertaining to mobile phone usage and general conversation difficulties of Korean deaf people. It was determined from this investigation that the technology-related issues of the hearing impaired are not limited to usability or accessibility, but arise from hindered executive actions and differing executive behaviors for achieving communication goals at varying levels of ability. Therefore the design study has developed a new approach to the unique communication needs of the hearing-impaired, as well as their behavioral patterns, and presents possible overall improvements in face-to-face and distance communication through mobile technology. Keywords: Hearing-impaired, Communication system, Behavior pattern.
1 Introduction Despite the increased usability and accessibility of information technology, there are still substantial challenges with regard to technology design for people with disabilities. There exists a broad range of activities that occur after the initial perception that a person must undergo to fully interact with a device and to fulfill the person’s interaction goal. [1] Although many technologies provide supplementary “accessible” means through which the disabled can perceive and react to interface information, this by no means renders them “usable” in that the interaction is often unsatisfactory. [4] To provide satisfactory technology usability, for users disabled or not, designers need to focus on consideration of users' practical goals and behaviors. In order to design a user-friendly technology for the disabled, we need expansion from a design perspective, in consideration of their overall goals and behaviors, that stretches beyond partial accessibility. This is a case study of the design of a communication system and its interfaces aimed at addressing the communication needs of the hearing-impaired. The design work is based on an in-depth investigation of the problems pertaining to mobile phone usage and general conversation difficulties of Korean deaf people. The design study has developed a new approach to the unique communication needs of the hearing-impaired, as well as their behavioral patterns, and presents possible overall C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 412–421, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Mobile Communication System Designed for the Hearing-Impaired
413
improvements in face-to-face and distance communication through mobile technology. This paper includes the knowledge gained from the investigation and explanation of the methods used to overcome the overall problems.
2 Gulfs in Disabilities’ Technology Interaction According to Donald Norman (1990), in the use of a product, there are gaps, called gulfs of execution and evaluation, between user goals and an artifact (Figure 1) [2]. The gulf of execution lies between our desired achievement (our goals and intentions) and the available means of physical execution. The gulf of evaluation reflects the amount of effort required by the user to interpret the physical state of the system and to determine how well expectations and intentions have been met. When an artifact is used, the gap is bridged through the user's specific actions. To bridge the gulf of execution, users perform actions as follows: 1. The user forms an intention to achieve a goal. 2. The user specifies a sequence of actions required to achieve the goal. 3. The user physically executes the intended actions to achieve the goal. The evaluation stage can be broken down into three parts also: 1. The user perceives the state of the world after the performance of some actions. 2. The user interprets those perceptions according to the expectations resulting from the actions. 3. The user must examine the current state of the world with respect to both his/her own intermediate expectations and his/her overall goal [1]. GULF OF EXECUTION
USER’S GOALS
PHYSICAL SYSTEM
GULF OF EVALUATION
Fig. 1. Interaction Gulfs (Norman (1986)) [3]
In the HCI area, the gulf framework between user goals and an artifact, has helped to discuss usability problems such as the ease with which users can find possible actions and execute them accordingly, and whether they can determine the interaction state without misunderstanding and can respond to it. For Instance, Norman suggested visibility, a sound conceptual model, good mappings and feedback, as typically good design principles to reduce the distance of gulfs [2]. However, for the disabled, whom have limited sensory or other abilities, solutions to the gulf problem are not limited to good conceptual model and feedback. From our investigation of mobile phone usage by the hearing-impaired, it was determined that the technology-related issues experienced by these people arise from hindered
414
J. Song and S. Yang
executive actions and differing executive behaviors for achieving communication goals at varying levels of ability. On the basis of Norman’s executive stage action framework, to achieve the goal, the disabled need to specify and execute a sequence of actions planned to achieve the goal. However, the limited sensory function of the hearing-impaired interrupts the sequence of communication actions, rendering them unable to achieve the goal through the same means as those with unimpaired hearing. This interruption extends beyond interface perception problems to fundamental communication activities. From the user research, we found that the hearing-impaired are giving up on their communication goals due to a lack of executive actions, and that they have altered their actions unrelated to hearing to supplement the broken actions and ultimately achieve their intentions, if only primitively. The differing behavioral patterns in supplementary actions have also lead to serious usability problems in their technology usage.
3 User Research on Communication for Hearing-Impaired With the aim of designing a mobile communication system for the hearing-impaired, we investigated hearing-impaired mobile phone usage and general conversations. Three sessions of focus-group interviews (FGI) with eight hearing-impaired people and three sign language interpreters were conducted in two Korean cities. The first interview was at Kimhae sign-language interpretation center, with interpreters whose job is to assist hearing-impaired people in their everyday communication. It was a pilotinterview to gather basic knowledge on how the hearing-impaired communicate and to get some advice on interview methods for deaf people. The second and third interviews were conducted with five deaf citizens and three deaf people from the Korea Association of the Deaf, at the Kimhae branch and Chanwon branch, respectively. The participants were aged from their twenties to forties; six people lived with their families and two lived alone. The FGI is designed to discuss issues relating to hearing-impaired communication, how they use mobile phones and what particular and practical needs they have for both mobile phone usage and general conversation. Because writing is insufficient in complex interview communication, due to the differing grammatical structure of sign language and written Korean, the second and third FGIs were interpreted by sign-language interpreters between a moderator and participants. The interpreted interviews were voice-recorded with participants' consent. From the interviews, we found hearing-impaired people to have the same objectives in general conversation and mobile-phone usage, communication, as those with unimpaired hearing, but with the obvious hindrance of their disability standing in the way of them accomplishing goals. The current technological design trends don't do enough to help bridge the gap. The main means of communication between the hearing-impaired is sign-language at close range. They use sign-language via video calling technology in distance communication, in addition to Short Message Service (SMS). In a conversation with a person of unimpaired hearing who is unacquainted with sign language, they use writing or lip-reading at close range, and SMS for distance communication. When a complex conversation is necessary with a person of unimpaired hearing (e.g. at a hospital or government office), either via face-to-face or by distance communication, deaf people often need sign-language interpretation
A Mobile Communication System Designed for the Hearing-Impaired
415
because SMS and writing are often awkward due to the differing grammatical structure of sign language and written Korean. However, the current interpretation service for such cases is available only in a face-to-face conversation involving a deaf person, an interpreter, and a conversational partner. On account of this, the hearing-impaired tend to give up most distance communication, and even close-range communication with those of unimpaired hearing (Figure 2). &O R V H U DQ JH F R P P X Q LF DW LR Q $KHDULQJ LPSDLUHG SDUWQHU
'L V W D Q F H F R P P X Q L F D W L R Q 9LGHRFDOO 6LJQ ODQJXDJH
6LJQ ODQJXDJH
$KHDULQJ LPSDLUHG SDUWQHU
606 :ULWLQJ $SDUWQHURI XQLPSDLUHG KHDULQJ
,QWHUSUHWDWLRQ Unavailable communication
$KHDULQJ LPSDLUHG SHUVRQ
Unavailable communication
$SDUWQHURI XQLPSDLUHG KHDULQJ
9RLFHFDOO $FRPPXQL FDWLRQ SDUWQHU
9RLFH 606 $SHUVRQRI XQLPSDLUHG KHDULQJ
$FRPPXQL FDWLRQ SDUWQHU
9LGHRFDOO
Fig. 2. Available executive actions for close-range and distance communication
In addition, limited hearing sensitivity differed other communication actions using mobile phones for SMS and video-calls. Because the hearing-impaired accomplish communication through these alternative means, to supplement their hindered communication channels, it has altered their SMS and video-call behaviors. Consequently, these differing behaviors lead to serious usability problems. Whereas SMS usage is secondary to voice calls for people with unimpaired hearing, SMS is the primary means of distance communication in Korea's deaf society. Differing from short information exchange in non-deaf people’s usage, for a simple but garrulous conversation, the hearing-impaired exchange at least four to ten times as many messages. Often in the SMS interface corresponding to the message received time, the sent or received messages forming a continuous conversation are mixed with messages from other conversations, interrupting the smooth flow of hearing-impaired communication. Because of these difficulties, sign language conversations between deaf people via video call have recently enjoyed an increase in popularity. However, the videocalling interface is designed to show only the speaker’s face, and is generally too small to accommodate the chest and the two hands making signs, or postures in which signs are made using only one hand while the other hand holds the phone. In being alerted to mobile phone signals, the hearing-impaired have shown particular behaviors: because the vibrating signals supplementing auditory signals aren’t effective if a device is not in contact with the user's body, the hearing-impaired will hold a mobile
416
J. Song and S. Yang
phone at all times so as to not miss incoming signals. Some participants said they hold the device even when they sleep, or put it under their pillow. Additional details of the problems involved with each communication method and mobile phone usage revealed in the FGI are presented in Table 1. Table 1. Hearing-impaired peoples' communication methods and associated problems &RPPXQLFDWLRQPHWKRGV
3UREOHPVGLVFRYHUHG
:ULWLQJ
y Not fluent due to the differing grammatical structure of sign language and written Korean. y Writing ability is varied depending on educational background y Writing devices required (pen and paper)
&RPPXQLFDWLRQYLD LQWHUSUHWDWLRQ
606
y Available only for face-to-face conversation y Due to limited interpreter numbers, face to face interpretation is not readily available y Immediate interpretation is required in emergency situations y The hearing-impaired often find themselves in unfavorable situations without an interpretation service y y y y
Awkward in writing due to grammar problems Only for simple conversation At least four to ten times as many messages required Mixed messages from other conversations can hinder fluent communication
9LGHRFDOO
y Screen too small for sign-language y One handed sign-language while the other hand holds the phone y Hard to hold during long conversations (arm aches) y Invisible in a dark place or at night time y Service charge is expensive
2WKHUV
y Calling signals often missed even if set to vibration signal y Video phone (local line) and mobile phone video calls are not compatible y Door bell is not perceived
4 System Design 4.1 Design Process From the user research results, we found that many different problems pertaining to communication and mobile phone usage by the hearing-impaired are primarily caused by the fact that current technology is not designed to support their unique communication needs and behavioral patterns. From this discovery, our design goals developed
A Mobile Communication System Designed for the Hearing-Impaired
417
two major focuses: firstly, providing an appropriate set of communication means to cover all distance and close-range communication, so as to fulfill the communication needs upon which the hearing-impaired have all but given up; secondly, we propose to provide improved interaction and interface for each communication method to better suit the unique communication behavioral patterns of the hearing-impaired. To achieve this design goal, our design approach was as follows: First, we recommend a communication framework to offer sufficient available executive actions to cover the majority of hearing-impaired communication objectives with other deaf people, as well as with people with unimpaired hearing, in close-range or distance communication. Second, on the basis of this framework, we designed a communication system and interfaces. Three personas and six design scenarios, reflecting communicational situations selected from the user research, were employed as the major tools in designing the system interaction, and form factors fitting deaf peoples' unique behavioral patterns. Design scenarios describing user behavior and experience are essential in discussing and analyzing how the technology is (or could be) reshaping their activities [5].
Fig. 3. User test with deaf people (Left) using video prototype (Right)
Third, a user evaluation was conducted, in which six deaf people and two interpreters reviewed the system design via video and model prototypes, and expressed their opinions and identified possible problems (Figure 3). In this way, it was verified that the design concept of the communication system would be effective in addressing the overall communication needs of the hearing-impaired. Currently, the project is undergoing a procedure of design refinement for further development. 4.2 Communication Framework and Interaction Design The communication means of the system are developed from a communication framework as previously mentioned. A deaf person’s communication environments were divided into close range and long distance, and they are distinguished by whether the partner is hearing-impaired or not. In each distinguished communication situation, appropriate communication methods were developed, as indicated in Figure 4. The framework suggests new communication methods such as remote interpretation service or interpretation calls, as well as sign-language video call, SMS, and
418
J. Song and S. Yang
digital memo, which are those currently used by Korea's deaf citizens. A remote sign language interpretation service, connecting the device to the interpreter’s video phone, helps face-to-face communication between people of impaired and nonimpaired hearing. Similarly, for distance communication in the same situation – which Korea's hearing impaired have basically given up on – the system is designed to provide an interpretation call service by establishing a connection between the hearing-impaired person, an interpreter, and the non-hearing-impaired person, through video calling.
Fig. 4. Proposed framework to develop communication methods of the system
The mobile device is designed to provide various means of communication, developed from the framework, via sign-language video calls, threaded SMS, digital memos, remote interpretation services, and interpretation calls (Figure 5). The interfaces and form factors of the devices are intended to suit the behavioral patterns and needs of the hearing-impaired in sign-language conversations using two hands, SMS conversations with several exchanges, and digital memos to another person. The device has two screens. The screens are slightly bigger than those of normal mobile phones (3.3 inches) to show each person's sign language, including the chest and the hands making signs. It is designed to stand independently so that a user can make signs using two hands. When the device is not in a situation in which it can stand independently, like in a car, it can be worn on the arm to reduce the annoyance of
A Mobile Communication System Designed for the Hearing-Impaired
419
having to hold the phone. Threaded SMS is applied to provide fluent conversation with a particular person without the disturbance of other incoming messages. Digital memo is added to help users with short and instant face-to-face communication, for which the interface is designed to clearly and easily present a memo by rotating the text as per the open angle of the upper screen. A Mobile Communication Device
Front
Back
Initial Display
Remote Interpretation Service
Interpretation Call
Threaded SMS Digital Memo
Sign-Language Video Call
Fig. 5. Form factors and interfaces of the mobile communication device
4.3 The Additional Devices The final proposed system adds a vibrator and a cradle to a mobile communication device which provides the major communicational aids explained above (Figure 6). The vibrator and cradle work together to help mitigate additional daily problems of the hearing-impaired. The vibrator is designed to be worn on the wrist in order to prevent the missing of incoming call and message signals through wireless communication. It is also assembled with mobile device to be used without holding the device when a user uses long time sign language communication. The cradle, in which the device can be docked, supports the needs of the hearing-impaired by allowing widescreen, hands-free viewing at the home or office, as well as charging of the device's battery and providing a secure and stable location for device storage. The cradle can also act as a house door bell, which is but another unfilled daily need of Korea's hearing-impaired.
420
J. Song and S. Yang
Fig. 6. The proposed communication system modules and functions
5 Conclusion Computing system design is part of an ongoing cycle in which new technologies create opportunities for humans. Technology design, considering the practical needs of the disabled, can provide new opportunities for these people and improve their quality of life. From the design study case, we discovered that limited sensitivity and ability can alter the overall executive actions of the disabled in their interaction with a device. For conspicuous improvement in technology usability for the disabled, serious efforts are vital for the provision of appropriate executive actions and to better suit disabled citizens' unique behavioral patterns on the basis of deepening understanding of the daily problems they face. In this design study, a concrete design approach to improving the overall communication methods and abilities of the hearing-impaired can help them to achieve their communication goals in face-to-face and distance communication. The interaction design, directly addressing each communication method commonly used by the hearing-impaired, is based solely on their unique behavioral patterns, and poses to improve their overall quality of communication activities and life.
A Mobile Communication System Designed for the Hearing-Impaired
421
References 1. Carey, K., Gracia, R., Power, C., Petrie, H., Carmien, S.: Determining accessibility needs through user goals. In: Proceedings of the 12th International Conference on HumanComputer Interaction, 4th International Conference on Universal Access in HumanComputer Interaction, pp. 28–35. Lawrence Erlbaum Associates, Mahwah (2007) 2. Norman, D.: The design of Everyday Things. MIT Press, Cambridge (1990) 3. Norman, D.: Cognitive Engineering. In: Norman, D., Draper, S. (eds.) User centered system design: New perspectives on Human-Computer Interaction, pp. 31–61. Lawrence Erlbaum Associates, Inc., New Jersey (1986) 4. Pullin, G., Newell, A.: Focussing on Extra-Ordinary Users. In: Proceedings of the 12th International Conference on Human-Computer Interaction, 4th International Conference on Universal Access in Human-Computer Interaction, Part1, pp. 253–262. Lawrence Erlbaum Associates, Mahwah (2007) 5. Rosson, M.B., Carroll, J.M.: Usability Engineering: Scenario-Based Development of Human-Computer Interaction. In: The Morgan Kaufmann Series in Interactive Technologies. Moran Kaufmann, San Francisco (2001)
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly Wang-Chin Tsai and Chang-Franw Lee Graduate School of Design, National Yunlin University of Science and Technology 123, University Road Section 3, Touliu, Yunlin, 64002, Taiwan, R.O.C. {g9330802,leecf}@yuntech.edu.tw
Abstract. Small touch screens are widely used in applications such as bank ATMs, point-of-sale terminals, ticket vending machines, facsimiles, and home automation in the daily life. It is intuition-oriented and easy to operate. There are a lot of elements that affect the small screen touch performance. One of the essential parts is icon feedback. However, to merely achieve beautiful icon feedback appearance and create interesting interaction experience, many interface designers ignore the real user needs. It is critical for them to trade off the icon feedback type associated with the different users needs in the touch interaction. This is especially important when the user capability is very limited. This paper described a pilot study for identifying factors that determine the icon feedback usability on small touch screen in four older adult Cognitrone groups since current research aimed mostly at general icon guidelines and recommendations and failed to consider and define the specific needs of small touch screen interfaces for the elderly. In this paper, we presented a concept from the focus on human necessity and use a cognitive assessment tool, which is, Cognitrone test, to measure older adult’s attention and concentration capability and learn more about how to evaluate and design suitable small screen icon feedback types. Forty-five elder participants were participated. Each subject was asked to complete a battery of Cognitrone tests and divided into 4 groups. Each subject was also requested to perform a set of ‘continuous touch’ usability tasks on small touch screen and comment on open-ended questions. Results are discussed with respect to the perceptual and cognitive factors that influence older adults in the use of icon feedback on small touch screen. It showed significant associations between icon feedback performance and factors of attention and concentration. However, this interrelation was much stronger for the Group 2 and Group 4, especially for Type B, Type C and Type G. Moreover, consistent with previous research, older participants were less sensitive and required longer time to adapt to the high-detailed icon feedback. These results are discussed in terms of icon feedback design strategies for interface designers. Keywords: small touch screen, icon feedback, older adults, cognitrone style.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 422–431, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
423
1 Introduction Lately, “touch” becomes one of the buzzwords. In fact, for over a decade, touch screen technology and devices are in widespread use, from public systems such as self order and information kiosks to personal handheld devices like PDAs (Personal Digital assistants) or gaming devices. Generally speaking, interactions on touch sensitive screens is one of the most “direct” application forms of HCI (Human Computer Interaction), with information and control displayed on one surface. So to say, the zero displacement between input and output, control and feedback, hand movement and eye gaze makes touch screen an intuition-oriented tool for users, particularly for the novices [6]. Nonetheless, as this touch technology gains sophistication and its teething problems being worked out, small touch screen technology meets two limitations. First, the screen might be obscured by the user’s finger, hand, or arm. Second, it is difficult for users to point at targets within their finger width. Recently, some studies on thumb use recommended that 9.2mm is the most proper width for on-screen icons [2]. Below 9.2mm, users’ performance tends to degrade when they attempt to correctly select an icon on the screen with their thumb. Though the problem can be solved by applying other aids, such as a stylus or a cursor, the easy-to-operate characteristic of thumb-based screen touch no longer exists. Moreover, a practical designer may consider icons of 9.2mm too large and space occupying. Therefore, techniques like Offset Cursor and Shift are introduced to improve selection accuracy and to help users to refine their initial selection position. Originally designed for fingertip operation, Offset Cursor overcame digit occlusion by offsetting the cursor from the selection point while Shift achieved it by displaying an inset of selection region. However, both novel designs are of little significance in their adaptabilities to the altering needs of the elder users as their abilities decline because of aging. Known as the most frequently applied approach for human computer interface design, Nielsen’s outlines of the User-centered Paradigm (1993) were intended for homogeneous groups to test the users regarding to design decisions. Yet, the current interface development tools and methods neither meet the needs of diverse user groups, nor do they address the dynamic nature of diversity. As a result, there is an urgent need to implicit the issues of these shortcomings of the current approach as well as to search for new processes and practices. By its literal definition, touch screen operation is different from normal screen operation. Besides visual search, “touch” actions are involved during the interaction as well. That is to say, the main objective of interface designers is to create a highly user-friendly interface while confirming appropriate design concepts. With the view that older adults’ attention ability could be a strong impact factor to their feedback perception of the icon, this study aimed to investigate how icon feedback types affect diverse elderly users when they operate on small touch screen. By generalizing the older generation’s perceptions and performances of varied icon feedbacks on small touch screen, analyzing their preferences of different icon feedbacks and finally generating performances and differentiating advantages, the findings of this study served as a guide to icon feedback design of more user- friendly small touch screen.
424
W.-C. Tsai and C.-F. Lee
2 Literature Review Research on the use of alternative feedback modalities focused primarily on the use of single feedback, while comparatively few studies examined and compared different visual icon feedback combination on small touch screens. As Leonard et al. [4] pointed out in 2006 that additional research is needed to examine specific combined icon feedbacks and their usability for the elderly with varied physical and psychological conditions. Although passive touch screens are intuition-oriented and easy-to-learn, there are several limitations about the precision that users have to overcome in the interaction. First of all, for touch screens, finger pointing selection of rather small objects and specification of smaller targets may be difficult and critical in effective selections. Second, for interaction on small touch screens, complications may occur, due to occlusion, imprecision in selection, poor calibration, or parallax errors caused by the offset between the display surface and the overlaid touch surface. Third, for touch screen interaction, which is different from the use of mouse, no analogue is involved. Unlike mouse users who can move the mouse pointer over screen elements, get feedback from the selected elements such highlighting as well as confirm their selection by clicking mouse buttons, touch screen users point on screen elements directly and immediately initiate an action which might not be able to canceled afterwards. Fourth, touch screen interaction is characterized by the user’s habits and characteristics. In other words, it is a procedure requiring crucial cognitive skills such as concentration, coordinated reactions, excellent judgment together with decision-making capabilities to avoid mistaken manipulation. Finally, owing to the fact of physical and intellectual declinations of human aging, older adults face more difficulties in small touch screen operation which is intended majorly for the younger users. For instance, when operating with complicated interfaces, it may be hard for the elderly to press minute buttons and detect icon feedbacks because of their varied attention capabilities and habits formulated [3]. Hence, indications whether an action is possible or not have to come along with the icon feedbacks. Likewise, static activation takes more of the elder users’ attention for clarifications and action-receiving checks. To sum up, despite the fact that relatively few attempts were made, discussions and studies of icon feedback types are of essential importance. Meanwhile, for small touch screen assessments, direct applications of icon feedback are required, especially those in terms of the estimated numbers of potentially-excluded population and potentially-challenged population among various target population groups.
3 Methodology 3.1 Participants Forty-five volunteers ranging from the age of fifty-five to seventy-three years old participated in this study. Among them, twenty-eight were female and seventeen were
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
425
male. The mean age is 67.6 years old. Compensation for participants in this study included free comprehensive cognitrone tests and souvenir. Participants were randomly selected in Taichung City. 3.2 Cognitrone Test The Cognitrone (COG) Test is a general performance test regarding to attention and concentration measurement and analysis, which consists two basic concepts. First, its stimulus materials are composed of stick figures and require paticipants’ judgment to decide identical pictures. Second, the Cognitrone test is used for measuring executive functionings such as decision or judgment-making on a person’s receptive response to minute changes. Moreover, the COG test is applied in predicting concentration levels and attention spans, which are essential to underground work skills. With a set time limit, participants are asked to accomplish tasks which are not intellectuallydemanding, with possible speed and accuracy.
Fig. 1. The Cognitrone test introduction
During the test, participants have to compare the cognitrone of figures. Altogether five pictures are presented on the screen in which four pictures in one line with the fifth picture below them. Participants have to determine whether it is an exact match or not by pressing two different colors of buttons on the response panel. Green buttons are for exact matches while red buttons are for inexact matches. Then, complied measurements describing the subject’s performance of speed, accuracy and consistence will be processed and calculated by a scoring program. In regards with the time limit, usually the COG test gives an unlimited completion time. However, it is suggested that the ideal time for completion is between five to ten minutes. Any time longer than ten minutes is considered a reflective of a concentration deficit. Furthermore, the reliability of the COG test is considerably high, which is above r=9.5, with a number of validity tests carried out. Meantime, although different versions of COG test are available, the S11 Version is the most suitable for the use of Taiwanese participants because the S11 Version was developed with relevant and applicable norms nation-wide. In this session, participants were divided into four COG groups, each with different analysis result and different focuses on a major trait of attention, as illustrated in Table 1 below.
426
W.-C. Tsai and C.-F. Lee Table 1. Arrangements: participants group the number of the participants item
age
gender
Group characteristic
Group1(n=11)
62.0 (3.4)
M=2 F=9
Accurate-fast
Group2(n=15)
67.3 (5.4)
M=10 F=5
Accurate-slow
Group3(n=12)
67.2 (5.1)
M=5 F=7
Inaccurate-fast
Group4(n=7)
66.8 (5.5)
M=0 F=7
Inaccurate-slow
3.3 Materials and Experimental Design The interface platform used in this study was an ASUS MyPal A730W compatible PDA. Participants were seated approximately thirty centimeters from the screen display. Simulated screen resolution was set at 1024 x 768 pixels, with a 24-bit color setting. To accomplish continuous-touch tasks, participants were requested to perform a series of random input of ten-digit telephone number. In the meantime, by using the
Table 2. icon feedback example screen shot and experiment scene type A
Feedback Form Description Movement: The position of the icon will gradually move after icon is touched
B C
Color :The color of the icon will change after icon is touched Magnify: The shape of the icon will change after icon is touched Movement+Magnify: The combined
D
feedback of the icon will apply after icon is touched Movement+Color: The combined
E
feedback of the icon will apply after icon is touched Color+ Magnify: The combined
F
feedback of the icon will apply after icon is touched Movement+ Color+ Magnify The
G
combined feedback of the icon will apply after icon is touched
Presentation
interface platform
A Study on the Icon Feedback Types of Small Touch Screen for the Elderly
427
Flash X programming language, a group of icon feedback presentations was developed in this study. Also, based on the findings of related research, an average icon size of 6 mm was adopted in this experiment. Finally, this study employed a 7 x 4 factorial design with seven feedback modality conditions among the four groups of participants illustrated in Table 1. In addition, two measurements of efficiency were used to assess participants’ performance. One was the total time for completion, which was measured in seconds and the other was the frequency of errors of missing or wrong. Both measurements focused on interrelated components of the continuous touch task, which were influenced mostly by the user’s response to the icon feedback on small touch screen. 3.4 Procedures Before the experiment, the participants were briefed about the rules and the purpose of the experiment and were requested to fill in their personal information such as their age, gender, and education. In the test session, in order to get accustomed to the interface, the participants were asked to make a simple trial before the start of each task. As the task began, the participants were asked to touch the icon from the program instruction, which adopted progressive interaction in the experiment interface. After that, to complete, the participants had to touch every icon on the touch screen and complete the ten-digit telephone number trials as shown in Table 2. During the trials, the participants experienced all of the seven icon feedback types, which are Type A, Type B, Type C, Type D, Type E, Type F, and Type G. Also, respectively they perceived various usabilities from each icon feedback. At last, after the screen touch tasks being completed, the participants were encouraged to comment on some open-ended questions if there were any aspects for adjustment, improvement or further explanation or if there were any favored features or disliked features. All materials were presented to the participants in Chinese and for the purpose of this paper, all items and questions were translated into English. 3.5 Data Analysis For the analysis of the data, this study applied Analyses of Variance (ANOVA) to examine significant differences of the task performance of feedback conditions within each cognitrone group. When each cognitrone group operated the icon feedback of Type A, Type B, Type C, Type D, Type E, Type F and Type G, the one-way ANOVA analyzed the data. In addition, the significant differences were analyzed by utilizing the Scheffe Method as the post hoc test for multiple comparisons. Significance was accepted at the level of p 0.05). By Block 10, the significance of the differences in means between the Visuals On/TTS (M = 6546, SD = 3064) and Visuals On/Spearcons + TTS (M = 7061, SD = 3408) conditions in Block 10 was very small. It is also clear from Figure 2, that even though the differences between the conditions using auditory-only and auditory with visual cues in Block 10 are significant, there is much less of a difference between the auditory only and visual conditions than existed in the first block of the experiment. Figure 3 illustrates the mean time to target for the five categories in the first and tenth blocks. 25000
Visuals Off/Spearcons + TTS Visuals Off/TTS
20000
Visuals On/Spearcons + TTS Visuals On/No Sound
15000
Visuals On/TTS
10000
5000
1
2
3
4
5
6
7
8
9
10
Block Number
Fig. 2. Mean time to target in milliseconds for all conditions over all blocks
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
451
Fig. 3. Mean time to target in milliseconds for all conditions in Blocks 1 & 10. Error bars are 95% confidence intervals.
Collapsing across audio cue types, conditions with the visuals on were significantly faster than visuals off, in both Block 1,F(1, 2197) = 661.269, p < 0.001, and Block 10, F(1, 2197) = 348.079, p < 0.001. Considering the different audio cue types (TTS vs. spearcon+TTS), the spearcons cues led to slower times in Block 1, F(1,2197) = 9.539, p = 0.002, but the effect diminished quickly over the first few blocks, and no significant difference was found amongst the sound conditions for Block 10 (p > .05). 3.2 Subjective Ratings The participants gave scores on five dimensions (i.e. helpfulness, distraction level, preference over silence, fun and annoyance level) by providing agreement or disagreement responses on a Likert scale. The scores were also aggregated into an overall preference score for each participant. The means across all participants for each condition and audio cue are summarized in Figure 4. Overall, there was no significant difference in preference for spearcons and TTS, F(1,18) = 3.319, p = 0.071. However, a t-test comparing visuals on and visuals off conditions demonstrated that both audio cues were significantly better rated when no visuals were provided, t(106) = 6.706, p < 0.001. The TTS sounds were given significantly higher rankings when they were accompanied by spearcons than when they were not, in both the visuals on condition, t(33) = -2.234, p = 0.032, and visuals off condition, t(33)= -3.181, p = 0.004. That is, simply adding spearcons seemed to lead to higher ratings of the TTS, with no performance difference after a few blocks of practice.
452
B.N. Walker and A. Kogan
Fig. 4. Mean aggregate subjective preference scores, 5 being the highest possible score. TTS is given higher scores in the presence of spearcons.
4 Discussion The performance results confirm many of the findings in the study by Palladino & Walker [1], allowing us to generalize the utility of spearcons as part of auditory menus from the desktop to the mobile phone. Conditions with visual cues led to faster responses, as compared to conditions with only auditory cues. This is understandable, given that the visual list allows for fast look-ahead. With the visuals on, the type of audio cues did not matter. That is, adding spearcons did not negatively impact performance, even though the spearcons add approximately half a second to each audio cue. In fact, even the silent (visuals only) condition was no different from the TTS and spearcons conditions, when the visual list was presented. It is likely the case that with the visuals on participants are moving through the list about as fast as possible by relying largely on the visual interface. Practice does not have much of an impact, supporting the interpretation that this is a highly practiced task. Adding the audio at least does not slow down performance when the visuals are on. When the visuals are off, overall performance was slower than when visuals were on (see the top lines in Figure 2). However, with a little practice, performance in the audio-only conditions improved, and closed in on the conditions with visuals on (see the narrowing of the gap between the top lines and the bottom three lines, in Figure 2, from Block 1 across to Block 10). This bodes well for the use of auditory menus, even for users with little or no experience with audio-only interfaces. Within the pair of audio-only conditions, it is interesting to note that TTS-alone initially led to faster performance than spearcons+TTS, but this difference went away by Block 10. In Block 1, it is likely the case that because the spearcons were prepended to the TTS for each item, participants took the time to listen to both cues
Spearcon Performance and Preference for Auditory Menus on a Mobile Phone
453
before making a selection, rather than focusing strictly on the spearcon to take advantage of its cuing capability. From the open-ended comments from participants, it appeared that they would hold down the arrow key to scroll quickly to the necessary item, then listen to the entire auditory cue and make the selection as needed. This showed that they made very little use of the auditory cue and relied mainly on their recollection of the alphabetical list organization. This would explain why a previous study by Palladino & Walker [10] showed a significant difference in the spearcons and TTS conditions while testing shallow two-dimensional menus. In that study, participants needed to listen to each menu item before proceeding to the next, since they could not predict what was coming. It was not beneficial for them to hold down the arrow key each time as they did in the present study with a deeper menu structure, as that would lead them to miss the necessary cue. However, as they became more familiar with both the list and the audio cues, participants here relied on the spearcons more. We know this because the overall performance times were comparable in the spearcons+TTS and TT-only conditions. That is, if they listened to, say, 1000 ms of audio for each menu item, then in the spearcons case this means about 250 ms of spearcon, 250 ms of silence, and 500 ms of TTS. Without the spearcon, this means 1000 ms of TTS. Thus, with practice, listeners came to make item selection decisions without listening to very much of the TTS phrases. Indeed, spearcons contribute a lot to performance of the navigation task. The preference questionnaire demonstrated the positive reception of auditory cues in the absence of visual cues, as both spearcons and TTS were rated positively in the no-visual condition. This shows that, in a setting where users must rely on sounds to complete a task, they are inclined to feel good toward the sounds given, regardless of format. However, when they can rely on the visual sense to guide them, they prefer not to hear any audio and may even be annoyed by the sound. Given that there were no performance differences in the three visuals on conditions (silent, TTS only, and spearcons+TTS) it is instructive to consider the subjective ratings as well as the performance measures. Taken together, then, it is clear that users must be provided with the option to turn off audio when visuals are available, and turn it on only when it is perceived as desired and/or necessary. One additional caveat is that the audio quality needs to be optimized. Several participants commented on having trouble deciphering the audio cues for both spearcons and TTS. It is be important not to discount the interaction modality as a whole, simply due to a less-than-optimal implementation. While we are confident that the sounds here were generally acceptable and intelligible, the TTS could certainly be produced with higher quality algorithms. This would also improve the quality of the spearcons, since they are derived from the TTS sound files. The general receptiveness of listeners to audio cues to aid navigation in a no-visual context supports further research into auditory menu design and deployment. In particular, it would be interesting to test how spearcons are perceived in a twodimensional menu study, where they have shown improved performance over TTS alone. That is, what happens when both the preference and performance cues support spearcon use? Most interestingly, although preference ratings for TTS were consistently higher than spearcons, the TTS ratings were even higher in the presence of spearcons. That is, adding spearcons to TTS seemed to enhance the ratings of the TTS. It is possible that listeners considered the spearcons+TTS menus to be more sophisticated or perhaps interesting, and this was rated as preferable. This has great implications for
454
B.N. Walker and A. Kogan
designing with spearcons. While not harming overall user performance, spearcons can provide another layer to the user experience of audio menu navigation, one that encourages positive receptiveness to a new system.
5 Future Work Future studies are focusing on the use of spearcons in audio-dependent contexts, where the participants cannot devote their full attention to the visual cue. In particular, we will be looking at task performance while a participant is simultaneously working on a visually and cognitively distracting task. This will be tested both in a desk setting and in a mobile one, where the user is walking on a designated route. We will be looking for effects on performance as well as subjective preference feedback from those involved. And, of course, we are extending these studies to participants with vision impairments, as they will be the primary users of (non-visual) advanced auditory menus, enhanced with whatever cues make the interfaces more effective and more pleasing to use.
References 1. Palladino, D., Walker, B.N.: Efficiency of spearcon-enhanced navigation of onedimensional electronic menus. In: Proceedings of the International Conference on Auditory Display (ICAD 2008), Paris, France (2008) 2. Nees, M.A., Walker, B.N.: Auditory Interfaces and Sonification. In: Stephanidis, C. (ed.) The Universal Access Handbook, pp. TBD. Lawrence Erlbaum Associates, New York (in press) 3. Gaver, W.W.: Auditory Icons: Using Sound in Computer Interfaces. In: Human-Computer Interaction, vol. 2, pp. 167–177 (1986) 4. Blattner, M.M., Sumikawa, D.A., Greenberg, R.M.: Earcons and icons: Their Structure and Common Design Principles. In: Human-Computer Interaction, vol. 4, pp. 11–44 (1989) 5. Walker, B.N., Nance, A., Lindsay, J.: Spearcons: Speech-based Earcons Improve Navigation Performance in Auditory Menus. In: Proceedings of the International Conference on Auditory Display (ICAD 2006), London, England, pp. 63–68 (2006) 6. Palladino, D., Walker, B.N.: Learning rates for auditory menus enhanced with spearcons versus earcons. In: Proceedings of the International Conference on Auditory Display (ICAD 2007), Montreal, Canada, pp. 274–279 (2007) 7. Asakawa, C., Takagi, H., Ino, S., Ifukube, T.: Maximum Listening Speeds for the Blind. In: Proceedings of the International Conference on Auditory Display (ICAD 2003), Boston, MA (2003) 8. Leplatre, G., Brewster, S.: Designing Non-Speech Sounds to Support Navigation in Mobile Phone Menus. In: Proceedings of the International Conference for Auditory Display (ICAD 2000), Atlanta, GA, pp. 190–199 (2000) 9. Hereford, J., Winn, W.: Non-Speech Sound in Human-Computer Interaction: A Review and Design Guidelines. Journal of Educational Computing Research 11, 211–233 (1994) 10. Palladino, D., Walker, B.N.: Navigation efficiency of two dimensional auditory menus using spearcon enhancements. In: Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society (HFES 2008), New York, NY, September 22-26 (2008)
Design and Evaluation of Innovative Chord Input for Mobile Phones Fong-Gong Wu, Chia-Wei Chang, and Chien-Hsu Chen Department of Industrial Design, National Cheng Kung University, 1 University Rd., Tainan, Taiwan 70101, Taiwan
[email protected],
[email protected],
[email protected] Abstract. Text message is one of the most popular functions of mobile phones, apart from talking through the phone. This study focuses on how chord input is being used on mobile phones, as well as operating phones with chord input. We propose two new mobile phones: Tri-joint key and Four-corner key, which combines with the chord input and the natural finger localization. There were 14 male participants and 6 female participants that participated in this research; after 9 days of practice with the content of numerals, English characters and English phrases. The result shows the performances of the participants have increased, including the speed of completing tasks and accuracy. There is no significant difference between these two new styles of phones and the ordinary type concerning the user satisfaction chart. This also means users could accept new kinds of input devices. Keywords: mobile phones, keyboard, chord input, input device, innovation.
1 Introduction Mobile phones have been in our lives for the past decades, however the design of its buttons have barely changed. In terms of English alphabets, the 26 alphabets and other numerals have to be fitted into 12 buttons for input purposes. Therefore a button is used for the input of three or four alphabets by pressing the button multiple times. This input method is inefficient and causes burden on your muscle and bones. Different systems for different languages furthermore use different input methods on the buttons. People want and need a bigger display screen. If the buttons occupy a big area of the mobile phone, then the size of the screen will be limited. Minimizing the buttons will cause inconvenience for input purposes. The need to increase its function and decrease its size of an electronic product will cause point of pain in the field of Ergonomics [1]. In terms of electronic products, the user interface can be divided into two types: GUI (Graphic User Interface) and SUI (Solid User Interface). SUI emphasizes on the importance of the control, signs, buttons and knobs; whether they fulfill the size, position, sense of sight, hearing and touch in Ergonomics. Due to the increased usage of mobile electronic products, and yet we are still using the traditional input C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 455–463, 2009. © Springer-Verlag Berlin Heidelberg 2009
456
F.-G. Wu, C.-W. Chang, and C.-H. Chen
keyboard. There are many keyboards that are defined this way; the size of the mobile cannot be decreased which in result is not suitable for mobile products. Using two of our hands on a keyboard has also further caused injury on one’s finger, wrist and forearm [2]. Baumann and Thomas also pointed out that most electronic products require multifunctional purposes; the idea of one button for one function is not viable anymore [3]. This will lead to a large number of buttons occupying a large area of the interface, raising the production cost and increasing the burden of the user. The rate of error will also increase. The buttons of a mobile phone is an important factor influencing the control of the phone; if the phone is placed on the palm of the user, it will efficiently enhance the posture and increase stability. Fitts’ Law [4] is also an important law to refer to when designing buttons. This has been used as the standard of measuring accuracy and speed [5]. There were many scholars that used this Law to investigate the results of this influence [6][7][8][9][10][11]. Various results show the distance in between will influence the time it takes to key in and this is a very valuable principle that should be referred to when designing or evaluating keyboards. This is more relevant when the efficiency is the criterion that is investigated. Chord keyboard integrates and lessens the buttons on the keyboard, thereby decreasing the movement of the user’s hand. This will lessen the burden and improve working posture, ultimately lessen the harm accumulated through operating on the keyboard [2]. This study investigates the popular input device of modern mobile phones and proposes possible innovative designs for text and number input. The focus of the design will emphasize on integrating chord input onto the design of mobile phone buttons and combining natural finger positioning. The main purposes of the study are as follows: 1. Through literature review and discussions with professionals, propose new input methods for chord input mobile phones. 2. Study current user behavior of mobile phones and base it as the principles of the new mobile phone. 3. Produce prototypes of the new mobile phone chord input design. 4. Learning curve of users using the new chord input and input efficiency comparison. 5. Propose suggestions for mobile phone chord input design.
2 Method and Evaluation This study consists mainly of three stages: observe current user behavior, design new conceptual product and verify the comparison experiment. The process will be supported by statistical analysis and user questionnaires and complete evaluation of results. In order to control the result of the experiment, the users are required to familiarize with the experiment equipments under a certain amount of time until they are under the same level of understanding with the equipment. In order to understand the psychological level of the users, continuous scale is used to undergo psychological choosing task. According to RPE (Rating of Perceived Exertion) proposed by Borg [12], using satisfaction estimation can evaluate the burden of the hand. As users are burdened when they are operating, the discomfort will increase accordingly.
Design and Evaluation of Innovative Chord Input for Mobile Phones
457
2.1 Observation Before the formal experiment, observation is made on the market and the natural holding posture of the hand; this is used as the reference of the new input design. The participants are 17 design-related students of undergraduate level or higher; 11 male and 6 female. No handicap on their hand and serious injury. Most of the participants are right-handed and have a long history of using mobile phones. The task is to input a paragraph consisting of 50 English words and 213 characters. Every alphabet occurs more than three times. After the inputting task, the subject fills in a subjective measurement that consists of performance estimation, comfort level and psychological satisfaction. According to the observation results, the 17 users use both hands for support and their thumb for input. Three users use one hand for support and their thumb for input. One user uses one hand for support and the thumb of the other hand for input. The result shows that most users like to use both hands for support and input via their thumb. Furthermore, in terms of using both hands for support, the four fingers behind the mobile phone can be categorized as three types: four fingers curled but not crossed, four crossed fingers and the forefinger locking the upper section of the mobile phone forming a “C” form. Looking at the operation behavior of the available mobile phones, most users use their thumb for input because it has a higher mobility. The other four fingers are used for supporting the mobile phone. This will cause imbalance of the fingers and burden the thumb, causing muscle ache and sour muscles. While using chord input, it will increase the efficiency and the burden between the fingers. 2.2 Design Development The first phase of the experiment observes existing mobile phone operation behaviors and designs eight different prototypes for different holding postures (Table 1). Table 1. Eight different prototype models
1.
2.
3.
4.
5.
6.
7.
8.
458
F.-G. Wu, C.-W. Chang, and C.-H. Chen
The prototypes are designed for a right-handed person and divide the buttons as groups for the input of their five fingers. The burden of the muscles is divided onto all the fingers in accordance to the different fingers. The burden on the ring finger and pinkie is lessened. This is referenced from the previous study as the most efficient natural gesture and finger positioning [13]. Evaluation of the holding posture and operation is made on the eight prototypes. A total of 10 participants consisting of 8 male and 2 female; all of them having an undergraduate level of education and more than three years of experience in using a mobile phone. The highest scored prototype is number 1, a finer modeling is made on this prototype and design on the placement of the buttons is based on the position of the buttons on this prototype. 2.3 Evaluating Chord Input of Mobile Phones According to the design principles of existing buttons, the numeral buttons are spread on the left hand-side of the mobile phone which is considered as grouped buttons. There are sub-buttons in grouped buttons; these buttons are operated by the four fingers other than the thumb. The function button is place on the right hand-side of the mobile phone that is operated by the thumb. The button itself has four sub-buttons that are used during input. Pressing different buttons will select different combinations to produce different numerals. Base on the above design, two prototypes are designed: Tri-joint and Four-corner (Table 2). Table 2. Two designs of chord input for a mobile phone Type
Three-view Drawing
Button placement
Tri-Joint
Four-Corner
Chord input is not applied in the input of numerals. In the input mode of English characters, the thumb operates on the function button and number buttons to select the characters desired through chord input. The alphabets use the placement of current mobile phones (Table 3).
Design and Evaluation of Innovative Chord Input for Mobile Phones
459
Table 3. Corresponding chart of chord input 1 A B C D
2 a b c
3 d e f
4 g h i
5 j k l
6 m n o
7 p q r s
8 t u v
9 w x y z
0
2.4 Design Mockup Tri-joint and Four-corner prototypes (Table 4), all sizes are referred from existing mobile phones. Length*width*height is 100mm*45mm*16mm. Table 4. Information about two prototypes Type
Prototype view and operating posture
Tri-joint
Four-corner
2.5 Participants 20 participants are chosen, aged between 20-30 years old. The average age is 25.05 years old. They consist of 14 male and 6 female, all having a higher educational degree of undergraduate level. All participants have more than 3 years of experience in using mobile phones and sending English text messages. There is not any hand injury at the time of participating in the experiment. They were willing to participate for the ten day course of the experiment. 2.6 Learning Sections The formal experiment required the participants to input characters and numbers as part of their learning; which included numerals and English words and short sentences. The rate in which the numbers occurred is the same as the English characters. The experiment also recorded time and the error rate. The learning period of the experiment is nine days. All of the nine days included numeral input and English
460
F.-G. Wu, C.-W. Chang, and C.-H. Chen
characters are added starting from the fourth day, and English short sentences are added on the seventh day. 2.7 Final Tasks After all the participants have completed the learning stage and begins the formal experiment. They begin on two tasks: numeral and English characters input. The questions are displayed on the computer screen in the form of slides, each slide including 5 questions, each question including 5 words. Every task includes 3 slides. In terms of numerals, each question has five numbers, each slide has 5 questions and each number on a slide is displayed randomly. Each task includes five questions, which is 75 numbers. The English character includes 15 English words consisting of five characters, displayed on three slides and every alphabet will be displayed more than twice. Every mobile phone is tested first on numerals followed by English characters. In the formal experiment, the subject completes one usability questionnaire and evaluate according to the efficiency, physical comfort and psychological comfort of mobile phones. The numeral and English characters are discussed separately.
3 Result The experiment measures the time taken and the number of errors occurred that are transformed to input speed (CPM) and accuracy (%) for further evaluation and analysis. 3.1 Results of Numeral Part in the Final Tasks One can see from the figure that the fastest input rate is an ordinary mobile phone, 82.07 characters per minute. This is followed by Tri-joint of 66.73 characters per minute and lastly Four-corner of 51.43 characters per minute. The correctness is also the ordinary mobile phone leading the other two new mobile phones. The correctness of an ordinary mobile phone is 99.33% compared to Tri-joint’s 98.07% and Fourcorner’s 97.33%. Compared with the achievement rate of the current mobile phones; Tri-joint can reach 81.31% of the current mobile phone’s rate, Four-corner can reach 62.67% of the current mobile phone’s rate. In terms of correctness, Tri-joint can reach 98.73% of current mobile phones and Four-corner reaching 97.99% (Table 5). Table 5. Formal experiment numerals statistical numbers
Speed (CPM)
Accuracy (%)
Group Tri-joint Four-corner normal mobile phone Tri-joint
Mean 66.73 51.43 82.07 98.07
SD 16.00 9.26 19.46 20.65
Achievement rate 81.31% 62.67%
Four-corner normal mobile phone
97.33 99.33
23.78 12.69
97.99%
98.73%
Design and Evaluation of Innovative Chord Input for Mobile Phones
461
The Homogeneity Tests before undergoing ANOVA analysis, the P value of numbers in terms of input speed and accuracy is 0.080 and 0.059 respectively, bother larger than 0.05. Therefore an ANOVA analysis is proceeded and the result shows significant differences between the three mobile phones both on the speed [F(2, 57) =19.539, P 0. Thus if the virtual image is dark itself and the background is bright, ri may become not noticeable anymore. To cope with this problem, the operator is rewritten to:
ri = k v * vi − k b * bi
(1)
Here both parameter k are made independent and can also be greater than 1. Adequate values found are 1.15 for kv and 0.25 for kb. Yet optimal values can vary dependent from virtual scene and environment. The difficulty of this approach lies in finding good parameter k that matchs not only one single setup of virtual objects and surrounding but many. In the experiments of this work no suitable parameter k are found that are robust to all setups.
Colorimetric and Photometric Compensation for Optical See-Through Displays
609
Fig. 5. This image shows the weakness of Trivial Compensation: Parts of the face disappear due to the brighter background
The simple smooth compensation arisen from the experiences with the former methods fulfill several demands on a compensation algorithm: It has to fulfill the four quality criteria and additionally must have smooth transitions between its adjustments to all parameters. Thus a new formula is created as fraction depending on the background value combined with the maximum intensity value max. This results in a smooth transition factor s which served as multiplier for the virtual part. max si = bi + max (2) ri = si * vi This compensation formula increases linearly with vi and due to s decreases smoothly with rising bi. By this behavior higher visual comfort for the user is achieved. Because of this smoothness the compensation of the background is weaker compared to e.g. Subtraction Compensation. Virtual luminance and contrast are preserved with reasonable success. The complexity of the formula is as about as simple as in the former algorithms One disadvantage of the used equation is that the pixel value is always decreased. Another point is that s is independent of the virtual part as vi is not included in the formula, thus there is no additional scaling with respect to vi. The Simple Smooth Compensation method already achieves reasonable results, but has the main weakness in not including the virtual part in s. This is changed with the advanced version as the new formula is a fraction of sums of differently weighted virtual and background values, resulting in a more complex s: si =
c * (max − vi ) + bi (max − vi ) + d * bi
(3)
ri = si * vi
c and d are weights used to adjust the slope of s. In this work c = 1.4 and d = 2.0 are found appropriate values. For c > 1 the factor s can exceed 1 which is an intended increase of the virtual part. By inverting the virtual value in s at (max - vi) the slope of the function decreases for higher values of vi, which is a desired effect. This compensation responds smoothly to changing v and b both, which achieves good visual comfort for the user. The main advantage of this operator is its robustness
610
C. Weiland, A.-K. Braun, and W. Heiden
as it responds smoothly to the environment luminance and scales with the virtual pixel intensity vi. Nevertheless to achieve best results the parameters c and d can be adjusted to a specific setup. For c = 1.4 and d = 2.0 virtual luminance and contrast are preserved well except for extreme values of vi and bi. Despite its complexity the formula does not significantly slow down the system compared to the former algorithms. Weaknesses originating from the formula appear at the minimum and maximum values. For b ≈ 0 the factor s reaches its peak c, which leads to a increase of the virtual part for a c > 1. Experiments showed that even through this raised virtual value the falsification is unnoticed during use and virtual contrast is preserved. Nevertheless for high-luminance virtual images there is the possibility for contrast loss. For b → max s reaches 1/d which results in a reduction of r. To avoid exaggerated darkening of the image d should be chosen between 1 and approximately 4.
4 Results The colorimetric compensation system described in this paper revises the visual part of an image to the adapted eye and thus improves visual quality for the user. The delay occurring from updating the ring buffer effects the time function only on slower machines as it is linked to the frame rate. Future versions may couple it instead to the passed time made independent from the kernel size which allows application of large kernel without time slowdown. This acceleration method is a major key to realtime. The exaggeration of edges in the image coming from the square form of the box filter is barely visible to the user. Irawan's model of adaptation over time [3] could not be reconstructed completely, but from its form a similar own model is derived. One disadvantage of all time adaptation models is a small frame-rate dependent error. The different photometric compensation operators introduced differ in quality of compensation for different virtual contrast ranges, level of background luminance and applicability. Trivial Compensation is a basic approach that can work for video projectors [1], but can cause severe image information loss for see-through displays. Especially for bright backgrounds parts of the virtual content can vanish. Though it shares form of the equation with Trivial Compensation, Subtraction Compensation gives the best results and satisfies all quality criteria well. On the contrary it has to be configured for a specific setup. Virtual objects and scene luminance are supposed not to change much as otherwise quality decreases. Therefore Subtraction Compensation is the best choice for setups with known and limited content. Figure 6 shows the difference between a compensated and an uncompensated image. In the uncompensated image (on the left side) the colored pattern is visible all over the face region and the grid shines through the image. Especially the yellow field behind the mouth and the light red field behind the left cheek appear disturbing. Subtraction Compensation is applied to the same virtual image and shown on the right side. One major compensation effect can be seen in the forehead, eye and cheek region where the background grid is no longer apparent. The colored background fields are also barely visible anymore. These regions are all bright and there the compensation operator functions best. In the mouth region virtual luminance is lower and the background becomes more visible. For the bright yellow field at the mouth red
Colorimetric and Photometric Compensation for Optical See-Through Displays
611
Fig. 6. Images captured by a webcam through the see-through display. Left is the uncompensated image, the right one is compensated with Subtraction Compensation
and green (which are composite colors of yellow) are subtracted partially. Together with the strong blue shift of the webcam this causes the field to appear purplish. Another effect is a gain in the virtual image's contrast. Especially darker regions like the temples are enhanced. This effect originates in the fact that the compensation operator subtracts the background independent of the virtual pixel value and thus darker pixels are reduced more. Both Smooth Compensation operators preserve virtual luminance and contrast and compensate the background but not as strong as a scene-optimized Subtraction Compensation operator. Their advantage is that they function over a wide range of luminance levels and need no recalibration. In direct comparison the normal version is faster while the advanced version compensates better due to inclusion of the virtual value in the equation.
5 Conclusion For colorimetric compensation more measurements are needed to provide more sample values for interpolation. Usage of a luminance meter that can be configured for scotopic, mesopic and photopic range would lead to very exact pixel-to-luminance mapping and allows independent computation of rod and cone adaptation. Another task is the development of the hardware system. Integration of the camera into the see-through display could be implemented using a beam splitter. This improvement would not only increase weight of the system and thus wearing comfort, but also solve the problem that the viewing distance has to be known. Adaptation over time is another task for future research. More psychophysical experiments, models and equations for the time course are needed to improve the colorimetric compensation system. A special focus has to be laid on rapid luminance changes and their influence on later eye adaptation effects. This focus is necessary for a quickly changing environment as it may occur in real-time applications with a moving user. An advance is also possible by new time course equations which reduce the per-frame error.
612
C. Weiland, A.-K. Braun, and W. Heiden
For photometric compensation there are two directions of research. The first one is to develop further advanced per-pixel operators. Both smooth operators are expected to be good examples as they are the most robust operators tested. A second way is to evaluate effects and speed of operators that include neighborhood information or even information about the whole image. A strong decrease of speed is expected. Acceleration can not only be done with parallel processing, but also on the graphics processor. Its parallel shader units and the native vector operations have great capability to accelerate the system. This would also omit reading and writing from the GPU backbuffer. A distribution on several computers is possible as well, though the network speed is assumed as bottleneck. Communication has to be bidirectional. For colorimetric compensation scene information and environment luminance have to be send to the slave computers and the results read back. For photometric compensation part of the background has to be send to each slave as they need the background information for the compensation operator and the results read back. The transfer is supposed to slow down the system and limit the frame rate.
References 1. 2. 3.
4. 5.
6.
7. 8.
Bimber, O., Emmerling, A., Klemmer, T.: Embedded entertainment with smart projectors. Computer 38(1), 48–55 (2005) Ferwerda, J., Pattanaik, S., Shirley, P., Greenberg, D.: A Model of Visual Adaptation for Realistic Image Synthesis. ACM Transactions on Graphics, 249–258 (1996) Irawan, P., Ferwerda, J., Marschner, S.: Perceptually Based Tone Mapping of High Dynamic Range Image Streams. In: The Eurographics Association 2005: Eurographics Symposium on Rendering (2005) Jarosz, W.: Fast Image Convolutions. In: SIGGRAPH Workshop (2001) Kiyokawa, K., Kurata, Y., Ohno, H.: An optical see-through display for mutual occlusion of real and virtual environments. In: Proceedings. IEEE and ACM International Symposium on Augmented Reality 2000 (ISAR 2000) (2000) Ohlenburg, J., Braun, A., Broll, W.: Morgan: A Framework for Realizing Interactive Realtime AR and VR Applications. In: Proceedings of the Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS) at IEEE Virtual Reality 2008 (VR 2008) (2008) Pattanaik, S., Ferwerda, J., Fairchild, M.: A Multiscale Model of Adaptation and Spatial Vision for Realistic Image Display. In: Siggraph 1998 Conference Proceeding (1998) Stamminger, A., Scheeland, M., Seidel, H.-P.: Tone Reproduction for Interactive Walkthroughs. In: The Eurographics Association 2000 (2000)
A Proposal of New Interface Based on Natural Phenomena and so on (1) Toshiki Yamaoka1, Ichiro Hirata1, Akio Fujiwara1, Sachie Yamamoto1, Daijirou Yamaguchi2, Mayuko Yoshida2, and Rie Tutui1 1
Wakayama University, Faculty of Systems Engineering Sakaedani 930,Wakayama City, Japan
[email protected],
[email protected], {s5048,
[email protected]},
[email protected],
[email protected],
[email protected] Abstract. This study aimed at creating new user-interfaces based on natural phenomena, objects, accustomed manners and so on. At first, literature and script survey was conducted to get framework of the study. Next, 33 places like famous garden, castle, temple and so on in Japan were surveyed to get clue of creating new user-interfaces. New user-interfaces were created based on a lot of collected data. The three selected user-interface design were visualized to evaluate from the viewpoint of usability and emotion and so on. These userinterfaces were evaluated highly. Keywords: user-interface, natural phenomena, manners, behavior, observation.
1 Introduction Occasionally user-interface designers have created a new interface imaging or referring human or animal action, and natural existence like tree, flower and river and so on. So, imaging or referring them is very important for their designing. For an example, the user-interface of iPod was designed based on human action which is used in daily life. However, these user-interfaces based on human behavior and natural phenomena have not examined systematically. In the study new findings are discovered in any places from Hokkaido island (Northern island) to Okinawa island (Southern island) in Japan.
2 Literature and Script Survey A lot of literature was collected in order to survey Japanese manners, customs and design in Japanese architecture, garden and so on in association with user-interface. The findings of the survey were classified into 3 groups below. 1. Accustomed manners friendly a. dwelling b. utensil c. behavior, manners 2. Natural phenomena 3. Movement, behavior of plants and animals C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 613–620, 2009. © Springer-Verlag Berlin Heidelberg 2009
614
T. Yamaoka et al.
As next step, a lot of script were collected by discussing. The script means knowledge as to a series of behavior under a situation. The script were classified into 2 groups. 1. Daily script: cooking, bathing, taking a train, eat in restaurant and so on 2. Script without daily life: going to amusement park, staying at hotel and so on A new user-interface can be created by the above mentioned findings. A good example is shown below. Example: There is two kinds of slope on a hill of shrine in Japan. One is called as the Man slope which is steep, and another is called as the Woman slope which is gentle. It does not take much time to go over the Man slope. On the other hand, much time to go over the Woman slope. This means the slope was designed according to ability of a human. The fact suggests that some kinds of user-interface is needed according to user’s skill.
3 Field Survey and New User Interface Accustomed manners and objects in a field survey were collected. 33 surveyed places are as follows. 1. 2. 3. 4. 5.
Hokkaido: Asahiyama zoo Tokyo: famous park, garden, temple, shrine, department store and so on Kyoto: famous temple, shrine, castle, market and so on Kyuusyuu: famous garden, shrine, museum, castle Okinawa: former Nakijin castle, the Ocean Expo Park, market and so on
3.1 Exploring User Interface Based on Accustomed Manners and Objects The collected data were classified into 4 groups below. 1. Architecture: entrance, stairs, corridor, wall, window, door, roof and so on. Change, emphasis and flexibility of space were observed. 2. Utensil, object: clothing, sign, vehicle, signboard, display, product and so on. Elements that show user’s situation like distance to goal and direction were observed. 3. Behavior, manners: while eating, customs, hospitality, wording and so on. Method or clue to provide information was observed. 4. Environment: garden, tree, sound, smell, water, road, space and so on. Elements guided user like navigation, emphasis, informing present location were observed. A lot of user-interfaces were created based on the above mentioned 4 groups. Six examples are shown below. Collected data were analyzed from the viewpint of “scene”, ”components” and “user-interface”. 1. Example1 a. a scene: There is two observing routes in a hall of the Asahiyama zoo. b. components: two different routes c. a new user-interface: select the route according to situation.
A Proposal of New Interface Based on Natural Phenomena and so on (1)
Fig. 1. Two routes displayed at the entrance
615
Fig. 2. A new user interface
2. Example2 a. a scene: the site of former Nakijin castle b. components: combination of 3 steps + landing +5 steps +landing +7steps at stairs c. a new user-interface: operate rhythmically without monotony
Fig. 3. The site of former Nakijin castle.
Fig. 4. A new user interface
3. Example3 a. a scene: window of temple b. components: cut off a part of scene from scene c. a new user-interface: focus on the emphasized part
Fig. 5. A window of temple
Fig. 6. A new user-interface
616
T. Yamaoka et al.
4. Example4 a. a scene: look up a leopard in a cage of the Asahiyama zoo b. components: look up animals c. a new user-interface: look an object from various viewpoint
Fig. 7. Look up animals
Fig. 8. A new user-interface
5. Example5 a. a scene: go through a shop curtain b. components: be able to see the inside c. a new user-interface: see next page partly
Fig. 9. Shop curtain
Fig. 11. A picture1 scroll
Fig. 10. New user interface
Fig. 12. A new user interface
A Proposal of New Interface Based on Natural Phenomena and so on (1)
617
6. Example6 a. a scene: look at a picture scroll b. components: a scroll c. a new user-interface: display all information on one page 3.2 Exploring User Interface Based on Natural Phenomena A lot of data were collected in the field survey. The collected data were classified into 4 groups below. 1. mountain: marvelous shape of rock, surface of mountain, summit of mountain, Gently-sloping hill and so on. Elements that inform user of present location and season’s change were observed. 2. sea: vast expanse of sea, water’s edge, surface of water, sound of water, water vapor, steam, sandy beach, shore and so on. Movement like wave was observed. 3. sky: cloud, rainbow, blue sky, sea of cloud, morning sun, sunset and so on. Elements about lapse of time, direction and weather were observed. 4. he others: weathering, sunlight filtering down through trees, sunny place, smoke and so on. A lot of user-interfaces were created based on the above mentioned 4 groups. Two examples are shown below. Collected data were analyzed from the viewpoint of “scene”, ”components” and “user-interface”. 1. Example1. a. a scene: summit of mountain and mountain trail b. components: sky, trees, trees and plants, open space c. a new user-interface: flexible interface
Summit of mountain
bellows
Obscure zone
Mountain trail Fig. 13. Mountain trail
2. Example2 a. a scene: surface of water b. components: surface, light, plants c. a new user-interface: navigated user-interface
tag Fig. 14. New user interface
618
T. Yamaoka et al.
Fig. 15. Surface of water
Fig. 16. New user interface
3.3 Exploring User Interface Based on Movement and Behavior of Plants and Animals A lot of data were collected in the field survey. The collected data were classified into 4 groups below. 1. Dynamic movement and behavior with movement. Elements with speed and movement were observed. 2. Dynamic movement and behavior without movement. Elements which inform active, understanding of response, etc were observed. 3. Static movement and behavior. Static situations were observed. 4. Plants. Elements which display a situation and change were observed. A lot of userinterfaces were created based on the above mentioned 4 groups. One example is shown below. Collected data were analyzed from the viewpoint of “scene”, ”components” and “user-interface”. 1. Example a. a scene: fishes gathered to get bait b. components: a lot of fish, gather c. a new user-interface: gather information which are related.
Fig. 17. Fish gathered to get bait
Fig. 18. New user interface
A Proposal of New Interface Based on Natural Phenomena and so on (1)
619
4 Evaluation of New User Interface Three user-interfaces were selected in order to evaluate.
Fig. 19. Design 1 A new user-interface with scroll function
Fig. 20. Design2 A new user-interface with direction
Fig. 21. Design3 A new user-interface gathered related information
1. Method. The three user-interfaces were visualized as real interface of products. Participants: 16 persons (male:2, femail:14) businessman, housewife. Method: Participants answered questions looking at the user-interface design. Questionnaire: items regarding layout, color, operation, easy to see, usability characteristics of a new user-interface 2. Results and discussion. Design1. Generally, design1 has good results. Especially, the emotional aspect is evaluated highly. Design2. While interface aspects are evaluated highly, the navigation is not evaluated. Although the navigation is good idea, the design is not so good. Design3. The interface of design3 is evaluated highly because of new function and the convenience.
620
T. Yamaoka et al.
5 Conclusion User-interface focused on screen like operational panel becomes very important in any products. This study is aimed at creating new user-interfaces based on natural phenomena, object, accustomed manners and so on. As we’ve lived in nature, it is very reasonable to apply the dispensation of nature to an user-interface. And, by observing manners and objects influenced by natural climate, we can construct an userfriendly interface easily. As animal and plants makes us calm down, factors of the calm cause emotional interface which is familiar to us. A lot of new and original userinterfaces were created based on natural phenomena, object, accustomed manners and so on in this study.
Managing Intelligent Services for People with Disabilities and Elderly People Julio Abascal1, Borja Bonail2, Luis Gardeazabal1, Alberto Lafuente1, and Zigor Salvador3 1
Laboratory of HCI for Special needs. University of the Basque Country/Euskal Herriko Unibertsitatea. Manuel Lardizabal 1, 20018 Donostia. Spain {julio.abascal,luis.gardeazabal,alberto.lafuente}@ehu.es 2 Fatronik. Paseo Mikeletegi, 7 - Parque Tecnológico. 20009 Donostia. Spain
[email protected] 3 CIC Tourgune. Paseo Mikeletegi, 56. 20009 Donostia. Spain
[email protected] Abstract. Ambient Supported Living systems for people with physical, sensory or cognitive restrictions have to guarantee that the environment is safe, fault tolerant and universally accessible. In addition it is necessary to overcome technological challenges, common to ubiquitous computing, such as the design of a middleware layer that ensures the interoperability of multiple wired and wireless networks and performs discovery actions. On top of that the system has to provide efficient support to the intelligent applications designed to assist people living there. In this paper we present the AmbienNet architecture designed to allow structured context information to be shared among the intelligent applications that support people with disabilities or elderly people living alone. Keywords: Supportive Ambient Intelligence, Users with disabilities, Elderly people, Ambient Assisted Living.
1 Introduction The Ambient Intelligence concept is being successfully applied to the development of supportive environments for people with disabilities and elderly people, under the framework of Ambient Supported Living. These environments have to meet several conditions in order to be truly helpful to and usable by people with physical, sensory or cognitive restrictions. For instance, accessibility barriers to interaction must be avoided. In addition, due to the fact that many users will depend on the system, it must be safe and fault tolerant. From the technological point of view, the system must be able to handle heterogeneous wired and wireless networks, embedded processors and sensors by means of a well-designed middleware layer. Furthermore, the system must allow the efficient processing of the intelligent applications that provide the actual support to users. Therefore special attention must be paid to the design of environments oriented to efficiently support intelligent applications, taking into account that each context-aware C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 623–630, 2009. © Springer-Verlag Berlin Heidelberg 2009
624
J. Abascal et al.
application has to process a large part of the huge amount of data collected by a great number of sensors. In addition, the information produced by each application (processing the information coming from sensors) in order to obtain knowledge to be used for its own purposes can be useful for other applications. This is the case, for instance, with adaptive user interfaces for intelligent environments [1] that have to create and maintain models of the user, the task and the environment, which can be shared with supportive intelligent applications that use models of the user, task and environment in order, for instance, to issue alarms or warnings when the user is trying to perform a task in an inappropriate place or at an inappropriate time [2]. Even if processors are currently able to process enormous quantities of data, ubiquitous computers can have a limited quantity of memory and processing capacity, which justifies the effort to minimize inefficient repetitions. In the following sections we will describe the technological AmbienNet approach to support applications and to allow structured information sharing among them.
2 The AmbienNet Environment AmbienNet is a research project that aims to study, design and test a supporting architecture for Ambient Intelligence environments devoted to people with disabilities and elderly people living at home or in institutions. The AmbienNet infrastructure is currently composed of two wireless sensor networks, a middleware layer and a number of smart wheelchairs provided with range sensors [3]. The following subsections briefly describe them. 2.1 Sensor Networks A key aspect of Ambient Intelligence is the deployment of diverse types of sensors throughout the environment. These are able to provide information about different physical parameters, such as temperature, light intensity, distance, presence/absence, acceleration, etc. The progress experienced in recent years by sensor technology allows cheaper and more accurate sensors, enabling the inclusion of a large number of them. For this reason the use of Sensors Network technology has been proposed in order to be able to efficiently handle them. AmbienNet uses two wireless networks based on Zigbee [4], which is currently the most widespread technology for sensor networks. Zigbee is chosen due to its advantages such as low power consumption (especially important for the mobile devices: wheelchairs, hand-held devices, etc.), large number of devices allowed, low installation cost, etc. Furthermore, in public buildings with relatively frequent re-partitioning of space (administration or commercial buildings, hospitals, residences) devices can easily be moved and reconfigured. Indoor Location System. The AmbienNet indoor location system is devoted to locating people living in an institution. This system has two main functions: (a) to monitor the current position of fragile elderly people and (b) to locate care personal in order to call the closest and most appropriate person when urgent help is required. People are located only if they provide their informed consent [5].
Managing Intelligent Services for People with Disabilities and Elderly People
625
The system has two levels of accuracy. In the proximity mode, a rough location (typically the room) of the target is computed. This operating mode requires just one or two beacons in every room from which we want to provide location information. The second one is the multilateration mode where location of the target is computed accurately by measuring the ultrasound times of flight. This mode requires some additional beacons (5 to 8) to enable multilateration and deal with non-line-of-sight errors. The indoor location system is able to locate people wearing a discrete tag, with an accuracy of up to 10 centimetres [6]. Currently studies are being developed in order to include an accelerometer in the tag, which could prove valuable for fall-detection. Image sensor network. The current availability of inexpensive, low power hardware including CMOS cameras makes it possible to deploy a wireless network with nodes equipped with cameras acting as sensors for a variety of applications. AmbienNet includes a wireless sensor network that performs detection and tracking of a mobile object taking into account possible obstacles. A network of small video cameras with simple local image processing allows the detection of static and mobile items such as humans or wheelchairs [7,8]. The images are processed locally in order to reduce the data transmission just to valid data. Ceiling cameras are used to detect objects that can be distinguished from the floor, shadows, etc., using simple and well-known image processing algorithms. The embedded microcontroller of each camera computes a grid map of the room with cells corresponding to small areas of 10-20cm. These areas may have two possible values: “occupied” cells correspond to obstacles as detected by the image processing algorithms; “free” cells represent the space where mobile objects and people can move. 2.2 Middleware The middleware is a key part of the ubiquitous computing environment [9]. After studying the infrastructure requirements of general pervasive health care applications [10] three specific requirements for the applications of AmbienNet have been identified: (1) integration of heterogeneous devices, (2) provision of device discovery mechanisms, and (3) management of context information. Adopting a layered architecture the middleware layer is able to abstract device-dependent features and to provide a homogeneous interface to the upper layers of the system, minimizing software complexity. Context in pervasive applications is related to high-level, user-centred information. Since context information is to be extracted from raw sensor data, an efficient software infrastructure has to process the sensor data and extract higher-level information and make it available to the application level. In addition, a flexible and powerful model is required for context representation. In addition to the role of efficiently integrating and dynamically managing heterogeneous resources and services, a second role is assumed by this layer: providing developers with a framework to build intelligent applications. In this way, the AmbienNet middleware layer offers an object-oriented interface to the applications. Interface functions refer to three different kinds of functional modules, which provide a unified interface to context, control and interaction drivers respectively, as described in [11]. Furthermore, an additional interface is provided to the intelligent context services as explained in the next subsection.
626
J. Abascal et al.
Operational interfaces to services and drivers follow the OSGi specification and system components at any layer are defined as OSGi bundles. In order to distribute the OSGi components, we adopt the R-OSGi approach proposed by Rellermeyer et al [12], which preserves the OSGi interface to applications, and hence portability. 2.3 Mobile Items: Smart Wheelchairs Smart wheelchairs are usually standard electrical wheelchairs provided with sonarand infrared-based distance sensors connected to the bus that links the driving device and the power stage, and controlled by an embedded processor [13]. Since they can act as autonomous vehicles, many navigation algorithms are taken from the mobile robotics field [14]. In AmbienNet the distance sensors are used for indoor navigation by the wheelchairs. In addition, this information is shared with the environment in order to provide information about temporary obstacles, wheelchairs traffic, etc. AmbienNet wheelchairs are provided with navigation algorithms and adaptive user interfaces. More information can be found in [15].
3 Sharing Contextual Information in AmbienNet The AmbienNet approach for sharing knowledge between applications is to create a new level to support intelligent applications, providing them with pre-processed data. In this way, this level acts as a distributed intelligent system where each application processes the raw data coming from the sensors and provides structured information, semantically annotated. In addition, intelligent applications can be consumers of the shared information to complement and enhance their knowledge. In order to share this information, in the AmbienNet project a new level called “Context-Awareness and Location Service”, has been designed (see Figure 1). This level receives structured information from all the applications that are able to produce it and serves information to the applications that require it, effectively acting as a context broker.
Fig. 1. AmbienNet infrastructure model
Managing Intelligent Services for People with Disabilities and Elderly People
627
Most intelligent applications use models to reason about the current situation and to be able to take decisions. Decision taking can involve user modelling to specify the characteristics of the user. These features must be observable and relevant for the application. In the case of people with physical, sensory or cognitive restrictions user modelling is very useful to tune the interaction system to the user functionalities. In addition, intelligent environments may require other models to describe the tasks that the user can (or can not) do and the environment where these tasks can (or can not) be performed. With this information an intelligent application, for instance, can supervise the users’ behaviour and help them to perform everyday tasks, plan their time, or prepare exceptional activities, such as trips, medical visits, etc. Modelling activity is a time and process consuming task that should not be repeated by each application needing it. The Intelligent Context Service layer processes the information coming from the sensors through the middleware in order to build and maintain the models, and serve this information to all the applications requiring it. These applications can also be producers of information that is shared with the other applications through this layer.
4 Intelligent Applications The information collected by all the sensors is delivered by the middleware to the Intelligent Context Services level, where it is materialized into the context-awareness and location services. These services process the information about location and sensors to extract data of a higher level to be used as support for the context applications. Moreover, this structure allows the design of enhanced applications and services, such as pattern recognition or user behaviours in order to identify or detect certain situations of risk or alarm. In order to demonstrate the validity of the Intelligent Content Services level designed for AmbienNet currently three intelligent applications share data: Intelligent Adaptive Interfaces, Everyday Tasks Support, and Navigation Support. All of them are information consumers and producers. 4.1 Intelligent Adaptive Interface Ambient Intelligence systems provide support to the user in a proactive way; that is, without waiting for users’ explicit requests. Nevertheless, in some cases it is necessary to establish a direct communication between the user and the environment. For instance, they explicitly interact when the user issues commands and when the system requests extra information (commands confirmation, supplementary data) or produces messages (warnings, reminders) [2]. Due to the fact that each user may have different capabilities and interests it is necessary to adapt the interaction to the specific user at hand. In addition, the modality that better fits the type of communication for a specific situation has to be chosen. The models of the user, the task and the context (handled and maintained by the intelligent Context Support Layer) are used to adapt the interaction.
628
J. Abascal et al.
4.2 Everyday Task Support This application supervises the activity of the users and provides them with recommendations to perform some tasks and warnings when potentially hazardous situations are detected. The information about the users, their activities and the adequate time and place for them, is stored in the previously mentioned models run by the intelligent Context Support Layer. More information about this application can be found in [1]. 4.3 Navigation Support The Navigation support is devoted to people with physical restrictions using smart wheelchairs. Even if this type of wheelchair is usually provided with range sensors that allow obstacle avoidance (short-term planning) they may experience difficulties for (long-term) trajectory planning. For this task, the information collected by the intelligent environment can be used. By means of the location sensor network deployed in AmbienNet the environment is able to locate the user with two precision levels: in which room he/she is (rough location) and his/her absolute position with a precision of centimetres (detailed location). In addition the video sensor network can place the wheelchair on a grid map (taking into account its position and orientation) together with the detected mobile and static obstacles [16]. This information is served to the wheelchair processor that uses it to plan its trajectory [17]. Therefore the smart wheelchairs take profit from contextual information coming from the indoor location system and from the image sensor network, which provides the wheelchairs with accurate information about places, distances, and mobile and sporadic obstacles. This information can be used for global navigation. In this way crowded corridors, closed doors, dangerous zones, etc., can be avoided. In addition, the wheelchair’s position and approximate speed, as well as obstacles, can be obtained from the ceiling cameras in order to perform a “local” navigation aid (avoiding obstacles and allowing specific goals to be reached). During the autonomous period between two successive updates from the environment the wheelchair uses the well-known wavefront path planning algorithm [38] to set checkpoints. Due to intrinsically inherent odometric errors, the estimated real position has to be maintained during the whole autonomous period until the next position update. For that, the Adaptive Monte Carlo Localization [39] algorithm was selected due to its great performance. This algorithm estimates the real position, comparing the map given and the one obtained through sensor readings. Additionally, the enhanced Vector Field Histogram [40] algorithm is used for obstacle avoidance. Furthermore, each wheelchair produces spatial information that can be useful for other applications. Therefore, the wheelchair controller has three different inputs: commands issued by the user through the user interface; distance information collected by range sensors installed in the own wheelchair; and context information coming from the environment. In order to use it properly, a shared-control paradigm is used.
Managing Intelligent Services for People with Disabilities and Elderly People
629
5 Conclusions In addition to guaranteeing universal accessibility, safety and fault tolerance, the design of supportive Ambient Intelligent environments has to ensure efficient processing of concurrent intelligent applications. The AmbienNet project proposed and tested an OSGi-based middleware to provide efficient interoperation among heterogeneous hardware and networks and intelligent applications. In addition, the new “ContextAwareness and Location Service” layer eases the creation of new supportive applications, providing them with pre-processed information in a proactive manner. Acknowledgment. The AmbienNet project is developed by the Laboratory of HCI for Special Needs of the University of the Basque Country, in collaboration with the Robotics and Computer Technology for Rehabilitation Laboratory of the University of Seville and the Technologies for Disability Group of the University of Zaragoza. This work has been partially funded by the Spanish Ministry of Education and Science as a part of the AmbienNet project TIN2006-15617-C03, and the Basque Government under grant No. S-PE07IK03.
References 1. Abascal, J., Fernández de Castro, I., Lafuente, A., Cia, J.M.: Adaptive Interfaces for Supportive Ambient Intelligence Environments. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2008. LNCS, vol. 5105, pp. 30–37. Springer, Heidelberg (2008) 2. Abascal, J.: Users with Disabilities: Maximun Control with Minimun Effort. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2008. LNCS, vol. 5098, pp. 449–456. Springer, Heidelberg (2008) 3. Salvador, Z., Jimeno, R., Lafuente, A., Larrea, M., Abascal, J.: Architectures for ubiquitous environments. In: IEEE Int. Conf. on Wireless and Mobile Computing, Networking and Communications. IEEE Press, New York (2005) 4. ZigBee, http://www.zigbee.org/ 5. Casas, R., Marco, A., Falcó, J.L., Artigas, J.I., Abascal, J.: Ethically Aware Design of a Location System for People with Dementia. In: Miesenberger, K., Klaus, J., Zagler, W.L., Karshmer, A.I. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 777–784. Springer, Heidelberg (2006) 6. Marco, A., Casas, R., Falco, J., Gracia, H., Artigas, J.I., Roy, A.: Location-based services for elderly and disabled people. Computer Communications 31(6), 1055–1066 (2008) 7. Rowe, A., Goel, D., Rajkumar, R.: FireFly Mosaic: A Vision-Enabled Wireless Sensor Networking System. In: RT Systems Symp. 2007, pp. 459–468 (2007) 8. Fernández, I., Mazo, M., Lázaro, J.L., Pizarro, D., Santiso, E., Martín, P., Losada, C.: Guidance of a mobile robot using an array of static cameras located in the environment. Autonomous Robots 23(4), 305–324 (2007) 9. da Costa, C.A., Corrêa Yamin, A., Resin Geyer, C.F.: Toward a General Software Infrastructure for Ubiquitous Computing. IEEE Pervasive Computing 7(1), 64–73 (2008) 10. Salvador, Z., Larrea, M., Lafuente, A.: Infrastructural Software Requirements of Pervasive Health Care. In: Procs. IADIS Int. Conf. on Applied Computing, Salamanca (Spain), pp. 557–562 (2007)
630
J. Abascal et al.
11. Salvador, Z., Larrea, M., Lafuente, A.: Smart Environment Application Architecture. In: Procs. of the 2nd Int. Conf. on Pervasive Computing Technologies for Healthcare, PervasiveHealth 2008, Tampere (Finland), pp. 308–309 (2008) 12. Rellermeyer, J.S., Alonso, G., Roscoe, T.: R-OSGi: Distributed Applications Through Software Modularization. In: Proceedings of the ACM/IFIP/USENIX 8th International Middleware Conference (2007) 13. Ding, D., Cooper, R.A.: Electric-Powered Wheelchairs: A review of current technology and insight into future directions. IEEE Control Systems Magazine, 22–34 (April 2005) 14. Dutta, T., Fernie, G.R.: Utilization of Ultrasound Sensors for Anti-Collision Systems of Powered Wheelchairs. IEEE Transactions on Neural Systems and Rehabilitation Engineering 13(1), 24–32 (2005) 15. Abascal, J., Bonail, B., Cagigas, D., Garay, N., Gardeazabal, L.: Trends in Adaptive Interface Design for Smart Wheelchairs. In: Lumsden, J. (ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology, pp. 711–729. Idea Group Reference, Pennsylvania (2008) 16. Takeuchi, E., Tsubouchi, T., Yuta, S.: Integration and Synchronization of External Sensor Data for a Mobile Robot. In: SICE Annual Conference, Fukui, Japan, pp. 332–337 (2003) 17. Jennings, C., Murray, D.: Stereo vision based mapping and navigation for mobile robots. In: IEEE Int. Conf. on Robotics and Automation, New, Mexico, pp. 1694–1699 (1998)
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors in Embodied Conversational Agents Afia Akhter Lipi1, Yukiko Nakano2, and Matthias Rehm3 1
Dept. of Computer and Information Sciences, Tokyo University of Agriculture and Technology, Japan
[email protected] 2 Dept. of Computer and Information Science, Seikei University, Japan
[email protected] 3 Institute of Computer Science, Augsburg University, Germany
[email protected] Abstract. The goal of this paper is to integrate culture as a computational term in embodied conversational agents by employing an empirical data-driven approach as well as a theoretical model-driven approach. We propose a parameter-based model that predicts nonverbal expressions appropriate for specific cultures. First, we introduce the Hofstede theory to describe socio-cultural characteristics of each country. Then, based on the previous studies in cultural differences of nonverbal behaviors, we propose expressive parameters to characterize nonverbal behaviors. Finally, by integrating socio-cultural characteristics and nonverbal expressive characteristics, we establish a Bayesian network model that predicts posture expressiveness from a country name, and vice versa. Keywords: conversational agents, enculturate, nonverbal behaviors, Bayesian network.
1 Introduction When we meet someone, one of the first things we do is to classify the person as “in-group” or “out-group”. This social categorization is often based on the ethnicity [4]. When someone is identified as part of in-groups as opposed to out-group, she or he is perceived as more trustworthy. As the same way, does the ethnicity of Embodied Conversational Agents (ECAs) also matter? Findings in previous studies support a claim that ethnicity of embodied conversational agents effects users’ attitudes and behaviors. Nass et al [8] found that users showed more trust and were more willing to take the agent’s suggestion if the agent was of the same ethnic group or from the same cultural background. Aiming at generating culture-specific behaviors, specifically postures, in ECAs, this study focuses on modeling cultural differences. Our method enables the user to experience exchanges of cultural specific posture expressions in human-agent interaction. However, defining culture is not an easy task and there are various definitions of this C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 631–640, 2009. © Springer-Verlag Berlin Heidelberg 2009
632
A.A. Lipi, Y. Nakano, and M. Rehm
notion around, and descriptive and explanatory theories are not very useful for computational purposes. Thus, to generate culturally appropriate nonverbal behaviors in ECAs, we will propose a parameterized socio-cultural model which characterizes the group or the society using a set of numerical values, and selects agents’ nonverbal expressions according to the parameter set using probabilistic reasoning facilitated by a Bayesian network. As a data-driven approach, we have already collected comparative multimodal corpus for two countries; an Asian country Japan and a European country Germany, and extracted culture-specific posture shapes from the corpus [1]. In this paper, based on the results of our empirical study, we extend our research by employing a model-driven approach by introducing Hofstede model [7] as a theoretical basis of describing socio-cultural characteristics. Hofstede’s theory is more appealing for establishing a computational model because Hofstede defines each culture using five dimensions each of which has quantitative nature. Integrating the Hofstede theory of culture [7] and the empirical data from our corpus [1], in this paper, we will implement a parameterized model which generates culture specific non-verbal expressions. Our final goal is not restricted to build a model for embodied conversational agents, but is to propose a general model which estimates nonverbal parameters for various cultures. In the following sections, first we will discuss related work in section 2, and in section 3, explain the approach of this study in addition to a brief description of Hofstede model. Section 4 reports the empirical data in our corpus, and Section 5 proposes a Bayesian network which combines Hofstede theory and the empirical data. Section 6 describes a nonverbal decision module, and Section 7 gives conclusions and future work.
2 Related Work As research in ethnicity of ECAs, Nass et al [15] examined the question: “Does the ethnicity of a computer agent affect users’ attitudes and behaviors?” So they did a study of Korean subjects interacting with an American agent or with a Korean agent. They found out that ethnic similarity had significant effect on users’ attitudes and behaviors. When the ethnicity of the subject was the same as the agent, the subject took the agent to be more trustworthy and convincing. Nass et al [15] claimed that user showed more trust and were more willing to take the agent’s suggestion or more willing to give credit card number. These results suggests that culture adapted agents are more positively accepted, and will provide more successful outcome in E-commerce. Francisco et al. [11] assessed index of ethnicity on the basis of language and non-verbal features, not by physical appearance such as skin color, hair, style or clothing. They found that children had longer interaction with a virtual peer whose verbal and nonverbal behaviors matched with their own ones than with a ethnically mismatched virtual peer. Isbister [5] pointed out the importance of non-verbal communicative behaviors which is largely culture-specific. She reviewed a number of features of nonverbal communication such as eye gaze and gestures. Arabs treat sustained eye contact as a sign of engagement and sincerity where as Japanese interpret sparse use of direct eye contact as a sign of politeness. Another example is a simple head nod which is interpreted as a sign of agreement in Germany but indicates only attention in Japan. The
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
633
frequency, the manner, and the number of gestures are also culturally dependent. Mediterranean people have far more gestures than North American do. Italians tend to use big gestures and do gestures more frequently than English or Japanese. Southern Europeans, Arabs, and Latin Americans use animated hand gestures where as Asians and Northern Europeans use quieter gestures [5]. As studies in learning systems, Johnson et al. [10] described a language tutoring system that also takes cultural differences in gesture usage into account. Maniar and Bennett [12] proposed a mobile learning game to overcome cultural shock by making the user aware of the cultural differences. The eCIRCUS project (Education through characters with emotional intelligence and role playing capabilities that understand social interaction) [13] is aiming at developing models and innovative technologies that support social and emotional learning through role-plays. For example, children become aware of social sensitive issues such as bullying through virtual role-plays with synthetic characters.
3 Describing Socio-cultural Characteristics As a theoretical approach, we employ Hofstede theory to describe socio-cultural characteristics. Then, from an empirical approach, we propose several nonverbal expressive parameters to characterize the posture expressiveness. These two layers will be integrated into a Bayesian network model to predict either behavioral characteristics or a culture. We start with introducing Hofstede theory [7]. Hofstede theory defines culture as a dimensional concept, and consists of the following five dimensions which are based on a broad empirical survey. 1. Hierarchy/Power Distance Index: This dimension describes the extent to which different distribution of power is accepted by the less powerful members. More coercive and referent power is used in high power distance societies and more reward, legitimate, and expert power in low power distance societies. 2. Identity: This is the degree to which individuals are integrated into a group. On the individualist side, ties between individuals are loose, and everybody is expected to take care for herself/himself. On the collectivist side, people are integrated into strong and cohesive groups. 3. Gender: The gender dimension describes the distribution of roles between genders. Feminine cultures place more value on relationships and quality of life whereas in masculine cultures competition is rather accepted and status symbols are of importance. 4. Truth/Uncertainty: The tolerance for uncertainty and ambiguity is defined in this dimension. It indicates to what extent the members of a culture feel either uncomfortable or comfortable in unstructured situations which are novel, unknown, surprising, or different from usual. 5. Virtue/Orientation: This dimension distinguishes long and short term orientation. Values associated with long term orientation are thrift and perseverance whereas values associated with short term orientation are respect for tradition, fulfilling social obligations, and saving one’s face.
634
A.A. Lipi, Y. Nakano, and M. Rehm
Since cultural characteristics in Hofstede theory are synthetic, a set of parameter values indicates the cultural profile. Table 1 gives Hofstede’s ratings for three countries [2]. For example, in Identity dimension, Germany (67) is more individual culture than Japan (46), and US (91) is the most individual among three. Table 1. Hofstede ratings for three countries Hierarchy
Identity
Gender
Uncertainity
Orientation
Germany
35
67
66
65
31
Japan
54
46
95
92
80
US
40
91
62
46
29
4 Characterizing Nonverbal Behaviors 4.1 Defining Posture Expressive Parameters To define parameters that characterize posture expressivities, we reviewed previous studies. To describe cultural differences in gestures, Efron [14] proposed parameters such as spatio-temporal aspects, interlocutional aspects, and co-verbal aspects. Using a factor analysis, Gallahar [15] revealed four dimensions; expressiveness, expansiveness, coordination, and animation. Based on these previous studies, Hartmann et al. [16] defined gestural expressivity using six parameters such as repetition, activation, spatial extent, speed, strength, and fluidity Based on our literature study, we came up with five parameters which define the characteristics of posture. The five parameters are spatial extent, rigidness, mirroring, frequency, and duration. In the next section, the details of deriving values for each behavioral expressive parameter are explained. 4.2 Assigning Values Since we found that the cultural difference in posture shifts is very clear in arm postures [1], we focus on predicting arm postures. Among the five expressive parameters we proposed in section 4.1, we got the value of frequency and duration from our previous empirical study [1]. To find the values for spatial extent and rigidness, we will conduct an experiment. Then to derive the numerical value for mirroring, we will analyze our video data. Frequency and duration. Frequency and duration can be assigned by referring the results of our previous empirical study [1]. Average frequency of arm posture shifts in German data is 40.38 per conversation and 22.8 in Japanese data. On the other hand, average duration of each posture is 7.79 sec in German data and 14.8 sec in Japanese data. Thus, Japanese people like to keep one posture longer than German people.
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
635
Measuring impression for spatial extent and rigidness. By spatial extent, we mean the amount of physical space required for a certain posture. As the term rigidness seems more tricky type, we used the opposite word relax instead of rigidness to make the term simple to the subjects. Study Design: We extracted 15 video clips of postures from Japanese video data and 15 posture video clips from German data, and asked 7 Japanese subjects and 10 German subjects to rate each video clip. The rating was made using a questionnaire which asked the subjects to rate impressions on the shape of arm, lower body, and whole body using 7 point scales where 1 is meant to be the least value and 7 is the top most value. For each video clip, the subjects answered their impression in two dimensions; spatial extent and relax. Before starting the experiment, each subject was handed an explanation form which explained how each subject should rate the video clips. Result: The rating results are shown below. Table 2 shows that German do more relaxed postures than Japanese, and Japanese do smaller postures than German. Table 2: Non-verbal expressive parameters from Experimental data
Country Japan Germany
Spatial Extent 5.25 7.33
Rigidness 8.58 7.62
Analyzing mirroring. Mirroring refers to an interpersonal phenomenon in which people unknowingly adjust the timing and content of their behavioral movements such that they mirror the behavioral cues exhibited by their social interaction partner. Mirroring has positive effects on interaction, and enhances the relationship between the conversants. Study Design: We analyzed videos of 10 Japanese pairs and 7 German pairs (both speaker and listener) acting the first time meeting scenario [1] where two people met first time and had a conversation to know each other. The dyadic conversation took place for 5 minutes. After annotating posture shifts of both speaker and listener using Bull’s Coding Scheme [3], we counted the number of postures common in both parties, speaker and listener, by using two conditions below; (1) If person A shifts to a new posture while speaking, and within five seconds person B also changes to the same posture as person A, vice versa.
(2) If person A shifts to a new posture, and soon person B also does the same posture which is overlapped with person A’s posture.
636
A.A. Lipi, Y. Nakano, and M. Rehm
Result: The average number of mirroring for Japanese is 6.2 per conversation, and for German is 0.57. This result suggests that Japanese are more likely to synchronize with the conversation partner than German people. So they like to be in a group and more collective in nature.
5 Combining Theoretical and Empirical Approach to Develop a Parameter-Based Model Based on Hofstede theory of culture, we proposed a model where culture is connected to Hofstede dimensions which are also connected with nonverbal expressive parameters for postures. 5.1 Reasoning Using a Bayesian Network To build this parameter based model, we employ Bayesian network technique. Figure 1 shows our Bayesian network which models relationship between socio-cultural aspects and behavioral expressiveness. Bayesian networks are acyclic directed graphs in which nodes represent random variables and arcs represent direct probabilistic dependences among them. Bayesian networks [2] handle uncertainty at every state. This is very important for our purpose as the linkage between culture and nonverbal behavior is a many to many mapping. In addition, since the network can be used in both directions, it can infer the user’s cultural background as well as simulate the system’s (agent’s) culture specific behaviors. 5.2 Parameter Based Socio-cultural Model In order to build a Bayesian network for predicting socio-cultural aspects in posture
expressiveness, the GeNie[6] modelling environment was used.
Fig. 1. Bayesian network model predicting Japanese posture expressiveness parameters
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
637
First Layer: The first part of the network is quite simple. The entry node of the Bayesian network is a culture node which is connected to Hofstede’s dimensions. Currently we have inserted two countries, Germany and Japan. Middle Layer: The middle layer defines Hofstede’s five dimensions. We already integrated all five dimensions: hierarchy, identity, gender, uncertainty, and orientation. Hofstede ratings for each country shown in Table1 are used as the probabilities in each node. The Lowest layer: The lowest layer consists of a number of different behavioral parameters that depend on a culture’s position on Hofstede’s dimensions. We draw a connection between the cultural dimensions and the nonverbal behaviors. Lowest level consists of five nodes whose values were specified in section 4.2. a) Spatial Extent: Spatial extent describes the amount of physical space required for a certain posture. From our experimental data, we found that German do bigger posture than Japanese. When we compare the postures between male and female subjects, we found that, Japanese female do smaller posture than male, and the difference is bigger than in German data. So, we can say Japanese society is more masculine than German. Moreover, hierarchy affects the spatial extent. In high hierarchical societies, people seem to make small postures [5]. b) Rigidness: How stiff the posture is. Our experimental data revealed that Japanese people seem to do more rigid postures than German, and German seem to be more relaxed than Japanese. In high hierarchical society, people are stiffer than low hierarchical society. Thus, we assume a linkage between hierarchy and rigidness. c) Mirroring: Since mirroring is to copy the conversation partner’s postures during a conversation, we assume that frequency of mirroring correlates with collective nature. In our corpus study in section 4.2, Japanese people actually more frequently did mirroring than German people. d) Frequency: German people change their posture more frequently than Japanese. According to Hofstede theory, Japanese culture is of long term orientation therefore we set links from truth and virtue to frequency. e) Duration: Japanese people stay at a single posture for a long period of time than German people. Thus, we assume that both truth and virtue affect duration. For each node in the Bayesian network, probability is assigned based on the data that we reported in section 4. For example, since the posture shift frequency of German data (40.38) is 1.77 times of Japanese data (22.8), as probability values, we assigned 0.66 and 0.34 to each country respectively. Output: When a country is chosen at the top level as evidence, behavior expressive parameters are estimated. For instance, as shown in Figure 2, when Japan is chosen as evidence, the results of estimations are; spatial extent is small (51%), rigidness is extreme (51%), mirroring is most (54%), frequency is low (59%), duration is long (56%). In the same way, when Germany is given as evidence, the estimation results are; spatial extent is big (51%), rigidness is least (53%), mirroring is least (66%), frequency is high (52%), and duration is short (52%).
638
A.A. Lipi, Y. Nakano, and M. Rehm
5.3 Evaluation of the Model As an evaluation of our model, we tested whether this model can properly predict posture expressiveness of other countries. When the Hofstede scores for US shown in Table 1 are applied, the model predicts that spatial extent for US is big (51%), rigidness is least (52%), mirroring is least (90%), frequency is high (53%), and duration is short (53%). This prediction suggests that American postures are less rigid (in other words more relaxed), and this supports what Ting Toomey has reported [5].
6 Posture Selection Mechanism This section presents our posture selection mechanism which uses the Bayesian network model as one of its components. A simple architecture is given in Figure 2. Basically it is divided in to three main modules. The input to the mechanism is a country name and a text that the agent speaks. 6.1 Probabilistic Inference Module The Probabilistic Inference Module takes country name as input and outputs the nonverbal parameters for that country. To generate outputs, the module refers our Bayesian network model. We used Netica API of JAVA version as an inference engine. The outputs of this module are values of nonverbal expressive parameters of each culture: spatial extent, rigidness, duration, and frequency. 6.2 Decision Module This module is the most important module. This module has two sub-modules. b1: Posture computing module: This module takes the estimation results from the Bayesian network as inputs, and uses them as weights for each empirical data. Then, it calculates the sum of all the weighted values. For example, the score for a posture, PHFe (Put hand to face), which is frequently observed in Japanese data, is shown below. 0.5183, 0.507, 0.58, and 0.56 are weights for spatial extent, rigidness, frequency, and duration respectively, which are given by the Bayesian network. 4.19, 4.4, 2.725, and 1.01 are values obtained in empirical studies in section 41. PHFe={(0.5183 *4.19)+(0.507 *4.4)+( 0.58 *2.725)+ (0.56*1.01)} * 10 = 65.49 b2: Posture distinguishing module: This sub-module separates typical postures of each cultures as German like postures, Japanese like postures, or common postures (used in both Germany and Japan). It is judged by checking the thresholds for each country. If the text is Japanese, and the posture value falls within the range of Japanese posture, it sends the posture to the Generation phase as a posture candidate.
1
Since various kinds of measures were used in the empirical data, they were normalized them into 1 to 7.
A Parameter-Based Model for Generating Culturally Adaptive Nonverbal Behaviors
639
Japan ese A gent
Proba bilistic Infe re nc e M o dule
D ec ision M odule
G e ne ration M od ule
U ser
B ay esia n m o d el
E m p ir ica l d a ta
A n im a tio n s
Germ an Agen t
Spe e ch synthe siz e r
Fig. 3. A simplified architecture of the system
6.3 Generation Module This module takes postures recommended by the decision module and looks for the animation file for that posture in animation database. Then, Horde3D animation engine generates the animation file. We use Hitachi HitVoice for TTS which converts the text into a wav file, and then the agent speaks with appropriate culture-specific postures.
7 Future Work and Conclusions Employing Bayesian network, we combined Hofstede model of socio-cultural characteristics with posture expressive parameters that we proposed, and found that our model estimates cultural specific posture expressiveness quite well. As future work, we plan to apply our posture generation mechanism to language exchange application on the web where two users from different countries log on the service, and teach their own language to the partner, and learn a foreign language from her or his partner. In this application, the system not only helps the user teach a language, but also makes the learner familiar with the culture-specific nonverbal behaviors. Acknowledgment. This work is funded by the German Research Foundation (DFG) under research grant RE 2619/2-1 (CUBE-G) and the Japan Society for the Promotion of Science (JSPS) under a Grant-in-Aid for Scientific Research (C) (19500104).
References 1. Rehm, M., et al.: Creating a Standardized Corpus of Multimodal Interactions for Enculturating Conversational Interfaces. In: Proceedings of Workshop on Enculturating Conversational Interfaces by Socio-cultural Aspects of Communication, 2008 International Conference on Intelligent User Interfaces (IUI 2008) (2008) 2. Rehm, M., et al.: Too close for comfort? Adapting to the user’s cultural background. In: Proceedings of the 2nd International Workshop on Human-Centered Multimedia (HCM), Augsburg (2007) 3. Bull, P.E.: Posture and Gesture. Pergamon Press, Oxford (1987) 4. Nass, C., Isbister, K., Lee, E.: Truth is Beauty Researching Embodied Conversational Agents. In: Cassell, J., et al. (eds.) Embodied Conversational Agents, pp. 374–402. The MIT Press, Cambridge (2000) 5. Ting-Toomey, S.: Communication Across Culture. The Guildford Press, New York (1999)
640
A.A. Lipi, Y. Nakano, and M. Rehm
6. GeeNIe and SMILE, http://genie.sis.pitt.edu/ 7. Hofstede, http://www.geert-hofstede.com/hofstededimensions.php 8. Lee, E.-J., Nass, C.: Does the ethnicity of a computer agent matter? An experimental comparison of human-computer interaction and computer-mediated communication. In: Prevost, S., Churchill, E. (eds.) Proceedings of the workshop on Embodied Conversational characters (1998) 9. Isbister, K.: Building Bridges through the Unspoken: Embodied Agents to facilitate intercultural communication. In: Payr, S., Trappl, R. (eds.) Agent Culture: Human –Agent Interaction in a Multicultural World, pp. 233–244. Lawrence Erlbaum Associates, Mahwah (2004) 10. Johnson, W., et al.: Tactical Language Training System: Supporting the Rapid Acquisition of Foreign Language and Cultural Skills. In: Proc. of InSTIL/ICALL - NLP and Speech Technologies in Advanced Language Learning Systems (2004) 11. Iacobelli, F., Cassell, J.: Ethnic Identity and Engagement in Embodied Conversational agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 57–63. Springer, Heidelberg (2008) 12. Maniar, N., Bennett, E.: Designing a mobile game to reduce cultural shock. In: Proceedings of ACE 2007, pp. 252–253 (2007) 13. http://www.e-circus.org/ 14. Efron, D.: Gesture, Race and Culture. Mouton and Co. (1972) 15. Gallaher, P.E.: Individual Differences in Nonverbal Behavior; Dimension of style. Journal of Personality and Social Psychology 63819, 133–145 (1992) 16. Hartmann, B., Mancini, M., Buisine, S., Pelachaud, C.: Design and evaluation of expressive gesture synthesis for embodied conversational agents. In: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, pp. 1095–1096 (2005)
Intelligence on the Web and e-Inclusion Laura Burzagli and Francesco Gabbanini Institute of Applied Physics “Nello Carrara” – Italian National Research Council Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy {L.Burzagli,F.Gabbanini}@ifac.cnr.it
Abstract. Within the context of Web, the word intelligence is often connected with the visions of Semantic Web and Web 2.0. One of the main characteristic of Semantic Web lies in the fact that information is annotated with metadata and this gives the opportunity of organizing knowledge, extracting new knowledge and performing some basic operations like query answering or inference reasoning. Following this argument, the advent of the Semantic Web is often claimed to bring about substantial progress in Web accessibility (which is part of the e-Inclusion concept). Web 2.0 sites, favoring massive information sharing, could as well be of great importance for e-Inclusion, enabling new forms of social interaction, collective intelligence and new patterns of interpersonal communication. Benefits could be substantial also for people with activity limitations. The paper tries to highlight the possible roles and convergence of Web 2.0 and Semantic Web in favoring e-Inclusion. It highlights the fact that examples of applications of these concepts to the e-Inclusion domain are few and limited to the e-Accessibility field. Keywords: e-Inclusion, Web 2.0, Semantic Web.
1 Introduction Due to the evolution and increased complexity of the Web, intelligence is becoming a challenging functionality in the Web itself, and a number of forms in which it can manifest itself have been identified, such as the Semantic Web and Web 2.0. Enhancements that these forms of intelligence might bring in access to information and interpersonal communication could have a positive impact in the field of e-Inclusion. The term e-Inclusion is here considered in its widest definition, both as a support to accessibility of Information and Communication Technology and as a support to the daily activities of people, according to the European Union Riga declaration1, point 4: “e-Inclusion” means both inclusive ICT and the use of ICT to achieve wider inclusion objectives. It focuses on participation of all individuals and communities in all aspects of the information society. E-Inclusion policy, therefore, aims at reducing 1
See http://ec.europa.eu/information_society/events/ict_riga_2006/doc/declaration_riga.pdf, last visited on 2/27/2009.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 641–649, 2009. © Springer-Verlag Berlin Heidelberg 2009
642
L. Burzagli and F. Gabbanini
gaps in ICT usage and promoting the use of ICT to overcome exclusion, and improve economic performance, employment opportunities, quality of life, social participation and cohesion. The analysis in this paper introduces Web 2.0 and Semantic Web, tries to highlight their possible convergence and to summarize their role in favoring e-Inclusion. The discussion points out the fact that, up to now, few examples of applications of Semantic Web and Web 2.0 to the e-Inclusion domain exist, and they are mostly limited to the e-Accessibility field. An example of one of such applications is presented to stimulate the discussion.
2 Aspects of Intelligence on the Web Web technologies have been constantly evolving, and also the way in which the Web is used and perceived by its users is evolving. As observed in [1], from the browsing of some pages containing text or images, going through the connection of pages, the Web has become an interactive ubiquitous information system that leverages the wisdom of many users and makes it possible to reuse data through mashups. From the perspective of users this structure offers information, services and powerful search engines to find them. Users can take advantage of an environment useful to people for research, learning, commerce, socialization, communication and entertainment. In order to fully exploit the potential of today’s Web, benefits would come from the introduction of intelligence embedded on the system, to handle complex scenarios. Intelligence on the web can be considered according to several different perspectives. In the complex and varied Web world, a noteworthy phenomenon is assuming a great importance: Web 2.0. According to the opinion of several experts, this phenomenon is not based only on technological innovations, but draws its peculiarity from social aspects, from the cooperation between users and the wide variety of web contents which are directly generated from users. This combination of technological and human factors could represent a valuable support also in the e-Inclusion field. However, at scientific level, the more advanced representation of intelligence on Web seems to be the Semantic Web, the revolution for the web proposed by Tim Berners-Lee almost 10 years ago [2]. If referred to e-Inclusion the concept seems to offer a number of features that not only can improve web accessibility and overcome several limitations which are present in today’s Web, but also can lead to the creation of new services which can be useful for a wide variety of users (see [3], [4]). An interesting aspect to note is that Semantic Web and Web 2.0 are not in contrast, but it has become clear to many experts (see [5], [6]) that they are natural complements of each other, because Semantic Web can give a formal representation of human meaning, while contents can be built and maintained using the techniques and data generation capabilities that are typical of Web 2.0. 2.1 Web 2.0: Social and Technological Aspects Although there is not full agreement on a definition, Web 2.0 (also called the wisdom Web, people-centric Web, participative Web, see [7], [8]) is perceived as a second phase in the Web’s evolution, in which web based services have the characteristic of
Intelligence on the Web and e-Inclusion
643
aiming to facilitate collaboration and sharing between users, letting them engage in an interactive and collaborative manner, giving more emphasis to social interaction and collective intelligence. It is a fact that the term does not refer to an update to Web technical specifications, but to changes in the way software developers use known web technologies and in the way end-users use the internet as a platform. A number of researches attribute to this phenomenon an element of intelligence (see [8], [9], [7]). Intelligence originates from interaction among users, when this interaction happens by means of the use of Internet, and differs from intelligence seen as a result of software routines implementing Artificial Intelligence procedures. In the literature, someone describe this aspect with the terms collected intelligence (i.e. the value of user contributions is in their being collected together), finding them more appropriate than collective intelligence (i.e. characterized by the emergence of truly new levels of understanding) (for example, see [10]). Accessibility Issues in Web 2.0. From the technological point of view, many (but not all) Web 2.0 applications are supported by a series of new generation web based technologies that have existed since the early days of the web, but are now used in such a way to exploit user-generated content, resource sharing and interactivity in a more sophisticated and powerful way (see [11]), giving rise to the so called Rich Internet Applications (RIA). Techniques such as AJAX have evolved that have the potential to improve the user-experience in browser-based applications. The main impact on accessibility comes from dynamic and incremental updates in web pages. From the one side, these may come unexpected to users, who may not notice that a part of the page has changed. From a second side it is to be noted that problems with asynchronous updates may be fatal for users relying on Assistive Technology (AT): in fact, updates can occur on a different area of the page than where the user is currently interacting and AT’s could fail notifying users that something on the page has changed. Issues concerning accessibility in RIA’s are being faced by the WAI-ARIA2 of the W3C, which has formulated a series of best practices for rich internet applications design. WAI-ARIA markup presents a solution to making these applications accessible. An interesting analysis of technologies to enable a more accessible Web 2.0, discussed in [12], points out that, basically, ARIA is built upon Semantic Web concepts in that it defines so called “live regions” that allow, adding semantic annotations to the HTML and XHTML markup in order to better define the role of user interface components. This feature can be used, for example, to enable assistive technologies to give an appropriate representation of user interfaces. For example a browser can interpret the additional semantic data and provide it to the assistive technology via the accessibility Application Programming Interface of the platform, which already implements mechanisms to describe user interface controls. Thiessen and Chen in [13] present a chat example that shows ARIA live regions in action and demonstrates several of their limitations. Web 2.0 perspectives for e-Inclusion. As a result of a first survey, it appears that the role of Web 2.0 with respect to e-Inclusion (considered from the perspective of its general definition) has not yet been object of interest in the scientific community. 2
See http://www.w3.org/WAI/intro/aria.php. Last visited on 2/27/2009.
644
L. Burzagli and F. Gabbanini
Up to now, Web 2.0 has almost exclusively been considered useful in particular fields of applications such as leisure, travel or e-commerce. The availability of a wide corpus of collective intelligence could give Web 2.0 an important role also in the field of e-Inclusion, where interaction among users has always been considered (see [12] as an example) to offer valuable support in helping people to overcome limitations (either physical or cultural), even if only a limited number of examples have been presented that follow this direction, yet. The new forms of social interaction and collective intelligence brought by Web 2.0 could enable new patterns of interpersonal communication for all users. Benefits could be substantial also for people with activity limitations. For example, through Web 2.0 sites motor impaired users could share their experiences about accessible accommodations and paths in towns: value to the application could increase as more users use it, putting their knowledge at the disposal of other users. An example of a service that exploits the capabilities of Web 2.0 and Semantic Web in a sub-domain of e-Inclusion, that is e-Accessibility, is described in Section 4. It is to be noted, moreover, that the interaction techniques that are typical of this Web 2.0 (although a number of problems related to their accessibility are emerging and are being handled by WAI) can represent a useful help for some group of users, especially for people with cognitive disabilities because of the possibility to implementing contextual help, tailoring interfaces to meet users’ experiences and because of the fact that techniques exist to learn users’ preferences (see [5]). 2.2 Semantic Web The World Wide Web is a container of knowledge as well as a mean to exchange information between peoples and let them communicate. As of today, even if pages are generated by CMS and stored in databases, they are presented in the form of pages written in mark-up languages such as HTML and XHTML. Thus, web contents are mostly structured to be readable by humans and not by machines. The aim of the Semantic Web initiative (originated by Tim Berners-Lee and now being developed within the World Wide Web Consortium3) is to represent web contents in a manner that is more easily processable by machines and to set up tools to take advantage of this representation. The initiative seeks to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. In this way, according to the Semantic Web vision, a person, or a machine, should be capable to start browsing information on the web and then move through a set of information sources which are connected not by wires and links, but by being “about the same thing” (see [14] for example). The Semantic Web is based on the concept of ontology: an ontology is used to describe formally a domain of discourse and consists of a list of terms and the relationships between these terms. Metadata, organized with ontologies, are used to identify information in web sources and logic is used for processing retrieved information and uncover knowledge and relationships found in ontologies. 3
See http://www.w3.org/2001/sw/. Last visited on 2/27/2009.
Intelligence on the Web and e-Inclusion
645
It is to be noted that the Semantic Web does not aim to exhibit a human-level intelligence, as the one envisioned by Artificial Intelligence (AI). Though it builds on the work carried out in AI, the Semantic Web stack (made up with RDF, RDFSchema, Ontology vocabularies, Rules, Logic and Proof) aims at building intelligent agents, in the sense that they are capable of uncovering unexpected relationships between concepts. Semantic Web Perspectives for e-Inclusion. Discussions on the role of Semantic Web and e-Inclusion are mainly focused on aspects related to e-Accessibility. It is highlighted by several authors that the Semantic Web is not an area that is very well explored for supporting Web accessibility (see [5], [15]). However, it is also generally acknowledged that developments connected to the Semantic Web can provide a valuable contribution to creating accessible content, especially if taken together with Web 2.0. One of the main characteristic of Semantic data is that it can be modeless: it is not already deliberately constructed as a page. Following this argument, the advent of the Semantic Web is often claimed to bring about substantial progress in Web accessibility and presentation on mobile devices (which are part of the e-Inclusion concept), as it facilitates providing alternatives for content and form for documents (see, e.g., [16]). Harper and Bechhofer, in [4], observe that semantic information built into general purpose Web pages could enable substantial improvements of accessibility of web pages. In fact, information is often rendered on web pages in an order defined by the designer and not in the order required by the user and web pages may bear implicit meaning that is connected to how to the information presented visually, while, for example, the same meaning cannot be interpreted by visually impaired persons that are forced to interact with systems in a serial manner. The availability of semantics would be of great value for Assistive Technology, which relies on semantic understanding to provide access to the user interfaces. Moreover, this could also provide benefits in that it could enable automatic reformulation and rearrangement of contents, based on metadata, in view of their fruition by people with different preferences and needs, in different contexts of use.
3 Convergence Between Web 2.0 and Semantic Web So far Web 2.0 and Semantic Web, which are considered two visions of the intelligence on the web, have been considered as completely different approaches to the future web. However, recently (see [6] for example), a number of authors have started to consider a possible convergence between them, merging their most successful aspects. From one side, the richness of Web 2.0 lies in its social dimension, and is characterized by an easy exchange of information in wide communities (social network) and in the collection of large amount of information, even if this is often unstructured. From the other side, the strength of Semantic Web is in its capability of interlinking and reusing structured information, but it needs data to be aggregated and recombined. In other words there is the need to merge human participation with well structured information.
646
L. Burzagli and F. Gabbanini
Fig. 1. The “collective knowledge system” general scheme
Two main solutions appear in literature. The first is related ([5]) to the creation of services and tool that are able to automatically analyze web pages and discover semantics, thus allowing structuring users generated content. A different solution (see [10]) is represented by systems that are defined by Gruber as “collective knowledge systems”, in which existing semantic structures are used to organize large amounts of knowledge generated by users (see Fig. 1). These systems are made up with a social network, supported by computing and communication technology, in which self-service problem solving discussions take place and where people pose problems and others reply with answers; a search engine to find questions and answers; users helping the system learning about which query/document pairs were effective at addressing their problems. According to Gruber [10], the role of Semantic Web is firstly seen in adding value to user data by adding structured data, related to the content of the user contributions in a form that enables more powerful computation. Secondly, Semantic Web technologies can enable data sharing and computation across independent, heterogeneous Social Web applications, whereas, up to now, these data are presently confined in a given application.
4 Existing Applications An existing example of the collective knowledge systems outlined in Section 3 is presented in the next section. The example is given by the IBM Social Accessibility Project. Though it regards eAccessibility, it is taken as a model to highlight potential benefits that could come from the exploitation of the convergence of Semantic Web and Web 2.0, in the wider domain of e-Inclusion.
Intelligence on the Web and e-Inclusion
647
4.1 The IBM Social Accessibility Project The Social Accessibility Project (see screenshot in Fig. 2) has been set up by IBM to improve Web accessibility by using the power of communities. It is a service whose goal is to make Web pages more accessible to people with disabilities, taking advantage from users' input and leveraging on the power of the open community while not changing any existing content. The system allows users encountering Web access problems to immediately report them to the Social Accessibility server. Volunteers (called supporters) can be quickly notified and can easily respond by creating and publishing the requested accessibility metadata, which will help other users who encounter the same problems. Specifically, supporters are able to discuss solutions among themselves through Web applications on the server and create a set of metadata to solve the problem; they then submit it to the server. When the user visits the page again, the page is automatically fixed and any user who installs a suitable software extension can access the accessible version of the page. This project delineates an interesting possible convergence between Web 2.0 and the Semantic Web because it takes advantage of a social network that discusses problems and tries to provide solutions in a collaborative manner. There is a potentially continuous interaction between users and supporters to discuss solutions and consider comments.
Fig. 2. Screenshot of the gust page of the IBM Social Accessibility Project
648
L. Burzagli and F. Gabbanini
Users can also create metadata: for example when a user finds an important position in a page, the position can be submitted as a “landmark” for other users. This process ends up with the creation of metadata that help identifying and overcoming access problems. More details on the service can be found at the address http://sa.watson.ibm.com/.
5 Conclusions The paper presents Semantic Web and Web 2.0 as two different approaches for the distribution of intelligence on the web. Their characteristics are presented and a summary of potential benefits coming from the application of Web 2.0 and Semantic Web vision to the e-Inclusion field is discussed. A class of applications (named “collective knowledge systems”), which in the future could serve as an example for exploiting the convergence of Semantic Web and Web 2.0 and which could have a positive impact on e-Inclusion is presented. An example referred to the improvement of accessibility of the Web is described. It is to be noted that the Web has certainly evolved from a collection of hypertext pages to an interactive ubiquitous information system that leverages the wisdom of many users to provide knowledge on virtually all fields. Web 2.0 has indeed had a relevant impact on the evolution of the Web, whereas the contribution of Semantic Web, though presently less visible, is expected to have an influence at longer terms. However, the evolution of Web and the uptake of Web 2.0 and Semantic Web seem to have had, up to now, a limited impact on e-Inclusion and they have been mostly studied in relation to the impact on a sub-domain of e-Inclusion, that is, eAccessibility. In fact, in the literature that was examined during the authors’ work, very few specific references of Web 2.0 to e-Inclusion were found, with the exception of a number of works dealing with problems that Web 2.0 technology can bring to the accessibility of web sites (which, indeed, is an aspect of e-Inclusion) or some minor applications built to provide accessible interfaces to some Web 2.0 applications like YouTube4 and SlideShare5. The IBM Social Accessibility Project, though focused again on accessibility, seems to represent a valid example of a new perspective in that it exemplifies a class of applications that take advantage of the power of Web 2.0 and Semantic Web and could have a positive impact on e-Inclusion. Acknowledgements. The contribution of Pier Luigi Emiliani in the development of the ideas presented in the paper is warmly acknowledged.
References 1. Kelly, K.: Four stages in the internet of things (November 2007), http://www.kk.org/thetechnium/archives/2007/11/ four_stages_in.php (last visited on 2/27/2009) 4 5
See http://icant.co.uk/easy-youtube/, last visited on 2/26/2009 See http://icant.co.uk/easy-slideshare/about/index.html, last visited on 2/26/2009
Intelligence on the Web and e-Inclusion
649
2. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 284(5), 34–43 (2001) 3. Kouroupetroglou, C., Salampasis, M., Manitsaris, A.: A semantic-web based framework for developing applications to improve accessibility in the WWW. In: W4A: Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A), pp. 98–108. ACM Press, New York (2006) 4. Harper, S., Bechhofer, S.: Semantic triage for increased web accessibility. IBM Systems Journal 44(3), 637–648 (2005) 5. Cooper, M.: Accessibility of emerging rich web technologies: Web 2.0 and the Semantic Web. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 93–98. ACM Press, New York (2007) 6. Heath, T., Motta, E.: Ease of interaction plus ease of integration: Combining web2.0 and the semantic web in a reviewing site. Web Semantics 6(1), 76–83 (2008) 7. Murugesan, S.: Understanding web 2.0. IT Professional 9(4), 34–41 (2007) 8. O’Reilly, T.: What is web 2.0? Design Patterns and Business Models for the Next Generation of Software (September 2005), http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/ what-is-web-20.html 9. Lin, K.J.: Building Web 2.0. Computer 40(5), 101–102 (2007) 10. Gruber, T.: Collective knowledge systems: Where the Social Web meets the Semantic Web. Journal of Web Semantics 6(1), 4–13 (2008) 11. Knights, M.: Web 2.0. Communications Engineer 5(1), 30–35 (2007) 12. Gibson, B.: Enabling an accessible Web 2.0. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 1–6. ACM Press, New York (2007) 13. Thiessen, P., Chen, C.: Ajax live regions: chat as a case example. In: W4A 2007: Proceedings of the 2007 international cross-disciplinary conference on Web accessibility (W4A), pp. 7–14. ACM Press, New York (2007) 14. Antoniou, G., Van Harmelen, F.: A Semantic Web Primer, 2nd edn. MIT Press, Cambridge (2008) 15. Yesilada, Y., Harper, S.: Web 2.0 and the semantic web: hindrance or opportunity? In: W4a - international cross-disciplinary conference on web accessibility 2007. SIGACCESS Accessibility and Computing, vol. (90), pp. 19–31 (2008) 16. Seeman, L.: The semantic web, web accessibility, and device independence. In: Proceedings of the 2004 international Cross-Disciplinary Workshop on Web Accessibility (W4A), vol. 63, pp. 67–73. ACM Press, New York (2004)
Accelerated Algorithm for Silhouette Fur Generation Based on GPU Gang Yang1,2 and Xin-yuan Huang1 2
1 Beijing Forestry University, 100083, Beijing, China Institute of Software, Chinese Academy of Sciences, 100080, Beijing, China {yanggang,hxy}@bjfu.edu.cn
Abstract. In the method that represents fur with multi-layer textured slices, representing silhouette fur is a time consuming work, which requires silhouetteedge detection and fin slices generation. In the paper, we present an accelerated method for representing silhouette fur by taking advantage of the programmable ability of Graphic Process Units (GPU). In the method, by appending edge info on each vertex, the silhouette-edge detection can be implemented in GPU; and by storing fin slices data in video memory in preprocessing, the time spent on fin slices generation and on data transmission from CPU to GPU can be saved. Experimental results show that our method accelerates silhouette fur representation greatly, and hence improves the performance of rendering furry objects. Keywords: fur rendering; GPU; silhouette fur; multi-layer textured slices.
1 Introduction Representation of realistic fur is a hot research topic in computer graphics, and it is also a challenging problem due to the high complexity of furry surface. The most direct method for representing fur is using geometric primitives to represent each fur fiber, such as the approaches presented in [1][2][3][4]. But it is difficult to achieve high rendering performance with these explicit geometric approaches because the number of furs over object surface is always very large. In 1989, Kajiya et al [5] put forward the method of representing fur with volume texture, and achieved excellent rendering results. However, the rendering speed of volume texture is very slow. Meyer et al [6] presented the idea of using multi-layer two-dimensional textured slices to represent the effect of three-dimensional volume textures, and achieved interactive rendering performance. Based on Meyer’s idea, Lengyel et al [7][8] presented a real-time fur rendering method. In the method, the furry surface is represented as a series of concentric, semi-transparent shells, and alpha-blending these shells together would produce the visual furry effect. Lengyel’s technique has great application value as it can represent the realistic fur results in real-time. Furthermore, Yang et al [9] improved Lengyel’s method by using non-uniform layered slices to represent fur, which improved the efficiency and flexibility. But the multi-layer slices method is not very appropriate for representing the fur near object silhouettes. Near the silhouettes, multi-layer slices are seen at grazing C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 650–657, 2009. © Springer-Verlag Berlin Heidelberg 2009
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
651
angles, hence the gaps between layers become evident and fur appears to be overly transparent. Lengyel added textured fins to overcome the overly transparent problem near silhouette regions. However, the detection of silhouette edges and the generation of fin geometry often consume a lot of time, affecting the rendering speed apparently. In the paper, we propose a GPU-based accelerated method for generating silhouette fins. In the method, by designing a certain kind of data structure for each vertex, the detection of silhouette edge and the generation of fins can be totally transferred into the GPU pipeline. Thanks to the strong computation capability of GPU, the silhouette fins generation are accelerated greatly, hence improved the fur rendering performance. In addition, with the method, the fin data transmission from CPU to GPU is also avoided, which improved the speed further. In the remainder of the paper, Section 2 first introduces the multi-layer slices method and its limitation; then, Section 3 discusses our GPU-based silhouette edge detection and fin generation approach in detail. The experimental results are given in Section 4, and summary is presented in Section 5.
2 Multi-layer Slices Method and Its Limitation Lengyel et al[8] and Yang et al [9] used multi-layer textured slices to represent fur. In their methods, the furry surface is represented as a series of concentric, textured shells. In preprocessing, they constructed a patch of geometry model of fur through particle systems, and sampled it into several layers of semi-transparent textures called shell textures. In the runtime rendering, multi-layer concentric shells are generated by shifting the original mesh outwards from the model surface, and each shell was mapped with corresponding layer of shell texture. Alpha-blending these textured shells from innermost to outmost would produce furry effects. Using the multi-layer slices method, the furry results can be produced in real-time. But the method is effective only when the viewing direction is approximately normal to the surface. Near silhouettes where the viewing direction is approximately parallel to the surface, the gaps between slice layers become evident, and the fur appears to be overly transparent. Lengyel’s method added textured quadrilateral slices called fins on silhouette edges to overcome the problem. These fin slices are normal to the surface, and mapped with the fin texture that is generated by sampling a tuft of fur. By adding these textured fin slices, the layer gaps are covered up, and good rendering results can be achieved. Nevertheless, the cost for generating fins is quite considerable. In order to generate fin slices, we must traverse and detect each edge on object surface, and only those edges that are detected lying in silhouette region will be equipped by fin slices. The silhouette edge detection and fin slices generation are processed in CPU, and these generated fin slices data must be transferred from CPU to GPU for rendering. The edge detection, fin generation and data transfer processes often occupy about 30% times of the whole rendering process of fur, and hence affect the performance evidently.
652
G. Yang and X.-y. Huang
3 GPU Based Silhouette fur Representation Object’s silhouette is the line on object’s surface that is tangent to current viewing direction. It is the division between object’s front face and back face. The silhouette region of object can be defined as the nearby region around silhouette line. In order to use fin slices, the silhouette edge detection will first be implement for each edge, and then the fin slices will be generated only on those edges that are detected lying in silhouette region. Silhouette edge detection is implemented by computing the dot production of viewing direction V and current edge’s normal N. If the absolute value of the dot production is less than certain threshold β, the current edge is considered to be lying in silhouette region. Viewing direction V is the vector directing from view point Eye to current edge’s midpoint Epos. Silhouette edge detection can be represented as formula: | V*•N* | < β, where V = Eye – Epos .
(1)
和
V* and N* represent the unit vector of V N respectively. The value of β is set by users. In our experiments, β is set as 0.2 for achieving satisfied rendering results. Every edge that is detected lying in silhouette region will be equipped with a quadrilateral fin slice. The fin slice is placed vertically on the edge, and slice height is the fur’s length on the edge. In Lengyel’s and Yang’s methods, the silhouette edge detection and fin slice generation are all implemented in CPU. In this paper, by adopting appropriate data transfer strategy, the silhouette edge detection and fin slice generation can be implemented in GPU’s vertex shader; in addition, we also utilize GPU’s pixel shader to implement the texture mapping and rendering computation of fin slices. Our method is discussed in detail as follows. 3.1 GPU Based Silhouette Edge Detection and Fin Generation In the last ten years, the GPU’s programming capability and its computing power have been improved greatly. GPU provides programming ability in its two models: vertex shader and pixel shader. In graphical processing pipeline, objects’ vertex data will first be send to vertex shader for processing, and people can design program in vertex shader to deal with these vertex data flexibly; then, the processed vertex data is rasterized to generate pixel data, and the pixel data are send to pixel shader for processing. Although GPU’s vertex shader has powerful ability in dealing with vertex data, we can’t directly use vertex shader to implement silhouette edge detection and fin generation for two reasons: (1) In order to generate fin slices, a series of new vertices must be generated for representing these new quadrilateral slices; but vertex shader still hasn’t the ability of generating new vertices. (2) The processing object of vertex shader is vertex, not edge, and there is no concept of “edge” in vertex shader, which means we can’t implement edge detection directly in vertex shader. To solve the two problems, we take the following strategy: generating fin slice for every edge in preprocess, and sending all the quadrilateral fin slices to vertex shader for processing when rendering. A sort of edge info is generated for each vertex of these quadrilateral
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
653
slices, and by using the edge info, vertex shader can judge whether the vertex belongs to a silhouette fin slice, that is, a fin slice located in the silhouette region. As show in figure 1, the fin slice on edge e is formed by four vertices V1, V2, V3, and V4. For each vertex, besides the basic vertex information like position, normal, and fin texture coordinates, we append two edge info: edge normal Enormal and edge midpoint Epos. Enormal can be calculated as the sum of two adjacent polygons’ normal. For an object with edgeNum edges, all its fin slices will involve edgeNum*4 vertices, and all these vertices’ data will be sent to vertex shader for processing when rendering. In vertex shader, according to Enormal and Epos, we can judge if current vertex belongs to a silhouette fin slice by computing formula (1). If the vertex is detected to be in a silhouette fin, it will be preserved; if not, the vertex will be discarded by moving it out of the viewing frustum. The range of projected viewing frustum in z-axis is [-1, 1], so the vertex can be discarded by setting its z coordinates out of [-1, 1], for example, setting the vertex coordinates as (0, 0, -2). By the way, all the fin slices that don’t lie in silhouette range will be moved out of viewing frustum and hence be clipped before rasterization.
Fig. 1. Fin slice and edge info. The quadrilateral v0v1v2v3 is the fin slice on edge v0v1, N1 and N2 are the normals of two adjacent triangles, Enormal = N1 + N2, Epos is the mid-point of edge v0v1. The Enormal and Epos will be included in the vertex data of v0, v1, v2 and v3.
Essentially, in this strategy, we just generate all the fin slices in advance, and use the vertex shader to discard all the non-silhouette fins. By this way, we slide over the problem (1) mentioned above; and by introducing the edge info to each fin vertex, we solve the problem (2). As mentioned before, the fin slices that generated in advance have edgeNum*4 vertices; each vertex’s data involve some basic vertex information and additional edge info. It will be time consuming if we transmit all the edgeNum*4 vertices’ data to GPU when rendering each frame. Fortunately, these vertices’ data are unchanged in each frame, so we can transmit these data to GPU just once and store them in video memory for using in each frame. We utilize GPU’s feature “Vertex Buffer Object” to do this. Consequently, the fin slice data transmission can be avoided completely during the real-time rendering process. Therefore, in our method, CPU needn’t do any computation about silhouette edge detection. We just generate all the fin slice data and store these data in video memory in advance; and GPU will complete all the remaining work when rendering, like silhouette edge detection and vertex discarding.
654
G. Yang and X.-y. Huang
Compared to the fin representation method in [8] and [9], our method has higher performance. As shown in table 1, our method can improve the performance in processing fins by 10-15 times. 3.2 Silhouette Fin Rendering After the silhouette edge detection and vertex discarding in vertex shader, all the fin slices that lie in silhouette region are preserved and then will be rendered. We use pixel shader to implement the rendering computation of fin. When rendering fin, besides mapping fin texture, a tangent texture that records the tangent vector of fur also need to be mapped for computing the lighting of fur. As shown in figure 2, the left is fin texture, and the right is the corresponding tangent texture in which each pixel’s value is the tangent vector of fur in the pixel. Fin texture and tangent texture are both generated by sampling a tuft of fur in preprocessing. The fin texture used in our experiments is 512*128. When mapping the fin texture on a fin slice, we needn’t map the whole strip of texture; alternatively, we can just randomly select a segment of the fin texture for mapping, and the width of the selected segment is relative to the width of current fin slice. It is noted that the two ends of the fin texture in figure 2 are consecutive and can be linked together seamlessly; thus the rendering results of silhouette fur can have better continuity.
Fig. 2. Fin texture (left) and the corresponding tangent texture (right). In the tangent texture, the R, G and B components of each pixel’s color record the tangent vector of fur on current pixel.
When rendering fin slices in pixel shader, we first fetch fur’s color from fin texture, and fetch fur’s tangent vector from tangent texture; then the lighting computation of fur is implemented according to the formula 2 that is adopted in [8]: FurLighting = ka + kd * Chair * (1-(T · L)2)pd/2 + ks * (1-(T · H)2)ps/2 .
(2)
In formula 2, T is the tangent vector of fur; L is lighting direction; H is the half-vector between the viewing and lighting direction. Ka, kd and ks are the ambient, diffuse and specular colors; pd and ps are the diffuse and specular power. To maintain the coherence of rendering effect when representing silhouette fur, the opacity of fin slices will be multiplied by an adjusting factorα. *· * , where V* is the viewing direction, and N* is the normal of current edge. Both V* and N* are unit vectors. When a fin slice approach the silhouette line,αwill approach 1; contrarily,αwill lessening. Therefore the fin slices will gradually fade out from the silhouette line to the fringe of the silhouette region, and hence alleviate the color discrepancy between fin slices and the multi-layer fur slices, producing the smoother rendering results. Figure 3 gives some of our rendering results.
α=1-|V N |
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
655
4 Experimental Results In our experiments, we compare the performance of our GPU-based method and the previous method. Three models are selected for experimenting (the rendering results of the three models are shown in Figure 3). We render each model in several different view points and compute its average performance for comparison. The experiments are processed on a PC installed with a P4 3.0GHz CPU, 1.0G Ram and a GeForce 6800GT graphics card with 256M video memory. Experimental data are listed in Table 1.
Fig. 3. Rendering results. In each row, the left is the fur rendering result without adding fin, in which the fur is overly transparent in silhouette region, and you can even see the model’s silhouette border through the fur; the middle is the result of only rendering silhouette fin; the right is the final rendering result by rendering fur and fin together.
656
G. Yang and X.-y. Huang Table 1. Comparison results
Model Polygon number / edge number Processing time of fin with our method (ms) Processing time of fin with previous method Improvement ratio of rendering fur Memory size of fin slice data (KB)
Torus 576 / 864 0.083 1.204 17% 148.5
Bunny 3065 / 4599 0.243 2.964 22% 790.5
Camel 4072 / 6112 0.296 3.525 24% 1050.5
The “processing time of fin” listed in the Row three and Row four of Table 1 is the total time for processing fin in each frame, including the time of silhouette edge detection, fin data transmission from CPU to GPU and fin rendering. It can be seen from the data that our method is faster 10-15 times than the previous method in processing fin. Row five of Table 1 gives the improvement ratio of the whole rendering process of fur by using our method. The improvement ratio is computed as: (FPSnew – FPSold) / FPSold, where FPSnew is the rendering rate by using our method to process fin, and FPSold is the rendering rate by using previous method. Comparing to the previous method, the only expense of our method is the larger video memory occupation. In our method, the data of all the fin slices will be stored in video memory, and in previous method, only the data of fin slices lying in the silhouette region need to be stored. Suppose the edge number of model is edgeNum, and then all the fin slices have edgeNum*4 vertices. For each vertex, the data that need to be stored in video memory include its position coordinates, fin texture coordinates, edge normal and the coordinates of edge midpoint. The coordinates of position or normal require three floating point data to store, and the texture coordinates require two to store. Therefore, the total memory requirement is edgeNum*4*(3*3+2) floating point data = edgeNum*44*4 bytes. The Row six of Table 1 gives the video memory occupation of fin data in our method. Relative to current video memory space, such a memory occupation can be received.
5 Conclusion This paper presents a GPU-based acceleration method for rendering silhouette fur. In the method, by producing edge info for each vertex, the silhouette edge detection can be implemented in GPU; by generating and storing all the fin slice data in video memory in preprocessing, the problem that vertex shader can’t be used to generate new vertices is slide over, and the time spent in fin data transmission from CPU to GPU is also saved. The method accelerates the silhouette fin detection and generation greatly, and improves the wholly performance of rendering furry objects. Besides used in silhouette fur rendering, silhouette edge detection is widely used in many other research issues of computer graphics, for example some rendering methods of NPR (non-photorealistic rendering). Our GPU-based silhouette edge detection approach can certainly be adopted in these issues for accelerating.
Accelerated Algorithm for Silhouette Fur Generation Based on GPU
657
Acknowledgments. This work was supported by the open grant of LCS, ISCAS (no. SYSKF0705), Natural Science Foundation of China (no. 60703006), and National HiTech 863 Program of China (no. 2006AA10Z232).
References 1. Miller, G.: From Wire-Frame to Furry Animals. In: Proceedings of Graphics Interface 1988, pp. 138–146. Lawrence Erlbaum Associates, Mahwah (1988) 2. LeBlanc, A., Turner, A., Thalmann, D.: Rendering hair using pixel blending and shadow buffers. J. Vis. Comput. Animat. 2(1), 92–97 (1991) 3. Watanabe, Y., Suenega, Y.: A trigonal prism-based method for hair image generation. IEEE Computer Graphics and Application 12(1), 47–53 (1992) 4. Berney, J., Redd, J.: Stuart Little: A Tale of Fur, Costumes, Performance, and Integration: Breathing Real Life Into a Digital Character. SIGGRAPH 2000 Course Note #14 (2000) 5. Kajiya, J.T., Kay, T.L.: Rendering Fur with Three Dimensional Textures. In: Computer Graphics Proceedings, ACM SIGGRAPH. Annual Conference Series, pp. 271–280. ACM Press, New York (1989) 6. Meyer, A., Neyret, F.: Interactive Volumetric Textures. In: Proceedings of Eurographics Workshop on Rendering 1998, pp. 157–168. Springer, Heidelberg (1998) 7. Lengyel, J.: Real-time fur. In: Proceedings of Eurographics Workshop on Rendering 2000, pp. 243–256. Springer, Vienna (2000) 8. Lengyel, J., Praun, E., Finkelstein, A., et al.: Real-Time Fur over Arbitrary Surfaces. In: Proceedings of ACM 2001 Symposium on Interactive 3D Graphics, pp. 227–232. ACM Press, New York (2001) 9. Yang, G., Sun, H.Q., Wang, W.C., Wu, E.H.: Interactive Fur Shaping and Rendering Using Non-Uniform Layered Textures. IEEE Computer Graphics and Applications 28(4), 85–93 (2008)
An Ortho-Rectification Method for Space-Borne SAR Image with Imaging Equation Xufei Gao, Xinyu Chen, and Ping Guo Image Processing and Pattern Recognition Laboratory, Beijing Normal University, 100875 Beijing, China
[email protected],
[email protected],
[email protected] Abstract. An ortho-rectification scheme for space-borne Synthetic Aperture Radar (SAR) image is investigated in this paper. It was usually achieved by indirect mapping between real SAR image pixels and the Digital Evaluation Model (DEM) grids. However, the precise orbit data cannot be easily obtained and using the Newton algorithm needs more calculation. In order to reduce the time consumed during iteration and further improving the accuracy of the SAR image, we propose a new ortho-rectification method with imaging equation. It removes the coordinate conversion by uniformly using the World Geodetic System 1984 (WGS-84). Moreover, the initial time of each DEM grid can be set according to the iteration result of its adjacent point. Compared to other methods, such as Collinearity Equation method, it costs less time and makes the SAR image more accurate. It is also much easier to be implemented in practice. Keywords: Ortho-rectification; Synthetic Aperture Radar; Digital Elevation Model; Imaging Equation.
1 Introduction The Synthetic Aperture Radar (SAR) system is becoming a more sophisticated and effective tool for continuously monitoring changes in various Earth’s surface processes with little dependencies [1][9][11][12]. Comparing with airborne SAR, space-borne SAR plays a significant role in remote sensing and microwave imaging, exploring ground from space, which is mainly used in the field of military [2][3][19]. The SAR digital image processing consists of two parts, one of which is to rectify distortion caused by altitude variations, velocity variations, altitude errors of platform and rotation of the earth and so on. Many SAR image rectification methods have been proposed, such as orthorectification of Radarsat fine using orbital parameters and DEM, SAR geometry and practical ortho-rectification of remote sensing images [5][10][13]. Other approaches emphasize interpolation of satellite orbit vector on the whole arc, calculation of corresponding pixel position of every three-dimension DEM point either, or both [16][18][20]. In these cases, however, the given orbital data are not always precise, and the SAR image is not necessarily rectified by coordinates’ conversion of inertial system if parameters of satellite are provided in Geodetic Coordinate System (GCS). C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 658–666, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
659
It is even not appropriate to set the initial iteration time of each grid as zero, and using the Newton algorithm for DEM points needs more calculation as described in [20]. In this paper, we investigate an ortho-rectification scheme of space-borne SAR briefly and propose a new method with imaging equation. Experiments have been conducted on a SAR image of mountain area to verify the effectiveness of the proposed method. The paper is organized as follows: Section 2 shows the related work, the proposed method is discussed in Section 3, Section 4 presents experiment results, and Section 5 comes to conclusions.
Sensor path
P2 SAR
P1
Azimuth
Squint angle
R0 R
Nadir
R
Plane of zero Doppler X Ground range (after processing to zero Doppler)
Slant range (before processing) Slant range (after processing to zero Doppler) Target Beam footprint
Fig. 1. A simple model of SAR system with side-looking mode (from ref. [8])
2 Related Work The SAR adopts side-looking imaging mode, and the side-looking angle of SAR image is much larger than that of optical image as shown in Fig. 1. This mode leads to a great influence to geometric distortion of SAR image. Consequently, it is very important for SAR application to rectify such distortion and create ortho-image [4]. The Range-Doppler method is commonly used in ortho-rectification, which primarily discusses the relationship of image points and target points from the view of SAR imaging geometry and mostly lies on the accuracy of fundamental catalogue data. Based upon polynomial function, polynomial rectification is a comparatively traditional method, which assumes general distortion of remote sensing image as combination of several of basic and high distortion. As a result, it cannot reach the same satisfied results for different areas in SAR image.
660
X. Gao, X. Chen, and P. Guo
Collinearity equation method, considering the change of sensor’s exterior azimuthal elements firstly and that of terrain later, works with the simulation and calculation of sensor’s position and attitude and does not take fully SAR image side-looking projective characteristics into account. Moreover, it needs elevation information of Ground Control Points (GCP). Because of its mathematical properties, this method can be achieved by using Range-Doppler equations as well as highprecision orbit data. In fact, there are still two problems to be dealt with: one is the acquisition of precise orbit data, which has become a bottleneck of SAR image rectification; another is time consumption of algorithm due to Newton iteration. In order to overcome these two problems, we adopt the WGS-84, as shown in Fig. 2, in cooperating with the small time error between two DEM grid points.
Z WGS-84 Earth’s rotation BIH (1984.0) zero meridian plane Geocenter O Y WGS-84 CTP equator X WGS-84 Fig. 2. Conventional terrestrial system (from ref. [15])
3 Method 3.1 Problem Description Imaging equation refers to the mathematical relationship of coordinates between SAR image (i, j) and ground points (φ, λ, h) on which remote sensing image rectification for any type of sensors is based [17]. From one side, the inertial system is a space fixed coordinate system, in which most of ephemeris parameters are set up. From another side, the Earth's rotation is one of the influences to geometric distortion, especially for space-borne SAR images. As ground points are represented in GCS, and both vectors of position RT and velocity VT of the DEM grid do not change with time. Therefore, the coordinate conversion from inertial system to geodetic system is no longer required and thus more parameters are involved in ephemeris files, such as the number of orbit state vector and the vector itself increasing with time, including position RS and velocity VS.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
661
The goal of ortho-rectification is to find azimuth and range direction position of SAR image parallel to latitude and longitude of the grid point and in turn to assign power value of the SAR image pixel to the point. The former is calculated through distance from point position in SAR image to the SAR. The latter can be obtained by irradiation time from the SAR phase centers to the range-line of the point [7]. In spite of the difficulties to get imaging equation of (i, j) and (φ, λ, h) and its explicit solution, a nonlinear one with regard to irradiation time is established between standard Doppler frequency and that of (i, j). For solving this equation, we take the iteration result of adjacent point enlightened by [20] as the initial time of next DEM grid. Details are discussed as follows. 3.2 Procedures Fig. 3 shows the procedure of our proposed ortho-rectification method, which is summarized as the following steps: Step 1: Calculation of RS and VS. A formula describing them on the whole orbit arc in the WGS-84 is formed with a number of discrete points on arc segments so as to calculate their values at time t given later [6][20]. Step 2: Determination of RT and VT. Given a selected ellipsoid, the values of RT and VT of each (φ, λ, h) from DEM database are determined through the conversion from Referential Ellipse's Earth Core Geodetic Datum to the WGS-84. Step 3: Evaluation of the relative iteration time based on UTC of the image’s start line. The iteration is started at t = 0 if the grid is a starting point; otherwise, t equals to the time when iteration of its adjacent point is ended. That is,
Is a starting point ⎧ 0, . t=⎨ otherwise ⎩t + σ t ,
(1)
The σt in Equation (1) is defined by
σt = −
f d − f de , f dr
(2)
where fd, fde and fd are expressed as the Doppler frequency, Doppler centroid frequency and Doppler frequency rate, respectively [14]. The iteration of current point terminates when |σt| is less than a predefined threshold, or t is turned to calculate RS, VS, RT, and VT. It should be pointed out that the threshold is determined by
ρ VS σ t < a , 10
(3)
where ρa denotes the azimuth resolution of satellite, indicating that the error of ρa resulting from σt should be less than one-tenth itself. Step 4: Calculation of the azimuth and range direction of SAR image. The position of azimuth and range direction of (i, j) is calculated by
R = RS − RT
,
(4)
662
X. Gao, X. Chen, and P. Guo
and
⎧ i = t * PRF / N , ⎨ ⎩ j = (R − R 0 ) / ρr
(5)
R is the distance from satellite to ground point, PRF is pulse repetition frequency, N is the number of sampling points of whole image’s azimuth direction, R0 is the first slant range, and ρr is the slant range resolution. Step 5: Traversal of all grid points. The program is not finished until all DEM grids are processed.
(ij, Ȝ, h) in DEM
Formula of RT and VT Satellite orbit data
Is Starting point Y
N
Formula of RS and VS
t=0
fd, fde and fdr
RS and VS at t N
ıt = (fd –fde) / fdr
VSıt < ȡa /10
t = t + ıt
Y Real SAR image
᧤i, j᧥
Gray Interpolation
Rectified SAR image
Fig. 3. Flowchart of otho-rectification for space-borne SAR image with the proposed method. The shadowed rectangles represent given data and the dotted lines mean an iterative process.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
(a)
(b)
(c)
(d)
663
Fig. 4. An image used in the experiment and its results: (a) the original space-borne SAR image; (b) the rectified image by the proposed method; (c) the lower left-hand corner of the original image; and (d) the rectified image of (c).
4 Experiments To verify the feasibility and effectiveness of the proposed method, a space-borne SAR image with size of 14184×11384 and resolution of 10m is tested, which displays a mountain area of longitude 140.12°E and latitude 38.25°N, and it is from the descending-pass data of satellite. The evaluation rang is from -96~3581m, and the precision of data from the DEM database is 90m×90m. 4.1 Performances Analysis Intuitively, the normal ratio relation between front and back of slope of the SAR image is resumed and it is particularly obvious in biggish region shown in Fig. 4. The calculation is decreased dramatically as a result of removal of coordinates and evaluation of iteration time. By comparison, the proposed one saves about four-fifth of time, as listed in Table 1.
664
X. Gao, X. Chen, and P. Guo
It is also encouraging that the accuracy of rectified image is slightly improved with the orbital data computed in Step 1. Given n being the number of GCP, the Root Mean Square (RMS) error of all points is defined as
RMS =
1 n
n
∑ ( XR
2
i
+ YRi ) , 2
(6)
i =1
(X, Y) is the coordinate, XRi and YRi are the residual of X and Y, respectively. Studies have demonstrated that the more GCP are selected, the better the result is. And when the number of GCP reaches a certain value, it does not work any more. After 12 GCP and quadratic polynomial model are employed, the general RMS error is controlled within 0.6 pixels, which guarantees the image’s quality. Table 1. Comparison of time consumed during rectification. (The unit of time is minute.) Round
Range Doppler
Polynomial Function
Collinearity Equation
Proposed Method
1 2 3 4 5 6
18.45 17.69 19.27 17.83 18.15 16.92
18.49 19.26 20.11 20.37 19.76 18.94
20.94 21.07 20.73 21.75 21.36 22.08
4.87 4.96 5.01 4.73 4.39 4.15
Average
18.05
19.49
21.32
4.68
4.2 Program Discussion
The program written by C++ is executed on a PC with CPU of Pentium (R) IV 3.00 GHz and 448MB RAM. Due to the large size of data files, memory status becomes a very important matter. To keep the computational complexity as low as possible, the whole DEM data is partitioned in terms of their relative independence between each two grids. In addition, multi-threaded techniques are used and the execution time is somewhat reduced when run on server. Note that the number of threads can be set optionally with reference to both PC performance and the number of DEM partition from a configuration file.
5 Conclusions In this paper, an ortho-rectification method for space-born SAR is proposed, which mainly focuses on uniformly using the WGS-84 and flexibly choosing starting iteration time together. Experiments show that the new method needs less calculation time, results in desirable accuracy of image, and is much easier to be implemented in practice. However, the SAR image with higher quality needs to be obtained if the inversion or reconstruction of satellite orbit is precisely carried out. This issue will be studied in the future.
An Ortho-Rectification Method for Space-borne SAR Image with Imaging Equation
665
Acknowledgements. The research work described in this paper was fully supported by the grants from the National Natural Science Foundation of China (Project Nos. 60675011, 90820010).
References 1. Dierking, W., Dall, J.: Sea Ice Deformation State from Synthetic Aperture Radar Imagery Part II: Effects of Spatial Resolution and Noise Level. IEEE Trans. on Geoscience and Remote Sensing, 2197–2207 (2008) 2. Evans, D., Apel, J., Arvidson, R., Bindschadler, R., Carsey, F.: Space-borne Synthetic Aperture Radar: Current Status and Future Directions. NASA Technical Memorandum 4679, p. 171 (1995) 3. Fornaro, G., Lombardini, F., Pardini, M., Serafino, F., Soldovieri, F., Costantini, M.: Space-borne Multi-dimensional SAR Imaging: Current Status and Perspectives. In: IEEE International Geoscience and Remote Sensing Symposium, pp. 5277–5280 (2008) 4. Huang, G., Guo, J., Lv, J., Xiao, Z., Zhao, Z., Qiu, C.: Algorithms and Experiment on SAR Image Ortho-rectification Based on Polynomial rectification and Height Displacement Correction. In: Proceedings of the 20th ISPRS Congress, Istanbul, vol. 35, pp. 139–143 (2004) 5. Ibrahim, M., Ahmad, S.: Ortho-rectification of Stereo Spot Panchromatic and Radarsat Fine Mode Data using Orbital Parameters and Digital Elevation Model. In: Proceedings of GISdevelopment, ACRS 2000, Digital Photogrammetry (2000) 6. Li, J.: Orbit Determination of Spacecraft. National Defense Industry Press, Beijing (2003) (in Chinese) 7. Liu, L., Zhou, Y.: A Geometric Correction Technique for Space-borne SAR Image on System Level. J. Radar Science and Technology 2, 20–24 (2004) (in Chinese) 8. Dastgir, N.: Processing SAR Data using Range Doppler and Chirp Scaling Algorithms. Master’s of Science Thesis in Geodesy Report No. 3096, TRITA-GIT 07-005 (2007) 9. Nakamura, K., Wakabayashi, H., Doi, K., Shibuya, K.: Ice Flow Estimation of Shirase Glacier by Using JERS-1/SAR Image Correlation. In: IEEE International Geoscience and Remote Sensing Symposium, pp. 4213–4216 (2007) 10. Pierce, L., Kellndorfer, J., Ulaby, F., Norikane, L.: Practical SAR Ortho-rectification. In: Geoscience and Remote Sensing Symposium, IGARSS 1996, vol. 4, pp. 2329–2331 (1996) 11. Sandia National Laboratories Synthetic Aperture Radar Homepage, http://www.sandia.gov/radar/sar.html 12. Shi, L., Ivanov, A.Y., He, M., Zhao, C.: Oil Spill Mapping in the Western Part of the East China Sea Using Synthetic Aperture Radar Imagery. J. Remote Sensing 29, 6315–6329 (2008) 13. Toutin, T.: Geometric Processing of Remote Sensing Images: Models, Algorithms and Methods. J. Remote Sensing 25, 1893–1924 (2004) 14. Wen, Z., Zhou, Y., Chen, J.: Accurate Method to Calculate Space-borne SAR Doppler Parameter. J. Beijing University of Aeronautics and Astronautics 32, 1418–1421 (2006) (in Chinese) 15. Wikimedia Commons WGS-84.svg file, http://commons.wikimedia.org/wiki/File:WGS-84.svg
666
X. Gao, X. Chen, and P. Guo
16. Xu, L., Yang, W., Pu, G.: Ortho-rectification of Satellite SAR Image in Mountain Area by DEM. J. Computing Techniques for Geophysical and Geochemical Exploration 26, 145– 148 (2004) (in Chinese) 17. Zhang, Y.: Remote Sensing Image Information System. The Science Press, Beijing (2000) (in Chinese) 18. Zhang, P., Huang, J., Guo, C., Xu, J.: The Disposing Method of DEM for the Simulation Imaging of SAR. J. Projectiles, Rockets, Missiles and Guidance 27, 347–350 (2007) (in Chinese) 19. Zhang, S., Long, T., Zeng, T., Ding, Z.: Space-borne Synthetic Aperture Radar Received Data Simulation Based on Airborne SAR Image Data. J. Advances in Space Research 41, 1818–1821 (2008) 20. Zhang, Y., Lin, Z., Zhang, J., Gan, M.: Geometric Rectification of SAR Image. J. Acta Geodaetica et Cartographica Sinica 31, 134–138 (2002) (in Chinese)
Robust Active Appearance Model Based Upon Multi-linear Analysis against Illumination Variation Gyeong-Sic Jo, Hyeon-Joon Moon, and Yong-Guk Kim School of Computer Engineering, Sejong University, Seoul, Korea
[email protected] Abstract. Independent Active Appearance Model (AAM) has been widely used in face recognition, facial expression recognition, and iris recognition because of its good performance. It can also be used in real-time system application since its fitting speed is very fast. When the difference between the input image and the base appearance of AAM is small, the fitting is smooth. However, when the difference can be large because of illumination and/or pose variation in the input image, the fitting result is unsatisfactory. In this paper, we propose a robust AAM using multi-linear analysis, which can make an Eigen-mode within the tensor algebra framework. The Eigen-mode can represent the principal axes of variation across the order of tensor and it can apply to AAM for increasing robustness. In order to construct both of original AAM and the present AAM, we employ YALE data base, which consists of 10 subjects, 9 poses, and 64 Illumination variations. The advantage of YALE data base is that we can use the coordinate of landmarks, which are marked for train-set, with ground truth. Because when the subject and the pose were same, the location of face isalso same. We present how we construct the AAM and results show that the proposed AAM outperforms the original AAM. Keywords: AAM, YALE data base, Multi-linear Analysis, Eigen-mode, Tensor.
1 Introduction The Active Appearance Model (AAM) is a non-linear, generative, and parametric model for the certain visual phenomenon. And it is used for face modeling frequently as well as for other object modeling. The AAM is proposed in [1] firstly, and then improved in [2], which model shape and appearance separately. The AAM is computed by train-set, which consists of pair of images and land marks, which is marked manually by hand. Generally the AAM fitting has performed successfully when error rate between input image and base appearance of the AAM is low. However, when error rate becomes high when illumination and 3D pose are changing, and its fitting result is unsatisfactory. In this paper, we propose a new AAM which contains Eigen-mode based upon multi-linear analysis. The multi-linear analysis is extension of Singular Value Decomposition(SVD) or PCA, and offers a unifying mathematical framework C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 667–673, 2009. © Springer-Verlag Berlin Heidelberg 2009
668
G.-S. Jo, H.-J. Moon, and Y.-G. Kim
suitable for addressing a variety of computer vision problems. The multi-linear analysis builds subspaces of orders of the tensor and a core tensor. The advantage of multilinear analysis is that the core tensor can transform subspace into Eigen-mode, which represent the principal axes of variation across the various mode (people, pose, illumination, and etc)[9]. In contrast, PCA basis vectors represent only the principal axes of variation across images. In other words, Eigen-mode covers the variation of each mode but PCA vectors cover all variations. We can build the AAM which includes not only the principal axes of variation across images but also variation across the various modes. To include the variation across the various modes, the AAM can contain the variations for several modes. This paper is organized as follow. Section 2 and 3 explain AAM and multi-linear analysis. Then, in section 4, we describe the method how to apply multi-linear analysis to AAM. Finally, we are going to show our experimental results and summarize our work in sections 5.
2 Independent Active Appearance Models Independent AAM models the shape and appearance separately [2]. The Shape of AAMs is defined by a mesh located at a particular the vertex location. Because AAM allows a linear shape variation, we can define the shape as follow: .
(1)
indicate the shape parameters. indicates a In equation (1), the coefficients base shape, and represent shape vectors. The shape vectors can be obtained by applying PCA to train-set after using Procrustes analysis in order to normalize the landmarks. The appearance of AAMs is defined within the base shape . This mean that pixels in image lie inside the base shape . AAMs allow a linear appearance variation. Therefore we can define the appearance as follow: .
(2)
Where λi indicate the appearance parameters, represent the appearance vectors, and is a base appearance. After finding both the shape parameters and the appearance parameters, the AAMs instance can be generated by locating each pixel of appearance to the inner side of the current shape with piecewise affine warp. A model instance can be expressed as equation (3): ;
A
The parameters of both shape and appearance are obtained by a fitting algorithm.
(3)
Robust Active Appearance Model Based Upon Multi-linear Analysis
669
Fig. 1. Unfolding a 3rd – order tensor of dimension 3ⅹ4ⅹ5
3 Multi-linear Analysis 3.1 Tensor Algebra Multi-linear analysis is based on higher-order tensor. The tensor, well-known as nway array or multidimensional matrix or n-mode matrix, is a higher order generalization of a vector and matrix. A higher order tensor N is could be given by … … . Therefore the order of vector, matrix, and tensor is 1st, nd th 2 , and N , respectively. In order to manipulate the tensor easily, we should unfold , ,… ,… the tensor to matrix A by stacking its mode-n vectors to column of the matrix. Figure.1 shows the unfolding process. … by a matrix The mode-n product of a higher order tensor … … M is a tensor , which can be denoted by M, and its entries are computed by M
…
…
∑
…
…
.
(4)
This mode-n product of tensor and matrix can be represented in terms of unfolded matrices, B
MA
.
(5)
3.2 Tensor Decomposing In order to decompose the tensor, we employee Higher Order Singular Value Decomposition (HOSVD). HOSVD is an extension of SVD that expresses the tensor as the mode-n product of N-orthogonal spaces
670
G.-S. Jo, H.-J. Moon, and Y.-G. Kim
U
…
U …
U .
(6)
In equation (6), U are mode matrix that contains the orthonormal vectors spanning the column space of the matrix D which is result of unfolding tensor . Tensor , called the core tensor, is analogous to the diagonal singular value matrix of conventional matrix SVD, but it is does not have a diagonal structure. The core tensor is in general a full tensor. The core tensor governs the interaction between the mode matrix U , where is 1, 2, …, N. Procedure of tensor decomposition using HOSVD can be expressed as follows • •
and set up matrix U with left
Compute the SVD of unfolded matrix D singular matrix of SVD. Solve for the core tensor as follows U
…
U …
T .
(7)
4 Applying Multi-linear Analysis to AAMs In section 2 and 3, we described AAMs and multi-linear analysis. Now, we explain how to apply multi-linear analysis to AAMs. Since, in independent AAMs, the appearance vectors of AAMs influence the fitting result poorly and the shape of AAMs is not influenced by changing illumination, we consider identify and pose for AAMs. To build AAMs using multi-linear analysis, we construct a third-order tensor to represent identity, poses, and features. Using HOSVD, we can decompose the tensor into three factors as follows U
U
U ,
(8)
where is the core tensor that governs the interaction among the three mode matrices(U , U , and U ). Using core tensor and mode matrix U , we can build eigen-mode as U .
(9)
Since the AAM add Eigen-mode, we rewrite equation (1) as follow: ,
(10)
where are parameters of Eigen-mode. The advantage of the AAMs based upon Eigen-mode is that the shape is stable under higher error rate between base appearance and input image, which can be happened by changing illumination and pose because each Eigen-mode considers only each mode, not all train-set. Figure 2 compares the fitting results between the present AAM and the traditional AAM.
Robust Active Appearance Model Based Upon Multi-linear Analysis
671
Fig. 2. The fitting results of the present AAM(bottom) and traditional AAM(top).
In Figure 2, the shape of traditional AAM(top) is not able to cover the darker region with the face. On the other hand, the shape of the present AAM(bottom) is covering the darker region very well with the face.
5 Experiments and Evaluation We employ YALE face data base B[8], which is consisted of 10 subjects, 9 poses, and 64 Illuminations, for AAM training and experiment. In YALE face data base, when the subject and the pose are the same, the location of face is also same although there is changing illumination. This property allows that we use landmarks, marked for train-set, with ground truth, because the coordinates of landmarks is not changed in a category which is the same for the subject and the pose. In order to build both of AAMs, we have constructed train-set which consists of images in 9 subjects, 9 poses, and 1 illumination and meshes made by marking 64 landmarks on each image. Ground truth was established by meshes for train-set and images have deferent subject and pose. Experiments were divided into two evaluations: one was a test about how speedily each AAMs ran fitting algorithm, and another was an evaluation about how correctly each model fitted for image. 5.1 Efficiency Comparison The fitting speed of AAMs is important for applying AAM to real-time system. We have compared the fitting speed of both of AAMs, which is performed based on Quad core computer with CPU 2.4GHz and RAM 2GB. The fitting algorithm was run for 5 iterations. We measured the spent time for running fitting algorithm per iteration and all iteration. The traditional AAM used 11 parameters (4 global transform parameters and 7 local transform parameters), and our AAM used 18 parameters (4 global transform parameters, 7 local transform parameters, and 7 mode transform parameters).
672
G.-S. Jo, H.-J. Moon, and Y.-G. Kim Table 1. the speed of fitting algorithm for both AAMs 1
2
3
4
5
Avg.
Traditional AAM
7ms
6ms
6ms
6ms
7ms
6.4ms
Present AAM
7ms
7ms
7ms
7ms
7ms
7ms
Table 1 illustrates that the elapsed times are similar, although our AAM used more parameters than traditional AAM. 5.2 Robustness Experiments We have evaluated about how our AAM correctly fits for images under higher error rate between base appearance and input image. Our evaluation procedure can be expressed as follows: • Dividing images into 4 categories. Each category consists of images which have the average pixel errors 40~49, 50~59, 60~69, and 70~79, respectively. • Fitting for images, and then we measure coordinate errors between the ground truth and the fitted shape, per iteration.
Fig. 3. Fitting error of both of AAMs. Each graph represents shape error per iteration under pixels error 40~49(top left), 50~59(top right), 60~69(bottom left), and 70~79(bottom right).
Robust Active Appearance Model Based Upon Multi-linear Analysis
673
We employed L1 norm for measuring coordinate errors. In Figure 3, each graph represents the fitting error per iteration for categories. When average pixels error is increasing, the fitting error of traditional AAMs is also increasing, but our AAM is not increasing the fitting error.
6 Conclusion In this paper, we proposed a AAM based upon Eigen-mode. In order to establish that AAM, we have built the Eigen-mode using multi-linear analysis, that employs HOSVD to decompose the tensor. We have shown that the present AAM has ability to fit for image speedily, even though parameters are increased. And it can fit for image under higher error rate. Since the present AAM is fast in fitting diverse images, it could be applied to any real-time systems. We plan to apply out AAM to real-time system to recognize face and facial expression tasks.
Acknowledgement This work was supported by the Seoul R&BD Program (10581).
References 1. Edwards, G.J., Taylor, C.J., Cootes, T.F.: Interpreting Face Images Using Active Appearance Models. In: Proc. International Conference on Automatic Face and Gesture Recognition, pp. 300–305 (June 1998) 2. Matthews, I., Baker, S.: Active Appearance Models revisited. International Journal of Computer Vision, 135–164 (2004) 3. Cootes, T., Edwards, G., Taylor, C.: Active appearance Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6) (2001) 4. Gross, R., Matthews, I., Baker, S.: Constructing and fitting active appearance models with occlusion. In: Proceedings of the IEEE Workshop on face processing in Video (June 2004) 5. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Congitive Neuroscienc 3(1), 71–86 (1991) 6. De Lathauwer, L., De Moor, B., Vandewalle, J.: A Multilinear Singular Value Decomposition. SIAM Journal of Matrix Analysis and Applications 21(4) (2000) 7. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear analysis of image ensembles: TensorFaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002) 8. Georghiades, A.S., Belhumeur, P.N., Kriegman, K.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs Taro Kanno, Atsushi Watanabe, and Kazuo Furuta 7-3-1 Hongo, Bunkyo-ku, Tokyo,113-8656, Japan {kanno,watanabe,furuta}@sys.t.u-tokyo.ac.jp
Abstract. This paper presents the modeling and simulation of human-human interaction based on a concept of mutual beliefs, aiming to describe and investigate the cognitive mechanism behind human interactions that is a crucial factor for system design and assessment. The proposed model captures four important aspects of human interactions: beliefs structure, mental states and cognitive components, cognitive and belief inference processes, and metacognitive manipulations. This model was implemented with a Bayesian belief network and some test simulations were carried out. Results showed that some basic qualitative characteristics of human interactions as well as the effectiveness of mutual beliefs could be well simulated. The paper concludes by discussing the possibility of the application of this model and simulation to universal access and HCI design and assessment. Keywords: Human Modeling, Team Cognition, Interaction, Sharedness, Mutual Beliefs, Agent Simulation, Design and Assessment.
1 Introduction Although receiving relatively little attention, one of the important issues in the studies of human-computer interaction and universal access is that of cognitive factors specific to the interaction among plural persons through computers and IT systems as well as that of a team as a whole with computers or IT systems. CSCW is one of the research fields studying such interaction through computer systems. The field, however, has heretofore focused mainly on how corroborative activities can be supported or mediated by means of computer systems [1], while paying less attention to the cognitive mechanism behind cooperative activities. One of the reasons for this human-centered, but not “humans-centered”, approach in HCI or UA studies seems to be the lack of a sound theory and user modeling that describes the mechanism behind human cooperation in terms of cognitive models. With such a cognitive user model for cooperation, various cognitive simulations similar to those of individual user’s cognition by such as ACT-R and SOAR will be possible [2,3], resulting in a further developed understanding of cognitive factors in cooperation. This paper presents the modeling and simulation of human-human interaction based on a concept of mutual belief. In Section 2, the details of the model are introduced, and in Section 3, the model’s implementation with a Bayesian belief network is explained. In Section 4, the results of some test simulations as well as the simulation architecture is explained. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 674–683, 2009. © Springer-Verlag Berlin Heidelberg 2009
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
675
Section 5 concludes the paper by discussing the possible application of this model and simulation to universal access and HCI design and assessment.
2 Model of Team Cognition In our previous studies [4], we proposed a model of team cognition. The model was intended to describe inter-personal and intra-team factors in cognition in terms of beliefs about a partner’s cognition as well as one’s own cognition. The model consists of a set of three layers of mental components (both cognitive processes and beliefs) and their interactions. In a dyadic case (A and B), the model is composed of: a) Ma = A’s cognition, Mb = B’s cognition (individual cognition except beliefs b) Ma’= A’s belief about Mb, Mb’ = B’s belief about Ma (belief in another member’s cognition) c) Ma’’= A’s belief about Mb’, Mb’’ = B’s belief about Ma’ (belief in another member’s belief) Fig. 1 shows a schematic of this model, depicting three aspects of team cognition: belief structure, mental components, and the inter- and intra-personal interactions of these mental components. Details of each aspect are explained below.
Fig. 1. Team Cognition
2.1 Belief Structure The ability to infer or simulate the minds of others, that is, to obtain the beliefs of others, is believed to be innate and essential to human-beings [5]. It is necessary therefore to consider this aspect when modeling team cognition. We model team cognition with a structure of mutual belief based on the philosophical study of both team and collective intention [6]. Mutual belief is a set of beliefs hierarchically justifiable, such as in the above condition (b) and (c). Although theoretically mutual beliefs
676
T. Kanno, A. Watanabe, and K. Furuta
continue infinitely, empirically two or three are sufficient for actual cooperation. Most of the related theories have referred to the importance of the ability to infer and simulate the intentions of others in cooperative activities, while less attention has been paid to the function of beliefs in the third layer (belief in the beliefs of others). There is a high possibility that the third layer has a function in detecting and explaining, as well as recovering from, conflicts among team members. 2.2 Mental Components People can infer, simulate, feel, and share various aspects of mentality, including cognitive processes, mental states, knowledge, attitudes, and emotions. We refer to these as mental components in this paper. The circles of each layer in Fig. 1 represent mental components. If we can infer that we share some mental components or constructs with others, then they can be mapped onto one’s belief structure. For example, when one person gets angry (A’s first layer), then another person can easily understand or feel that anger (B’s second layer), and at the same time the person who has gotten angry can also infer or expect how the other person perceives their emotion (A’s third layer). Recent work has provided a listing of typical mental components identified by qualitative meta-analysis of recent HCI conference papers [7]. The mechanism and relations among the mental components in a single layer correspond to a model of individual cognition. It is therefore possible to incorporate such a model into each layer of the model shown in Fig.2. 2.3 Process and Manipulation The status of team cognition can be determined by the combination of its process and the status of its mental components. This combination is a key issue in understanding how a team member obtains and updates mental status for establishing and maintaining team cooperation. Communication and the observation of the behavior of partners are the main methods of human interaction. Much research on team cognition has analyzed such observable data to evaluate the efficiency and effectiveness of team cooperation. It is, however, obvious that the analysis of such phenotype interactions cannot directly explain or describe the mechanism behind cooperation because such observable behaviors are the results of the process of team cognition and not that of the reasoning involved in such a process. Indeed, there is another type of interaction involved in team cognition: intra-personal manipulation of mental components such as logical inferences, projections, and prototypes, including beliefs, in a single layer or between different layers,. Based on the status of one’s own mental components and the interrelations among the different layers, a person takes action (i.e., observing or communicating with others) to modify their own mental components as well as proactively influence those of their partner. Note that this type of interaction can be the sole reason for proactive interaction in team cooperation, thus providing a genotype of communication and behaviors. 2.4 Interaction Genotype Fig. 2 illustrates the relations between such observable behaviors and the mechanism and process behind them in communication. The upper two levels correspond to what
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
677
is talked about and its function, which can be analyzed from verbal protocols. Conventional protocol analysis has dealt with these aspects by analyzing the transcripts of verbal protocols. The lower two levels correspond to the drive or reasoning behind such observable interactions. We call the former type of communication “phenotype” and the latter “genotype” using an analogy of the categorization of human errors [8]. Table 1. Interaction Genotypes
Category 1. To drive and modify the process of each single layer (cognition and beliefs) 2. To help partner drive their process (update their cognition and beliefs)
3. To modify partner’s cognition and beliefs
Genotype Code(Reason/Objective) - Lack of necessary/adequate information or knowledge of mental components - Lack of confidence in beliefs
Phenotype (Performative) Query Confirm
-
Inform
-
-
Belief in the lack of necessary/adequate information about mental components To avoid conflicts Look-ahead Just for sharedness To avoid and recover from conflict in the status of mental components Correct misunderstandings
Fig. 2. Phenotype and Genotype of Interaction
Inform Query Confirm
678
T. Kanno, A. Watanabe, and K. Furuta
To elicit such interaction genotypes in team cognition, we conducted a qualitative analysis of several kinds of data obtained in team tasks: verbal protocols, post experiment interviews, and descriptions of the team behaviors by observers. The results of the inner manipulation of mental components and the genotype of interactions obtained to date are listed in Table 1. The second column shows a code in the data which includes reasons for verbal communication (phenotype) and observable behavior. The left column shows the code categories.
3 Simulation Model This section describes how the conceptual model was converted into a computational model. 3.1 Cognitive and Inference Process and Cognitive Status To simulate the non-monotonic human reasoning/inference process based on an uncertain and limited amount of information, a Bayesian Belief Network (BBN) was adopted for the representation of such a process in each layer of the team cognition model. BBNs are probabilistic graphical models consisting of nodes and links. A node for the team cognition model represents a type of cognitive status, such as situation awareness, and the probability of each node represents the degree of belief in the occurrence of the event. A link represents a causal relationship between two different nodes and a conditional probability is assigned to it. The team cognition model can be implemented with six BBNs (three layers * two persons) and the interactions among them. The cognitive task for the simulation performed in this study was to cooperatively achieve situation awareness. Specifically, a two-person team first obtained information from the environment or a partner and updated the probability of the corresponding nodes, and then all the probabilities of the entire BBN were calculated. In the simulation, conscious awareness of the occurrence of events was defined by Equation 1. U represents a set of the nodes of which the person is aware. Pi represents the occurrence probability of Node i. T is the threshold of the probability of becoming aware of the occurrence of events.
U =
∑ {i | P
i
≥ T}
(1)
i
3.2 Interaction between Different Layers It is reasonable to suppose that there are unconscious or subconscious interactions between different layers, for example, between own cognitive processes and the processes used in inferring a partner’s cognitive status. In a previous study, some evidence for this interaction was observed [9], for example people sometimes tended to believe without evidence that a partner might see the same information as they saw. In a computational model, this is represented as the manipulation of the probabilities of the corresponding two nodes between different layers. Two interaction effects, defined by Equations 2 and 3, were implemented in the present study. in Equation2
α
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
679
represents the effect of one’s own cognition on the belief layers, while βin Equation 3 represents the effect of the partner’s belief in their own cognition. Pi = αP1
(2)
P1 = βP2
(3)
3.3 Communication Generation As shown in Table 1, three types of interaction genotype have been obtained to date. In the following simulations, only the third one was implemented in the computational model. The rules derived from this genotype are defined by Equations 4 and 5. If Ua1 ≠Ua2 and If Ua2 is believed to be false then Inform (Ua1) to Modify (U1b). If Ua1 is believed to be false then Correct (Ua1) based on Ua2. If Ua1 ≠Ua3 then Inform(U1) to Modify(U2b).
(4) (5)
4 Simulation The process of obtaining shared situation awareness between agents A and B was simulated. Each agent has its own three layers of BBN. By the combination of these six BBNs, the distribution of knowledge, or heterogeneity of agents, can be represented. An example of the BBNs is shown in Fig. 3. The algorithm of the simulation is illustrated in Fig.4. The left upper nodes are those possessed only by Agent A, while the right-most node is the representative node for the events that Agent A cannot perceive but Agent B can.
Fig. 3. Agent A’s 1st Layer
680
T. Kanno, A. Watanabe, and K. Furuta
Fig. 4. Overview of the Simulation
This is a scenario-based simulation in which each agent obtains information from the environment sequentially based on the scenario and in which all occurrence probabilities are updated following the process shown in Fig.4. 4.1 Agent Characteristics The characteristics of an agent can be defined by its tendency in deciding the correct nodes between the 1st and 2nd layer, that is, the extent to which the agent has self confidence on their own cognition. The four characteristics shown in Table 2 were defined and implemented for the following simulation. Table 2. Agent Characteristics
Type 1 2 3 4
Character Strong self-confidence Following blindly Balanced Balanced (2)
Description Believe one‘s own cognition (U1 is correct) Follow one’s partner’s cognition (U2 is correct) Decide the one with more detailed knoweldge is correct Characteristics 3 without the third layer
4.2 Evaluation Criteria To assess the performance of the cooperation between the two agents, accuracy and sharedness, defined by Equations 6 and 7, respectively, were introduced. In Equations 6 and 7, U0 refers to the correct set of nodes that actually occurred in the scenario. Accuracy measures how correctly the team of Agent A and B is aware of the events that actually occur. The first term in sharedness is the completeness of the belief in the partner’s cognition (1st layer), while the second term represents the accuracy of the belief in the partner’s cognition (1st layer) [10, 11]. (6)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
681
(7)
5 Results and Discussion Simulation was conducted with the different agent characteristics combinations. The tested combinations are shown in Table 3, and comparisons of the accuracy and sharedness of each team are shown in Fig.5. 40 trials for each team condition were conducted. The results show that Team A received the lowest score for both accuracy and sharedness. It was observed from the communication log that each agent insisted on their correctness and did not complement their own cognitions with their partner’s. Team B scored the highest for sharedness but not accuracy because the members were strongly mutually dependent on their partners and did not take advantage of the merit of distributed knowledge. Teams C and D exhibited good performance for both accuracy and sharedness. It was also found from the comparison between Teams C and D that activation of the third layer (beliefs about beliefs) was effective on team performance. From the communication log, it was found that feedback (acknowledgement) to the speaker made communication more efficient and effective in Team D. This matches the concept of closed loop communication believed to be one of the important team competencies [12]. Table 3. Combinations of Agent Characteristics
Team A B C D
Agent A 1 2 4 3
Fig. 5. Accuracy and Sharedness Results
Agent B 1 2 4 3
682
T. Kanno, A. Watanabe, and K. Furuta
6 Conclusion This paper introduced a model for the simulation of human cooperative activities based on a concept of mutual belief. One of the characteristics of this model is the capturing of the mechanism behind cooperation not in terms of team function or macrocognition [13,14] but a cognitive user model (process and status). Another important characteristic is that the model separates metacognitive processes for cooperation (vertical) from cognitive/inference processes (horizontal). The model therefore can be used for almost all types of cognitive user models including Card’s information processing model [15], Norman’s model[16], and Simplex2[7], when applying them to the cognitive aspects of human cooperation. The simulation results showed that some basic qualitative characteristics of human cooperation were simulated, suggesting in particular that consideration of what one’s partner is thinking about oneself (activation of the third layer) is effective for good team performance. Although further testing under various conditions to assess the validity of this model is necessary, the current results show the potential of our simulation to provide a testbed environment for human cooperation that otherwise would be difficult to prepare using laboratory experiments or filed tests. This type of simulation also could be utilized for the design and assessment of HCI and UA for cooperation and collaboration, such as in the assessment of usability and accessibility through the simulation of the sharing processes of certain mental aspects or components.
References 1. Wilson, P.: Computer Supported Cooperative Work: An Introduction. Kluwer Academic Publishers, Dordrecht (1991) 2. Anderson, J.R., Lebiere, C.: The Atomic Components of Thought. Erlbaum, Mahwah (1998) 3. Newell: Unified Theories of Cognition. Harvard University Press (1990) 4. Kanno, T., Furuta, K.: Sharing Awareness, Intention, and Belief. In: Proc. 2nd Int. Conf. Augmented Cognition, pp. 230–235 (2006) 5. Baron-Cohen, S.: Mindblindnes. The MIT Press, Cambridge (1997) 6. Tuomela, R., Miller, K.: We-intentions. Philosophical Studies 53, 367–389 (1987) 7. Adams, R.: Decision and stress: cognition and e-accessibility in the information workplace. Univ. Access Inf. Soc. 5, 363–379 (2007) 8. Hollnagel, E.: The phenotype of erroneous actions: Implications for HCI design. In: Weir, G., Alty, J. (eds.) Human-Computer Interaction and Complex Systems. Academic Press, London (1990) 9. Kitahara, Y., Hope, T., Kanno, T., Furuta, K.: Developing an understanding of genotypes in studies of shared intention. In: Proc. 2nd Int. Conf. Applied Human Factors and Ergonomics, CD-ROM (2008) 10. Kanno, T.: The Notion of Sharedness based on Mutual Belief. In: Proc. 12th. Int. Conf. Human-Computer Interaction, pp. 1347–1351 (2007) 11. Shu, Y., Furuta, K.: An inference method of team situation awareness based on mutual awareness. Cognition, Technology, and Work 7, 272–287 (2005)
Modeling and Simulation of Human Interaction Based on Mutual Beliefs
683
12. Guzzo, R.A., Salas, E. (Associates eds.): Team effectiveness and decision-making in organizations, pp. 333–380. Pfeiffer (1995) 13. Letsky, P.M., Warner, W.N., Fiore, M.S., Smith, C.A.P.: Macrocognition in Teams, Ashgate (2008) 14. Salas, E., Fiore, M.S.: Team Cognition. American Psychology Association (2004) 15. Card, S.K., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Mahwah (1983) 16. Norman, D.A.: Cognitive Engineering. In: Norman, D.A., Draper, S.W. (eds.) User Centered System Design, ch. 3, pp. 31–61. Erlbaum, Hillsdale (1986)
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users Jan-Paul. Leuteritz1, Harald Widlroither1, Alexandros Mourouzis2, Maria Panou2, Margherita Antona3, and Asterios Leonidis3 1
Fraunhofer IAO / University of Stuttgart IAT, Nobelstr. 12, 70569 Stuttgart, Germany {jan-paul.leuteritz,harald.widlroither}@iao.fraunhofer.de 2 Centre for Research and Technology Hellas, Hellenic Institute of Transport, Thessaloniki, Greece {mourouzi,panou}@certh.gr 3 Foundation for Research and Technology - Hellas, Institute of Computer Science, Heraklion, Greece {anona,leonidis}@ics.forth.gr
Abstract. This paper describes the framework and development process of adaptive user interfaces within the OASIS project. After presenting a rationale for user interface adaptation to address the needs and requirements of older users, the paper presents and discusses the architecture and functionality of the OASIS adaptation framework, focussing in particular on an advanced library of adaptive widgets, as well as on the process of elaborating the adaptation rules. The results of the adopted approach are discussed and hints to future developments are provided. Keywords: Automatic user interface adaptation, Unified User Interface Design., adaptive widgets, adaptation decision-making.
1 Introduction Over the last 50 years, the number of older persons worldwide has tripled - and will more than triple again over the next 50-year period as the annual growth of the older population (1.9%) is significantly higher than that of the total population (1.02%). The European Commission has predicted that between 1995 and 2025 the UK alone will see a 44% rise in people over 60, while in the United States the baby-boomer generation which consists of about 76 million people and is the largest group ever in the U.S., is heading towards retirement [7]. This situation asks for new solutions towards improving the independence, the quality of life, and the active ageing of older citizens. Although substantial advances have been made in applying technology for the benefit of older persons, a lot of work remains to be done. Notably, only 13% of people aged over 65 are Internet users, while the average in Europe is 51%. Recent advancements in Information Society research have tremendous potential for meeting the emerging needs of older people and for further improving their quality C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 684–693, 2009. © Springer-Verlag Berlin Heidelberg 2009
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
685
of life. OASIS is an Integrated Project of the 7th FP of the EC in the area of eInclusion that aims at increasing the quality of life and the autonomy of elderly people by facilitating their access to innovative web-based services. OASIS stands for “Open architecture for Accessible Services Integration and Standardisation”, which hints at the project’s way towards making this vision a reality: OASIS aims at creating an open reference architecture, which allows not only for a seamless interconnection of Web services, but also for plug-and-play of new services. In order to give the OASIS architecture a critical mass for widespread implementation, the project consortium will make the reference architecture in question, and the related tools, available as open source. 12 initial services have been selected for prototype development in the project’s lifetime. They are joined into three main categories considered vital for the quality of life enhancement of the elderly: Independent Living Applications, Autonomous Mobility, and Smart Workplaces Applications [8]. OASIS aims at creating an open system. Not only new Web services will be able to connect via the hyper-ontological framework. New applications, that process information from different Web services in an innovative manner, are expected to emerge frequently. One main advantage of this approach is that it enables developers to make any new Web service or application available to a large community of elderly users through the OASIS platform. The OASIS approach aims at delivering all such services in appropriate forms optimally tailored to diverse interaction needs through the OASIS advanced approach to user interface adaptation. This paper focuses on the R&D approach of the project towards ensuring high quality interaction for older users, building on personalisation and adaptation techniques. The chosen methods for automatic user interface adaptation and rules generation are introduced and discussed here. Their purpose is: • to facilitate the development of interactive applications and services for different platforms; • to develop various accessibility components that can be used across the range of interaction devices supported by the project; • to enable the personalisation of interactions, as well as automatic tailoring-to-device capabilities and characteristics, thus offering an individualised user experience; • to develop components that facilitate the rapid prototyping of accessible and selfadaptive interfaces for the project’s range of supported devices.
2 Background 2.1 Older Users as a Target Group Older people are increasingly becoming the dominant group of customers of a variety of products and services (both in terms of number and buying power) [7]. This user group, large and diverse in its physical, sensory, and cognitive capabilities, can benefit from technological applications which can enable them to retain their independent living, and ultimately reduce health care expenditure. Although older people are not generally considered to have disabilities, the natural ageing process carries some degenerative ability changes, which can include diminished vision, varying degrees of hearing loss, psychomotor impairments, as well as reduced attention, memory and learning abilities. All of these changes affect the way
686
J.-P. Leuteritz et al.
older people use Information and Communication Technology (ICT), which must be accommodated to ensure that they are not disadvantaged when using ICT. This accommodation can only be realized after a thorough understanding of the changes associated with ageing and of their impact on the needs of older people concerning the interaction with technical systems. 2.2 Rationale for a User Interface Adaptation-Based Approach According to ISO 9241, the usability of a technical system depends inter alia on the user and the context of use. This requirement becomes even more critical when designing for non–traditional and diverse user groups, such as the elderly. Therefore, appropriate, personalised, systematically-applicable and cost-effective interaction solutions need to be elaborated, and proactive approaches towards coping with multiple dimensions of diversity are a prerequisite. The concepts of Universal Access and Design for All [12], [13] have the potential to contribute substantially in this respect, as they cater for diversity in every dimension of human-computer interaction. Recent approaches towards Design for All imply the notion of intelligent user interface run-time adaptation, i.e., the capability of automatically adapting to individual user characteristics and contexts of use through the realization of alternative patterns of interactive behaviour. The Unified User Interface design method has been developed to facilitate the design of user interfaces with automatic adaptation behavior [11] These efforts have also pointed out the compelling need of making available appropriate support tools for the design process of user interfaces capable of automatic adaptation. In a parallel line of work to user-oriented adaptivity, user interface (UI) research has recently addressed the identification of, and adaptation to, the situational and technical context of interaction (see, e.g., [1] for an overview) – although most of the time, user- and context-oriented adaptivity are combined (e.g., see [3]). Adaptivity concerns systems that adapt to the form factor of the user’s device, the actual interaction devices available to the user, the user’s geographical location, etc. In the context outlined above, OASIS aims to provide high-quality, ambient user interfaces by effectively addressing diversity in the following dimensions: (i) target user population and changing abilities due to aging; (ii) categories of delivered services and applications; and (iii) different computing-platforms and devices (i.e., PDA, smartphone, desktops, laptops). In this context, new accessibility components and alternative interfaces are constructed within OASIS that will be used across the range of devices supported by the project, offering personalised, ambient, multimodal, and intuitive interaction. By design, the OASIS user interface will embed adaptations based on user, device and context characteristics. Furthermore, OASIS develops innovative tools to facilitate the rapid prototyping of accessible and self-adaptive interfaces for cutting-edge technologies and devices supported by the project.
3 User Interface Adaptation Methodology The OASIS user interface adaptation methodology is decomposed into two distinct but highly correlated stages: the specification and the alternative design. During the specification stage, the conditionally adjustable UI aspects and the discrete dimensions that are
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
687
correlated with the adaptation decisions (user- and context- related parameters) are identified. During the alternative design stage, a set of alternative designs is created for each UI component. These alternatives are defined according to the requirements posed by each adaptation dimension (e.g., visual impairment) and parameter (e.g., red-green colour blindness or glaucoma). These alternatives need to be further encoded into a rule set, loaded by a rule inference engine, evaluated and finally propagated from the concept layer to the actual presentation layer. The OASIS project boosts adaptation by incorporating the above-outlined mechanisms into a complete framework that inherently supports adaptation. The Decision Making Specification Language (DMSL) engine and run-time environment [10] offer a powerful rule definition mechanism and promote scalability by utilizing external rule files while relieving the actual UI implementation code from any adaptationrelated conditionality. The Adaptive Widget Library developed in OASIS (see section 3.1) encapsulates all the necessary complexity for supporting adaptation of user interface components (from evaluation request till decision application). The OASIS adaptation platform infrastructure consists of the following components (see Fig. 1): the DMSL Server and the Adaptive Widget Library. The DMSL server is divided into the DMSL Engine Core and the DMSL Proxy. The Core is responsible for loading and evaluating the rules, while the Proxy acts as a mediator between the Core and external “clients”, by monitoring incoming connections, processing the requests and invoking the appropriate core methods.
Fig. 1. The OASIS Adaptation platform infrastructure
3.1 The OASIS Adaptive Widget Library The Adaptive Widget Library is a set of primitive (e.g., buttons or drop-down menus) and complex (e.g. file uploaders, image viewers) UI components that utilizes the DMSL Server facility to support adaptation. The library’s ease of use is ensured by relieving developers of the responsibility of manually adapting any widget attributes by offering a common “adapt” method. Each widget encloses a list of its adaptive attributes and when instructed to adapt itself, evaluates each attribute and applies the corresponding decision. Considering that the DMSL Server is a remote component, network connectivity is an essential precondition for the overall process; thus any lack of it should be handled beforehand. A fail-safe mechanism has been developed to minimize the side effects of potential connectivity loss, where the “last” known
688
J.-P. Leuteritz et al. Table 1. Adaptation steps in the OASIS framework 1.
2. 3. 4.
At compile time, the developer defines the rule file that the DMSL Server will load for the specific User Interface decision-making process and builds the user interface using the OASIS Adaptive Widget Library At runtime, the application – when necessary – invokes the adapt method for each contained widget Each widget asks the DMSL server to evaluate all the rules related to its subject to adaptation attributes Upon successful evaluation, it applies these decisions and updates its appearance to meet user and context needs
configuration is stored and maintained locally to facilitate “static” user interface generation without supporting on-the-fly adaptation. The adaptation process in the OASIS framework is outlined in Table 1. An example of user interface created using the Adaptive Widget Library is presented in Figure 2.
Fig. 2. An exemplary user interface developed with the Adaptive Widget Toolkit
Fig. 3. A simple example of widget adaptation
The panel, button and image UI components which appear in this interface are available through the library. Figure 3 depicts how this interface is automatically adapted through DMSL rules. In the left part of the figure, the interface displays a color combination, while in the right part a greyscale is used for enhanced contrast. This type of adaptation can be useful in case of visual limitations of older users.
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
689
3.2 Interaction Prototyping Tool To further facilitate adaptation design using the Adaptive Widget Library, a tool for rapid prototyping of interactions is being implemented to bind together all the components of the framework. This tool is intended to facilitate the connection of application task models (i.e., services) with accessibility solutions and adaptivity. Specifically, it is going to enable interaction designers to: − create rough interaction models, − encapsulate preliminary adaptation logic and effects and − specify how adaptations are effected in the interactive front-end. This tool incorporates the facilities mentioned above in a form similar to reusable software design patterns. The output of the prototyping tool will facilitate further development of the interfaces, while preserving the possibility for full-cycle reengineering of the modified output.
4 Adaptation Rules Elaboration Methodology This section discusses the methodology adopted in the elaboration of the adaptation rules for the OASIS prototyping tool. The major challenge in creating adaptation rules for self-adaptive user interfaces lies within the complexity of the resulting design space. Even a relatively simple adaptation design space including 3 different aspects of the interface, each of which will have three different alternatives, can in theory produce 27 different interface instantiations. Hence, iterative user interface development, involving repetitive user testing in the early phases, is not a very attractive method for creating adaptation rules. Instead, a more cost- and time-efficient solution is a theory based approach. In the OASIS project the first step in the development of the adaptation routine was a review of general interface design guidelines (e.g., ISO 9241). This was primarily meant to ensure that the adaptation rules defined would not contradict existing standards. Afterwards, specific design guidelines for elderly users were examined (e.g., [4]), in order to determine where adaptations would be appropriate, according to the restrictions of the devices to be used. The result of this work is a matrix in which the lines contain the adaptation-trigger parameters (e.g. the user’s age or impairments profile) and the columns show the user interface elements to be adapted (e.g., font size and color profile). Parameters to be linked via an adaptation rule are indicated at the intersection of rows and columns. In the matrix, the trigger parameters are specified in a format that takes into account the exact definitions of these variables. This is meant to facilitate the translation of the rules into DMS. As a subsequent step, the matrix was sent out to the OASIS consortium in order to collect feedback and to ensure that application-dependent issues are appropriately taken into account. The option of also collecting user-based feedback was discarded in this phase, based on the rationale that both the matrix and the underlying concepts would be difficult for the users to fully comprehend and comment upon. Users should rather be confronted with a prototype of the adaptive interface and provide a direct statement of approval or disapproval, e.g., via a validated measurement instrument for user satisfaction [6].
690
J.-P. Leuteritz et al.
After updating the matrix according to the collected feedback, it was checked for possible conflicts between rules to be created. Finally, the adaptation rules were elaborated. Table 2 below summarizes the resulting adaptation trigger parameters, and Table 3 shows the user interface elements subject to adaptation in the context of OASIS. Table 4 displays two examples of adaptation rules. Table 2. Trigger parameters • • •
End devices: PCs (including laptops • and tablet PCs) PDAs • Symbian mobile phones
• •
Person-related parameters: All users o Language Elderly users: o Age o Occupation / Life situation o Computer literacy o Speech impairment o Vision impairment o Mobility- / Motor impairment o Cognitive impairment o Hearing impairment Caregivers o Profession Others o User subgroup
Context-related parameters: Location o Office o Home o Other points of interest • Ambient parameters o Illuminance o Noise level o Handling conditions • Occupation parameters o At work o Moving o Car o By feet o On bus / train • Device specification o Weight o Robustness •
Table 3. User Interface elements subject to adaptation Font size Icon size Color-profile Brightness Audio volume Cursor Size of edit fields
Animation Voice control Text-to-speech Touch screen On-screen keyboard Touch less interface Caution warnings
Table 4. Adaptation rules – examples 1
If
Then
2
If
Then
[Elderly user’s age = 1 or 2 or 3] or [Elderly user’s life situation = 2 or 3] or [Elderly user’s computer literacy level = 0] or [Vision impairment = 1 or 2 or 3] Resolution 640*480 pixels
[Elderly user’s life situation =1] or [Elderly user’s computer literacy level = 1] Resolution 800*600 pixels
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
691
A further step to be accomplished is the final validation of the designed adaptations. Two approaches are under consideration towards this purpose: Hypothesis driven: each single adaptation rule could be tested by presenting all possible variations of an interface element to test users and asking them for their preference – or doing a performance test with interface instances that only differ concerning one variable. Comparison to standard device: one interface instance is considered as “standard” and each user in a user test is presented with both the standard and an adapted instance. This method will not tell the experimenter which adaptation is preferable, but it will show if all the adaptations together make sense. 4.1 Lessons Learned Significant experience was acquired through the process of creating adaptation rules. For example, it was found that that the brief descriptions of some interface characteristics in the matrix could cause misunderstandings. When using such a matrix to collect feedback, each of the dependent parameters should be accompanied by a short description. The use of scenarios and personas can be very useful in order to explain why an additional parameter is needed and how it is supposed to behave. Furthermore, phrasing scenarios can help developers of the adaptation rules to keep a focus on the overall usability of the system and avoid losing orientation between a large number of more or less important and even partially contradicting adaptation indications. Another important aspect of elaborating adaptation rules relates to the interpretation of existing design knowledge and guidelines. For many adaptation parameters, the literature does not provide precise thresholds for the trigger variables or the elements to be adapted. For example, older users are said to prefer bigger font sizes. Yet sources often do not give age-related cut-off-values, which is presumably due to the fact that the elderly are a very heterogeneous group. On the other hand, adaptation rules must be elaborated using precise thresholds, specifying, for example, at which user age the font size should grow, and to which extent. This issue was addressed by including adaptation rules with arbitrarily set thresholds based on design experience rather than excluding adaptations. It was assumed that the precision of the adaptation rules could still be fine tuned at a later stage through user testing. This decision was taken in order to allow the prototyping tool work with a rather large variety of rules. Corrective mechanisms will be included in the OASIS adaptation framework which support also the manual configuration of some interface aspects such font size or color profile. This empowers the user to reject any unwanted adaptation, resulting in an optimal personalisation of the UI.
5 Discussion and Conclusions This paper has presented the OASIS approach to user interface adaptation in the context of Web services for older users, addressing in particular the elaborated adaptation framework, the role of the OASIS Adaptive Widget Library and the process of designing the adaptations embodied in such a library.
692
J.-P. Leuteritz et al.
The design of user interface adaptation is a relatively novel undertaking. Although a general methodology is available, such as Unified User Interface Design, further research is necessary on how to best fine-tune various aspects of this methodology in different design cases. The work presented in this paper may serve as an example in this respect and offers hints for discussion. A first consideration that emerges is that tools are required in order to easily integrate adaptation knowledge into user interface development. The OASIS Widget Library has been developed in order to provide developers with fundamental support in applying adaptation. Through such a library, developers can easily embed user interface components’ adaptations in their user interfaces without having to design them from scratch or to implement the adaptive behaviour. However, it should be considered that adaptation not only affects the physical level of interaction, i.e., the presentation of interactive artifacts in the user interface, but also the interaction dialogue and overall structure. For example, the length of interactions, the number of interaction objects or options, the metaphors, wordings and operators, and the depth of menus are dialogue characteristics potentially subject to adaptation. Furthermore, additional adaptation triggers could also be considered, such as, for example, computer literacy and expertise. These aspects are not explicitly addressed in OASIS at the moment. However, it should be mentioned that the Unified User Interface Methodology provides techniques and tools for gradually expanding the types and levels of adaptation in a user interface, thus offering the opportunity to address increasing and evolving adaptation design requirements [2]. Additionally, the recent uptake of ontology-driven system development demands for general approaches targeted to linking adaptation to ontologies. A potential architecture for exploiting ontologies for adaptation purposes is presented in [9]. One fundamental challenge in creating self-adaptive interfaces lies in the difficulties encountered when translating the scientific state of the art into precise rules, as there are seldom concise thresholds defined for the triggers or even for the adaptive elements. Yet this challenge can be turned into a unique opportunity: adaptation rules are probably the most concise form of shaping theories about user behavior. Once a rule is established, it can be tested with specific user groups in experimental settings. If the results indicate that a rule improves interaction for certain user characteristics, than this rule is consolidated and can be re-used. If the rule turns out to be at least not generally right, it can be dropped. Eventually, the development of self-adaptive systems could bring new importance to basic research in human-computer-interaction.
References 1. Abowd, G.D., Ebling, M., Gellersen, H., Hunt, G., Lei, H.: Guest Editors’ Introduction: Context-Aware Computing. IEEE Pervasive Computing 1(3), 22–23 (2002), http://dx.doi.org/10.1109/MPRV.2002.1037718 2. Antona, M., Savidis, A., Stephanidis, C.: A Process–Oriented Interactive Design Environment for Automatic User Interface Adaptation. International Journal of Human Computer Interaction 20(2), 79–116 (2006) 3. Doulgeraki, C., Partarakis, N., Mourouzis, A., Stephanidis, C.: Adaptable Web-based user interfaces: methodology and practice. eMinds International Journal of Human Computer Interaction 1(5), 79–110 (2009)
Development of Open Platform Based Adaptive HCI Concepts for Elderly Users
693
4. Fisk, A., Rogers, W., Charness, N.: Designing for older adults: Principles and creative human factor approaches. Crc. Pr. Inc., London (2004) 5. ISO 9241, Ergonomics of human-system interaction. International Organization for Standardisation (2008) 6. Leuteritz, J., Widlroither, H., Klüh, M.: Multi-level validation of the ISOmetrics Questionnaire during the quantitative and qualitative usability assessment of two prototypes of a wall-mounted touch-screen device. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009), San Diego, California, USA, July 19–24. Springer, Berlin (2009) 7. Kurniawan, S.: Age Related Differences in the Interface Design Process. In: Stephanidis, C. (ed.) The Universal Access Handbook. Taylor & Francis, Abington (in press, 2009) 8. OASIS Consortium, (OASIS) Grant Agreement no 215754 – Annex I - Description of Work. European Commission, Brussels, Belgium (2007) 9. Partarakis, N., Doulgeraki, C., Leonidis, A., Antona, M., Stephanidis, C.: User Interface Adaptation of Web-based Services on the Semantic Web. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009), San Diego, California, USA, July 19–24. Springer, Berlin (2009) 10. Savidis, A., Antona, M., Stephanidis, C.: A Decision-Making Specification Language for Verifiable User-Interface Adaptation Logic. International Journal of Software Engineering and Knowledge Engineering 15(6), 1063–1094 (2005) 11. Savidis, A., Stephanidis, C.: Unified User Interface Design: Designing Universally Accessible Interactions. International Journal of Interacting with Computers 16(2), 243–270 (2004) 12. Stephanidis, C. (ed.) Salvendy, G., Akoumianakis, D., Arnold, A., Bevan, N., Dardailler, D., Emiliani, P.L., Iakovidis, I., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Oppermann, C., Stary, C., Tamura, H., Tscheligi, M., Ueda, H., Weber, G., Ziegler, J.: Toward an Information Society for All: HCI challenges and R&D recommendations. International Journal of Human-Computer Interaction 11(1), 1–28 (1999) 13. Stephanidis, C. (ed.) Salvendy, G., Akoumianakis, D., Bevan, N., Brewer, J., Emiliani, P.L., Galetsas, A., Haataja, S., Iakovidis, I., Jacko, J., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Stary, C., Vanderheiden, G., Weber, G., Ziegler, J.: Toward an Information Society for All: An International R&D Agenda. International Journal of Human-Computer Interaction 10(2), 107–134 (1998)
User Individual Differences in Intelligent Interaction: Do They Matter? Jelena Nakić and Andrina Granić Faculty of Science, University of Split, Nikole Tesle 12, 21000 Split, Croatia {jelena.nakic,andrina.granic}@pmfst.hr
Abstract. Designing an intelligent system, as confirmed by research, must address relevant individual characteristics of users. This paper offers a brief review of individual differences literature in the HCI field in general and e-learning area in particular. Research suggests that using adaptive e-learning systems may improve user learning performance and increase her/his learning outcome. An empirical study presented in this paper encompasses a comprehensive user analysis regarding a web-based learning application. Statistically significant correlations were found between user intelligence, experience and motivation for e-learning with her/his learning outcome accomplished in an e-learning session. These results contribute to the knowledge base of user individual differences and will be considered in an estimation of possible benefits from enabling the system adaptivity. Keywords: individual differences, user analysis, adaptive systems, e-learning, empirical study.
1 Introduction System intelligent/adaptive behavior strongly relies on user individual differences, the claim which is already confirmed and empirically proved by Human-Computer Interaction (HCI) research [6, 12, 13, 15, 22, 25]. Such assumption is in line with related studies completed by the authors; for example [17, 18]. However, developing adaptive systems is the process that includes comprehensive research, in relation to application domain of particular system. Designing intelligent interaction needs to take into account several research questions, including (i) how to identify relevant user characteristics, (ii) how to model the user, (iii) what parts of the adaptive system shall change and in what way and (iv) how to employ user model to implement adaptivity [4]. This paper describes an empirical study considering the first question in context of education. Particularly, the study identifies and appraises user individual differences and their relevance in learning environment. The paper is structured as follows. An introductory section provides a brief review of individual differences literature in the HCI field in general and e-learning area in particular. Literature findings are discussed in context of objectives and motivation for the research. Subsequently, the exploratory study is presented, along with results and discussion. Finally, conclusions are drawn and future research work is identified. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 694–703, 2009. © Springer-Verlag Berlin Heidelberg 2009
User Individual Differences in Intelligent Interaction: Do They Matter?
695
1.1 Individual Differences in HCI: A Literature Review and Discussion The first step in enabling a system to adapt to individual use is identifying and acquiring relevant information about users. The initial comprehensive overview of individual differences in the HCI field is Egan’s (1988) report on diversities between users in completing common computing tasks such as programming, text editing and information search. He pointed out that the ambition of adaptivity (e.g. dynamic or real-time adaptation) is that not only “everyone should be computer literate” but also that “computers should be user literate”, suggesting that user differences could be understood and predicted as well as being modified through the system design. Since then, the diffusion of technology brought computers to wide user population with extensive variety of knowledge, experience and skill dimensions in different areas. Accordingly, the identification of individual differences relevant for a system adaptation became a critical issue. In their early consideration of adaptivity, Browne, Norman and Riches (1990) provided one of the first classifications of candidate dimensions of user differences that may impact computer usage. They included diversities in cognitive styles (field dependence/independence, impulsivity/reflectivity, operation learning/comprehension learning), personality factors, psycho-motor skills, experience, goals and requirements, expectations, preferences, cognitive strategies and a number of cognitive abilities. Later on, Dillon and Watson (1996) reviewed a century of individual differences work in psychology stressing the role of differential psychology in the HCI field. They have identified a number of basic cognitive abilities that have reliably influenced the performance of specific tasks in predictable ways. Based on own analyses, they summarized that measures of ability can account for approximately 25% of variance in performance thus being suitable for usage in decision making for most systems, especially in addition to other sources of information (previous work experience, education, domain knowledge, etc.) According to their recommendations, psychological measures of individual differences should be used to increase possibilities for generalization of HCI findings. There is a number of studies confirming these pioneer work suggestions, showing for example that cognitive abilities, such as spatial and verbal ability, do affect the interaction, particularly navigation performance of the user [2, 9, 23, 27, 34]. The influence of user goals, knowledge, preferences and experience on her/his interaction with an intelligent system is unquestionable [4]. Moreover, these characteristics have been successfully employed in many adaptive systems, for example AHA!1 [11], InterBook2 [5], KBS Hyperbook [19], ELM-ART3 [33], INSPIRE [26], AVANTI [30], PALIO [31]. On the other hand, the matter of adaptation to cognitive styles and learning styles has been mainly ignored until last decade. Nevertheless, newer research (e.g. [8, 16]) confirms that navigation preferences of the users reflect their cognitive styles. In educational area many authors concluded that adaptation to learning styles, as defined by Kolb (1984) or Honey and Mumford (1992), could bring substantial benefits to students’ learning activities. This is evident from an increasing number of adaptive 1
http://aha.win.tue.nl/ http://www.contrib.andrew.cmu.edu/~plb/InterBook.html 3 http://apsymac33.uni-trier.de:8080/Lisp-Course 2
696
J. Nakić and A. Granić
educational systems having implemented some kind of adaptation (adaptability or adaptivity) to learning styles, see for example CS388 [7], INSPIRE [26] and AHA [28]. 1.2 Motivation for the Research Evidently, the affect of user individual differences on her/his performance has been the topic of very fruitful research for the last few decades. However, the obtained results are not quite consistent, partially because the user performance while using a particular system depends greatly on the system itself [3]. In addition, the research area of cognitive styles and learning styles in the HCI field is very recent so yet there is no strong evidence of their relevance concerning user’s interaction with an intelligent system (as discussed in [29]). Furthermore, even if these user styles were proved to be relevant, the question of potential benefits from personalized interaction still remains. System adaptation, even when well designed, does not necessarily imply user’s performance improvement [8]. Moreover, it can be disadvantageous to some classes of users [10]. Before including adaptation into a system, it is worthwhile to consider the possible alternatives. One good alternative, as suggested by Benyon and Hook (1997), could be an enlargement of learner’s experience in order to overcome her/his low spatial ability. As a second alternative, an appropriate redesign of a nonadaptive interface can be considered [20]. Based on these reflections, the research presented in this paper encompasses a comprehensive user analysis regarding a web-based learning application. The empirical study reported in the following aims to provide an answer whether it is reasonable or not to implement adaptation into the system.
2 User Analysis in e-Learning Environment: An Empirical Study The methodology for this experiment has been grounded mainly on our previous exploratory study reporting the relevance of user individual characteristics on learning achievements acquired in interaction with an e-learning system [18]. Although we have found some statistically significant correlations of user individual characteristics and learning performance, the results were not suitable for generalization, mainly due to certain limitations of the participants sample and of methodology applied. Encouraged by the results of the pilot experiment but also aware of its limitations, we have redesigned the methodology and conducted the second study elaborated in the following. The main objective of the research remains the same – to estimate potential benefits of engaging adaptation into the system. Clearly, such estimation should be based on in depth user analysis comprising both the analysis of user individual differences and user behavior in e-learning environment. In particular, the presented empirical study identifies and appraises those users' characteristics that produce statistically significant differences in the “amount of knowledge” which students get in learning session (i.e. learning outcomes). These characteristics are candidate variables for steering the adaptation process towards them. It can be assumed that adaptation of the system to those user characteristics that significantly correlate with learning outcomes could bring substantial benefits to students’ learning performance. Such hypothesis still has to be confirmed or rejected experimentally for each one of the candidate variables.
User Individual Differences in Intelligent Interaction: Do They Matter?
697
2.1 Participants Student volunteers were recruited from two faculties of the University of Split. The first group of participants was selected among 30 first-year undergraduate students (from two different study programs) attending The Computer Lab 1, a laboratory classes at the Faculty of Science. The second group was chosen from 30 candidates of first-year graduate students who were taking the Human-Computer Interaction course at the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture. Overall, fifty-two students agreed to take part in the study and five of them were engaged to assist in carrying out the procedure. The experiment was completed over four weeks in class. Consequently, there was a number of students who did not accomplish all phases of the procedure, partially because of certain technical limitations occurred at the days of the learning sessions. A total of 33 students completed the study. 2.2 Variables and Measuring Instruments User individual differences concerned as predictor variables include: age, personality factors, cognitive abilities, experience, background, motivation and expectations from e-learning. The Eysenck’s Personality Questionnaire (EPQ) was used to measure students’ personality factors. According to Eysenck (1992) one of the two main personality factors is neuroticism or the tendency to experience negative emotions. The second one is extraversion, as the tendency to enjoy positive events, especially social events. General factor of intelligence or “g” factor, as defined by Sternberg (2003), is a cognitive ability measure assessed through M-series tests, consisting of 5 subtests. We have used a Likert-based questionnaire to measure students’ experience, motivation and expectations. There were three dimension of experience assessed: computer experience and Internet experience which refer to time students spend using computer and Internet at the present time, as opposed to prior experience in using computers that refers to their previous education. Motivation for e-learning was the most difficult variable to measure. Although the learning sessions were integrated in the class, the students’ learning performance did not affect their course grades. That was the way to prevent interfering of extrinsic motivation for learning. The motivation assessed through the questionnaire refers only to intrinsic motivation of students, i.e. the level of their interest in the subject matter and in the mode of its presentation as web-based application. Students’ expectations from e-learning are another subjective measure, estimated through their own opinion about the quality and efficiency of e-learning applications in general. Information about students’ background, i.e. previous knowledge was calculated on their grades from previously passed exams (for graduate students) or from entry tests and pre-exams of first-year courses (for undergraduate students) in addition to their high school grades of relevant subjects. Students’ outcome acquired in learning session is expressed as a gain between pretest and post-test scores. The same paper-based 19-item multiple choice test served as pre-test and post-test. A lesson related to a communication and collaboration of Internet users, provided through a learning management system, was selected as a topic of the learning session. Selected lesson has not been thought previously in any university course at both faculties.
698
J. Nakić and A. Granić
2.3 Procedure The whole experiment procedure was conduced as a part of usual class time, integrated into the courses curriculums. It took four weeks to carry out all phases of the procedure. Through an introductory interview we have informed the students about the purpose and nature of the experiment. They have been told that participation in the study is on voluntarily basis and that their performance or scores on tests will not affect their course grades in any way. Obtained participants’ data were used for the preparation of a finely tuned questionnaire used afterwards to assess their experience, motivation and expectations. In the second week of the experiment the participants attained M-series tests. Testing was conducted under the supervision of psychologist and took 45 minutes for fulfillment of 5 tests, each one of them time limited separately. A following week the students took the EPQ test and filled the prepared questionnaire which measured remaining personal characteristics. The last week of the procedure comprised four steps. First, the students were given the pre-test on the subject matter expected to learn afterwards using the e-learning system. They were allowed 10 minutes to complete the pre-test. Then the students started a web-based learning application. Time for learning was limited to 30 minutes. The students were permitted to take notes while reading, but not allowed to use any external material on the subject, such as textbooks or other web resources. These notes could serve them only in reviewing the lesson material. After completing the learning session, the students were given the post-test. Again, a maximum of 10 minutes was allowed for completing the test. Usage of the notes taken while learning was not permitted. On completion of the post-test, the students were asked to fill the SUS questionnaire, thus measuring their satisfaction with the system they have just experienced.
3 Results Data analysis was conduced using SPSS version 16.0 for Windows. Pearson correlations were calculated, with p < 0.05 as acceptable level of significance for the experiment. 3.1 The Sample A total of 33 datasets were analyzed. The sample consisted of 12 females (36.4%) and 21 males (63.6%). The age varied from 18 to 24, with a mean of 20.3. The distribution of gender and age is shown in Table 1, distinguished into different study programs of students. Table 1. The distribution of gender and age within the sample Study program Computer Science and Technics Mathematics Computer Science Total
Study
Female
Male
Age range
Average age
undergraduate
2
10
18-21
22.0
undergraduate graduate
8 2 12
0 11 21
18-20 21-24 18-24
19.1 19.3 20.3
User Individual Differences in Intelligent Interaction: Do They Matter?
699
Descriptive statistics of all measured variables is presented in Table 2. The sample is relatively heterogeneous, considerable differences are evident in prior experience as well as background knowledge. This can be explained by the fact that participants come from three different study programs of two faculties. Two groups of participants are composed of first-year undergraduate students, while the third group is recruited from first-year graduate students. Regardless the differences in student experience, neither of participants has previously read any lesson from the learning management system used in the experiment. Table 2. Descriptive statistics of the sample
Age Extroversion Neuroticism Intelligence Prior experience Computer experience Internet experience Motivation Expectations Background knowledge Learning outcome Satisfaction
Minimum 18 3 1 36 6 4 4 0 0 8 10 47.5
Maximum 24 21 18 60 54 16 16 8 6 56 43 95
Mean 20.30 14.18 9.48 49.15 25.64 9.64 9.52 6.30 4.12 27.30 30.52 74.545
Std. Deviation 1.72 4.43 4.37 6.79 15.22 3.62 3.36 1.81 1.41 12.34 6.90 12.63
3.2 Results and Interpretation Data analysis showed highly significant correlation of M-series tests results with learning outcome (r = 0.47, p < 0.01). Since learning outcome is measured as a gain between pre-test and post-test scores, this result suggests that more intelligent students have learned more in the e-learning session then the less intelligent ones. The probability of occurring this by chance is less then 0.01. Another statistically significant correlation was identified between M-series tests results and background knowledge (r = 0.39, p < 0.05), indicating that more intelligent students have also achieved better grades on their previously passed exams and/or pre-exams. In light of these two significant correlations, it seems that more intelligent students have better learning performance in web-based than in traditional learning environment. No significant correlations were found between personality factors and learning outcome. Table 3 shows Pearson correlations of all psychological tests scores with background knowledge, learning outcomes and satisfaction with the system. Conducting age and experience analysis we have found that Internet experience significantly correlates with learning outcome (r = 0.37, p < 0.05), as shown in Table 4, suggesting that students who spend more time on the Internet use web-based learning application more successfully than students who spend less time. Intrinsic motivation for e-learning positively correlates with learning outcome (r = 0.36, p < 0.05), suggesting that more motivated students have acquired more knowledge in learning session than less motivated students. Another statistically significant
700
J. Nakić and A. Granić Table 3. Correlations of personality and intelligence with knowledge and satisfaction Extroversion
Background knowledge Learning outcome Satisfaction
Neuroticism
-.182 p = .311 -.088 p = .626 .043 p = .810
.066 p = .714 .184 p = .305 .260 p = .143
Intelligence
.393* p = .024 .465** p = .006 .035 p = .845
** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). Table 4. Correlations of age and experience with learning session results and satisfaction Age Learning outcome Satisfaction
.333 p = .058 .276 p = .120
Prior experience
Computer experience
Internet experience
.284 p = .109 .139 p = .441
.180 p = .315 .116 p = .521
.370* p = .034 .094 p = .602
* Correlation is significant at the 0.05 level (2-tailed).
correlation (r = 0.35, p < 0.05) was found between expectations from e-learning and satisfaction in using the system (SUS questionnaire). Apparently, users with grater expectations from the system have experienced higher levels of fulfillment in system usage. Those correlations are presented in Table 5, along with correlations for background knowledge. No significant connections were identified between background knowledge and other variables presented in this table. Table 5. Correlations of motivation and expectations from e-learning with background knowledge, learning outcome and satisfaction Motivation Background knowledge Learning outcome Satisfaction
.082 p = .648 .357* p = .041 .184 p = .306
Expectations
.163 p = .364 -.026 p = .886 .346* p = .049
Background knowledge
.314 p = .075 .205 p = .251
* Correlation is significant at the 0.05 level (2-tailed).
3.3 Discussion Personality factors, namely extroversion/introversion and the level of neuroticism, seem to have no impact on learning outcome (Table 3), the results which are in line with related literature, cf. [12, 13].
User Individual Differences in Intelligent Interaction: Do They Matter?
701
Considering motivation for e-learning and expectations from it, obtained results were expected (Table 5) – while motivation for e-learning is related to learning outcome, expectations from e-learning correlate with user satisfaction. In order to offer valuable results’ interpretation, it is important to distinguish motivation from satisfaction. Motivation includes aspiration and effort to achieve a goal, while satisfaction refers to fulfillment we feel due to a goal achievement. Thus the obtained connection of expectations and satisfaction seems very natural. Apparently, there is no connection between background knowledge and learning outcome. Such connection was expected because of the following reason. Namely, there are high correlations of background knowledge with all three dimensions of experience: prior experience (r = 0.79, p < 0.01), computer experience (r = 0.44, p < 0.05) and Internet experience (r = 0.44, p < 0.05). On the other hand, experience significantly correlates with learning outcome (Table 4). Consequently, the correlation of the background knowledge and learning outcome was also expected and such result would be in line with related studies [4]. The absence of particular connection may be explained by the fact that the topic of learning session was previously unknown to majority of participants, as confirmed with the pre-test scores.
4 Conclusion Appraising user characteristics that produce differences in learning performance has an important role when considering adaptive educational systems. The conducted empirical study reveals that there are significant connections of user intelligence, experience and motivation with her/his learning outcome in an e-learning environment. These results contribute to the knowledge base of user individual differences and they should be taken into account when developing a web-based instructional content. Nevertheless, further work is required in order to determine the way in which relevant user characteristics could be exploited in enabling the system adaptation. Additional research will be conducted to investigate what affects learning behavior as well as to determine how learning behavior is reflected on learning outcomes. It will be particularly interesting to see if the predictors of the learning behavior could predict learning outcome as well. Acknowledgments. This work has been carried out within project 177-0361994-1998 Usability and Adaptivity of Interfaces for Intelligent Authoring Shells funded by the Ministry of Science and Technology of the Republic of Croatia.
References 1. Benyon, D., Höök, K.: Navigation in Information Spaces: Supporting the Individual. In: INTERACT 1997, pp. 39–46 (1997) 2. Benyon, D., Murray, D.: Adaptive systems: From intelligent tutoring to autonomous agents. Knowledge Based Systems 6, 197–219 (1993)
702
J. Nakić and A. Granić
3. Browne, D., Norman, M., Rithes, D.: Why Build Adaptive Systems? In: Browne, D., Totterdell, P., Norman, M. (eds.) Adaptive User Interfaces, pp. 15–59. Academic Press. Inc., London (1990) 4. Brusilovsky, P.: Adaptive Hypermedia. User Modeling and User-Adapted Interaction 11, 87–110 (2001) 5. Brusilovsky, P., Eklund, J.: InterBook: an Adaptive Tutoring System. UniServe Science News 12 (1999) 6. Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.): Adaptive Web 2007. LNCS, vol. 4321. Springer, Heidelberg (2007) 7. Carver, C.A., Howard, R.A., Lavelle, E.: Enhancing student learning by incorporating learning styles into adaptive hypermedia. In: Proc. of 1996 ED-MEDIA World Conf. on Educational Multimedia and Hypermedia, Boston, USA, pp. 118–123 (1996) 8. Chen, S., Macredie, R.: Cognitive styles and hypermedia navigation: development of a learning model. Journal of the American Society for Information Science and Technology 53(1), 3–15 (2002) 9. Chen, C., Czerwinski, M., Macredie, R.: Individual Differences in Virtual Enviroments – Introduction and overview. Journal of the American Society for Information Science 51(6), 499–507 (2000) 10. Chin, D.N.: Empirical Evaluation of User Models and User-Adapted Systems. User Modeling and User Adapted Interaction 11, 181–194 (2001) 11. De Bra, P., Calvi, L.: AHA! An open Adaptive Hypermedia Architecture. The New Review of Hypermedia and Multimedia, 115–139 (1998) 12. Dillon, A., Watson, C.: User Analysis in HCI – The Historical Lessons From Individual Differences Research. International Journal on Human-Computer Studies 45, 619–637 (1996) 13. Egan, D.: Individual Differences in Human-Computer Interaction. In: Helander, M. (ed.) Handbook of Human-Computer Interaction, pp. 543–568. Elsevier Science B.V. Publishers, North-Holland (1988) 14. Eysenck, H.J.: Four ways five factors are not basic. Personality and Individual Differences 13, 667–673 (1992) 15. Ford, N., Chen, S.Y.: Individual Differences, Hypermedia Navigation and Learning: An Empirical Study. Journal of Educational Multimedia and Hypermedia 9(4), 281–311 (2000) 16. Graff, M.G.: Individual differences in hypertext browsing strategies. Behaviour and Information Technology 24(2), 93–100 (2005) 17. Granić, A., Stankov, S., Nakić, J.: Designing Intelligent Tutors to Adapt Individual Interaction. In: Stephanidis, C., Pieper, M. (eds.) ERCIM Ws UI4ALL 2006. LNCS, vol. 4397, pp. 137–153. Springer, Heidelberg (2007) 18. Granić, A., Nakić, J.: Designing intelligent interfaces for e-learning systems: the role of user individual characteristics. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4556, pp. 627–636. Springer, Heidelberg (2007) 19. Henze, N., Nejdl, W.: Adaptivity in the KBS Hyperbook System. In: 2nd Workshop on Adaptive Systems and User Modeling on the WWW. Toronto, Banff (1999), Held in conjunction with the WorldWideWeb (WWW8) and the International Conference on User Modeling (1999) 20. Hook, K.: Steps to Take Before Intelligent User Interfaces Become Real. In: Interacting with Computers, vol. 12, pp. 409–426. Elsevier Science B.V (2000) 21. Honey, P., Mumford, A.: The Manual of Learning Styles, 3rd edn. Peter Honey, Maidenhead (1992) 22. Jennings, F., Benyon, D., Murray, D.: Adapting systems to differences between individuals. Acta Psychologica 78, 243–256 (1991)
User Individual Differences in Intelligent Interaction: Do They Matter?
703
23. Juvina, I., van Oostendorp, H.: Individual Differences and Behavioral Metrics Involved in Modeling web Navigation. Universal Access in the Information Society 4(3), 258–269 (2006) 24. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. Prentice-Hall, Englewood Cliffs (1984) 25. Magoulas, G., Chen, S. (eds.): Proceedings of the AH 2004 Workshop, Workshop on Individual differences in Adaptive Hypermedia, The 3rd International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, Eindhoven, Netherlands (2004) 26. Papanikolaou, K.A., Grigoriadou, M., Kornilakis, H., Magoulas, G.D.: Personalising the Interaction in a Web-based Educational Hypermedia System: the case of INSPIRE. UserModeling and User-Adapted Interaction 13(3), 213–267 (2003) 27. Stanney, K., Salvendy, G.: Information visualization: Assisting low spatial individuals with information access tasks through the use of visual mediators. Ergonomics 38(6), 1184–1198 (1995) 28. Stash, N., Cristea, A., De Bra, P.: Adaptation to Learning Styles in ELearning: Approach Evaluation. In: Reeves, T., Yamashita, S. (eds.) Proceedings of World Conference on ELearning in Corporate, Government, Healthcare, and Higher Education, pp. 284–291. Chesapeake, VA, AACE (2006) 29. Stash, N., De Bra, P.: Incorporating Cognitive Styles in AHA (The Adaptive Hypermedia Architecture). In: Proceedings of the IASTED International Conference Web-Based Education, pp. 378–383 (2004) 30. Stephanidis, C., Paramythis, A., Karagiannidis, C., Savidis, A.: Supporting Interface Adaptation: the AVANTI Web-Browser. In: 3rd ERCIM Workshop on User Interfaces for All (UI4ALL 1997), Strasbourg, France (1997) 31. Stephanidis, C., Paramythis, A., Zarikas, V., Savidis, A.: The PALIO Framework for Adaptive Information Services. In: Seffah, A., Javahery, H. (eds.) Multiple User Interfaces: Cross-Platform Applications and Context-Aware Interfaces, pp. 69–92. John Wiley & Sons, Ltd., Chichester (2004) 32. Sternberg, R.J.: Cognitive Psychology, Wadsworth, a division of Thompson Learning, Inc., 3rd edn (2003) 33. Weber, G., Brusilovsky, P.: ELM-ART: An Adaptive Versatile System for Web-based Instruction. International Journal of Artificial Intelligence in Education 12, 351–384 (2001) 34. Zhang, H., Salvendy, G.: The implication of visualization ability and structure preview design for web information search tasks. International Journal of Human–Computer Interaction 13(1), 75–95 (2001)
Intelligent Interface for Elderly Games Changhoon Park Dept. of Game Engineering, Hoseo University, 29-1 Sechul-ri Baebang-myun, Asan, Chungnam 336-795, Korea
[email protected] Abstract. This paper proposes an intelligent interface to improve the game accessibility for the elderly based on the multimodal interface and dynamic load balancing. This approach aims to control the fidelity of feedback and the level of difficulty dynamically when the elderly become bored or frustrated with the game. By applying the proposed intelligent interface, we will present the implementation of a rhythm game for the elderly with a specialized game controller like a drum. Keywords: Game Accessibility, Multimodal Interface, Dynamic Game Balancing, Rhythm Game.
1 Introduction As the percentage of older persons in the world's population is continually increasing, the issue of accessibility to and usability of products and services has become more critical. And, there has been an explosion of interest and involvement in the field of gerontechnology1 for innovative and independent living and social participation of older adults in good health, comfort and safety [1]. In recent years, we have been studied serious games for the elderly based on the previous research finding positive effects of video game use on the cognitive and neuromotor skills of the elderly [2]. And, we have started a project, "A research on serious games for the elderly toward human service" supported by Hoseo University. A research team in the faculty of game engineering, welfare for the elderly, nursing, and electronic engineering are working in collaboration. The goal of this project is to improve the quality of their lives by means of game play interaction. And, our strategy to achieve this goal includes a change from technology development driven by technical feasibility towards one driven by the knowledge about behavior of the elderly, considered as a special category of users, whose particular abilities and needs, at cognitive, social and health levels, have to be taken into account during the research process. This paper will present a way of adapting a game to the elderly dynamically in order to keep them challenged and interested based on the consideration and understanding of them. In section 2, we introduce our previous experiment and the concept 1
The term of gerontechnology is a composite of two words, “gerontology” the scientific study of aging and “technology”: research, development, and design of new and improved techniques, products, and services.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 704–710, 2009. © Springer-Verlag Berlin Heidelberg 2009
Intelligent Interface for Elderly Games
705
of game accessibility. Section 3 proposes our approach in order to improve the game accessibility for the elderly. Section 4 presents the implementation of a rhythm game with a specialized game controller like a drum. Finally section 5 concludes this.
2 Related Work This section introduces our previous experiment to identify barriers posed by current video games and understand the content interests and the skill sets of the elderly. And, we present multimodal interface and design for dynamic diversity, which can be applied to improve the game accessibility. 2.1 Experiment The objective of this study is to examine Korean elders’ playing of video games. The total number of participants was forty. We recruited participants who were over the age of 65 at the time of the study. We conducted a series of four focus groups with four games selected for the study (2 Taiko master, Wii sports, and WarioWare).
Fig. 1. Participants were encouraged by the researchers to take turns and to play a variety of games within the one hour. During the game play, participants’ comments were noted. In addition, the interview and participant's movement of hands and eyes were recorded by two video cameras in order to evaluate the difficulty or frustration of them.
Relating to the game controller, an input device used to control a game, the participants demonstrated no difficulty when using the drum-like game pad with sticks for Taiko master. And, participant remarks indicating difficulty with Wii remote controller. While the drum-like game pad is simple and easy to use, participants need more time to familiarize themselves with the Wii remote, especially the use of the buttons. This means that special purpose devices are more familiar and intuitive to use than general-purpose devices. Relating to the game design, each mini-game of WarioWare lasts only about five seconds or so. It’s too short to understand and enjoy the challenge of game for the elderly. And, one participant commented after playing Wii sports, “I don’t know the rule of bowling. Make the games with common activity such as cleaning, dancing and
706
C. Park
so on. Taiko master provides weak feedback about the progress and activity of game play in spite of long playtime. A majority of participants indicated after the game play that they would be interested in playing video games in the future. To appeal to an elderly people, existing games need some medications to the complexity of controls and simplifying the challenge of the activity. This study demonstrates that interactive games allow the elderly to enjoy new opportunities for leisure and entertainment situations, while improving their cognitive, functional and social skills. 2.2 Game Accessibility Game Accessibility is defined as the ability to play a game even when functioning under limiting conditions. Limiting conditions can be functional limitations, or disabilities — such as blindness, deafness, or mobility limitation . Multimodal interface provides the user with multiple modes of interfacing with a system beyond the traditional keyboard and mouse. Modality refers to any of the various types of sensation, such as vision or hearing. And, a sensory modality is an input channel from the receptive field. A well-designed multimodal application can be used by people with a wide variety of impairments. This means that the weaknesses of one modality or sensory ability can be offset by the strengths of another. For examples, visually impaired users rely on the voice modality with some keypad input. Hearing-impaired users rely on the visual modality with some speech input. Among the most important reasons for developing multimodal interfaces is their potential to greatly expand the accessibility of computing for diverse and non-specialist users, and to promote new forms of computing not previously available [3]. [4] proposed a paradigm to support universal design is called Design for Dynamic Diversity (DDD or D3). Traditional User Centered Design (UCD) does not support this paradigm, as the focus of UCD is placed on the “typical user”. As has been described, “the elderly” encompasses a very diverse group of users in which individual requirements change over time, making it a group that UCD has difficulties coping with. That is why a new methodology has been introduced to accommodate Design for Dynamic Diversity. In this paper, we will apply the concept of DGB for game balance. Game balance is a concept in game design describing fairness or balance of power in a game between multiple players or strategic options. We will control the level of challenge dynamically and individually in order to help the elderly’ game play.
3 Intelligent Interface In this section, we propose an intelligent interface based on the understanding of skill sets of this significant population. We aims to keep the elderly in the mental state of operation in which the person is fully immersed in what he or she is doing by a feeling of energized focus, full involvement, and success in the process of the activity. The intelligent interface provides two methods to avoid the elderly becoming board or frustrated with the game.
Intelligent Interface for Elderly Games
707
Fig. 2. Overview of intelligent interface
First method is to select the most appropriate mode of feedback to encourage and assist the elderly. Feedback is an important part of video games for a fulfilling interactive experience. And, it can be presented in several types of feedback such as visual, audio, action, NPC and so on. The more alternative type of feedback used with the use of multimodal interface, the greater the number of people who will be suited. In terms of function for elderly peoples with dexterity and strength impairment, this methods makes game accessible by the use of another modality or sensory ability. And, each feedback is designed to have multiple levels of fidelity. Second method is to change the level of the difficulty dynamically in order to keep the elderly away from states where the game is far too challenging, or way too easy. Traditional game design is well suited to covering particular clusters of players as a developer’s perception of what makes a good game is sure to appeal to someone. So, designers have relied on the provision of adaptable gaming experiences to make for better audience coverage, for example most games come equipped with an easy, medium and hard difficulty setting. But, the variety of the elderly means that some players will inevitably lie outside the scope of predetermined adaptation [5]. Our approach is to keep the elderly interested from the beginning to the end individually by changing parameters, scenarios and behaviors in video games. In order to realize intelligent interface, we need to detect the difficulty the user is facing at a given moment. Challenge function maps a given game state into a value that specifies how easy or difficult the game feels to the user. Depending on this value, intelligent interface can control the fidelity of feedback and the level of challenge in order to making game adaptable to different users. And, intelligent interface can be used to positive or negative and explicit or implicit way to avoid the elderly becoming board or frustrated with the game
4 Implementation We have implemented a rhythm game for the elderly in order to apply the proposed intelligent interface. This game has been developed using Microsoft Visual C++ and DirectX as the graphic API. In rhythm games, the players must match the rhythm of the music by pressing specific buttons, or activating controls on a specialized game controller, in time with the
708
C. Park
game's music. This kind of interaction helps the elderly improve perceptual motor skills and cognitive functioning. Motor skills can be defined as a refined use of small muscle controlling the hand, and fingers, usually in coordination with the eyes. This skill allows one to be able to complete tasks such as writing, drawing, and buttoning. A decline in perceptual-motor functions has serious consequences which affects a range of activities of daily living[6].
Fig. 3. Feedback and Difficult Control for Rhythm Game
Fig. 4. (a) The buk is a traditional Korean drum, with a round wooden body that is covered on both ends with animal skin. Performers usually beat their buk with bukchae (a drumstick) on one hand or two hands together. (b) We modified buk to detect when a sensor in the drum’s surface is hit. And, there are two blue buttons, left, and right, which are used to select and decide in the selection screens.
To implement intelligent interface in this game, we need to define a challenge function to detect the difficulty of gameplay. The first way is to allow the user press button when he or she feels difficulty. The second way is to make an equation dependent on a game score. If the variation of game score for a given time is positive, then the level of challenge become sink. Intelligent interface aims to keep this value stable to avoid the elderly becoming board or frustrated with the game. And, we need to control the level of difficulty for intelligent interface in rhythm game. In rhythm games, the player is required to hit the drum in time when large beats will scroll across the screen. So, the difficulty of this game can be controlled by
Intelligent Interface for Elderly Games
709
the speed and amount of note on screen. This is a way to make the game easy or difficult directly. For the fidelity of feedback, To improve the accessibility of game, we developed a specialized game controller by modifying buk2, traditional Korean drum (Fig. 1). Instead of using standard interfaces like keyboard and mouse, buk is so intuitive and simple that the elderly don’t need to spend the time to learn how to use. This game is played simply by hitting the drum in time with notes traveling across the screen. And, we can design our game as a one-switch game that can be controlled by single button.
Fig. 5. Introduction and gameplay Screenshot
5 Conclusion To identify barriers posed by current video games, we examined Korean elders' playing with three popular games. And, we presented an intelligent interface, which enables to control the fidelity of feedback and the level of difficulty dynamically. Our approach is based on the multimodal interface and dynamic game balancing for the accessibility of games. By applying the proposed interface, we developed a rhythm game and specialized controller especially for the elderly. This game can help the elderly improve perceptual motor skills and cognitive functioning. And, intuitive and simple game controller was also developed by modifying a traditional Korean drum. Acknowledgments. “This research was supported by the Academic Research fund of Hoseo University in 2008” (20080015)
References 1. Harrington, et al.: Gerontechnology: Why and How. Herman Bouma Foundation for Gerontechnology, 90–423 (2000) 2. Dustman, et al.: Aerobic exercise training and improved neuropsychological function of older individuals. Neurobiology of Aging (1984) 2
The term buk is also used in Korean as a generic term to refer to any type of drum. Buk have been used for Korean music since the period of the Three Kingdoms of Korea (57 BC – 668 AD).
710
C. Park
3. Oviatt: Advances in robust multimodal interface design. IEEE Computer Graphics and Applications 272, 03 (2003) 4. Gregor, et al.: Designing for dynamic diversity: making accessible interfaces for older people. In: Proceedings of the 2001 EC/NSF workshop on Universal (2001) 5. Gilleade, et al.: Using frustration in the design of adaptive videogames. In: Proceedings of the 2004 ACM SIGCHI International Conference (2004) 6. Drew, et al.: Video Games: Utilization of a Novel Strategy to Improve Perceptual-Motor Skills in the noninstitutionalised elderly. Cognitive Rehabilitation 4, 26–34 (1985)
User Interface Adaptation of Web-Based Services on the Semantic Web Nikolaos Partarakis1, Constantina Doulgeraki1, Asterios Leonidis1, Margherita Antona1, and Constantine Stephanidis1,2 1
Foundation for Research and Technology – Hellas (FORTH) Institute of Computer Science GR-70013 Heraklion, Crete, Greece 2 University of Crete, Department of Computer Science, Greece
[email protected] Abstract. The Web is constantly evolving into an unprecedented and continuously growing source of knowledge, information and services, potentially accessed at by anyone anytime, and anywhere. Yet, the current uptake rates of the Web have not really reached their full potential, mainly due to the design of modern Web-based interfaces, which fail to satisfy the individual interaction needs of target users with different characteristics. A common practice in contemporary Web development is to deliver a single user interface design that meets the requirements of an “average” user. However, this “average” user is in fact an imaginary user. Often, the profiles of a large portion of the population, and especially people with disability, elderly people, novice users and users on the move, differ radically. Although much work has been done in the direction of providing the means for the development of inclusive Web-based interfaces that are capable to adapt to multiple and significantly different user profiles, the current evolution towards the semantic web poses several new requirements and challenges for supporting user and context awareness. Building upon existing research in the field of semantics-based user modeling, this paper aims to offer potential new directions for supporting User Interface Adaptation on the Semantic Web. In this context, the benefits gained from supporting semantically enabled ontology based profiling are highlighted, focusing on the potential impact of such an approach to existing UI adaptation frameworks.
1 Introduction Recently, computer-based products have become associated with a great amount of daily user activities, such as work, communication, education, entertainment, etc. Their target population has changed dramatically. Users are no longer only the traditional able-bodied, skilled and computer-literate professionals. Instead, users are potentially all citizens of the emerging Information Society, and demand customised solutions to obtain timely access to any application, irrespective of where and how it runs. At the same time, the type and context of use of interactive applications is radically changing (e.g., personal digital assistants, kiosks, cellular phones and other C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 711–719, 2009. © Springer-Verlag Berlin Heidelberg 2009
712
N. Partarakis et al.
network-attachable equipment). This progressively enables nomadic access to information [15]. In computing, the notion and importance of adaptation, as the ability to adapt a system to the user’s needs, expertise and requirements was only recently recognised. In this context the computationally empowered environment can adapt itself, at various degrees, to its ‘inhabitants’, thereby reducing drastically the amount of effort required from the users. Methods and techniques for user interface adaptation meet significant success in modern interfaces, but most focus mainly on usability and aesthetics. The Unified User Interfaces methodology for UI adaptation [15] was conceived and validated as a vehicle to efficiently and effectively address, during the interface development process, the accessibility and usability of UIs to users with diverse characteristics, supporting also technological platform independence, metaphor independence and user-profile independence. Web-based user interfaces (WUIs) constitute a particular type of UIs that accept input and provide output by generating web pages that are transported via the Internet and are viewed by the user through a web browser. Adaptive Web-Based User Interfaces support the delivery of qualitative user experience for all, regardless of the user’s (dis)abilities, skills, preferences, and context of use. In the web context, factors such as visual experience and site attractiveness, quality of navigation organization (especially on large sites), placement of objects [3], colour schema, and page loading time also affect the overall user experience and satisfaction and can be employed by adaptation mechanisms to personalize web user interfaces. On the other hand, the Semantic Web provides valuable means and raises great expectations WUIs adaptation. Research has already employed the features offered by the Semantic Web for generating adaptation recommendations using mining techniques [12]. In the same context, work has been conducted towards providing dynamically generated Web content to better meet user expectations through semantic browsing of information [8]. However, the potential of developing an adaptive web-based environment in the context of the Semantic Web has not yet been fully investigated. In this paper, a potential architecture for a development framework that supports the creation of adaptive Web User Interfaces is introduced by extending the architecture of an existing development framework (EAGER [5]). This paper is structured as follows. Section 2 discusses various approaches to User Interface Adaptation. In section 3, a potential architecture for supporting User Interface adaptation on the Semantic Web is presented, based on the experience gained through the development of adaptive applications in various contexts. Section 4 outlines the main potential benefits of employing such a methodology in a semantically enabled environment. Finally, section 5 discusses further research and development steps in this direction.
2 Current Approaches to User Interface Adaptation 2.1 User and Context Profiling The scope of user profiling is to provide information regarding the user who accesses an interactive application. A user profile contains attributes either specified by the
User Interface Adaptation of Web-Based Services on the Semantic Web
713
user prior to the initiation of interaction or acquired by the system during interaction (through interaction monitoring). On the other hand, context profiling aims at collecting context attribute values (machine and environment) that are (potentially) invariant, meaning unlikely to change during interaction, (e.g., peripheral equipment or variant), or dynamically changing during interaction (e.g., due to environment noise, or the failure of particular equipment, etc). Static profiling. Static profiling entails the complete specification of attributes prior to the implementation of the reasoning engine of an interactive application. Where static profiling is employed, the process of altering the logic used for generating the adaptable behaviors of the system is semi-automatic and cannot be provided on the fly. More specifically it is not feasible, when such an approach is followed, to enrich the decision logic while the system is running to perform meta-adaptation. This can only be achieved in the context of adaptations that occur based on collecting and analyzing usage data. Extensible profiling using special purpose languages and Design Support Tools. A potential solution to the limitations of static profiling is to separate the logic under which adaptation occurs from the system performing the adaptation. This can be achieved, for example, through the creation of special purpose languages for the specification of the decision logic. An example of such a language is the Decision Making Specification language (DMSL [14]). Special purpose design support tools, such as MENTOR [1], can be used to produce the decision logic of an application orchestrating user interaction. 2.2 User Interface Adaptation Toolkits Data stemming from user and context profiling are used by adaptation toolkits for dynamically generating the interface instance that is more appropriate for a specific user in a specific context of use. Such toolkits in their most advanced implementation consist of collections of alternative interaction elements mapped to specific user and context parameters. The automatic selection of the appropriate elements is the key for supporting a large amount of alternative interface instantiations. In the following sections some indicative examples of existing tools that support the development of adaptive User Interfaces in various contexts are presented. The EAGER toolkit. EAGER [5] is a development toolkit that allows Web developers to build adaptive applications using facilities similar to those offered by commonly user frameworks (such as ASP.NET [2] and Java server faces [6]). It is a developer framework build over ASP.NET providing adaptation-enabled ready to use dialogs. By means of EAGER, a developer can produce Web portals that have the ability to adapt to the interaction modalities, metaphors and UI elements most appropriate to each individual user, according to profile information containing user and context specific parameters. Advanced toolkit for UI adaptation in mobile services. The main concept of this toolkit [9] is to facilitate the implementation of adaptive-aware user interfaces for mobile services. UI widgets supported by this framework encapsulate all the necessary information and are responsible for requesting and applying the relative
714
N. Partarakis et al.
decisions. The Toolkit employs DMSL to allow UI developers to turn hard-coded values of lexical attributes to adapted UI parameters specified in an external preference file. As a result, the UI Implementation is entirely relieved from adaptation-related conditionality, as the latter is collected in a separate rule file. 2.3 Case Studies In this section real life applications developed utilizing adaptation toolkits are briefly overviewed, focusing on highlighting their ability to cope with the diversity of the target user population and therefore offering qualitative user experience for all, regardless of the user’s (dis)abilities, skills, preferences, and context of use. The AVANTI web Browser. The AVANTI Web Browser [16] facilitates static and dynamic adaptations in order to adapt to the skills, desires and needs of each user including people with visual and motor disabilities. The Avanti’s unified interface can adapt itself to suit the requirements of three user categories: able-bodied, blind and motor impaired. Adaptability and adaptivity are used extensively to tailor and enhance the interface respectively, in order to effectively and efficiently meet the target of interface individualisation for end users. Additionally, the unified browser interface implements features, which assist and enhance user interaction with the system. Such features include enhanced history control for blind and sighted users, link review and selection acceleration facilities, document review and navigation acceleration facilities, enhanced intra-document searching facilities etc. The EDEAN portal. EDEAN is a prototype portal developed, as proof-of-concept, following the UWI methodology by means of the EAGER toolkit [5]. In order to elucidate the benefits of EAGER, an already existing portal was selected and redeveloped from scratch. In this way, it was possible to identify and compare the advantages of using EAGER, both at the developer’s site, in terms of developer’s performance, as well as at the end-user site, in terms of the user-experience improvement. The ASK-IT interface for mobile transportation services. The Home Automation Application developed in the context of ASK-IT facilitates remote overview and control through the use of a portable device. These facilities provided the ability to adapt themselves according to user needs (vision and motor impairments), context of use (alternative display types and display devices) and presence of assistive technologies (alternative input devices). 2.4 Discussion The approaches developed so far to support User Interface Adaptation have shown to be adequate for addressing a number of requirements. Especially in the context of web applications, previous work has proven that it is technologically feasible to develop web-based interfaces that are able to adapt to various user profiles and contexts of use. Limitations of current approaches include the difficulties faced when addressing the potential of change and the reduced reasoning capabilities resulting from the methods used for capturing user and context profiles. The semantic web brings new directions and challenges, and offers new paths through enhanced expressive power and advanced reasoning facilities. The next section discusses how these facilities can
User Interface Adaptation of Web-Based Services on the Semantic Web
715
be used towards enriching the adaptive behavior of existing frameworks. A potential implementation architecture for supporting User Interface Adaptation on the Semantic Web will be presented, focusing on the feasibility of such a concept and on its potential advantages.
3 User Interface Adaptation on the Semantic Web 3.1 Requirements for Effective User Modeling Requirements for creating effective user modeling systems have been documented in [7] and [4], and include: • Generality, including domain independence. User modeling systems should be usable in as many domains as possible, and within these domains for as many user modeling tasks as possible. • Expressiveness and strong inferential capabilities. Expressiveness is a key factor in user modeling systems; they are expected to express many different types of assumptions about the users and their context. Such systems are also expected to perform all sorts of reasoning, and to perform conflict resolution when contradictory assumptions are detected. • Support for quick adaptation. Time is always an important issue when it comes to users; User modeling systems are required to be adaptable to the users’ needs. Hence they need to be capable of adjusting to changes quickly. • Precision of the user profile. The effectiveness of a user profile depends on the information the system delivers to the user. If a large proportion of information is irrelevant, then the system becomes more of an annoyance than a help. This problem can be seen from another point of view; if the system requires a large degree of customization, then the user will not be willing to use it anymore. • Extensibility. A user modeling system’s success relies on the extensibility it offers. Companies may want to integrate their own applications (or API) into the available user models. • Scalability. User modeling systems are expected to support many users at the same time. • Import of external user-related information. User models should support a uniform way of describing users' dimensions in order to support integration of already existing data models. • Management of distributed information. The ability of a generic user modeling system to manage distributed user models is becoming more and more important. Distributed information facilitates the interoperability and integration of such systems with other user models. • Support for open standards. Adherence to open standards in the design of generic user modeling systems is decisive since it fosters their interoperability. • Load balancing. User modeling servers should be able to react to load increases through load distribution and possibly by resorting to less thorough (and thereby less time-consuming) user model analyses.
716
N. Partarakis et al.
• Failover strategies. Centralized architectures need to provide fallback mechanisms in case of a breakdown or unexpected situation. • Fault tolerance. In case a user inserts wrong data in his/her profile by mistake (i.e. a user denotes an opposite gender), the system must prompt the user to adjust the corresponding parameters, rather than reset his/her profile. • Transactional Consistency. Parallel read/write procedures on the user model should lead to the deployment of sufficient mechanisms that preserve and restore possible inconsistencies. • Privacy support. Another requirement of user modeling systems is to respect and retain the user's privacy. In order to meet these requirements, such systems must provide a way for the users to express their privacy preferences, as well as the security mechanisms to enforce them 3.2 User Interface Adaptation on the Semantic Web: Proposed Architecture Figure 1 presents the proposed implementation architecture for supporting adaptive interfaces on the Semantic Web.
Fig. 1. User Interface Adaptation on the Semantic Web: proposed architecture
Modeling User, Context and Interaction. In the proposed architecture, the Knowledge Base contains the ontology representing the modeled classes and properties for supporting the collection of parameters appropriate for modeling: • User Profile (Disability, Web Familiarity, Language, etc.) • Context Profile (Input-Output devices, screen capabilities, etc.) • User Interaction (monitoring user actions, user navigation paths, etc.)
User Interface Adaptation of Web-Based Services on the Semantic Web
717
The Knowledge Base can use web ontology languages such as OWL to store the appropriate information in the form of semantic web rules and OWL-DL [11] ontologies. This approach offers enough representational capabilities to develop a formal context model that can be shared, reused, and extended for the needs of specific domains, but can also combined with data originating from other sources, such as the Web or other applications. Moreover, currently the logic layer of the Semantic Web is evolving towards rule languages that enable reasoning about the user’s needs and preferences and exploiting available ontology knowledge [10]. An example of how user profile parameters can by modeled in an ontology is presented in Figure 2. User is a superclass that includes the user groups a user may belong to according to his/her functional limitations (NonImpairedUser, HearingImpairedUser, MotorImpairedUser or VisuallyImpairedUser), each of which is further analysed where appropriate.
Fig. 2. An example of an ontology representing user abilities
Designs Repository. The Designs Repository contains abstract dialogues together with their concrete designs. Following the Unified User Interface Design methodology [15], this is achieved through polymorphic decomposition of tasks that leads from abstract design pattern to a concrete artifact. Design Repositories for supporting adaptation of web-based services can consists of primitive UI elements with enriched attributes (e.g., buttons, links, radios, etc.), structural page elements (e.g., page templates, headers, footers, containers, etc.), and fundamental abstract interaction dialogues in multiple alternative styles (e.g., navigation, file uploaders, paging styles, text entry) [5]. Reasoner and Rule Engine. The Reasoner module, together with the Rule engine, undertakes the job of classifying instances and performing the overall decision making that is required for selecting the appropriate interaction elements to build the concrete user interface. In this context, the Reasoner classifies instances into classes that have a strict definition, taking into account the Open World Assumption (i.e., if there is a statement for which knowledge is not currently available, it cannot be inferred if it is true or false). The Rule Engine undertakes the classification into primitive classes and specifies and executes classification rules.
718
N. Partarakis et al.
Orchestration (Adaptation Core). The adaptation core undertakes the orchestration of the main modules of the proposed architecture. When a user profile is created, the Reasoner and Rule engine are invoked for classifying instances under various classes, computing inferred types and reasoning on the available context. The results are stored in the knowledge based and are used by the adaptation core for inferring specific actions regarding the activation and deactivation of alternative dialogs. The adaptation core is also responsible for re-invoking the aforementioned services when the data stemming from the user interaction monitoring process lead to the need of reevaluating existing user profile information through reevaluation of rules. 3.3 Benefits Regarding the adaptation process itself, the adoption of a semantically enabled inference mechanism potentially allows the evaluation of more complex rules, thus making reasoning more solid and enriching the application logic. Moreover, an ontology based specification of user, context and interaction profiles makes the potential extension of the system easier. Another important benefit of a semantically enabled adaptation approach is the increased possibility of learning user preferences. These attributes traditionally can be set by the user, but in most cases cannot be inferred from user actions. In the context o the proposed architecture it is possible to dynamically generate social tags that can in turn be used for performing adaptive filtering of information based on user preferences. A similar result can be also obtained by modeling user interaction data and performing batch analysis. This can be supported in the proposed architecture through introducing another layer of modeling beyond the designs repository used for strict UI purposes (i.e., a content modeling repository).
4 Conclusions and Future Work This paper has proposed an architecture for supporting the development of Adaptive User Interfaces on the Semantic Web, based on existing approaches which have been successfully used in the recent past for supporting adaptation of user interfaces in various contexts. Modifications to the architectural structure used in these adaptation frameworks have been proposed in order to cope with the requirements set in the context of the Semantic Web. Taking into account the enriched modelling and inference capabilities offered, this novel architecture aims at combining the benefits of the Semantic Web (such as extensibility, strong inference capabilities, etc.) with benefits of existing adaptation frameworks (such as the ability to address accessibility, user preference, various input output devices, etc). In future work, this implementation architecture will be employed in the context of the EAGER development framework. In this context, the Knowledge Base of Eager together with the inference mechanisms will be replaced by the modules proposed in the extended architecture (Knowledge base, Rule engine, Reasoner, etc.) allowing the reuse of facilities common to both architectures, such as the Designs Repository (which has been already put into use in the context of several interactive web based applications, such as the EDEAN portal, http://www.edean.org).
User Interface Adaptation of Web-Based Services on the Semantic Web
719
References 1. Antona, M., Savidis, A., Stephanidis, C.: A Process–Oriented Interactive Design Environment for Automatic User Interface Adaptation. International Journal of Human Computer Interaction 20(2), 79–116 (2006) 2. ASP.NET Web Applications, http://msdn.microsoft.com/en-us/library/ ms644563.aspx 3. Bernard, M.L.: User expectations for the location of web objects. In: Proceedings of CHI 2001 Conference: Human Factors in Computing Systems, pp. 171–172 (2001), http://psychology.wichita.edu/hci/projects/ CHI%20web%20objects.pdf [2007] 4. Çetintemel, U., Franklin, M.J., Giles, C.L.: Self-adaptive user profiles for large-scale data delivery. In: Proceedings of the 16th IEEE International Conference on Data Engineering (ICDE 2000) (2000) 5. Doulgeraki, C., Partarakis, N., Mourouzis, A., Stephanidis, C.: Adaptable Web-based user interfaces: methodology and practice. eMinds 1(5) (2009) http://www.eminds.hci-rg.com/index.php?journal=eminds&page= article&op=download&path=58&path=33 6. JavaServer Faces Technology, http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JSFIntro.html 7. Kobsa, A.: Generic user modeling systems. User Modeling and User-Adapted Interaction 11(1-2), 49–63 (2001) 8. Kostkova, P., Diallo, G., Jawaheer, G.: User Profiling for Semantic Browsing in Medical Digital Libraries. The Semantic Web: Research and Applications, 827–831 (2008) 9. Leuteritz, J.-P., Widlroither, H., Mourouzis, A., Panou, M., Antona, M., Leonidis, A.: Development of Open Platform Based Adaptive HCI Concepts for Elderly Users. In: Stephanidis, C. (ed.) Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009, San Diego, California, USA, July 19–24. Springer, Berlin (2009) 10. Michou, M., Bikakis, A., Patkos, T., Antoniou, G., Plexousakis, D.: A Semantics-Based User Model for the Support of Personalized. Context-Aware Navigational Services. In: First International Workshop on Ontologies in Interactive Systems, 2008. ONTORACT 2008, pp. 41–50 (2008) 11. OWL Web Ontology Language Reference. W3C Recommendation, February 10 (2004), http://www.w3.org/TR/owl-ref/ 12. Robal, T., Kalja, A.: Applying User Profile Ontology for Mining Web Site Adaptation Recommendations. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 126–135. Springer, Heidelberg (2007) 13. Savidis, A., Leonidis, A., Lilis, I., Moga, L., Gaeta, E., Villalar, J.L., Fioravanti, A., Fico, G.: Self-configurable User Interfaces. ASK-IT Deliverable - D3.2.1 (2004) 14. Savidis, A., Antona, M., Stephanidis, C.: A Decision-Making Specification Language for Verifiable User-Interface Adaptation Logic. International Journal of Software Engineering and Knowledge Engineering 15(6), 1063–1094 (2005) 15. Stephanidis, C.: The concept of Unified User Interfaces. In: Stephanidis, C. (ed.) User Interfaces for All - Concepts, Methods, and Tools, pp. 371–388. Lawrence Erlbaum Associates, Mahwah (2001) 16. Stephanidis, C., Paramythis, A., Sfyrakis, M., Savidis, A.: A Case Study in Unified User Interface Development: The AVANTI Web Browser. In: Stephanidis, C. (ed.) User Interfaces for All - Concepts, Methods, and Tools, pp. 525–568. Lawrence Erlbaum Associates, Mahwah (2001)
Measuring Psychophysiological Signals in Every-Day Situations Walter Ritter University of Applied Sciences Vorarlberg, Hochschulstraße 1, 6850 Dornbirn, Austria
[email protected] Abstract. Psychophysiological signals enable computer systems to monitor the emotional state of a user. Such a system could adapt its behavior to reduce stress, give assistance, or suggest well-being tips. All of this should lead to a technology that is more user-friendly and more accessible to older people. Measuring physiological signals in research labs has been done for many years. In such a controlled environment the quality of signals is very high because of the optimal placement of electrodes by research staff. Analysis techniques can therefore rely on high quality data. Measuring physiological signals in real-life settings without the assistance of well-trained staff, is much more challenging because of artifacts and signal distortions. In this paper we discuss the approach taken in the Aladin project to cope with the inferior and unreliable quality of physiological signal measurements. We discuss a sensor design intended for every-day use and present the variance of skin conductance we experienced within measurements, between different measurements of the same individual as well as between different persons. Finally, we suggest using trends instead of absolute values as a basis for physiology-enhanced human-computer interaction “in the wild”. Keywords: psychophysiology, skin conductance, heart rate, sensor technology, real-life settings, artifacts.
1 Introduction Measuring physiological signals has become widespread in an increasing number of application areas. Whilst it has been used within clinical applications for a long time1, recently it has also become popular in human computer interaction, as it provides clues as to what is going on psychologically inside a user. On the one hand, this provides valuable input in usability evaluations, identifying issues that could not be uncovered by questionnaires [8,10]. On the other hand physiology has been also proposed as an important input-channel for interactive computer applications [13]. Picard suggests the term Affective Computing for computers that also take emotions of their users into account for adjusting their behavior. Even though this term is mainly used 1
The origins of electrocardiogram (ECG) measurement date back as far as 1838, when Carlo Matteucci found out that each heart beat was accompanied by electrical current.
C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 720–728, 2009. © Springer-Verlag Berlin Heidelberg 2009
Measuring Psychophysiological Signals in Every-Day Situations
721
in the context of wearable computers, it also extends to a broader range of computer driven applications. Even intelligent building management systems and smart rooms come to mind. The physiological signals measured from human bodies range from electrocardiograms (ECG), electrodermal activity (EDA), electromyography (EMG) to electroencephalogram (EEG) and more. Each measure has its specific meaning, which is often dependent on other measurements. 1.1 Common Physiological Parameters ECG. Electrocardiograms show the electrical activity of the heart during the various phases of heart beats. The typical heart beat is characterized by the P, Q, R, S, and T waves as well as resulting segments [12]. The most prominent peak is the R-peak, which is often used to determine the RR-intervals also referenced as inter-beat intervals, describing the time between two consecutive heart beats. ECGs are applied in many contexts. In the context of human-computer interaction ECGs are often the basis for calculations derived from heart rate like heart rate variability. Heart rate variability is often analyzed in the frequency domain using spectral analysis techniques. Here the resulting power spectrum is divided into the frequency bands of very low frequency (VLF, 0.0033Hz - 0.04Hz), low frequency (LF, 0.04Hz 0.15Hz), and high frequency (HF, 0.15Hz - 0.4Hz). HF power spectrum is related to parasympathetic tone and variations due to spontaneous respiration, LF power spectrum indicates parasympathetic as well as sympathetic tone, and the VLF power spectrum, especially in shorter recordings, is considered an indicator for anxiety or negative emotions. The ratio of LF/HF is seen as balance indicator between the sympathetic and parasympathetic tone [1]. Whilst heart rate variability is a very promising indicator, variables in its processing, as well as the influence of artifacts or ectopic beats [3] make it very hard to use reliably in uncontrolled environments. EDA. Electrodermal activity is determined by measuring the electrical conductance of skin. Skin conductance is usually measured with electrodes placed on the index and middle finger or inside the palm [5]. For parameterization skin conductance usually is divided into the tonal skin conductance level (SCL) and skin conductance responses (SCR). Among the responses, a distinction between non-specific responses (responses without external stimuli) and specific responses to external stimuli is made. Specific SCRs are characterized by the latency time between the stimuli and the beginning of the rise, typically between one and three seconds, the amplitude of the rise (highly varying among test persons, also depending on habituation effects) and its half-value period [5, 16]. SCRs can therefore be used to detect physiological reactions to specific stimuli. If the occurrence of external stimuli is unknown, it is difficult to distinguish between specific responses and non-specific responses. Skin conductance level is often related to the concept of activation [6]. SCL is especially used as an indicator for the activation level [16], where longer term measurements allow for observations of the activation progress. EDA is a very popular measure in HCI, not the least because it can be easily measured on the fingers of the user, meaning there is no need for attaching electrodes underneath the clothes.
722
W. Ritter
EMG. Electromyography measures the activity of muscular tension. In this way, action of specific muscles can be identified and, for example, be used to count eye blinks, or tension of the neck muscles that may suggest frustration. Although the basic principle of measuring EMG is simple (two electrodes placed on the skin over the relevant muscle), the transformation of measured results into psychophysiological concepts is difficult. For example, a certain expression is usually caused by many contractions of multiple muscles [7]. Also, placement of electrodes on the surface often captures activity of various muscles, due to their relative size. Besides, electrodes may reduce the ability of an individual to move [2], thus also limiting the acceptance of applications in real-life settings. EEG. The electroencephalogram shows electrical activity on the head surface, allowing to deduce activity in specific parts of the brain. To be able to measure the very low amplitudes of activity, electrodes need to be attached using conductivity gel. The locations of electrode placement has been widely standardized in the so-called International 10-20 system. Basically, EEG measures the potential difference between two electrode sites. A theoretical distinction between active and inactive sites is made, where inactive sites are characterized as not being a source of electrical activity themselves. Among inactive sites are ear lobes, the vertex, or the nasion [14]. If the measurement is made by comparison of two active sites, the term bipolar recording is used. This is mostly prevalent in medical scenarios, whereas for psychophysiological purposes often monopolar recordings are used, meaning one site is inactive. According to Coles [4], EEG is typically recorded at sampling rates of 100Hz to 10000Hz. For the analysis of EEG signals, event-related brain potentials (ERP) play an important role. ERPs reflect brain activity caused by a specific discrete event, finally leading to a response. They are also regarded as indicators for psychological processes [4]. EEG is widely used in the field of brain-computer interfaces. However, its use in real-life settings for now has been constrained by the complicated measurement procedure involved. All of these measures described above work best with exact electrode placement on the human skin, especially EEG and EMG. Whilst in a laboratory, clinical, or otherwise controlled environment this is not much of a problem thanks to well-trained staff. Measuring these parameters in real-life settings by users is a completely different story. One problem may be the use of electrodes themselves. Telling a user to attach electrodes using a special gel to support conductance elicits negative associations reminding of medical treatments and thus leads to reluctance of use. But even if users were willing to use electrodes, the varying placements of them can lead to completely different measured values. In the following section we give an overview of the Aladin sensor glove, intended for physiology data collection in real-life settings, before we analyze the collected data regarding variances and their implications.
2 The Sensor Design In the Aladin project, psychophysiological measures were used as input for light adaptation algorithms as well as biofeedback applications. The project partners have developed an adaptive lighting system for elderly people with the aim to increase
Measuring Psychophysiological Signals in Every-Day Situations
723
mental and physical fitness [9]. As the project targets the elderly, in particular, it was essential that the sensor solution be accepted by potential customers. In a pre-study we learned that electrodes with the need for conductance gel were not accepted by the target group. Therefore we had to find alternative forms of electrodes that would be accepted but in turn may have the downside of not yielding equally reliable results. Wearable shirts would have been a nice solution, however, they only work well on skinny people. As regards psychophysiological measures we opted for heart rate and electrodermal activity, as both relate to the concept of activation and are also least susceptible to errors due to varying electrode placement, though still problematic. On the hardware side we developed a sensor design over two iterations. First we intended to use a sensor belt inspired by heart rate monitors, and also capable of measuring skin conductance at one side of the chest, temperature, and acceleration (see Figure 1). During a pre-test, however, we learned that attaching the sensor belt was a big problem for some persons of the target group due to their reduced mobility [15].
Fig. 1. Aladin sensor belt
Fig. 2. Aladin sensor glove
724
W. Ritter
As a result we opted for a glove based sensor design. A glove can be attached easily by all persons, and works well regardless of body size or figure. Heart rate in this sensor was measured via a method based on blood-volume pulse (BVP). Skin conductance was measured between the thumb and index finger (see Figure 2). Both sensor designs were based on a special mobile version of the Varioport recorders developed by Becker Meditec2. Sensor data is sent wirelessly to the measurement system via a Bluetooth connection. The decision for measuring heart rate based on BVP resulted in high levels of artifacts during movements, especially when moving the hand above the head, but in the typical usage scenarios this did not prove to be an issue. In the following section we describe the results we received using the Aladin sensor glove during field tests carried out in twelve test households.
3 Results of Psychophysiological Measurements from Field Tests Given the unpredictable usage scenarios during the field tests, the main questions were how reliable the signals would be within sessions of the same person on different occasions, as well as signal quality among different test persons, all compared to innersession variance, mostly caused by physiological processes. This should give an indication of which parameterization technique might be most suitable for the recorded data. As no information existed about what a person did at a specific time, all data was drawn from biofeedback exercises, where test persons should be reasonably calm and relaxed, thus not measuring results from unexpected actions. As skin conductance uses raw values and therefore is more prone to attachment-variations than the heart rate derived from BVP, we focus on skin conductance in this analysis. In the context of Aladin it was particularly important to get an idea of a person’s maximum span of skin conductance to give appropriate feedback of the relaxation progress in a biofeedback session. Table 1. Standard deviations within measurement sessions (Inner-STDEV) and between sessions (Inter-STDEV) for whole 4 minute periods and the first 30 seconds of the periods
2
Becker Meditec, Karlsruhe, Germany.
Measuring Psychophysiological Signals in Every-Day Situations
725
Fig. 3. Inter-person signal comparison
The following data is drawn from twelve field test persons - ten female and two male, aged between 64 and 82 years. A total of 1402 measurements each with a duration of four minutes were analyzed. Data was collected by the sensor glove described above. After initial instructions, the test persons attached the sensor glove by themselves without being observed or coached any further. As depicted in Figure 3 skin conductance differs enormously between individuals, but also within a single person, with the graphs showing minimum, maximum, aver-
726
W. Ritter
age and median of a measurement session. It can be seen that test person VP4 is much more stable than VP12, but even here variations between individual sessions outweigh the inner-session specific variations of skin-conductance. The graphs also show that average and median values are nearly identical. An analysis of standard deviations within individual measurement sessions and between them supports this impression. Table 1 lists individual standard deviations for whole 4-minute measurements, as well as for the first 30 seconds of them. Skin conductance values are not converted to µS units, but instead reflect raw device values, as for our purposes only relative changes are of interest. The average and standard deviation of all test persons' combined standard deviations support the hypothesis that the phenomenon of these highly varying valueranges between sessions is far more pronounced than changes within a session, and can be observed among all test persons. Figure 4 shows that for both period variants, 4 minutes and only the first 30 seconds, the proportion of changes between sessions and within sessions is about the same. The individual differences between persons are also apparent although the ratio between inter- and inner-session standard deviations is comparable.
Fig. 4. Standard deviations between measurements and within measurements within the whole 4 minute periods and the first 30 seconds of the periods
4 Discussion These results demonstrate that variations between individual measurements clearly outweigh changes within one measurement. In the context of our biofeedback application this means that we cannot use the history of past measurements, for example average or median values, directly to predict the span of values in the current session. As the graphs in Figure 3 illustrate, average and median values are nearly identical. This could be seen as an indicator that artifacts due to badly attached electrodes are minimal. Therefore the electrodes implemented into the sensor glove seem to be appropriate for use in such application scenarios. However, the results also seem to confirm that psychophysiology-based applications in real-life settings should not rely on absolute measures and amplitude-based
Measuring Psychophysiological Signals in Every-Day Situations
727
parameters, as these vary too much between usage sessions and also among different users. This is especially true for directly measured parameters like skin conductance, as opposed for example to heart-rate deduced from significant amplitude changes from the R-wave of ECG or pulse wave of BVP. One approach to circumvent the problem of highly varying levels and to predict the value-span would be to require a calibration procedure at the beginning of every usersession. This would give an idea of which values to expect and then scale feedback accordingly. This actually is common practice in laboratory situations to accommodate varying levels of skin conductance between individual test persons. From a usability standpoint, this would be highly annoying for users who just want to play a game or perform a biofeedback session, instead of helping the system to guess what to expect. A solution might be to embed the calibration procedure into the application, for example, as part of a game or exercise. In Aladin, we applied a different approach. Instead of having amplitude-based measures we suggest a simple approach based on trends. In the case of skin conductance this could be the integration of the number of occurrences of rising skin conductance, as well as periods of falling skin conductance, and then taking them as indicators for activation or relaxation. Such an approach based on trends already proved to be promising in our study of an adaptive memory game that subliminally changed its appearance based on physiological input in order to increase memory-performance of the user [11].
5 Conclusion The lessons learned from the field study and the analysis of the measured signals indicate that for psychophysiological appliances in real-life settings one might have to take a step back from the laboratory proven measurement and parameterization techniques and find new ways to utilize physiological signals in alternative human computer interaction. Whilst devices definitely will get better over time, and deliver more precise measurements of certain parameters, the basic problem is likely to remain: users have to accept these measurement devices, and as long as skin contact is required, simple and comfortable electrodes play an important role. This, however, also means that we have to make compromises regarding signal quality and reliability. In future we plan to further develop and evaluate alternative parameterization techniques for physiological values that are less demanding regarding signal quality than the standard laboratory routines, and therefore more appropriate as an additional channel in human-computer interaction for AAL applications in the wild. Acknowledgments. The Specific Targeted Research Project (STREP) Ambient Lighting Assistance for an Ageing Population (ALADIN) has been funded under the Information Society Technologies (IST) priority of the Sixth Framework Programme of the European Commission (IST-2006-045148).
728
W. Ritter
References 1. Acharya, R.U., Joseph, P.K., Kannathal, N., Choo Min, L., Suri, J.S.: Heart Rate Variability. In: Acharya, R.U., Suri, J.S., Spaan, J.A.E., Krishnan, S.M. (eds.) Advances in Cardiac Signal Processing, pp. 121–166. Springer, Berlin (2007) 2. Cacioppo, J.T., Tassinary, L.G., Fridlund, A.J.: The Skeletomotor System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 325–384. Cambridge University Press, Cambridge (1990) 3. Clifford, G.D.: ECG Statistics, Noise, Artifacts, and Missing Data. In: Acharya, R.U., Suri, J.S., Spaan, J.A.E., Krishnan, S.M. (eds.) Advanced Methods and Tools for ECG Data Analysis, pp. 55–100. Artech House, Boston (2007) 4. Coles, M.G.H., Gratton, G., Fabiani, M.: Event-Related Brain Potentials. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 413–455. Cambridge University Press, Cambridge (1990) 5. Dawson, M.E., Schell, A.M., Filion, D.L.: The Electrodermal System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 295–324. Cambridge University Press, Cambridge (1990) 6. Duffy, E.: Activation. In: Greenfield, N., Sternbach, R. (eds.) Handbook of Psychophysiology, pp. 577–622. Holt, Reinhart and Winston, New York (1972) 7. Eckman, P.: Methods for Measuring Facial Action. In: Scherer, K.R., Ekman, P. (eds.) Handbook of Methods in Nonverbal Behavior Research, pp. 45–90. Cambridge University Press, Cambridge (1982) 8. Izsó, L.: Developing Evaluation Methodologies for Human-Computer Interaction. Delft University Press, Delft (2001) 9. Kempter, G., Maier, E.: Increasing Psycho-physiological Wellbeing by Means of an Adaptive Lighting System. In: Cunningham, P., Cunningham, M. (eds.) Expanding the Knowledge Economy: Issues, Applications, Case studies, pp. 529–536. IOS, Amsterdam (2007) 10. Kempter, G., Ritter, W.: Einsatz von Psychophysiologie in der Mensch-Computer Interaktion. In: Heinecke, A.M., Paul, H. (eds.) Mensch & Computer 2006. Mensch und Computer im Strukturwandel, pp. 165–174. Oldenbourg, München (2006) 11. Kempter, G., Ritter, W., Dontschewa, M.: Evolutionary Feature Detection in Interactive Biofeedback Interfaces. In: Universal Access in HCI: Exploring New Interaction Environments. HCII 2005 Las Vegas Paper Presentation. CD-ROM (2005) 12. Papillo, J.F., Shapiro, D.: The Cardiovascular System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 456–512. Cambridge University Press, Cambridge (1990) 13. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997) 14. Ray, W.J.: The Electrocortical System. In: Cacioppo, J.T., Tassinary, L.G. (eds.) Principles of Psychophysiology, pp. 385–455. Cambridge University Press, Cambridge (1990) 15. Ritter, W., Becker, K., Kempter, G.: Mobile Physiology Monitoring Systems. In: Maier, E., Roux, P. (eds.) Seniorengerechte Schnittstellen zur Technik, pp. 78–84. Pabst Science Publishers, Lengerich (2008) 16. Vossel, G., Zimmer, H.: Psychophysiologie. Verlag W. Kohlhammer, Stuttgart (1998)
Why Here and Now Antonio Rizzo1, Elisa Rubegni1,2, and Maurizio Caporali1 1
University of Siena, Computer Science Department, Via Roma 56, 53100 Siena, Italy 2 Università della Svizzera italiana, TEC-Lab, via Buffi 13, 6900 Lugano TI (CH) {rizzo,caporali}@unisi.it,
[email protected] Abstract. The paper presents our vision in the process of creating new objects and things, based on reducing the estrangement of Internet content consumption by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place. These aspects should be incorporated in interactive artefacts not only for the contents consumption but also for editing and manipulating information. We present some projects and concepts that go in this direction, and among them we show the design solutions developed in our laboratory that aim to enhance the role of the physical location, social and cultural environment in affecting the contents and the way to interact with them. Keywords: Human-Computer Interaction, Interaction design, Situated Editing, Design for all, Tangible User Interface, Ubiquitous computing, Internet of Things.
1 Introduction We share the view that computers, phones and game consoles are no longer the only devices in our environment deemed worthy of being “intelligent” and connected. But within this broad view of the process of creating new objects and things, we want to address two specific issues. First, most current technology solutions take the user out from the physical and social context in which he is actually involved – communication devices and data links are seldom used to empower in-presence social activities. Second, the development of most networked devices proposes the irrelevance of physical location as one of the key advantages of being constantly connected. In considering these two issues, we do not dispute the utility of distance communication and any-place-any-time accessibility to information. We seek to complement these two crucial factors of ubiquitous computing with their converse. We propose to respond to these challenges by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place. Our goal is to develop design solutions that mitigate or even eliminate the almost compulsory estrangement from the physical context when using communication technology. We challenge information and communication technologies that allow anytime-anywhere access to provide content that could be enhanced by the fact of the user being in a given location. Internet resources are barely affected by the context in which the user accesses them - the content and interaction modality remain the same irrespective of physical C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 729–737, 2009. © Springer-Verlag Berlin Heidelberg 2009
730
A. Rizzo, E. Rubegni, and M. Caporali
location or cultural environment. For most web technologies the specific location from which the user accesses information adds nothing to the user’s experience: whether the user is in New York, Hong Kong, or the countryside near Siena doesn’t seem to matter. In the paper, concrete example of products will be provided and the implications of this approach will be discussed. These aspects should be incorporated in interactive artefacts not only for the contents consumption but also in editing and manipulating information.
2 Background and Motivation Lucy Suchman 0 gives a fundamental contribution in the direction of understanding the relationship between “interactive artefacts” and the context of use, emphasizing the role of the environment in the cognitive process. From this standpoint, designing an interactive artefact means not only designing a device, but designing new human activities and behaviour [1]. Since the cognitive process is socially and historically situated, the physical location is important and the focus of designing interactive systems should be on the usage. The Situated action theory 0 influenced the research behind the design of interactive systems and contributed to define a methodological background for development of interactive system 0. In this view, the physical/social context, the range of human activities supported and the contents provided are the main factors that should be considered in the definition of the user interface [2]. Internet and mobile ICT promise the redefinition of spatial, temporal and contextual boundaries promoting “anytime, anywhere” access to information 0 allowing new interaction modalities among people, objects and machines located in the environment. Especially mobile interaction implies the possibilities for the compression and fluidification of space, time and context of interaction [6]. The patterns of spatial behaviour and temporal understanding change dramatically within mobile technology [8]. Mobile technology opens new visions for design and creates new challenges as the adaptation of the contents and of the interaction patterns to the context of use. Thus, designing user interaction needs to consider the temporal and spatial interdependencies. One of the most current topics of debate regards designing for Situated interaction [9], considering context as not merely the location, but also as the user activity supported. Recent studies on embedding information devices in the environment put significant attention on the understanding of possible patterns of interaction enabled by technology, the physical/social and cultural environment and the human activities potentially supported [12, 16, 21]. The issue of the compulsory estrangement from the physical context when using communication technology is a key challenge in research, especially in the domain of Ubiquitous computing [24], Physical Computing [15] Everyware [5], Tangible User Interface [22] and the Internet Of Things [14]. These are based on the same viewpoint: computers, phones and game consoles are no longer the only devices in our environment deemed worthy to embody computation and be connected. New kinds of devices potentially enable interaction modalities oriented to provide added value of being in a given location when accessing a given content [e.g. 23]. In this perspective, the design
Why Here and Now
731
has to be based on the creation of patterns of interaction in accordance with the specific type of place, contents and activity that differ according with the technology used. The vision given by the above mentioned emerging technology domains allows the definition of innovative interaction modalities in terms of contents consumption and editing. For example, current research considers possible ways in which contents consumption and manipulation can be affected by the location in which they are delivered. Therefore, even the manipulation of contents has to consider the context in which the activity takes place. In this direction, an interesting concept that emerged from research is the Situated Editing (SE). SE was developed within the POGO project as a way to join the “invisible computing” approach [12], that is, to allow a seamless integration of the physical and virtual world through intuitive interaction modalities. These concepts are extended from the POGO project research, which designed and tested prototypes for children of elementary school level to support narrative storytelling through interaction with digital devices [17]. SE enables real-time manipulation of educational assets, permitting students and teachers to share the production of course content. The environment developed in POGO has a number of tools that support the SE. Raw non-digital media elements (e.g. drawings, sounds) can be converted into digital assets using tools for rich asset creation. These digital assets are stored on physical media carriers (transponders) and can be used in tools that support story telling. With these tools assets come alive on a big projection screen, sound system, paper cardboard, paper sticks etc. The system offers tools to capture the creative end-results and share it with others using the Internet in movies or in digital or paper based storyboards. This concept orients the design of the few projects at our laboratory and inspires the development of several prototypes. The SE concept as has been elaborated in our project has been slightly modified. The interaction modalities designed for the consumption and manipulation of the Internet contents are defined for specific devices as computer or mobile phone. Recent elaborations of these modalities go in the direction of adding physical control devices to extend the interaction to physical world. Though these modalities could enhance user experience, they always enable the manipulation and fruition of contents in a traditional ways: using computer or mobile device in defined place (or in movement) and time. In our perspective the way people access, consume and manage Internet contents has to be enriched by a given location and context. It means that being in a physical space should affect people interaction not just in terms of the device used but also concerning the way people interact with it. Thus, the system should be aware if the user is alone or with others, at home or in the office and provide more suitable interaction. The idea is not to jeopardize the actual way of interacting with media, but to enrich and complement these modalities. Our research activity goes in the direction of extending this research line, by offering new patterns and tools that enable an interaction contingent on the user location (here and now) and activities. Our aim is to reduce the estrangement that actually characterizes the interaction through traditional devices (e.g. desktop, laptop, mobile phone) by conceiving interaction modalities suitable for social activities occurring in the here-and-now, in real-time and real-place.
732
A. Rizzo, E. Rubegni, and M. Caporali
Following we provide some examples of artefacts that embody the concept of situated editing as a metaphor to mediate interaction between tools and human activity. Several projects are oriented to improve the role of here and now in the fruition of contents. These projects span many modes of interaction, from gestural (gesture as the main input based interaction), visual (manipulation of images) and aural (sound based interface).
3 Situated Interaction and Editing: Some Examples The most consolidate interaction modalities to access Internet contents are based on keyboard and mouse but recently Ubiquitous computing, Physical Computing, Everyware, Tangible User Interface and the Internet Of Things propose richer interaction patterns. Nowadays, one of the main problems related to the contents consumption concerns the time spent for accessing and managing information, and different modes of accessing it in different places. The Web 2.0 enables Internet users to access contents in real time (e.g. video streaming). This is based on the idea that the contents and the way information is consumed and edited may change according with the location, timing, channel and technological device. In our perspective, contents may and have to be affected by these dimensions. The same contents can be accessible by the users in a huge variety of situations (being in motion or being in a specific physical space) through a specific channel (e.g. audio or video streaming) and using the devices that are more suitable to the context of usage. For example, while the user is driving (s)he can listen to the radio information that are located on his/her computer at office, or once at home (s)he can read news selected beforehand at the office, on the screen of his TV. In this case the interaction modalities and the delivery of contents change according with location and the activities in which the user is involved. Channels that deliver contents can be determined by the user location and activity. For example, the user can bookmark content and decide how and when to access it according to specific location, channel or timing. In order to exemplify our vision on situated editing and interaction we use some examples of projects and concepts. There is a huge quantity of projects that address the issue of manipulating media, from which we selected for illustration just those that better represent our vision. Some of the projects mentioned below concern editing and management of contents, tagging, social bookmarking, mobile social networking. Recently, the trend of editing and manipulating digital contents moves interaction from the desktop to the physical world. Aurora1 is a new concept for a browser that integrates Web, desktop and physical location of the user. The browser is aware of the user physical context and proposes patterns of interaction suitable to it, merging data and user behaviour. Whenever possible, Aurora leverages natural interactions with objects in space. The interaction with physical objects for the manipulation of media is a key feature of the Bowl project. The Bowl2 [10] focuses on the use of tangible interfaces for 1 2
http://adaptivepath.com/aurora/ http://www.thisplacement.com/2007/11/12/bowl-tokene-based-media-for-children-at-dux2007
Why Here and Now
733
handling media in the home environment. The bowl, placed on the living room table, can be used for the manipulation of media moving from physical towards online, social and time shifted distribution with services like YouTube, Vimeo, TiVo and Apple TV. The project aims to find solutions for the physical form of the interface, the types of interaction and the kinds of suitable content. In the same direction, Siftables3 [11] enables people to interact with information and media approaching interactions with compact devices with sensing, graphical display, and wireless communication capabilities. These compact devices can be physically manipulated as a group to interact with digital information and media. Items can be sorted into piles or placed one close to the other.
Fig. 1. On the left side a picture of the Bowl project; at the right side Siftable items
Several projects regarding the issue of location affects contents are related to the domain of the mobile social networking [19]. Many projects aim to enhance the engagement of people with the local environment. The project Whrrl4 allows the real-time personalization of the physical environment based on local discovery and user-generated content through the integration of Web and mobile experience. Whrrl enables the sharing of local knowledge with other members of the community. All these examples share the same vision of adding value to interaction by enabling the manipulation of physical objects for editing media and affecting contents with the physical location. Though these projects address these issues, they provide a way of interacting with content that is traditionally connected to the use of computers or mobile devices. These projects show that there is still a lack in reducing the estrangement of people in interacting with contents in a given location. Our purpose is to address to this challenge by conceiving innovative interaction patterns that consider the given location and context. Following, some examples of projects from our research laboratory. 3.1 Wi-roni and Other Examples The Wi-roni project [18] goes in the direction of enhancing the role of the territory (in both social and physical aspects) in affecting the contents (and the way to interact with contents). Wi-roni addresses to improve Internet contents consumption and 3 4
http://web.media.mit.edu/~dmerrill/siftables.html Launch Pad, O'Reilly Where 2.0 Conference, Burlingame, CA, May 12, 2008.
734
A. Rizzo, E. Rubegni, and M. Caporali
manipulation considering the added-value of being in a given location when accessing a given content. Wi-roni is a Urban Architecture project located in the La Gora public park in Monteroni d'Arbia, a small village in the province of Siena (Italy). For this project, we developed two interconnected solutions: Wi-wave, a vertical pillar for accessing web audio content in public spaces and Wi-swing, a children’s swing that tells stories while swinging. Wi-wave facilitates the interaction of people with web content in public spaces. Wi-wave uses ultrasonic sensor technology to capture physical gestures as a navigation interface for three channels of audio playback/streaming. The content offered by Wi-wave is a collection of two audio types: streaming radio and synchronized podcast. Wi-wave allows everyone to listen to Podcasts in a public area, and, from a research perspective, allows us to explore issues regarding the design of interaction through patterns of behavior that may have aesthetic and imaginative value. Users can also upload through their mobile device their podcast files and listen to these in the Park sharing the contents with their friends.
Fig. 2. On the left side the Wi-roni playground system; on the right side Wi-wave final prototype
Wi-wave is a fully implemented prototype installed in the public park, while Wi-swing is a concept under development. Wi-swing is the tool that associated with Wi-wave allows the situated editing. Wiswing is a tool for listening to storytelling and in general for broadcasting the output of Wi-wave (the other concept developed in the project). Children can browse and control the speed of the narrative through the movements of the swing and can edit contents delivered by adding sound to the original story. These modalities are complementary and they can be activated according with the context of use: if the child is alone s/he can just listen to the story, but if more children are playing with Wi-swing the author modality is switched on. They can play with sound objects and create the audio for the story.
Why Here and Now
735
Another concept developed at our laboratory is the “Parco della Luna”(“Moon park”). The project aims to support people activities in a public park namely “Parco della Luna” devoted at delivering astronomical contents. “Parco della Luna” extends the exploration of the sky, in the physical place, sitting or lying down in the park, through the Internet. The idea is to use spatial references such as trees or buildings for navigating information on the Internet and have an overlapping layer between what you are seeing in the physical space and information from Internet. Currently, between the place where you are living the experience with the action and the place where you are interacting with the on-line world, there is a gap. The prototypes and concepts mentioned above (Wi-roni and “Parco della Luna”) developed at our laboratory try to reduce this gap.
4 Conclusion Our vision is based on reducing the estrangement of Internet content consumption by conceiving interaction modalities suitable for social activities occurring in the hereand-now, in real-time and real-place. Using the metaphor of water distribution as a stand-in for the Internet: water should be in any location where we want life, but it is a matter of human cultural design whether to have a simple tap or to build a fountain. Our perspective offers room for design in the direction of creating “fountains” with unique interaction modalities where the emerging behaviour of people becomes a value itself. This perspective opens a huge space for the design of user experiences that integrate existing modalities of interaction (as those enabled by desktop and mobile phones) with innovative ones (e.g Wi-wave) in order to extend the opportunities offered by a given location and context for contents consumption and manipulation. Another important aspect regards the trend of merging digital and physical worlds that allow the use of pre-existing objects for the editing and consumption of media. For example, the interaction conceived in Wi-swing for allowing children to listen to narratives and to edit sounds is based on the traditional modalities of playing with a swing. In these situations, the user experience is enriched, bringing together familiar modes of interacting with physical objects and the interactive possibilities offered by the use of media. The SE principle moves from the opportunities for actions (affordances), which are available to specific people in given context, towards elaborating interaction modalities that seamlessly integrate the physical and the digital world. This integration is not only a way to empower existing opportunities for action with new goal-oriented effects and transformation in the world but also a way to transform constraints in resources. This happens since the design of the interaction follows the action pattern that specific users tend to privilege, according to their abilities and competence, in a given context and set of activities associated. Furthermore, the interaction modalities can overlap with one another, that is, different users could reach the same or close goals through their own interaction pattern. This seems to us a promising way to approach the design considering all the issues we have mentioned [4].
736
A. Rizzo, E. Rubegni, and M. Caporali
References 1. Bannon, L.J., Bødker, S.: Beyond the interface. Encountering artifacts in use. In: Carroll, J.M. (ed.) Designing interaction: Psychology at the human-computer interface, pp. 227–253. Cambridge University Press, Cambridge (1991) 2. Bødker, S.: A Human Activity Approach to User Interfaces. Human Computer Interaction 4(3), 171–195 (1989) 3. Bødker, S.: Creating conditions for participation: Conflicts and resources in systems design. Human Computer Interaction 11(3), 215–236 (1996) 4. Emiliani, P.L., Stephanidis, C.: Universal access to ambient intelligence environments: opportunities and challenges for people with disabilities. IBM Systems Journal 44(3), 605–619 (2003) 5. Greenfield, A.: Everyware: The Dawning of Ubiquitous Computing. New Riders Publishing (2006) 6. Kakihara, M., Sørensen, C.: Mobility: An Extended Perspective. In: Sprague Jr., R. (ed.) Thirty-Fifth Hawaii International Conference on System Sciences (HICSS-35), Big Island Hawaii. IEEE, Los Alamitos (2002) 7. Kleinrock, L.: Nomadicity: Anytime, Anywhere in a Disconnected World. Mobile Networks and Applications 1, 351–357 (1996) 8. Ling, R.: The Mobile Connection: The Cell Phone’s Impact on Society. Morgan Kaufmann, Amsterdam (2004) 9. Mccullough, M.: Digital Ground: Architecture, Pervasive Computing, and Environmental Knowing. MIT Press, Cambridge (2004) 10. Martinussen, E.S., Knutsen, J., Arnall, T.: Bowl: token-based media for children. In: Proceedings of the 2007 Conference on Designing For User Experiences. DUX 2007, Chicago, Illinois, November 05 - 07, 2007, pp. 3–16. ACM, New York (2007) 11. Merrill, D., Raffle, H., Aimi, R.: The Sound of Touch: Physical Manipulation of Digital Sound. In: The Proceedings the SIGCHI conference on Human factors in computing systems (CHI 2008), Florence, Italy (2008) 12. Norman, D.: The Invisible Computer: Why Good Products Can Fail, the Personal Computer Is So Complex, and Information Appliances Are the Solution. MIT Press, Cambridge (1999) 13. Norman, D., Draper, S.W.: User-Centered System Design: New Perspectives on HumanComputer Interaction. Lawrence Erlbaum Associates, Hillsdale (1986) 14. O’Reilly, T.: Web 2.0 Compact Definition: Trying Again (2007) (retrieved on 2007-01-20) 15. O’Sullivan, D., Igoe, T.: Physical Computing. Thompson, Boston (2004) 16. Redström, J.: Designing Everyday Computational Things. PhD dissertation (Report 20), Göteborg University (2001) 17. Rizzo, A., Marti, P., Decortis, F., Moderini, C., Rutgers, J.: The POGO story world. In: Hollnagen, E. (ed.) Handbook of Cognitive Task Design. Laurence Erlbaum, London (2003) 18. Rizzo, A., Rubegni, E., Grönval, E., Caporali, M., Alessandrini, A.: The Net in the Park. In: Special issue Interaction Design in Italy: where we are, Knowkege, technology and Politics Journal. Springer, Heidelberg (in printing) 19. Strachan, S., Murray-Smith, R.: GeoPoke: rotational mechanical systems metaphor for embodied geosocial interaction. In: Proceedings of the 5th Nordic Conference on HumanComputer interaction: Building Bridges, NordiCHI 2008, Lund, Sweden, October 20 - 22, 2008, vol. 358, pp. 543–546. ACM, New York (2008)
Why Here and Now
737
20. Suchman, L.A.: Plans and situated actions. The problem of human-machine communication. Cambridge University Press, Cambridge (1987) 21. Susani, M.: Ambient intelligence: the next generation of user centeredness: Interaction contextualized in space. Interactions 12(4) (2005) 22. Ullmer, B., Ishii, H.: Emerging frameworks for tangible user interfaces. IBM Systems Journal 39, 915–931 (2000) 23. Vazquez, J.I., Lopez-de-Ipina, D.: Social devices: Autonomous artifacts that communicate on the internet. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 308–324. Springer, Heidelberg (2008) 24. Weiser, M.: Ubiquitous computing. Computer 26, 71–72 (1993)
A Framework for Service Convergence via Device Cooperation Seungchul Shin1, Do-Yoon Kim2, and Sung-young Yoon3 1
Samsung Electronics 2 LG Electronics 3 Yonsei Univ. 232, Sinchon-Dong, Seodaemun-Gu, Seoul, 120-749, Republic of Korea
[email protected],
[email protected],
[email protected] Abstract. Device convergence is one of the most significant trends in current intelligence technology (IT). It is incorporating various kinds of existing services into one device that enables users to provide converged services. However, device convergence is not the whole counterplan that in some fields, costumers prefer divergent application-specific device. In addition, service is always hardware dependent that if a new service appears device convergence will be helpless in case of the hardware not supporting the service. In this situation to use the new service, we have to purchase the whole new device. Therefore, we propose a framework for service convergence via device cooperation supported by the wireless network to overcome the constraint of device convergence. Our framework shows a guideline that enables a device to provide a service by cooperating with other devices despite it lacks the hardware support or only provides one or two specialized services. Keywords: service convergence, device cooperation, device convergence, mobile computing.
1 Introduction As the hardware resources and the mobile computing technology has improved, various fixed and mobile services such as games, digital multimedia broadcasting (DMB), location based services (LBS), RFIDs are now supported on a single handset. This device convergence approach makes the mobile device a phone, a mp3 player, a digital camera, a camcorder, a game player, a pda and so on while the device divergence approach focus on one or two specialized services. Though there are huge arguments prospecting the future trend of the mobile device, the device convergence is still mentioned as a key. However, there are some issues on the device convergence. First, the support of multifunction makes the interface generic because multiple functions have to be combined into one design which loses the peculiarity [1]. Second, despite the hardware resources have improved, the cost, battery consumption and processing power increases. Third, the key factor of the success is the consumer’s demand not the C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 738–747, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Framework for Service Convergence via Device Cooperation
739
Fig. 1. The concept of the Service Convergence via Device Cooperation
technology [2]. Finally, the boundary of the devices will be ambiguous that the customers will lose a large opportunity of choice. On the other hand, the device divergence has some features [1]. First, the interface and the layout of the device are closely related to its services. Second, it has developed a new market of widgets related to the device that enriched additional specialized functions. For an example, the Apple iPod, iPod nano, iPod shuffle [3] have limited services but also has a bunch of accessories such as iPod Hi-Fi, Armband, Nike+iPod Sport Kit, radio remote and so on to supplement them. Third, as the result, a high degree of differentiation can be accomplished. In [4] and [5], it concludes that divergent application-specific device is more probable than a perfect level of convergent device. However, arguing device convergence and device divergence of which will be the trend in the future can be no use when a new service appears but the hardware lacks the ability. This occurs because services are hardware dependent. If the device has an interface to attach a supplementary hardware for the new service, it will be no problem. But in the case such as device has no interface, no attachable hardware exists, the attachable hardware is too expensive or is too cumbersome due to the size, the weight, the design, then we have to buy a whole new device supporting that new service. We propose a framework for service convergence via device cooperation to overcome the constraint of the hardware support and to avoid device convergence but to support multiple functions in mobile environment. Service divergence is incorporating disparate services together into a combined service. By using the ability of wireless network, the framework enables a user to use the services of the surrounding devices in his handset as if he carries a service-convergent device.
740
S. Shin, D.-Y. Kim, and S.-y. Yoon
The remainder of this paper is organized as follows. In section 2, we discuss several types of convergence for the related work. Section 3 presents the results of the analysis based on web-based survey related to the device convergence and the device divergence. In section 4, the type, classification and flow of the context used in our proposed system is introduced. Section 5 describes the overall architecture of the framework, the service discovery protocol, the context exchange protocol, privacy and authentication for service convergence via device cooperation. In section 6, we implemented an application of the framework. Several PDAs and a ultra-mobile-PC (UMPC) was used for the device. Section 7 evaluates the performance of the framework and finally section 8 concludes the paper and discusses about future work.
2 Framework Design In this section, we introduce the framework for service convergence via device cooperation. The proposed framework is systematically divided into components for the operations and protocols for communication between the mobile devices. The components are classified into three hierarchical layers. Moreover, only the communication protocols between the layers or the devices are defined. This makes the architecture support flexibility and extensibility. The framework is conceptually classified into two part.: One for the service provider and the other for the service consumer. The service consumer indicates the user’s device and the service provider is the cooperative device that provides its service through the wireless network. The role of the service consumer and service provider are interchangeable. They are organized into the three hierarchical layers mentioned above which are the interface layer, the core layer and the communication layer [19]. The details are described below: The interface layer connects the service application and the framework. In this layer, a device communicates with the service using the service API. A module for managing the service API and for translating the received message from the service API is loaded. The layer locates between the application and the core layer. The Core layer components include the core functions of the device, manage the entire framework and control the operation. Moreover, the system log manager supports the record of all operations in the framework. The communication layer supports the component for communication between devices. It delivers the transferred message to the upper layer. 2.1 Service Consumer Service Consumer is the device which requests for the services supported by other devices. It contains the component which discovers nearby service lists. Figure 2 is the architecture of the service consumer. It is divided to the interface layer, core layer and communication layer. The interface layer consists of the service manager and the message parser. The service manager manages the service list to support simultaneous service for the framework. In addition, it request for the service from the service API and informs the result. The message parser parses the input API parameter and analyzes it to an understandable format for the framework. The core layer is composed of the service consumer controller, the database manger, service list manager, task manager and the log
A Framework for Service Convergence via Device Cooperation
741
Fig. 2. Service Consumer System Architectures
manager. The service consumer controller supervises the entire components in the framework. The data manager stores the discovered service list and the service description, and also retrieves them when requested by the service API. The stored service provider list is continuously supervised. The task manager manages the task queue for the pending service requests. The log manager is a component for the developers that it records the entire operations executed in the service consumer device. The communication layer is organized into the service discovery protocol component and the service transfer / receiver protocol all related to packet transfer components. The service discovery protocol has the duty of collecting service lists through network and sending the results to the service data base manager. The service transfer / receiver component has the responsibility to send / receive the requested service 2.2 Service Provider Service Provider is the device that provides its service to other devices. Figure 3 shows the architecture and the components. The service manager and the message parser is the component of the interface layer in the service provider. They function as same as the ones in the service consumer. The core layer in service provider consists of service provider controller, scheduler, service database manger and the log manager. The service provider controller controls the overall components and the operations. The service database manager contains the service list which the device supports. The context transfer scheduler schedules sending a service when an event occurs or is activated in a periodical cycle. The log manager component in the service provider is the same as the one of the service consumer. The service discovery and the service transmitter component are the ones in the communication layer. These are for communication which is a protocol to use a service. The service discovery component sends the request form the service consumer to the upper layer. The service provider
742
S. Shin, D.-Y. Kim, and S.-y. Yoon
Fig. 3. Architecture of Service Provider
controller component then decides the possibility of sending the service list. The service transmitter component receives the appropriate service from the service provider controller component The transmitting time is decided by the scheduler.
3 Implementation of Device Share We have implemented the proposed framework for service convergence via device cooperation. The proposed framework is based on wireless network; However not all mobile devices supports wireless network. Therefore, we chose PDA, Smart phone and Ultra Mobile PC (UMPC) for the mobile device. We limited the services which the chosen mobile device can provide for the testing. Proposed framework was implemented based on Microsoft .NET Framework 2.0. The detail specifications are shown in table 1. Fig. 4 shows the mobile devices used for the implementation. Each device provides its own unique function such as RFID, Camera, DMB, and GPS. All of them support wireless LAN and share the service through wireless LAN. In case of HP RX5965 PDA, we have used external GPS for testing because built-in GPS had a poor receive sensitivity. Table 1. The detailed specification of devices used for the implementation Device Name(Type) Acer n50 (PDA) HP RW6100 (Smart Phone) Samsung Q1 (UMPC) HP RX5965 (PDA)
CPU 520Mhz 520Mhz 900Mhz 400Mhz
O/S Wireless WM 2003 O WM2003 O Win XP O WM 5.0 O
Bluetooth O X O O
Services RFID Camera DMB GPS
A Framework for Service Convergence via Device Cooperation
743
Fig. 4. The snapshot of devices used for the implementation
In case of UMPC, we tried to share DMB service; however we could not acquire the development kit for DMB control. Therefore, we used it as a service consumer to use service from the other devices. The mobile devices all support Bluetooth and 802.11b except RW6100 does not have Bluetooth. We used only 802.11b for the evaluation this time.
(a)
(b)
(c)
(f)
(d)
(e)
(g)
(h)
(i) Fig. 5. The screenshot of implementation. (a) ~ (e), (g), and (h) shows the services running on PDA. (f) shows the application on PC. (i) shows the images using the implementation.
744
S. Shin, D.-Y. Kim, and S.-y. Yoon
Fig.5 shows the implementation of the proposed framework. Fig.5 (a) shows the initialized main menu in PDA. Since the framework is not yet executed, no service appears in the display. Fig.5 (b) shows the result of Service list that is received from peripheral devices after execution of Service Discovery. As you can see on Figure, searched Service has the Service name together with activated button. If you press this button, you can use the Service through the hardware of peripheral Service provider. Fig.5 (c) shows the view of using RFID tag Reader through the peripheral RFID Service provider. This system is not equipped with RFID Reader. Fig.5 (d) shows the picture of receiving local information from GPS Service provider. As you can see on this figure, the location of Service provider is indicated on screen. Fig.5 (e) shows the view of taking picture by Camera Service. Fig.5 (f) shows that the local information provided from Fig.5 (d) is saved and indicated on Google Map. Fig.5 (f) and 5 (g) shows the implementation using RFID tag on u-TourGuide one of the result of previous studies. And the rest figures show the view of using actual Service.
4 Experimental Results In this chapter, we measure performance of framework developed in Chapter 3. In this test, it measures Service research time according to the number Service in Discovery protocol and investigates the difference of Service time between passed through framework and local providing. It shows the result of the time based on the number of client to Service provider and battery consumption time in case of processing with attaching to local and network. Developed system can only measure 1ms unit, so we measure the time with the number of System Tick provided by .NET Framework. Under the testing environment, measuring of 100ms tick was about 102 Tick Count. It is anticipated that 1Tick is Approx. 1ms. All the value used in this testing were recorded through the file in/output of log manger, therefore if I/O is hard, there may have little bit of latency compared to actual consumption time. The result of testing is like following. 4.1 Service Discovery Time Fig. 6 shows Service research time based on the number of device. In this testing, in a testing room of small-sized network, we added one more PC to the device of figure 8 and measured consumption time of Discovery protocol. PC is connected with wire LAN and the rest units are connected to network oriented to AP with wireless condition. Fig.10 shows the kind of devices used in X axis testing and Y axis is Tick Count that is held by Service Discovery response. ‘Send’ expresses the time until the transmission of Request packet for Discovery. The testing result of Figure is from the measuring on Acer n50 Premium PDA which has RFID Service and it allows Loop back for the response regarding transmitted Discovery from self device. As a result of testing, 3 PC, GPS and RFID devices had constant response time. On the other hand, RW6100 and UMPC consumed relatively long time for Discovery. In case of RW6100, it is Smart phone that has operated many processes compared to other devices; therefore it takes more time to request Discovery response. In case of UMPC,
A Framework for Service Convergence via Device Cooperation
745
Fig. 6. Service discovery protocol time depend on number of service devices
Fig. 7. Data transmission time on each services
WindowsXP is too heavy to operate in UMPC, therefore it is able to be anticipated that the response request has some delays. Fig.7 is the summary of Service time difference between local service in each Services and remote device using proposed framework. In this testing, we showed testing result for Camera, RFID, GPS, 3 items only. For the Camera, we utilized unique interface that is used for RW6100 and it transmits the date about 80000 ~ 11000byte during its data transmission. Both GPS and RFID are using serial interface and transmit approximately 26 byte at once. We summarized the type of interface for each Service and data amount on Fig. 8. In case of camera, it transmits preview screen also in real time, SDK that used in this testing is not able to control preview screen. Therefore we did not transmit any motion picture of preview and only measured the time when it captured the image.
746
S. Shin, D.-Y. Kim, and S.-y. Yoon
4.2 Service Transmission Time We excluded the time of camera open or initialization. In case of RFID, it is necessary to execute both Trigger for selecting of tag and Read operation for reading the tag. But in this testing, we only measured the Service time during Read operation. In case of GPS, we set that the Service provider transmits coordinates only for the time of local information request in a condition of GPS on every time. Therefore there was no measurement for extra initialization time. As a result of testing, we realized that camera consumed quite long time to transmit the date with using proposed framework. Same as Discovery, RW6100 consumed more latency because of CDMA function by the feature of Smart Phone. Big amount of date to transmit and receive also affect a lot for that. Table 2. Service Type, Data Amount to transmit and Interface Device Name(Type) Acer n50 (PDA) HP RW6100 (Smart Phone) HP RX5965 (PDA)
Service Type RFID Camera GPS
Interface Serial Unknown Serial
Min DataS ize 4 Bytes 85,700 Bytes 8 Bytes
Max Data Size 64 Bytes 105,124 Bytes 8 Bytes
5 Conclusions and Discussion In this paper, we proposed a framework for service convergence via device cooperation in mobile environment. The framework is proposed to cope with the limitations of the hardware dependent service. By cooperating the devices’ ability, a user can use all the combined service which is called service convergence, although the hardware could not support them. This means that the device does not have to convergent all kinds of services but only can support one or two specialized high quality services. Moreover, the architecture of the framework is classified into three hierarchical layers which can be easily extendable. As the result, it will have no problem to cope with any new service in the future. We proved the proposed framework has reasonable performance under mobile computing environment through actual implementation and testing results.
References 1. Norager, R.: Complex Devices, Widgets and Gadgets.: Product Convergence. Copyright by Rune Norager 2005 (2005) 2. Yoffie, D.: Competing in the Age of Digital Convergence. Harvard Business School Press, Cambridge (1997) 3. iPod. Apple Inc., http://www.apple.com/ipod/ipod.html 4. Kim, Y.-B., Lee, J.-D., Koh, D.-Y.: Effects of Consumer Preference on the Convergence of Mobile Telecommunications Device. Applied Economics 37, 817–826 (2005) 5. Mueller, M.: Digital convergence and its consequences, Working Paper, Syracuse University (1999)
A Framework for Service Convergence via Device Cooperation
747
6. Bubley, D.: Device divergence always beats convergence, http://disruptivewireless.blogspot.com/2006/05/ device-divergence-always-beats.html 7. Yasuda, K., Hagino, Y.: Ad-hoc Filesystem: A Novel Network Filesystem for Ad-hoc Wireless Networks. In: Lorenz, P. (ed.) ICN 2001. LNCS, vol. 2094, pp. 177–185. Springer, Heidelberg (2001) 8. Kravets, R., Carter, C., Magalhaes, L.: A Cooperative Approach to User Mobility. ACM Computer Communications Review 31 (2001) 9. Shin., S.-C., Kim, D.-Y., Cheong, C.-H., Han, T.-D.: Mango: The File Sharing System for Device Cooperation in Mobile Computing Environment. In: Proceedings of International Workshop on Context-Awareness for Self-Managing Systems (CASEMANS 2007), Toronto, Canada (May 2007) 10. Sun Microsystems, Jini architecture specification, http://www.javasoft.com/products/jini/specs/jini-spec.pdf 11. UPnP Forum: Understanding Universal Plug and Play White Paper (2000), http://www.upnp.org 12. International Organization for Standardization: Information technology – Automatic identification and data capture techniques – Bar code symbology – PDF-417. QR Code. ISO/IEC 15438 (2001) 13. International Organization for Standardization: Information technology – Automatic identification and data capture techniques – Bar code symbology –QR Code. ISO/IEC 18004 (2000) 14. ColorCode. ColorZip Media Inc (2006), http://www.colorzip.com 15. Kim, D., Seo, J., Chung, C., Han, T.: Tag Interface for Pervasive Computing. In: Proceedings of the international conference on Signal Processing and Multimedia, pp. 356–359 (2006) 16. Azfar, A.: Fixed mobile convergence - some considerations 17. Chung, Y.-W., et al.: A Strategy on the Fixed Mobile Convergence 18. D2., Cowon Corp, http://product.cowon.com/product/product_D2_dic.php 19. Han, T.-D., Yoon, H.-M., Jeong, S.-H., Kang, B.-S.: Implementation of personalized situation-aware service. In: Proc. of Personalized Context Modeling and Management for UbiComp Applications (ubiPCMM 2005), pp. 101–106 (2005)
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents Jérôme Simonin and Noëlle Carbonell INRIA (Nancy Grand-Est Research Center), Henri Poincaré University LORIA, Campus Scientifique BP 70239, Vandoeuvre-lès-Nancy Cedex, France {Jerome.Simonin,Noelle.Carbonell}@loria.fr
Abstract. We present and discuss the results of two empirical studies that aim at assessing the contributions, to the effectiveness and efficiency of online help of: adaptive-proactive user support (APH), multimodal (speech and graphics) messages (MH), and embodied conversational agents (ECAs). These three enhancements to online help were implemented using the Wizard of Oz technique. The first study (E1) compares MH with APH, while the second study (E2) compares MH with embodied help (EH). Half of the participants in E1 (8) used MH, and the other half used APH. Most participants who used MH, resp. APH, preferred MH, resp. APH, to standard help systems which implement text and graphics messages (like APH). In particular, proactive assistance was much appreciated. However, higher performances were achieved with MH. A majority of the 22 participants in E2 preferred EH to MH, and were of the opinion that the presence of an ECA, a talking head in this particular case, has the potential to improve help effectiveness and efficiency by increasing novice users’ self confidence. However, performances with the two systems were similar, save for help consultation rate which was higher with EH. Longitudinal (usage) studies are needed to confirm the effects of these three enhancements on novice users’ judgments and performances. Keywords: Adaptive user interfaces, Embodied conversational agents, Talking heads, Online help, Speech and graphics, Multimodal interaction, Eye tracking.
1 Introduction The effectiveness of online help for the general public is still unsatisfactory despite continuous efforts from researchers and designers over the last twenty years. Help facilities are still ignored by most “lay users” who prefer consulting experienced users to browsing online manuals. This behavior is best accounted for by the “motivational paradox” [4], that is: users in the general public are reluctant to explore the functionalities of unfamiliar software and learn how to use them efficiently, as their main objective is to carry out the tasks they have in mind. The two studies reported here address the crucial issue of how to design online help that will actually be used by the general public, an essential condition for ensuring its effectiveness. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 748–757, 2009. © Springer-Verlag Berlin Heidelberg 2009
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
749
To achieve this goal, help systems should be capable of providing users with appropriate information right when they need it; according to the “minimum manual” [4]. They should be aware of the user’s current goal, knowledge and skills to meet both requirements; that is, they should have the capability to create and update an adaptive model of the current user’s profile from interaction logs analysis. Such a model is needed for (i) tailoring help information to the user’s current knowledge, and (ii) anticipating their information needs accurately so as to satisfy them through timely initiatives. This help strategy, which is both adaptive and proactive, is liable to improve help effectiveness and alleviate users’ cognitive workload. Using speech for conveying user support information may also contribute to increase online help usage, hence its effectiveness, by reducing the interference of help consultation in the user’s main activity: users have to stop interacting with software applications to read textual help information which is usually superimposed on the application window. In contrast, oral messages do not use screen space, and users can easily go on interacting with the application while listening to oral messages; in particular, they can carry out a sequence of oral instructions while it is being delivered. Finally, novice users’ motivation and emotional state strongly influence the effecttiveness of knowledge or skill acquisition. Embodied online help has the potential to encourage requests for assistance; it may also increase novice users’ self-confidence and reduce their stress. We present two empirical studies, E1 and E2, which attempt to assess the actual contributions of three possible enhancements to online help effectiveness and userfriendliness: adaptivity and proactivity, speech- and graphics-based messages, and embodied conversational agents (ECAs). The aim is to determine whether these enhancements have the potential to win user acceptance and improve novice users’ performances, especially memorization of the procedural and semantic knowledge needed to operate standard applications effectively. The method is described in the next section. Results are presented and discussed in the third section.
2 Related Work Adaptive online help has motivated large-scale research efforts, such as the Lumiere project [8] or the Berkeley Unix Consultant [23]. According to [5] and [9], the bulk of research on adaptive human-computer interaction has been focused on designing efficient user models. Contrastingly, assessment of these models and evaluation of the effectiveness and usability of adaptive user interfaces are research areas which need to be further developed. Evaluation of the ergonomic quality of speech as an output modality has motivated few studies compared to speech input. Recent research has been centered on issues pertaining to the use of speech synthesis in contexts where displays are difficult to use; see, for instance, interaction with in-vehicle systems [24] and mobile devices [19], or interfaces for sight impaired users [18]. Speech synthesis intelligibility [1] and expressiveness (especially the use of prosody for conveying emotions [22] have also motivated a number of studies. In contrast, to our best knowledge, the use of speech for expressing help information has been investigated by only one research
750
J. Simonin and N. Carbonell
group. Authors of [10] propose guidelines for the design and test of help information presented via voice synthesis to users of current commercial computer applications. ECAs with humanlike appearance and advanced verbal and nonverbal communication capabilities have been designed and implemented in many research laboratories. Current research efforts focus on creating ECAs which emulate, as best as possible, human appearance, facial expressions, gestures and movements, emotions and intelligent behaviors. Modeling expression of emotions [17], human conversational skills [16] and gaze activity during face-to-face exchanges [7] are very active research areas. Most evaluation studies of ECAs’ contributions to human-computer interaction focus on computer-aided learning situations; for a survey, see [14]. For instance, [12] investigates students’ motivations for acceptance or rejection of ECAs. In other application domains, utility and usability of humanlike ECAs have motivated only a few ergonomic studies1. Save for the pioneer work reported in [21], most evaluation studies address only a few issues in the design space for humanlike ECAs, such as the influence of the ECA’s voice quality (extrovert vs introvert) on children performances [6], or the affective dimension of interaction with a humanlike ECA (7 out of the 9 studies mentioned in [20].
3 Method 3.1 Overview We used the same methodology and setup for E1 and E2, in order to compare the effects of the three proposed enhancements to online help on learning how to operate an unfamiliar software application. We chose Flash, a software tool for creating graphical animations, because computer-aided design of animations involves concepts which differ from those implemented in standard interactive software for the general public. Thus, participants in E1 and E2, who were unfamiliar with animation creation tools, had to acquire both semantic knowledge and procedural know-how in order to carry out the proposed animation creation tasks using Flash. Undergraduate students (16 for E1, 22 for E2) who had never used Flash or any other animation tool, had to create two animations. E1 participants were divided into two gender-balanced groups with 8 participants each; one group could consult an adaptive-proactive help system (APH), and the other group a multimodal system (MH). Two online help systems, the MH system and an embodied help system (EH), were put successively at the disposal of E2 participants, one per animation; presentation order of MH and EH was counterbalanced among participants. Withinparticipant design, which reduces the effects of inter-individual differences, was not adopted for E1 to ensure that participants would notice message content evolutions. The MH system (used in E1 and E2) delivered oral help information illustrated with copies of Flash displays. The same database of multimodal help messages (over 300 messages) was used for MH and EH. EH was embodied by a female talking head which “spoke” speech components of messages. Speech was transcribed into printed text for APH so as to avoid implementing two enhancements in the same system. 1
e.g., Only 1 session (out of 8) was devoted to evaluation at IVA’07.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
751
Once participants had filled out a background information questionnaire (10 min.), they got acquainted with Flash basic concepts (e.g., ‘scenario’, ‘interpolation’, etc.) using a short multimedia tutorial they could browse through as long as they wanted to (15-20 min.). Then, they tried to reproduce two predefined animations (1 hour 15 min. or so). During the second study only, their gaze movements were recorded throughout their interactions with Flash and the help systems, so as to obtain objective data on participants’ subjective reactions to the ECA’s presence. Afterwards, they filled out two questionnaires, a verbal and a nonverbal one [2]; both questionnaires were meant to elicit their subjective judgments on the help system(s) they had experimented; Lickert scales and Osgood’s semantic differentiators were preferably used for collecting verbal judgments. Next, their understanding and memorization of Flash basic concepts and procedures were assessed using a written post-test. Finally, they participated in a debriefing interview. All-in, individual sessions lasted about two hours and a half. 3.2 Implementation of the Three Help Systems To reduce interference between help consulting and animation design, the display included two permanent windows (see figure 1), a sizeable Flash window and a small help window (on the right of the screen). Based on earlier empirical work [3], participants could request four different types of help information using four dedicated buttons: procedural know-how (How?), semantic knowledge (What?), explanations of the application current state (Why?) and confirmation or invalidation of most recent Flash commands (Confirm?). Oral help messages were activated using colored buttons placed above Flash display copies illustrating their content. Speech acts were indicated by colors: warning (e.g., precondition), concept definition or procedure description, and explanations or recommendations. Colors were also used to denote speech acts in APH textual messages. The talking head was placed at the top of the EH window. To achieve realistic adaptive-proactive assistance, we resorted to the Wizard of Oz paradigm as a rapid prototyping technique. We also used it for EH and MH so as to ensure identical reaction times to participants’ requests for the three simulated systems. The Wizard was given software facilities to adapt the information content of help messages to the current participant’s actual knowledge and skills as s/he perceived them through observing their interactions with Flash and the help system. Three different versions of each message (M) were available to the Wizard: − An initial version (V1) including all the semantic and procedural information needed by users unfamiliar with the information in M; the Wizard used it to answer participants’ first request for M. − To answer further requests for M, s/he had to choose between two other versions: a short reminder of the information in V1, with or without additional technical details; s/he activated it for participants who had shown a good understanding of the information in V1 during interactions with Flash subsequent to their first request for M; a detailed presentation of the information in V1, including explanations, examples and/or illustrations; it was intended for participants who had experi-enced difficulties in understanding and putting to use the information in V1.
752
J. Simonin and N. Carbonell
Fig. 1. The three help systems, APH on the left and MH/EH on the right. Creation of a keyframe Flash APH(“How?” Message).MH & EH
Fig. 1. The three help systems, APH in the center, MH/EH on the right, and Flash window on the left
Practically, when a participant sent a request to the APH system, a software assistance platform displayed the name of the appropriate message on the Wizard’s screen. Thus, the Wizard had just to select the version of the message that best matched this participant’s current knowledge and skills, based on the observation of her/his interactions with Flash; the message was automatically displayed on the participant’s screen. As participants’ interactions with Flash and the APH system lasted one hour at the most, three versions of the same message were sufficient to achieve realistic simulations of dynamic adaptation to the evolution of participants’ familiarity with Flash through the session. When simulating MH or EH, the Wizard had just to activate the message selected by the assistance platform. To implement proactive user support, the Wizard was instructed to observe participants’ interactions with Flash, and assist them in creating the two predefined animations by anticipating their information needs and satisfying them on her/his own initiative using appropriate versions of messages in the database.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
753
3.3 Measures Post-session questionnaires and debriefings were analyzed in order to gain an insight into participants’ subjective judgments on the help system(s) they had experimented. To assess the influence of each enhancement to online help on participants’ performances and behaviors, we used post-test marks and analyses of manually annotated interaction logs. These data provided us with information on participants’ assimilation of Flash concepts and operation, help usage and task achievement. Some analyses of annotated logs were restricted to the first animation creation task which lasted about 40 minutes on average, as participants seldom requested help or needed pro-active assistance while creating the second animation, most of the necessary knowledge to carry it out having been acquired during execution of the first task. To assess participants’ affective responses to the presence of the ECA, behaviorbased measures were collected in addition to verbal and non verbal judgments, as recommended in [15]. E2 participants’ gaze movements were recorded (at 60 Hz) throughout the session, using a head-mounted eye tracker (ASL-501) which allows free head and chest movements without loss of precision. As voluntary eye movements express shifts of visual attention, they are valuable indicators of users’ engagement and involvement in interaction with an ECA [11]. Physiological measures, such as heart rate or galvanic skin response [13] were ignored, as they are more intrusive than eye tracking in standard HCI environments. 3.4 Software Developments We developed a client-server platform to assist the Wizard in his/her task and tools for facilitating annotation and analysis of interaction logs. These software developments in Java are briefly described in this section. Software Assistance to the Wizard’s Activity. The client-server platform can: display copies of the participant’s screen on the Wizard’s screen; display help messages activated by the Wizard on the participant’s screen; assist the Wizard in the simulation of the three help systems by selecting the message, M, matching the participant’s current request; and, for the APH system, provide her/him with a history of all versions of M sent to the participant previously. Messages are stored in a hierarchical database as multimedia Web pages. The platform also records logs of participants’ interactions with both Flash and the current help system. Time-stamped logs comprise user and system events, mouse positions and clicks, screen copies, eye tracking samples; they may also include recordings of the user’s speech and gestures. Software Assistance to Annotation and Analysis of Interaction Logs. Interaction logs saved by the platform can be “replayed” with gaze points or fixations superimposed on displays. Main annotation facilities include: interactive segmentation and labeling of logs; and, for eye movement analysis, automatic or manual definition of ‘areas of interest’ over display sequences. Graphical facilities are also provided for visualizing results of eye tracking data analyses (e.g., heat maps).
754
J. Simonin and N. Carbonell
4 Main Results 4.1 Participants’ Subjective Judgments First Study. 6 participants out of 8 preferred the APH system to standard online help. The two remaining participants preferred standard help, due to “the force of habit” according to them. Proactive user support raised enthusiastic comments while message content evolution (adaptivity) went almost unnoticed. 7 participants rated the support provided by APH as very useful, and its initiatives as most effective. Similarly, 7 participants rated the MH system higher than standard online help. However, only 5 of them preferred audio to visual presentations of help information, based on the observation that one could carry out oral instructions while listening to them. The 3 other participants who preferred visual presentations explained that taking in spoken information is a more demanding cognitive task than assimilating the content of a text: one can read displayed a textual message at one’s pace, and freely select or ignore parts of it, which is impossible with spoken information. Second Study. According to verbal questionnaires, 16 participants (out of 22) preferred the EH system to the MH system. 5 participants rated MH higher than EH, whereas only one participant gave preference to standard help facilities. Most participants were of the opinion that the presence of an ECA had not disrupted their animation creation activity (17 participants), that it could greatly increase the effectiveness (19), and appeal (21) of online help. Non verbal judgments in Sam 1st line are also very positive: 19 participants enjoyed the ECA’s presence, the feelings of the 3 remaining participants being neutral. In addition, 14 participants had the impression that the ECA’s presence increased their self-confidence and reduced their stress (Sam 3rd line). Analyses of annotated eye tracking data during the first task (40 min. or so) indicate that, from the beginning to the end of task execution, all participants (11) glanced at the ECA whether it was talking or silent. Each participant looked at it 75 times on average. Fixation duration was longer while the ECA was talking than when it was silent. These objective measures indicate that the ECA actually succeeded in arousing participants’ interest and maintaining it throughout the first task execution, although all of them were primarily intent on achieving the first animation creation task. These results confirm judgments expressed in questionnaires. 4.2 Participants’ Performances and Behaviors First Study. Duration of the first task execution greatly varied from one participant to another. Inaction rate, that is, the percentage of time while the mouse remained still, is sensibly higher for the APH group than for the MH group (62% versus 53%). This difference illustrates the efficiency of spoken compared to textual help messages: one has to stop interacting with the application while reading a textual message as mentioned by some participants. APH participants consulted help as often as MH participants (58 requests vs 60 on the whole), although the APH system displayed 183 messages on its own initiative. Pushing help information to novice users does not seem to reduce the number of help requests. This suggests that proactive assistance is an efficient strategy for increasing help effectiveness, as APH participants read most of the help messages pushed to them, according to debriefings.
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
755
Analysis of post-test marks suggests that MH participants gained a better understanding of Flash concepts than APH participants and recollected procedures to activate its functions better (average marks: 17.6/31 vs 15.6/31). As for task achievement, the difference between the two groups (MH: 12.6/20 vs APH: 11.1/20 for the first task) may be due to the necessity, for APH participants, to stop interacting with Flash to read help messages. These interruptions may have interfered with task achievement. Second Study. We divided participants into two groups (of 11 participants each), according to the help system they had used first, EH or MH. For most measures, average values computed over each group are not sensibly different. This is the case for post-test marks and, concerning the first task, for task execution duration (40 min.), the number of interactions with Flash and the help system, task achievement. The only noticeable difference between the two groups is the average number of help message activations per participant during the first task: 22 for EH vs 16 for MH. This result suggests that the presence of an ECA may encourage help consultation. 4.3 Discussion According to E1 results, proactive online assistance is likely to arouse higher subjective satisfaction among novice users than adaptive help or multimodal (speech and graphics) messages. Evolutions of message content may have gone unnoticed because adaptivity is a basic feature of human communication, especially in the context of tutor-novice dialogues. Participants’ rather balanced judgments on the usability of speech compared to text should not deter designers from considering speech as a promising alternative modality to text for conveying help information. Speech usability can easily be improved by implementing advanced audio browsing facilities. Taking up this research direction might prove to be more rewarding in the short term than implementing effective proactive help, which still raises unsolved scientific issues. Firstly, to guess novice users’ intents accurately from their interactions with an unfamiliar application is a difficult challenge, as these users tend to perform actions unrelated to the achievement of their goals. Secondly, the MH group achieved better performances than the APH group. Efficient proactive help may have induced APH participants to rely on help information to achieve the two animation creation tasks, hence, to put little effort into learning Flash. Additional empirical evidence is needed to validate this interpretation, as the number of participants in E1 was limited. The presence of an ECA was well accepted by E2 participants. The EH system was preferred to the MH system by most participants who stated that the ECA’s presence had not interfered with their animation creation activities. The vast majority of them considered that the presence of an ECA had the potential to improve help effectiveness through increasing users’ motivation and self-confidence. However, these perceptions are at variance with their actual performances which were similar for EH and MH. The ECA’s presence had no noticeable effect on Flash semantic and procedural knowledge acquisition, task execution time or task achievement; it only encouraged help consultation. Nevertheless, observation of novice users’ behaviors and activities over longer time spans may be necessary in order to perceive its possible influence on learning new concepts and skills. Longitudinal studies are
756
J. Simonin and N. Carbonell
essential to obtain conclusive evidence on the effects of the presence of an ECA on novice users’ performances and its contribution to online help effectiveness and efficiency. Such studies are also needed to determine whether users will not get bored with the presence of an ECA in the long run, and prefer MH to EH, contrary to E2 results. A few participants commented that the usefulness of the ECA would decrease as the user’s knowledge and practice would grow. Performing such studies is a longterm objective as their design and implementation raise many difficult research issues.
5 General Conclusion We have presented and discussed the results of two empirical studies that aim at assessing the contributions, to the effectiveness and efficiency of online help of: adaptive-proactive user support (APH), multimodal (speech and graphics) messages (MH), and embodied conversational agents (ECAs). These three enhancements to online help were implemented using the Wizard of Oz technique. The first study (E1) compares MH with APH, while the second study (E2) compares MH with embodied help (EH). Half of the participants in E1 (8) used MH, and the other half used APH. Most participants who used MH, resp. APH, preferred MH, resp. APH, to standard help systems which implement text and graphics messages. In particular, proactive assistance was much appreciated. However, higher performances were achieved with MH. A majority of the 22 participants in E2 preferred EH to MH, and were of the opinion that the presence of an ECA, a talking head in this particular case, has the potential to improve help effectiveness and efficiency by increasing novice users’ self confidence. However, performances with the two systems were similar, save for help consultation rate which was higher with EH. These results open up several research directions. Improving browsing through oral messages is a short-term promising direction. Longitudinal studies are needed to assess whether APH and EH have the potential to improve online help effectiveness and efficiency, especially semantic and procedural knowledge learning. They are also needed to determine whether users will get bored or irritated in the long run by these two enhancements. Design and development of an adaptive-proactive help system, is a long-term research direction, due to the numerous issues that still need to be solved.
References 1. Axmear, E., Reichle, J., Alamsaputra, M., Kohnert, K., Drager, K., Sellnow, K.: Synthesized Speech Intelligibility in Sentences: a Comparison of Monolingual EnglishSpeaking and Bilingual Children. Language, Speech, and Hearing Services in Schools 36, 244–250 (2005) 2. Bradley, M., Lang, P.: Measuring emotion: the self assessment manikin and the semantic differential. Journal of Behavioral Therapy and Experimental Psychiatry 25, 49–59 (1994) 3. Capobianco, A., Carbonell, N.: Contextual online help: elicitation of human experts’ strategies. In: Proc. HCI International 2001. LEA, vol. 2, pp. 824–828 (2001) 4. Carroll, J.M., Smith-Kerber, P.L., Ford, J.R., Mazur-Rimetz, S.A.: The minimal manual. Human-Computer Interaction 3(2), 123–153 (1987)
Enhancements to Online Help: Adaptivity and Embodied Conversational Agents
757
5. Chin, D.N.: Empirical Evaluation of User Models and User-Adapted Systems. User Modeling and User-Adapted Interaction 11, 181–194 (2001) 6. Darves, C., Oviatt, S.: Talking to digital fish: Designing effective conversational interfaces for educational software. In: Pélahaud, C., Ruttkay, Z. (eds.) From brows to trust: Evaluating embodied conversational agents. Part IV, ch. 10. Kluwer, Dordrecht (2004) 7. Eichner, T., Prendinger, H., André, E., Ishizuka, M.: Attentive presentation agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 283–295. Springer, Heidelberg (2007) 8. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users. In: Proc. UAI 1998, pp. 256–265 (1998) 9. Jameson, A.: Adaptive Interfaces and Agents. In: Jacko, J., Sears, A. (eds.) HumanComputer Interaction Handbook, ch. 15, pp. 305–330. Erlbaum, Mahwah (2003) 10. Kehoe, A., Pitt, I.: Designing help topics for use with text-to-speech. In: Proc. DC 2006, pp. 157–163. ACM Press, New York (2006) 11. Ma, C., Prendinger, H., Ishizuka, M.: Eye Movement as an Indicator of Users’ Involvement with Embodied Interfaces at the Low Level. In: Proc. AISB 2005 Symp. U. of Hartfordshire, pp. 136–143 (2005) 12. Moreno, R., Flowerday, T.: Students’ choice of animated pedagogical agents in science learning: A test of the similarity attraction hypothesis on gender and ethnicity. Contemporary Educational Psychology 31, 186–207 (2006) 13. Mori, J., Prendinger, H., Ishizuka, M.: Evaluation of an Embodied Conversational Agent with Affective Behavior. In: Proc. Workshop on Embodied Conversational Characters as Individuals, at AAMAS 2003, pp. 58–61 (2003) 14. Payr, S.: The university’s faculty: an overview of educational agents. Applied Artificial Intelligence 17(1), 1–19 (2003) 15. Picard, R.W., Daily, S.B.: Evaluating Affective Interactions: Alternatives to Asking What Users Feel. In: Workshop on Evaluating Affective Interfaces: Innovative Approaches, at CHI 2005 (2005), http://affect.media.mit.edu/publications.php 16. Piwek, P., Hernault, H., Prendinger, H., Ishizuka, M.: T2D: Generating dialogues between virtual agents automatically from text. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 161–174. Springer, Heidelberg (2007) 17. Poggi, I., Pelachaud, C., de Rosis, F.: Eye Communication in a Conversational 3D Synthetic Agent. The European Journal on Artificial Intelligence 13(3), 169–182 (2000) 18. Ran, L., Helal, A., Moore, S.E., Ramachandran, B.: Drishti: An Integrated Indoor/Outdoor Blind Navigation System and Service. In: Proc. IEEE PERCOM 2004, pp. 23–30 (2004) 19. Roden, T.E., Parberry, I., Ducrest, D.: Toward mobile entertainment: A paradigm for narrative-based audio only games. Science of Computer Programming 67(1), 76–90 (2007) 20. Ruttkay, Z., Dormann, C., Noot, H.: Evaluating ECAs. What and How? In: Workshop on Embodied Conversational Agents. Let’s specify and evaluate them, at AMAAS (2002) 21. van Mulken, S., André, E., Müller, J.: An empirical study on the trustworthiness of lifelike interface agents. In: Proc. HCI International 1999, LEA, vol. 2, pp. 152–156 (1999) 22. Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9), 1162–1181 (2006) 23. Wilensky, R., Chin, D.N., Luria, M., Martin, J., Mayfield, J., Wu, D.: The Berkeley UNIX Consultant Project. Artificial Intelligence Review 14(1-2), 43–88 (2000) 24. Zajicek, M., Jonsson, I.-M.: Evaluation and context for in-car speech systems for older adults. In: Proc. ACM Latin American Conf. on HCI, pp. 31–39. ACM Press, New York (2005)
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users? Ivar Solheim Norwegian Computing Center, P.O. Box 114 Blindern, 0314 Oslo, Norway
[email protected] Abstract. This paper addresses web accessibility and usability for lowerliteracy users with limited ICT skills. Although adaptive and adaptable user interfaces have been studied and discussed at least since the 80s, the potential of adaptive user interfaces is still far from realization. A main conclusion drawn in this paper is that simple, straightforward and intuitive adaptivity mechanisms may work well, but more complex and pervasive ones don’t, and may even be counterproductive. A potential pitfall may be simplistic and “cognitivist” user and task modelling that fails to take the user’s experience, competence and socio-psychological context—in short, the user’s actual, real perspective and environment—into account. Keywords: adaptive interfaces, personalisation, multimodality, user modelling, universal design.
1 Introduction This paper addresses web accessibility and usability for lower-literacy (LL) users with limited ICT skills. It is a general problem that the needs of LL users as well as those of the cognitively disabled have been ignored in HCI (Human Computer Interface) research, in contrast to the needs of the physically disabled, in particular the visually impaired. The paper focuses on research issues pertaining to adaptive and personalized user interfaces for LL users with weak ICT skills. It distinguishes between adaptability and adaptivity. Adaptability offers the user several options for personalization and adjustment according to the user’s subjective preferences. Adaptivity means that the user interface can be dynamically and automatically tailored to the needs of the user. The system automatically recognizes the user’s behaviour over time and improves the quality of the user interface interaction. The research reported here contributes to existing research in two ways: first, by studying and identifying the specific user interface needs and requirements of LL users, and second, by providing knowledge on the interaction level that can shed light on the merits as well as the pitfalls of adaptive interface approaches. Although adaptive (and adaptable) user interfaces have been studied and discussed at least since the 80s [11], it seems clear that the potential of adaptivity is still far from realization. The merits of adaptivity and in particular adaptive personalization of C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 758–765, 2009. © Springer-Verlag Berlin Heidelberg 2009
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
759
web pages have been disputed and are said to be overrated (see e.g. [9]). However, it is still a popular idea that adaptivity can play a significant role in enhancing accessibility and usability, also for elderly, LL users [1], [12]. Lower-literacy users can be defined as people that can read, but who have nontrivial difficulties doing so [12]. The group is a minority of our informants, but they represent a societal challenge that is often undervalued. For Norway it is estimated that at least 30 % of the population are low-literate (Statistics Canada and OECD 2005), whereas in other countries the rate is higher; in the USA above 40 % according to the National Assessment of Adult Literacy [4] Low literacy can be related to several fundamental factors, diagnosis and causes, but these are not accentuated in the study since the focus is principally on functions and behaviour as users, not on medical or clinical diagnosis [10].
2 The Research and the Data The research reported in this paper is based on data from several research projects that the Norwegian Computing Center has been involved in over the recent four years. The visions of “universal design” and “design for all” have been guiding and inspiring this research, acknowledging that a substantial part of all citizens lack the skills and the possibility that are required to be fully integrated participants in the modern information society. Personal interviews and observations of more than 60 Norwegian users of electronic forms have been carried out, in particular with groups with visual impairments, the elderly 70 years and older and finally also people with cognitive disabilities. This paper does not report and discuss the results for all groups, but focuses on the smaller sub group of LL users, about 20 users. Some appear in several categories and some are hard to define as we don’t base the categorization on clinical diagnosis, but only on observed functional behaviour and difficulties in the field trials. We have found that despite differences in age, background and other abilities, those that we observe as LL users share several important characteristics as users of ICT. Importantly, the needs of this type of users help us highlight crucial challenges for the development of design-for-all user interface design. The projects are still on-going, also involving collaboration with other countries in the EU-funded project DIADEM,1 but the data and the analysis presented here are only from Norway and with Norwegian users and are not necessarily compliant with findings in other countries. The projects have two general hypotheses. First, can multimodality make a difference? A significant part of the research has been to study to what extent various multimedia affordances (text, audio, video) can provide improved accessibility for certain types of users, e.g., replacing text with audio and video for LL and dyslectic users. Second, can adaptivity and “intelligent” interface components enhance accessibility and usability? These hypotheses are closely connected in the sense that multimodality is applied as a means for making adaptivity mechanisms user friendly and accessible. 1
Delivering Inclusive Access to the Disabled and Elderly Available: http://www.projectdiadem.eu/
760
I. Solheim
3 Lower-Literacy User Characteristics Summers [12] and Payne [2] argue that the problems and challenges of lower-literacy ICT are related to the fact that they don’t have an appropriate mental model of this socio-technical environment, in this case electronic forms on PCs. LL users lack appropriate mental models that should have guided them in their work with the electronic forms. However, this is only one part of the story. The users’ behaviour is not only a function of missing mental models, but of various sociocultural and psychological contexts and factors. For example, their behaviour is shaped by the fact they are often less comfortable with this working context in which their weak skills and (for some) dyslexia may be exposed and challenged. Often these users also have weak ICT skills and this reinforces the challenges and problems for them. Weak literacy and ICT skills also lead to lack of self-confidence and motivation. Some of the findings from our studies2 shed light on how users act under these circumstances: • Lower-literacy users are often not able to scan text and must most often read word for word [3], [12]. Typically, they would use much time and effort reading pages with much text. • Many LL users apply various evasion strategies. For example, rather than reading all the text on a page of a form, they skip this and go directly to the field to be filled out. Sometimes the users seem to guess what the page is about without reading it by looking at what they think are key words. • Complex page structure may be confusing. The user focuses on what he/she thinks is essential, not looking at additional information unless forced to it. • As these users are not familiar with common ICT icons, for example Help buttons, this means that Help buttons are not intuitively understood and seldom used. • The focus of attention is narrow and word-by-word oriented [3] which means that scrolling is difficult and confusing. • Navigating between pages is difficult because users are not familiar with the electronic form. • LL users appreciate predictability, simplicity and clear structure. • Multimodality (e.g., Help messages as audio) must be used with caution—can easily lead to confusion because the user is not prepared for this.
4 Adaptivity Process Attributes and Design Objectives The research work is about providing interface adaptivity in electronic forms. In the DIADEM project the goal is to provide an adaptive web browser interface. This will be achieved by developing an intelligent interface component which monitors the user, adapting and personalising the computer interface to enable people to interact with web-based forms. The figure below illustrates some of the basic elements in the adaptivity process of the work reported here. 2
Especially from the project UNIMOD – Universal Design in Multimodal Interfaces http://www.unimod.no/unimod-web/Startside_Engelsk.html
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
761
Table 1. Basic elements in the adaptivity process
Adaptivity process attributes Interaction techniques – approach, hypothesis
Attribute objectives and categories
User and task model definitions
The user model and requirements defined. Electronic form task models defined Support for target users, basic interface requirements
Goals
Rules
Testing multimodality and multimodal interaction mechanisms
A variety of rules, related to various aspects and challenges in the form
Activities and approaches in UI design practice Implement feasible multimodal support for users: Video, audio, animations, text – and in combination Key user and task characteristics derived from observations and analysis A research-based prioritized list of requirements for elderly and lowerliteracy users of forms Rules implemented, derived from user studies and analysis
There are basically four adaptivity process attributes [13]: interaction techniques, user and task models, goals and rules. An overall objective has been to test out the hypothesis that multimodality can be beneficial for users and, in line with this, multimodal support is implemented. This includes, for example, help messages that are (sometimes, but not always) both in text and audio mode. User and task models are defined in order to define key user characteristics of LL users, and the forms are analyzed in order to identify accessibility challenges. User studies are important for user model definitions and requirements as well as for task modelling. Finally, and on the basis of the other attributes, objectives and activities, a number of logical, “expert system” rules are defined and implemented. The system’s “intelligence” can provide personalised and tailored help, e.g. a user with serious problems in filling out the form is provided with different type of help than a user with minor difficulties. In case the user appears to be stuck, quite pervasive (often multimodal) help is automatically provided, for example by forcing the user to stop his work on the form and read short messages that appear on the screen. The type of help provided is designed on the background of studies of how users work with electronic forms, in particular observed frequent problems as well as problem solving strategies and techniques of LL and other types of users.
5 Adaptivity Mechanisms: An Example The figure below shows the start position of the form, before any modification and adaptivity measures were implemented. This electronic safety alarm application form has been used for several years in many municipalities in Norway.
762
I. Solheim
In our research projects, the form has been modified, redesigned and a number of new functionalities and affordances have been added in order to provide timely and personalised support for the user. Below (Figure 1) is shown an example that illustrates the task, the interaction design and the adaptivity mechanisms. The design of the form is intended to satisfy needs for universal access and “design for all” requirements. The design complies with Norwegian national guidelines3 for the design, accessibility and usability of electronic governmental forms, and according to this, the form is divided into three parts: • Navigation area to the left • Work areas in the middle • Information area to the right, See screen capture below:
Fig. 1. Task, interaction design and adaptivity mechanisms
The screen capture below shows the form in a modified way and illustrated with an activated adaptivity rule. First we can see that the initial page design interface has been changed. Now the user may focus on the field that he/she is supposed to fill out. The cursor is initially placed in the first field. As fields are filled out, they become “grey,” (not shown here), indicating the user can proceed the work, and the cursor moves to the next field. The capture (Figure 2) below shows an error. 3
User Interface Guidelines for Governmental Forms on the Internet, see http://www.elmer. no/retningslinjer/pdf/elmer2-english.pdf
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
763
Fig. 2. An error
In this case the user has made an error in the first field that changes the colour of the field and the related text from white to red, signalling that something is wrong. However, just a change of colour of a field may not be a sufficient aid for the user in order to understand what is wrong and what he or she should do. Therefore, two additional elements are activated, first, a text message appears in the open field above the red-coloured one; second, the same message is presented in audio mode. The user is told that he or she has to click in the field in question before writing the answer (“You must click on a field before you can write in it” it says, in Norwegian).
Fig. 3. Error response screen
764
I. Solheim
If the user after this does not succeed or even wait for a period, the whole page become grey with a message in the middle that he or she has to click on the green OK button on the top of the page in order to proceed. The screen shot below (Figure 3) illustrates this.
6 Discussion and Conclusion: When Adaptivity Becomes Counterproductive The findings from the observations, evaluation and field trials were ambiguous, but a main conclusion to be drawn is that simple, straightforward adaptivity mechanisms may work well, but more complex and pervasive ones don’t, and may even be counterproductive. The simple example above provides an illustration as to why this is so. A basic problem is simplistic and “mentalistic” user and task modelling that fails to take the user’s complex experience, competence and socio-psychological context into account. On the one hand, the users clearly appreciated several simplifying changes in the user interface and functionalities that helped them to accomplish tasks and navigate in the form. For example, the users found that filling out the modified form was easy and simple, as long as they did not make mistakes that activated the adaptivity rules. One successful, simple mechanism was that fields became grey when the field was filled out, providing intuitive feedback and clues how to proceed. Furthermore, when asked after the completion of the form, they also were in principle clearly positive towards the adaptivity mechanisms as such. A usability (SUMI) test was carried out and showed that also LL users were positive in general and had a tendency to blame themselves rather than the technology in case of difficulties. On the other hand, the observations of the actual behaviour of users showed that most users had problems with the more complex adaptivity mechanisms. The users in this study clearly favoured as little interaction with the system as possible, were highly sensitive to disturbances and unnecessary interaction and were easily distracted and led astray when they had to relate to additional error-handling. The observations of the users showed that the adaptivity mechanisms that are illustrated above were often inaccurate and disturbing rather than relevant and helpful. A fundamental problem seems to be that users don’t share the implicit mental model implemented and reflected in the adaptivity mechanisms. For example, the user is not always aware that he or she has done something wrong when the “intelligent” help is activated. When one mechanism is activated because the user (for example) is very slow in filling out the form, the user becomes confused and frustrated because he or she does not know why a mechanism is activated. Furthermore, the additional audio function, which is meant to be beneficial for LL users, also becomes a source of confusion because the user does not see the relevance of the audio message. The user modelling must take into account that for this type of users, issues related to cognitive overload and reduced cognitive ability to process textual input, are important. In line with this, users favoured as simple interface structure; immediate, context-sensitive feedback and help, simple language, concise texts and minimized use of icons. But there will also be other critical issues such as motivation, selfconfidence, feeling of mastery and the overall working context – factors that also must be taken seriously into account in the design.
Adaptive User Interfaces: Benefit or Impediment for Lower-Literacy Users?
765
Acknowledgement. The research reported here was supported by the Norwegian Research Council and the EU ICT FP6 programme.
References 1. Aula, A.: User study on older adults’ use of the Web and search engines. Universal Access in the Information Society 4(1), 67–81 (2005) 2. Payne, S.: User’s Mental Models: The Very Ideas. In: Caroll, J. (ed.) HCI Models, Theories, and Frameworks, pp. 135–156. Morgan Kaufman Publishers, San Francisco (2003) 3. Nielsen, J.: Lower-Literacy User. J Nielsen Alertbox column (2005), http://www.useit.com/alertbox/20050314.html 4. NCEIS National Center for Education Statistics: Overview of the National Assessment of Adult Literacy (2003), http://nces.ed.gov/naal/PDF/NAALOverview.pdf 5. Statistics Canada and OECD: Learning a living first results of the adult literacy and life skills survey, http://lesesenteret.uis.no/getfile.php/Lesesenteret/ Hovedrapport_All.pdf (2003) 6. Hasselbring, T.: Interview in Scholastic Offers Solutions in the Face of National Reading Crisis. Education Update 11(2), 11 (2005) 7. Boham, P.: Cognitive disabilities part 1: We still know too little and we do even less (2005), http://www.webaim.org/techniques/articles/ cognitive_too_little/ 8. Boham, P., Anderson, S.: A conceptual Framework for Accessibility Tools to Benefit Users with Cognitive Disabilities. In: Proceedings of the 2005 International CrossDisciplinary Workshop on Web Accessibility, Chiba, Japan, May 10, pp. 85–89 (2005) 9. Jupiter Research: Beyond the Personalization Myth: Cost-effective Alternatives to Influence Intent Report. Jupiter Research Corporation. 26 Pages - Pub ID: JUPT949160 (2003) 10. Rowland, C.: Cognitive disabilities part 2: conceptualising design considerations (2004), http://www.webaim.org/techniques/articles/conceptualise 11. Schneider-Hufschmidt, M., Kuhme, T., Malinowski, U. (eds.): Adaptive User Interfaces: Principles and Practice. Adaptive User Interfaces. Elsevier Science Publishers B.V, Amsterdam (1993) 12. Summers, K., Summers, M.: Making the Web Friendlier for Lower-Literacy Users. Intercom magazine of the Society for Technical Communication, 19–21 (June 2004) 13. Karagiannidis, C., Koumpis, A., Stephanidis, C.: Supporting Adaptivity in Intelligent User Interfaces: the case of Media and Modalities Allocation. In: ERCIM Workshop Towards User Interfaces for All: Current Efforts and Future Trends, Heraklion, Greece, October 30-31 (2005) 14. Dieterich, H., Malinowski, U., Kuhme, T., Schneider-Hufschmidt, M.: State of the Art in Adaptive User Interfaces. In: Schneider-Hufschmidt, M., Kuhme, T., Malinowski, U. (eds.) Adaptive User Interfaces: Principles and Practice. Adaptive User Interfaces. Elsevier Science Publishers B.V, Amsterdam (1993)
Adaptative User Interfaces to Promote Independent Ageing Cecilia Vera-Muñoz1, Mercedes Fernández-Rodríguez1, Patricia Abril-Jiménez1, María Fernanda Cabrera-Umpiérrez1, María Teresa Arredondo1, and Sergio Guillén2 1
Life Supporting Technologies. Technical University of Madrid, Ciudad Universitaria s/n. 28040- Madrid. Spain {cvera,mfernandez,pabril,chiqui,mta}@lst.tfo.upm.es 2 ITACA Institute. Technical University of Valencia, Camino de Vera s/n. 46022- Valencia. Spain
[email protected] Abstract. During the last years, the EU population is experiencing an increasing aging process. This tendency is motivating the emergence of new needs and the appearance of diverse services and applications oriented improve the quality of life of senior citizens. The creation of such services requires the use of technological advances and design techniques specifically focused on addressing elderly requirements. This paper presents the adaptative user interfaces that have been developed in the context of an EU funded project, PERSONA, aiming to provide different services to promote independent aging Keywords: adaptative user interfaces, ambient assisted living, services for elderly, independent aging.
1 Introduction EU population is becoming increasingly older [1]. As a result of this demographic trend, European countries are expected to experience significant social and economical impacts, with enormous effects on welfare expenditures and, in particular, on employment and labor markets, on pension systems and healthcare systems. The European social model is based on the wellness for all the citizens and frequently this wellbeing is perceived in terms of “quality of life”. Quality of life is a subjective concept but, from the perspective of an elderly person, it can be analyzed from different viewpoints or domains: physical, psychological, level of independence, social relationships, environments and spirituality, religions or beliefs. It is a technological challenge to provide senior citizens with systems that can foster the different facets in their perception of quality of life. These systems should improve the level of independence, promote the social relationships, leverage the immersion in the environments, and encourage the psychological and physical state of the person. Ambient Assisted Living (AAL) is a concept that embraces all the technological challenges in the context of the Ambient Intelligence (AmI) paradigm to face the problem of the aging population in Europe. AAL aims to create a technological context, transparent to the user, and specifically developed to manage elderly needs and increasing their life independence. C. Stephanidis (Ed.): Universal Access in HCI, Part II, HCII 2009, LNCS 5615, pp. 766–770, 2009. © Springer-Verlag Berlin Heidelberg 2009
Adaptative User Interfaces to Promote Independent Ageing
767
PERSONA, a European research funded project, firmly believes that the application of new AAL technologies can improve the quality of life of senior citizens [2]. The project aims at advancing the paradigm of AmI through the harmonization of AAL technologies and concepts for the development of sustainable and affordable solutions that promote the social inclusion and independent living of elderly. PERSONA has developed a common semantic framework that comprises a scalable open standard AAL technological platform and a broad range of AAL services. These solutions cover the user’s needs in the areas of social integration, daily activities, safe and protection, and mobility. The AAL services are offered to the users by means of adaptative interfaces developed as a result of a complex human-computer interaction design that involved the consideration of several aspects related with the user’s needs and context information.
2 Methodology The definition of the adaptative user interfaces developed within PERSONA project has been based on the “Interaction Framework” design method, described as part of the goal-directed design methodology [3]. The modelling included several tasks, starting with the definition of input methods, where the various means that a user could use for entering information into the system where assessed (i.e. keyboard, mouse, tactile screens, etc.). Then, the primary screens for presenting the information where described, following the “description of views” task. In a third step, the definition of functional and data elements established the concrete representations in the user interface of the functions and needs identified in the requirements phase. Additionally, the allowed operations with the diverse elements of the interface were determined. And, finally, a sketch with the basic interaction and key path scenarios was described. The project has required the consideration of diverse interaction options for providing the developed services. The study started with the designation of interaction channels, which are basic interaction modalities based on the five basic physiological senses (visual, auditory, haptic, olfactory, taste). For each of these channels the possible interaction modes that could be used by the different services for interacting with the users have been analyzed. The considered alternatives comprise icons and graphical elements as visual interaction, voice and sounds for auditory interaction, gestures recognition and tactile displays as haptic interaction, and taste and smell for olfactory interaction. Furthermore, a set of additional options such as tangible user interfaces, avatar based interaction, smart objects, multimodality, and adaptative graphical user interfaces, have been also studied, all grouped under a so-called spanning channel. Following, a specific interaction channel/mode and a type of device have been selected for each of the users’ target group defined in the project: elderly at home, elderly outside, relatives and care staff (Table 1). These groups where identified using the International Classification of Functioning, Disability and Health methodology (ICF), which makes possible to classify users according to their capabilities [4].
768
C. Vera-Muñoz et al. Table 1. Interaction channel/mode and type of device per user’s target group
Target group Elderly at home Elderly outdoor Relatives Care staff
User to system interaction Voice/ Touch screen Graphics/ Portable Mobile Device (PMD) Graphics/ Keyboard-PC Call / PC or PMD
System to user interaction Text and speech/ Screen Graphics/ PMD Graphics/ PC SMS or call/ PC or PMD
The classification showed a clear predominance of visual and auditory interaction as the most suitable alternative in all possible scenarios. Thus, new metaphors and different formats for representing the information have been designed for these two options. In this context, special attention has been paid to the design of the graphical user interfaces (GUIs), following the design for all principles and applying accessibility and usability criteria for the creation of easy-intuitive interaction dialogs between the user and the system.
3 Results The platform developed within the PERSONA project includes an interaction system designed to provide adaptative user interfaces for the diverse offered services. The adaptation is made automatically based on different parameters: the information to be presented to the user, as required by each service, the user’s profile, and the context information. The user interaction system includes two basic components: the dialog handler and the I/O handlers. These two components are close related with a context awareness framework and a profiling component (Fig.1).
Fig. 1. PERSONA user interaction system architecture
Adaptative User Interfaces to Promote Independent Ageing
769
The dialog handler component main purpose is to decide the type of device and interaction channel to be used in each of the services’ user interfaces. This selection is needed anytime a service is invoked and it is conducted considering the information to be shown to the user (provided by the service), along with data extracted from the user’s profile and environmental context. The output of the dialog handler is a specific interaction channel and a generic type of device to be used for interacting with the user (i.e. graphical interaction on a PC screen). With this information, an I/O handler selects the specific device that the system will use for interacting with the system (i.e. PC screen of the bedroom) and present the required information on it. The I/O handlers are application-independent pluggable technological solutions, associated with specific interaction characteristics, which manage the respective I/O channels to particular devices. Basically, six possible I/O handlers have been defined, each one associated with a particular interaction mode or device type: • The “Voice at home I/O handler” is responsible for any voice-based interaction with the user while he is located at home. • The “GUI-1 I/O handler” manages all graphical based interactions of indoor services. • The “GUI-2 I/O handler” is responsible for contents representation on portable mobile devices (PMD). • The “SMS I/O handler” manages information interchange in SMS format for all services. • The “Voice-graphical I/O handler” supports a combination of voice and graphical information. • The “Voice-Gesture I/O handler” deals with voice interactions combined with gestures. This option offers users the possibility of interacting with the systems using gestures (i.e. pointing) combined with voice in order to emphasize an intended or desired action. The procedure for adapting a user interfaces starts with the dialog manager that selects a specific output mode as the most appropriate according to the user’s profile, context data and information to be presented to the user. Once this selection has been done, it sends the information to the correspondent I/O handler, who chooses a specific device based on the context parameters, and presents the information required by the service on it. Additionally, I/O handlers convert the services’ output and the user’s input into the appropriate format, according to the device characteristics. The final result is a framework that manages user interaction in a service and device independent way, being also completely adaptative in terms of user’s profile, context parameters and type of information to be presented to the user.
4 Conclusions PERSONA project has applied the Ambient Intelligence paradigm to the design and development of the presented interaction system that provides adaptative user interfaces for AAL services. The developed solution bring systems fitting better users’ needs, lifestyles and contexts by further developing multimodal communication and
770
C. Vera-Muñoz et al.
integrating information acquired from the environment in the process of interaction with the user. PERSONA project is taking a step forwards in the field of services for supporting the elderly, by improving social connectedness and participation, providing control over the environment, mobility or prevention services. Elderly users will be highly benefited from having completely adaptable and personalized AAL services that can significantly improve their quality of life. Acknowledgments. We would like to thank the PERSONA Project Consortium for their valuable contributions for the realization of this work. This project is partially funded by the European Commission.
References 1.
2. 3. 4. 5. 6. 7.
European Commission, Directorate-General for Economic and Financial Affairs. The, Ageing Report: Underlying Assumptions and Projection Methodologies for the EU-27 Member States (2007-2060). EUROPEAN ECONOMY 7|2008 (2009) PERSONA EU funded project (2007-2010). IST-045459. European Commission Six Framework Programme, http://www.aal-persona.org Cooper, A., Reimann, R.: About Face 2.0: The Essentials of Interaction Design. John Wiley and Son, New York (2007) The International Classification of Functioning, Disability and Health (ICF), http://www.who.int/classifications/icf/ Emile, H.L.: Aarts, Stefano Marzano: The New Everyday: Views on Ambient Intelligence, pp. 78–83 (2003) Akman, V., Surav, M.: The Use of Situation Theory in Context Modeling. Computational Intelligence 13(3), 427–438 (1997) Mayrhofer, R.: Context Prediction Based on Context Histories: Expected Benefits, Issues and Current State-of-the-Art. ECHISE (2005)
Author Index
Abascal, Julio 623 Abril-Jim´enez, Patricia 139, 766 Adams, Ray 467 Ahamed, Sheikh I. 189 Akhter Lipi, Afia 631 Albayrak, Sahin 150 Alexandris, Christina 92 Aloise, Fabio 483 Amemiya, Tomohiro 477 Antona, Margherita 684, 711 Ao, Xuefeng 583 Arredondo, Mar´ıa Teresa 49, 75, 139, 248, 766 Azuma, Kousuke 209 Babiloni, Fabio 483 Balandin, Sergey 3 Barbosa, Tiago 345 Basdekis, Ioannis 279 Bened´ı, Jose Miguel 160 Betke, Margrit 493 Bianchi, Luigi 483 Bieber, Gerald 289 Bjærum, Robert 317 Blumendorf, Marco 150 Bollow, Eckhard C. 258 Bonail, Borja 623 Braun, Anne-Kathrin 603 Bruegger, Pascal 297 B¨ uhler, Christian 143 Burzagli, Laura 641 Cabrera-Umpi´errez, Mar´ıa Fernanda 49, 139, 766 Cansizoglu, Esra 493 Caporali, Maurizio 729 Carbonell, No¨elle 748 Carri¸co, Lu´ıs 384 Chang, Chia-Wei 455 Chen, Chien-Hsu 13, 455 Chen, Xinyu 658 Chien, Szu-Cheng 20 Cho, Hyunjong 57 Choi, Soo-Mi 394 Chong, Anthony 29
Chuang, Su-Chen 82 Cincotti, Febo 483 Comley, Richard 467 Connor, Caitlin 493 Daunys, Gintautas 503 de las Heras, Rafael 75 Delogu, Franco 557 Dias, Ga¨el 345 Doulgeraki, Constantina 711 Doulgeraki, Voula 279 Duarte, Carlos 384 Epstein, Samuel Ezer, Neta 39
493
Faasch, Helmut 258 Federici, Stefano 557 Felzer, Torsten 509 Fern´ andez, Carlos 160, 228 Fern´ andez, Mar´ıa 49 Fern´ andez-Rodr´ıguez, Mercedes Fisk, Arthur D. 39 Fujiwara, Akio 528, 613 Furuta, Kazuo 674 Gabbanini, Francesco 641 Gao, Xufei 658 Gardeazabal, Luis 623 Georgalis, Yannis 168 Georgila, Kallirroi 117 Ghoreyshi, Mahbobeh 467 Glavinic, Vlado 307 Grammenos, Dimitris 168 Grani´c, Andrina 694 Guill´en, Sergio 766 Guo, Ping 658 Han, Dongil 57 Hasan, Chowdhury S. 189 Heiden, Wolfgang 603 Hein, Albert 178, 519 Hellenschmidt, Michael 228 Hellman, Riitta 317 Hirata, Ichiro 528, 613
766
772
Author Index
Hirsbrunner, B´eat 297 Hitz, Martin 355 Hoffmeyer, Andr´e 519 Hong, Seongrok 365 Huang, Xin-yuan 650 Irwin, Curt B. 535 Islam, Rezwan 189 Izs´ o, Lajos 67 Jansson, Harald K. 317 Jedlitschka, Andreas 199 Jeong, Kanghun 365 Jimenez-Mixco, Viveca 75 Jo, Gyeong-Sic 667 Jokinen, Kristiina 537 Jones, Brian 127 Joo, Ilyang 365 Kanno, Taro 674 Karampelas, Panagiotis 279 Kempter, Guido 218 Kim, Do-Yoon 738 Kim, Yong-Guk 667 Kirste, Thomas 178, 519 Kleinberger, Thomas 199 Kogan, Anya 445 K¨ ohlmann, Wiebke 564 Kondratova, Irina 327 Kukec, Mihael 307 Lafuente, Alberto 623 Lauruska, Vidas 503 L´ azaro, Juan-Pablo 160, 238 Lee, Chang-Franw 422 Lee, Jaehoon 365 Lee, Jeong-Eom 209 Lee, Ji-Hyun 29 Lee, Joo-Ho 209 Leitner, Gerhard 355 Leonidis, Asterios 684, 711 Leung, Cherng-Yee 82 Leuteritz, Jan-Paul. 684 Li, Dingjun 335 Li, Hui 335 Liu, Jun 335 Liu, Ying 335 Ljubic, Sandi 307 Machado, David 345 Maes, Pattie 547 Magee, John 493
Mahdavi, Ardeshir 20 Maier, Edith 218 Malagardi, Ioanna 92 Martins, Bruno 345 Mattia, Donatella 483 Melcher, Rudolf 355 Mercalli, Franco 228 Mistry, Pranav 547 Miyashita, Satoshi 209 Montalv´ a, Juan Bautista 49 Moon, Hyeon-Joon 57, 365, 667 Morka, Sverre 317 Mourouzis, Alexandros 684 M¨ uller, Katrin 238 Nadig, Oliver 564 Nakano, Yukiko 631 Naki´c, Jelena 694 Naranjo, Juan-Carlos 139, 228, 238 Narzt, Wolfgang 374 Nien, Ken-Hao 13 Nordmann, Rainer 509 Nowakowski, Przemyslaw 100 Olivetti Belardinelli, Marta Pais, Sebasti˜ ao 345 Palmiero, Massimiliano 557 Panou, Maria 684 Park, Changhoon 704 Park, Gwi-Tae 209 Park, Jieun 29 Partarakis, Nikolaos 711 Peinado, Ignacio 248 Prueckner, Stephan 199 Quitadamo, Lucia Rita
483
Rau, Pei-Luen Patrick 335 Rehm, Matthias 631 Reis, Tiago 384 Renals, Steve 117 Rhee, Seon-Min 394 Rinderknecht, Stephan 509 Ritter, Walter 720 Rizzo, Antonio 729 Rogers, Wendy A. 39 Rubegni, Elisa 729 Ryu, Han-Sol 394
557
Author Index Sala, Pilar 228, 238 Salvador, Zigor 623 S´ anchez, Jaime 402 Schiewe, Maria 564 Schmidt, Michael 574 Schmitzberger, Heinrich 374 Serrano, J. Artur 238 Sesto, Mary E. 535 Shin, Seungchul 738 Simonin, J´erˆ ome 748 Solheim, Ivar 758 Song, Jaekwang 57 Song, Ji-Won 412 Song, Rongqing 583 Steinbach-Nordmann, Silke 199 Stephanidis, Constantine 168, 279, 711 Storf, Holger 199 Tanviruzzaman, Mohammad Tj¨ ader, Claes 108 Tsai, Wang-Chin 422 Tutui, Rie 528, 613 Urban, Bodo
189
289
Vanderheiden, Gregg C. 432, 438 Vera-Mu˜ noz, Cecilia 139, 766 Vilimek, Roman 593 Villalar, Juan-Luis 75 Villalba, Elena 248
Vipperla, Ravichander Voskamp, J¨ org 289
117
Walker, Bruce N. 445 Waris, Heikki 3 Watanabe, Atsushi 674 Weber, Gerhard 564, 574 Weiland, Christian 603 Welge, Ralph 258 Wichert, Reiner 267 Widlroither, Harald 684 Winegarden, Claudia 127 Wolters, Maria 117 Wu, Fong-Gong 13, 455 Wu, Zhongke 583 Yamaguchi, Daijirou 528, 613 Yamamoto, Sachie 528, 613 Yamaoka, Toshiki 528, 613 Yang, Gang 650 Yang, Sung-Ho 412 Yao, Yan-Ting 82 Yoo, Seong Joon 57 Yoon, Sung-young 738 Yoon, Yeo-Jin 394 Yoshida, Mayuko 528, 613 Yu, Emily 493 Zander, Thorsten O. 593 Zhou, Mingquan 583
773