IFIP Advances in Information and Communication Technology
352
Editor-in-Chief A. Joe Turner, Seneca, SC, USA
Editorial Board Foundations of Computer Science Mike Hinchey, Lero, Limerick, Ireland Software: Theory and Practice Bertrand Meyer, ETH Zurich, Switzerland Education Arthur Tatnall, Victoria University, Melbourne, Australia Information Technology Applications Ronald Waxman, EDA Standards Consulting, Beachwood, OH, USA Communication Systems Guy Leduc, Université de Liège, Belgium System Modeling and Optimization Jacques Henry, Université de Bordeaux, France Information Systems Jan Pries-Heje, Roskilde University, Denmark Relationship between Computers and Society Jackie Phahlamohlaka, CSIR, Pretoria, South Africa Computer Systems Technology Paolo Prinetto, Politecnico di Torino, Italy Security and Privacy Protection in Information Processing Systems Kai Rannenberg, Goethe University Frankfurt, Germany Artificial Intelligence Tharam Dillon, Curtin University, Bentley, Australia Human-Computer Interaction Annelise Mark Pejtersen, Center of Cognitive Systems Engineering, Denmark Entertainment Computing Ryohei Nakatsu, National University of Singapore
IFIP – The International Federation for Information Processing IFIP was founded in 1960 under the auspices of UNESCO, following the First World Computer Congress held in Paris the previous year. An umbrella organization for societies working in information processing, IFIP’s aim is two-fold: to support information processing within its member countries and to encourage technology transfer to developing nations. As its mission statement clearly states, IFIP’s mission is to be the leading, truly international, apolitical organization which encourages and assists in the development, exploitation and application of information technology for the benefit of all people. IFIP is a non-profitmaking organization, run almost solely by 2500 volunteers. It operates through a number of technical committees, which organize events and publications. IFIP’s events range from an international congress to local seminars, but the most important are: • The IFIP World Computer Congress, held every second year; • Open conferences; • Working conferences. The flagship event is the IFIP World Computer Congress, at which both invited and contributed papers are presented. Contributed papers are rigorously refereed and the rejection rate is high. As with the Congress, participation in the open conferences is open to all and papers may be invited or submitted. Again, submitted papers are stringently refereed. The working conferences are structured differently. They are usually run by a working group and attendance is small and by invitation only. Their purpose is to create an atmosphere conducive to innovation and development. Refereeing is less rigorous and papers are subjected to extensive group discussion. Publications arising from IFIP events vary. The papers presented at the IFIP World Computer Congress and at open conferences are published as conference proceedings, while the results of the working conferences are often published as collections of selected and edited papers. Any national society whose primary activity is in information may apply to become a full member of IFIP, although full membership is restricted to one society per country. Full members are entitled to vote at the annual General Assembly, National societies preferring a less committed involvement may apply for associate or corresponding membership. Associate members enjoy the same benefits as full members, but without voting rights. Corresponding members are not represented in IFIP bodies. Affiliated membership is open to non-national societies, and individual and honorary membership schemes are also offered.
Simone Fischer-Hübner Penny Duquenoy Marit Hansen Ronald Leenes Ge Zhang (Eds.)
Privacy and Identity Management for Life 6th IFIP WG 9.2, 9.6/11.7, 11.4, 11.6/PrimeLife International Summer School Helsingborg, Sweden, August 2-6, 2010 Revised Selected Papers
13
Volume Editors Simone Fischer-Hübner Ge Zhang Karlstad University, Department of Computer Science Universitetsgatan 2, 65188 Karlstad, Sweden E-mail: {simone.fischer-huebner, ge.zhang}@kau.se Penny Duquenoy Middlesex University, School of Engineering and Information Sciences The Burroughs, Hendon, London NW4 4BE, UK E-mail:
[email protected] Marit Hansen Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein Holstenstr. 98, 24103 Kiel, Germany E-mail:
[email protected] Ronald Leenes Tilburg University, TILT - Tilburg Institute for Law, Technology, and Society P.O. Box 90153, 5000 LE Tilburg, The Netherlands E-mail:
[email protected] ISSN 1868-4238 e-ISSN 1868-422X ISBN 978-3-642-20768-6 e-ISBN 978-3-642-20769-3 DOI 10.1007/978-3-642-20769-3 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011926131 CR Subject Classification (1998): C.2, K.6.5, D.4.6, E.3, H.4, J.1
© IFIP International Federation for Information Processing 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Emerging internet applications, such as Web 2.0 applications and cloud computing, increasingly pose privacy dilemmas. When they communicate over the Internet, individuals leave trails of personal data which may be stored for many years to come. In recent years, social network sites, where users tend to disclose very intimate details about their personal, social, and professional lives, have caused serious privacy concerns. The collaborative character of the Internet enables anyone to compose services and distribute information. Due to the low costs and technical advances of storage technologies, masses of personal data can easily be stored. Once disclosed, these data may be retained forever and be removed with difficulty. It has become hard for individuals to manage and control the release and use of information that concerns them. They may particularly find it difficult to eliminate outdated or unwanted personal information. These developments raise substantial new challenges for personal privacy at the technical, social, ethical, regulatory, and legal levels: How can privacy be protected in emerging Internet applications such as collaborative scenarios and virtual communities? What frameworks and tools could be used to gain, regain, and maintain informational self-determination and lifelong privacy? During August 2–6, 2010, IFIP (International Federation for Information Processing) working groups 9.2 (Social Accountability), 9.6/11.7 (IT Misuse and the Law), 11.4 (Network Security) and 11.6 (Identity Management) and the EU FP7 project PrimeLife held their 6th International Summer School in Helsingborg, Sweden, in cooperation with the EU FP7 project ETICA. The focus of the event was on privacy and identity management for emerging Internet applications throughout a person’s lifetime. The aim of the IFIP Summer Schools has been to encourage young academic and industry entrants to share their own ideas about privacy and identity management and to build up collegial relationships with others. As such, the Summer Schools have been introducing participants to the social implications of information technology through the process of informed discussion. Following the holistic approach advocated by the involved IFIP working groups and by the PrimeLife and ETICA project, a diverse group of participants ranging from young doctoral students to leading researchers in the field from academia, industry and government engaged in discussions, dialogues, and debates in an informal and supportive setting. The interdisciplinary, and international, emphasis of the Summer School allowed for a broader understanding of the issues in the technical and social spheres. All topical sessions started with introductory lectures by invited speakers in the mornings, followed by parallel workshops and seminars in the afternoons. The workshops consisted of short presentations based on the contributions submitted by participating PhD students, followed by active discussions.
VI
Preface
Contributions combining technical, social, ethical, or legal perspectives were solicited. Keynote speeches provided the focus for the theme of the Summer School – Lifelong Privacy, eGovernment and User Control, Economic Aspects of Identity Management, Privacy in Social Networks, Policies, Privacy by Crypto, Emerging Technologies, Trends and Ethics – and the contributions from participants enhanced the ideas generated by the keynote speeches. The 2010 Summer School was once again a very successful event. More than 70 delegates from more than 15 countries actively participated. We succeeded in initiating intensive discussions between PhD students and established researchers from different disciplines. A best student paper award donated by IFIP WG 11.4 was given this year to Michael Dowd for his paper entitled “Contextualised Concerns — The Online Privacy Attitudes of Young Adults” and for his paper presentation. These proceedings include both the keynote papers and submitted papers accepted by the Program Committee, which were presented at the Summer School. The review process consisted of two steps. In the first step, contributions for presentation at the Summer School were selected based on reviews of submitted short papers by the Summer School Program Committee. The second step took place after the Summer School, when the authors had an opportunity to submit their final full papers addressing discussions at the Summer School. The submissions were again reviewed, by three reviewers each, and those included in these proceedings were carefully selected by the International Summer School Program Committee and by additional reviewers according to common quality criteria. We would like to thank the members of the Program Committee, the additional reviewers, the members of the Organizing Committee as well as all the speakers, especially the keynote speakers, who were: Jan Camenisch, Andreas Pfitzmann, Claire Vishik, Alessandro Aquisti, Herbert Leitold, Bibi van den Berg, Sonja Buchegger, Kai Rannenberg, Timothy Edgar, Gregory Neven, Alma Whitten and Bernd Stahl. Without their contributions and dedication, this Summer School would not have been possible. We also owe special thanks to the PrimeLife project as well as to IFIP for their support. We dedicate these proceedings to our colleague and friend Prof. Andreas Pfitzmann, who to our deepest sorrow passed away on September 23, 2010. Andreas actively participated during the whole summer school and provided essential contributions as a keynote speaker, Session Chair, and panellist and in discussions with other participants. We will miss his bright spirit, his valuable experiences and knowledge, his visionary contributions, and his enjoyable company. February 2011
Simone Fischer-H¨ ubner Penny Duquenoy Marit Hansen Ronald Leenes Ge Zhang
Organization
The PrimeLife/IFIP Summer School 2010 was organized by the EU FP7 Project PrimeLife and the IFIP Working Groups 9.2, 9.6/11.7, 11.4 and 11.6 in cooperation with the ETICA EU FP7 project.
General Summer School Chair Simone Fischer-H¨ ubner
Karlstad University / Sweden, IFIP WG11.6 Chair
Program Committee Co-chairs Penny Duquenoy Marit Hansen Ronald Leenes
Middlesex University / UK, IFIP WG 9.2 Chair Independent Centre for Privacy Protection Schleswig-Holstein, Kiel / Germany Tilburg University / The Netherlands, IFIP WG 9.6/11.7 Chair
Local Organizing Committee Chair Ge Zhang
Karlstad University / Sweden
International Program Committee Bibi van der Berg Michele Bezzi Jan Camenisch Lothar Fritsch Mark Gasson Juana Sancho Gil Hans Hedbom Tom Keenan Dogan Kesdogan Kai Kimppa Eleni Kosta Elisabeth de Leeuw Marc van Lieshout Javier Lopez Leonardo Martucci Vaclav Matyas
Tilburg University / The Netherlands SAP Research / France IBM Research / Switzerland, IFIP WG 11.4 Chair Norwegian Computer Center, Norway University of Reading / UK University of Barcelona / Spain Karlstad University / Sweden University of Calgary / Canada Siegen University / Germany University of Turku / Finland KU Leuven / Belgium Ordina / The Netherlands TNO / The Netherlands University of Malaga / Spain CASED / Germany Masaryk University, Brno / Czech Republic
VIII
Organization
Gregory Neven Stefano Paraboschi Jean-Christophe Pazzaglia Uli Pinsdorf Andreas Pfitzmann Charles Raab Kai Rannenberg Norberto Patrignani Pierangela Samarati Dieter Sommer Sandra Steinbrecher Morton Swimmer Jozef Vyskoc Rigo Wenning Diane Whitehouse Rose-Mharie ˚ Ahlfeld
IBM Research / Switzerland University of Bergamo / Italy SAP Research / France Europ¨ aisches Microsoft Innovations Center GmbH (EMIC) / Germany TU Dresden / Germany University of Edinburgh / UK Goethe University Frankfurt / Germany Catholic University of Milan / Italy Milan University / Italy IBM Research / Switzerland TU Dresden / Germany Trend Micro / USA VaF / Slovakia W3C / France The Castlegate Consultancy / UK Sk¨ ovde University / Sweden
Program Committee for the ETICA Stream Veikko Ikonen Jeroen van den Hoven Michael Rader Philippe Goujon Bernd Stahl Roger Dean
Additional Reviewers Shkodran Gerguri Sandra Olislagers Tobias Smolka Petr Svenda
VTT / Finland Technical University of Delft / The Netherlands Karlsruhe Institute of Technology / Germany University of Namur / Belgium De Montfort University / UK eema / UK
Table of Contents
Terminology Privacy: What Are We Actually Talking About? A Multidisciplinary Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philip Sch¨ utz and Michael Friedewald
1
Implementability of the Identity Management Part in Pfitzmann/Hansen’s Terminology for a Complex Digital World . . . . . . . . Manuela Berg and Katrin Borcea-Pfitzmann
15
Privacy Metrics Towards a Formal Language for Privacy Options . . . . . . . . . . . . . . . . . . . . . Stefan Berthold
27
Using Game Theory to Analyze Risk to Privacy: An Initial Insight . . . . . Lisa Rajbhandari and Einar Arthur Snekkenes
41
A Taxonomy of Privacy and Security Risks Contributing Factors . . . . . . . Ebenezer Paintsil and Lothar Fritsch
52
Ethical, Social, and Legal Aspects ETICA Workshop on Computer Ethics: Exploring Normative Issues . . . . Bernd Carsten Stahl and Catherine Flick
64
Contextualised Concerns: The Online Privacy Attitudes of Young Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Dowd
78
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norberto Nuno Gomes de Andrade
90
Data Protection and Identity Management Oops - We Didn’t Mean to Do That! – How Unintended Consequences Can Hijack Good Privacy and Security Policies . . . . . . . . . . . . . . . . . . . . . . Thomas P. Keenan
108
X
Table of Contents
Supporting Semi-automated Compliance Control by a System Design Based on the Concept of Separation of Concerns . . . . . . . . . . . . . . . . . . . . . Sebastian Haas, Ralph Herkenh¨ oner, Denis Royer, Ammar Alkassar, Hermann de Meer, and G¨ unter M¨ uller Security Levels for Web Authentication Using Mobile Phones . . . . . . . . . . Anna Vapen and Nahid Shahmehri
120
130
eID Cards and eID Interoperability Challenges of eID Interoperability: The STORK Project (Keynote) . . . . . Herbert Leitold
144
Necessary Processing of Personal Data: The Need-to-Know Principle and Processing Data from the New German Identity Card . . . . . . . . . . . . Harald Zwingelberg
151
A Smart Card Based Solution for User-Centric Identity Management . . . Jan Vossaert, Pieter Verhaeghe, Bart De Decker, and Vincent Naessens
164
Emerging Technologies The Uncanny Valley Everywhere? On Privacy Perception and Expectation Management (Keynote) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibi van den Berg
178
50 Ways to Break RFID Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ton van Deursen
192
Privacy for eGovernment and AAL Applications The Limits of Control – (Governmental) Identity Management from a Privacy Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Strauß
206
Privacy Concerns in a Remote Monitoring and Social Networking Platform for Assisted Living . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Rothenpieler, Claudia Becker, and Stefan Fischer
219
Social Networks and Privacy Privacy Settings in Social Networking Sites: Is It Fair? . . . . . . . . . . . . . . . . Aleksandra Kuczerawy and Fanny Coudert
231
Privacy Effects of Web Bugs Amplified by Web 2.0 . . . . . . . . . . . . . . . . . . . Jaromir Dobias
244
Table of Contents
XI
Privacy Policies A Conceptual Model for Privacy Policies with Consent and Revocation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Casassa Mont, Siani Pearson, Sadie Creese, Michael Goldsmith, and Nick Papanikolaou Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis Agrafiotis, Sadie Creese, Michael Goldsmith, and Nick Papanikolaou
258
271
A Decision Support System for Design for Privacy . . . . . . . . . . . . . . . . . . . Siani Pearson and Azzedine Benameur
283
A Multi-privacy Policy Enforcement System . . . . . . . . . . . . . . . . . . . . . . . . . Kaniz Fatema, David W. Chadwick, and Stijn Lievens
297
Usable Privacy Designing Usable Online Privacy Mechanisms: What Can We Learn from Real World Behaviour? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Periambal L. Coopamootoo and Debi Ashenden
311
PrimeLife Checkout – A Privacy-Enabling e-Shopping User Interface . . . Ulrich K¨ onig
325
Towards Displaying Privacy Information with Icons . . . . . . . . . . . . . . . . . . Leif-Erik Holtz, Katharina Nocun, and Marit Hansen
338
Obituary Andreas Pfitzmann 1958-2010: Pioneer of Technical Privacy Protection in the Information Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hannes Federrath, Marit Hansen, and Michael Waidner
349
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
353
Privacy: What Are We Actually Talking About? A Multidisciplinary Approach Philip Schütz and Michael Friedewald Fraunhofer Institute for Systems and Innovation Research, Karlsruhe, Germany {philip.schuetz,michael.friedewald}@isi.fraunhofer.de
Abstract. This paper presents a multidisciplinary approach to privacy. The subject is examined from an ethical, social, and economic perspective reflecting the preliminary findings of the EU-funded research project PRESCIENT. The analysis will give a comprehensive illustration of the dimensions’ unique and characteristic features. This will build the basis for identifying overlaps and developing synergetic effects, which should ideally contribute to a better understanding of privacy.
1 Introduction Privacy is an essential fundamental human right, which preserves individuals from arbitrary intrusion into their personal sphere by the state, corporations or other individuals. In a globalised world with ever new emerging technologies privacy issues have evolved into one of the most salient, ubiquitous and pressing topics of our society today. The information age with its new forms of communication, data storage and processing contests the current concept of privacy to an extent and in ways never seen before. The imperative to deal with these new challenges becomes an increasingly prominent feature not only in media coverage but also in private and public policy-making. However, the current media attention to privacy-related topics poses the risk of a shallow and non-scientific debate. Being a moving target, the concept of privacy is above all elusive. It is evolving over time and people define and value it differently [1]. That is exactly why it remains tremendously important to work at the theoretical base of the concept and to keep on asking: What are we actually talking about? Recognising that privacy is a multifaceted concept, this paper proposes to examine the subject from different scientific perspectives. Reflecting the preliminary findings of the EU-funded research project PRESCIENT [2], the multidisciplinary approach embraces the dimensions of ethical, social, and economic aspects1 . Whereas the first serves as an umbrella perspective by comprising various dimensions in itself tracing collectively the origins of the human need for privacy, the second aims to shed light on an often neglected social value of privacy. The economic point of view represents the most elaborated section due to the research focus of the authors within the PRESCIENT project. Eventually it should be noted that this paper reflects 1
Although being considered in PRESCIENT, the legal perspective as a section of itself has been left aside in this paper in order to avoid redundancies with an already extensive literature on legal aspects of privacy.
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 1–14, 2011. c IFIP International Federation for Information Processing 2011
2
P. Schütz and M. Friedewald
a starting point examining the concept of privacy from rather under-researched perspectives. The following sections are therefore intended to give a comprehensive illustration of the perspectives’ unique and characteristic features. This will form the basis for identifying overlaps and developing synergetic effects, which should ideally contribute to a better understanding of privacy.
2 The Different Perspectives 2.1 Ethical Perspective Ethics is often falsely understood as moral rules providing individuals and society with a guidance. Yet ethics is not etiquette, it is not a manual to be followed, rather it is a philosophical enquiry about concepts involved in practical reasoning. Ethics as a part of philosophy deals with moral principles such as good and evil, virtues or justice. This section will to explore the ethical value of privacy. Why do we consider privacy as something worthy to protect? Where does the desire for privacy derive from? Assuming that privacy is part of two constitutive human polarities, mainly the desire to be independent and the need to be part of a community, we will trace the biological, anthropological, psychological and religious antecedents of this polarity. Research on territoriality and overcrowding shows that virtually all animals seek periods of individual seclusion. Although being social animals, humans have shown throughout their evolution behaviour of defending their territories and avoiding overcrowding the latter often functioning as an intensifier of stressful condition [3]. In frequently seeking small-group intimacy, individuals search for relieve of stress dropping the role they are assigned to play in community. At the same time, however, particularly humans have developed to the highest degree the ability of empathy, a capacity to reproduce the emotional and mental patterns of others, which makes them not only in their rationale to survive but also in their very neurological structure dependent on other human beings2 That is why isolation can be much more pathological than overcrowding [5]3 . Supported by various anthropological studies, aspects of privacy can be found in modern but also preliterate societies throughout history all over the world [6]. Hence, the individual’s need for a certain degree of social distance at some point in time seems to occur in every culture and time period, although social obligations as a must of any gregariousness imply some limitations to the choice of seeking this distance. As Westin puts it, the concept of privacy seems to be a “cultural universal”, which would militate in favour of an intrinsic value of privacy. [7] However, feminist anthropologists view privacy mainly as a “trap of domesticity” to women [8]. This “trap” would develop through the division between domestic affairs as essentially private and predominantly female, and the exclusively male and public 2
3
According to the most accredited theories, empathy is generated by „automatically and unconsciously activated neural representations of states in the subject similar to those perceived in the object” [4, p. 282]. Research on overcrowding and isolation suggests that the degree of control concerning the ability to choose between solitude and gregariousness is decisive in affecting the physical and psychological state of the individual.
Privacy: What Are We Actually Talking About?
3
realm of war and politics4 . Picking up the same train of thought, Pateman argues that the private and public spheres are two sides of a social contract, which includes primarily a “sexual contract” that distributes social roles and defines “areas of influences” between men and women [10]. Arguing furthermore from a psychoanalytic point of view, Freud understood privacy as involving a dichotomy between “civilized society, which demands a good conduct” of its citizens, and instinctual behaviour of the individual [11]. Since society has allowed itself to be misled into tightening the moral standards to the greatest possible degree, the creation of the private realm serves above all as a safety valve for all of the instinctual constraints weighing heavily upon the individual [11,7]. Even more revealing is his seminal essay on “Das Unheimliche” (The Uncanny) [12]. The German term “Unheimlich” derives from the meaning of “not homey” (“unfamiliar” or “outside of the family”), which is “the name for everything that ought to have remained secret and hidden but has come to light”, says Schelling’s definition, from which Freud starts. He eventually defines the uncanny as “anything we experience that reminds us of earlier psychic stages, of aspects of our unconscious life, or of the primitive experience of the human species“. The id as part of Freud’s structural model of the human psyche (id, ego and superego) contains most of these uncanny elements. There is in other words a literally dark side to privacy which implies a dimension the individual will never be able to master or understand completely. Freud then notices that also the antonym “heimlich” (“homely” or “familiar”), which nowadays actually means “secret”, conveys the meaning of private in the sense of “within in the family”. Surprisingly, he discovers that familiarity represents an integral part of uncanniness. It consequently seems that the uncanny as well as the familiar (homely) are two sides of the same coin which both constitute the core sphere of inner personal privacy. The concepts of modesty, intimacy and shame reflect furthermore emotions, behaviour patterns and externalised decisions which are based on the interplay within one’s psyche between the id and the superego. The private realm therefore depends on a process of negotiating the boundary between the inner part of the self and the external world of the other selves. This negotiation is influenced both by personal attitudes (subjectivity) and by social and cultural norms. In a religious context the inner part of the self represents a sacred place where the individual can seek intimacy with God5 . Even beyond the Christian or other monotheistic religions the attempt to establish a bond or a contact with God(s) consists commonly of a meditation phase of an individual or an exclusive group, in which privacy is sought. 4
5
Post-modern and feminist scholars have devoted influential studies on the sexual origins of privacy in the early Greek civilization. Sophocles’ Antigone was used as an example for the introduction of the distinction between the private, familial and womanly (oikia) and the public, political, and masculine (koinos) realm [9]. The biblical allegory of Adam and Eve, being happily nude while sharing with God the holy space of paradise, tries to show that perfect intimacy with God. When they taste from the forbidden fruit and consequently break the alliance with God, shame and distrust represented by the need to hide themselves in clothes begin to appear.
4
P. Schütz and M. Friedewald
Eventually, political philosophy has had another major influence on today’s conception of privacy. Especially in the Western hemisphere liberalism was shaping the idea of a state that ensures its citizens the protection of life, liberty and property, an idea originated by Locke in Two Treatises of Government. The legal notion of privacy as a fundamental right but also as an instrument to achieve other basic rights is based on these liberal conceptions in which privacy is mainly seen as a protective sphere that shields the individual from intrusions of the state6 . Complementary to this negative defensive right, Westin suggests a rather positive right conception of privacy which enables the individual to exercise control over his/her information. 2.2 Social Perspective Privacy is often seen in contrast to the public, although there is a social value to privacy. The perceived duality between private and public value frequently results in a supposed need of balancing the two against each other, which is explored in this section. Humans are individualistic and herd animals at once. They represent Hobbes’ homo homini lupus and Aristotle’s zoon politicon at the same time. This inner dualism can be best described by Schopenhauer’s hedgehog’s dilemma, which comprises the idea to identify the optimal distance between being “alone” and “together”7. So Moore is right when he argues that “the need for privacy is a socially created need. Without society there would be no need for privacy” [6, p. 73]. However, this only reflects the personal dilemma, with which individuals are regularly confronted. The tension between individual and community remains even if the hedgehog’s dilemma is solved, because the optimal distance would differ from situation to situation and would be due to various personal preferences not applicable to everyone. That is why balancing the individual well-being against the common good runs frequently into severe difficulties. One is that the notion of balancing or making trade-offs suggests literally a zero-sum game where an increase in security, for example, automatically results in a reduction of privacy. However, these kinds of trade-offs, often presented in public as axiomatic, seem not to take the complexity of social values into their account. "The value of privacy should be understood in terms of its contributions to society. [...] When privacy protects the individual, it does so because it is in society’s interest.”, argues Solove [15, p. 173]. Theoretically, privacy as well as other values such as national security, transparency, free speech or other human rights generate a complex net of interwoven and consequently interdependent values of society which can flexibly shift into one or another direction. The network structure could function as a model for the interaction of these social values. In legislative, judicial and administrative practice, there is the inevitable need of balancing different and overlapping values against each other. Therefore it becomes crucial to consider how such balancing processes can be effectively designed and implemented. One of the main features of balancing processes should be the testing of necessity and proportionality. Aharon Barak advocates 6 7
Warren and Brandeis later extend this idea defining privacy as “the right to be left alone” [13]. Though the hedgehog needs the warmth and affection of his fellow hedgehogs in his surrounding, he will expect to inevitably get hurt by their spiny backs [14, p. 396].
Privacy: What Are We Actually Talking About?
5
“the adoption of a principled balancing approach that translates the basic balancing rule into a series of principled balancing tests, taking into account the importance of the rights and the type of restriction. This approach provides better guidance to the balancer (legislator, administrator, judge), restricts wide discretion in balancing, and makes the act of balancing more transparent, more structured, and more foreseeable.” [16, p. 2] So in the case of balancing national security against privacy, for instance, there are at least four minimum core principles to be considered: 1) The principle of the rule of law, 2) the principle of proportionality, 3) the principle of favouring moderate intrusiveness as well as 4) the principle of the lesser technological privacy intrusiveness and the principle of directly proportional incremental authority to the privacy intrusiveness of the technology used [17, p. 142]. The pursuit of the greater good has often led to tremendous affliction throughout world’s history. That is why inalienable rights protect the individual from intrusions of the state. These fundamental rights and their core principles must not be subject to negotiation and should include an adequate protection of minorities in order to avoid Tocqueville’s tyranny of the majority [18]. However, privacy has also a “dark side” to it. Not only could terrorists take advantage of a constitutionally guaranteed private sphere to organise their attacks, but also is transparency and accountability in general challenged in cases like bank secrecy or discretionary earnings of politicians. From a feminist perspective, as already pointed out, the socially constructed realm of privacy threatens to continue to serve as a shield against the public, covering domestic violence and abusiveness of men against women. Though seemingly in conflict, social values such as privacy and transparency are all equally important. The Gordian challenge, with which society is continuously confronted, consists of striking the right balance between these social values. 2.3 Economic Perspective This section examines privacy from an economic point of view. Although the notion of privacy in the economic discourse is mainly understood as informational privacy dealing with data protection issues, this paper tries to consider the costs and benefits beyond the mere perfect/imperfect information topos of economic theory. In order to be able to effectively analyse the economic value of privacy, we have to initially shed light on the main actors protecting and intruding upon privacy. The actor-centred approach hypothesises that the decisions of the data subject as well as those of the data controller are based on the rationale of the homo economicus carefully balancing the costs and benefits and aiming for the maximal profit8 . The decisions of the data subject include a dual choice model in which the data subject can opt for “disclosing” or “retaining” personal information. The data controller’s 8
The EU data protection directive defines the data subject as “an identified or identifiable natural person”, whereas the data controller is referred to as “the natural or legal person, public authority, agency or any other body which alone or jointly with others determines the purposes and means of the processing of personal data.” [19]
6
P. Schütz and M. Friedewald
options of actions involve the collecting, aggregating, storing and processing of personal information as well as the reactions to privacy breaches9. The data subject: Costs created by disclosing personal data are scientifically extremely hard to grasp, because they are at the core of exactly that essence and complex value of privacy, which is a fundamental part of the essentially contested concept of privacy itself. Frequently, individuals value these types of costs differently and in addition privacy incidents often do have indirect and long-term effects on the data subject10 . Consequences are therefore hard to anticipate and it seems that individuals perceive long-term impacts as a rather indirect, controllable and less perilous harm to themselves. That’s why the data subject often underestimates or does not consider the long-term risks in giving away personal information [21, p. 11]. However, more and more individuals are confronted with privacy problems frequently resulting from their lax attitude towards sharing private information or being forced to disclose personal data. This can result in social sorting or other discriminatory practices by data controllers. There is furthermore an increasing risk of being the victim of online and offline crime such as burglary,11 identity theft, cyber stalking and bullying, character assassination as well as other forms of harassment. Another cost factor of sharing voluntarily personal and private data involves that peers, colleagues or prospective employers may form an opinion about the data subject based on a one-time superficial and maybe misleading impression. The consequences can go from mere embarrassment to the failure of a job interview. Feeling annoyed by unsolicited advertisement, but also being uncomfortable with advertisements that reflect too much knowledge about themselves, Internet users suffer more often than expected the aftermath of continuously disclosing personal and private information. In many instances, however, they are actually able to choose between disclosing or retaining personal data. Nonetheless, individuals tend to decide in favour of shortterm and tangible benefits although being aware that there is a value to privacy. The research of Acquisti and Berendt deals with exactly this gap of stated preferences, i.e. the (partial) awareness of the consequences of giving away personal information, and actual behaviour [21,23]. Lack of information and transparency about the commercial or governmental usage of personal data often eases the individual’s decision to disclose personal data [24]. Convenience aspects are one of the most important drivers for disclosing personal data [25, p. 4]. data controllers offer a plethora of supposed advantages and seemingly free services to the data subject in order to get hold of personal data. Acquisti characterizes the benefits of disclosing personal information as relatively small and short-term rewards [21]. These include direct and indirect monetary incentives 9
10
11
Due to the growing quantity of possible data controller actors, the analysis limits itself to private business entities, although public authorities represent one of the most important bodies collecting and controlling personal data. This must be, however, subject to further research. The data subject’s perception of these effects heavily depends on the information he/she receives and on previous experiences with privacy intrusions, the latter being called the “life cycle element” [20, pp. 31]. The Dutch website PleaseRobMe.com highlights the dangers of sharing too much information on the Internet about your locations [22].
Privacy: What Are We Actually Talking About?
7
such as little gifts or discounts on products in exchange of the customer’s personal data. All of these price deductions such as student, senior citizen and even volume discounts are part of a positive price discrimination strategy. But there are also immaterial rewards which can involve social benefits, e.g. when the data subject tries to avoid peer-group pressure (particularly in social networks) by willingly sharing private information [24]. Furthermore, Lenard and Rubin argue that the very existence of the Internet as we know it today with a myriad of seemingly free services such as search engines, e-mail accounts, social networks, news, etc. heavily depends on consumers’ willingness to disclose personal information [26, p. 163]. Taking these offers for granted, users underestimate the cost-benefit rationality that underlies the business models of many providers. The trade-off between exchanging personal data and services mostly free of charge is based on an asymmetric allocation of information. Not knowing that their personal data is collected and processed, users are often deluded concerning their reasonable expectation. Since knowledge and education about the economic value of personal data plays a decisive role, a new form of digital divide, perhaps a “privacy divide”, threatens to develop in society and the long-term need of a Privacy-E-inclusion of citizens could come into existence [27, p. 16]. Nevertheless from an economic point of view the increasing demand for goods like privacy or data protection would foster the supply and development of new technologies, laws and entrepreneurial codes of conduct as well as new business models which will offer new strategies to deal with privacy issues. It must be admitted, however, that there is little empirical evidence for a strong demand response. In retaining personal information, the data subject bears, of course, the costs of notreceiving the benefits for disclosing his/her personal data. In this case he/she is also part of a negative price discrimination not belonging to the group of preferred customers that enjoys discounts. Since data protection implies to hold back certain information, individuals who are reluctant to disclose personal data could furthermore be suspected of being loners, freaks or weirdoes who have something to hide. In fact, the question why people would need privacy if they do not have anything (bizarre or illegal) to hide belongs to one of the classical arguments of data controllers trying to camouflage their gain in power and profit by collecting information12. Here the classical but wrong statement “If you have nothing to hide. . . ” becomes relevant [28]. However, communicating, exchanging opinions or sharing information represents an essential part of human behaviour and an important strategy to succeed in society. If you want to pursue a successful career in any field of work, networking belongs to one of the most relevant activities. That is why holding back information at a certain point 12
In an interview on the CNBC documentary “Inside the Mind of Google” in December 2009 Eric Schmidt, CEO of Google, was asked: "People are treating Google like their most trusted friend. Should they be?" Hitting the nail on the head, he responded: "I think judgment matters. If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place, but if you really need that kind of privacy, the reality is that search engines including Google do retain this information for some time, and it’s important, for example, that we are all subject in the United States to the Patriot Act. It is possible that that information could be made available to the authorities." http://www.youtube.com/watch? v=A6e7wfDHzew, last accessed on 17 January 2011.
8
P. Schütz and M. Friedewald
of time could be disadvantageous. In the online world most of all social networks try to meet this demand of being easily, all the time and everywhere connected. Although most of the social interactions still take place in the off-line world, a trend towards more and more virtual interactions seems to be visible, especially if looking at the younger generation. Not sharing digital information could therefore lead to an isolation problem these days and even more probable in the future. As already pointed out, the relevant literature does not specifically identify the economic advantages of maintaining informational privacy for the individual, because the concept of privacy does not generate easily quantifiable factors. Hence, difficult-toquantify, privacy aspects are often excluded from the analysis [29]. Nonetheless, what can be seen as a benefit is that privacy serves as a defensive right against intrusions by others as well as a positive right enabling the data subject to exercise control over his/her information. Westin names four functions of privacy: [7, pp. 32] – First of all, there is personal autonomy, providing the individual with a core sphere where he/she is able to retreat not being controlled, manipulated or dominated by others, e.g. huge relevance concerning the secrecy of the ballot. – Secondly, privacy serves as a safety valve which allows the individual to let his/her instinctual needs run more freely without having to fear embarrassment. – Thirdly, self-evaluation and reflection can be carried out undisturbed in the private realm in order to develop one’s personality and initiate learning processes. Additionally, innovative and creative thinking is spawned so that societies can continue to advance allowing their citizens to explore beyond the mainstream. – Finally, limited and protected communication leads to an unstrained exchange of information supporting the right to free speech. Again, it is obvious that these highly immaterial and long-term benefits for the individual are difficult to operationalise and quantify. However, they represent a crucial element in our analysis of the costs and benefits of privacy. The data controller: data controllers face a complex cost-benefit ratio in gathering, storing and exploiting the collected data as well. Although the boundaries are blurred, we should generally distinguish between sensitive (confidential) data of the corporation and collected personal information of individuals. This paper mainly deals with the latter. Material and personnel costs of aggregating, storing and processing data represent first of all the most important direct expense factors. Although the software and hardware costs of aggregating, storing and processing data are constantly decreasing due to technological progress, the amount of data that needs to be stored and processed is skyrocketing at the same time so that data collecting companies face rapidly rising operating costs (e.g. for electric power supply). For this reason data centres are even built close to power plants or in cooler climates [30]. The energy issue becomes a more and more relevant topic, also because there is an apparent tendency towards retention of data, i.e., to collect more data than actually needed. This increases additionally the risk of overinvestment [31, p. 474].
Privacy: What Are We Actually Talking About?
9
Especially when you consider private data as a commodity that can be exploited by its owner, property rights should be furthermore taken into account as an indirect cost factor [32]. Confronted, moreover, with a complex body of rules and regulations concerning the collection, storage and usage of personal data, data controllers will try to comply (at least to a certain degree) with these rules to avoid lawsuits and payments of compensations. Extra administrative and infrastructural expenses should therefore be considered. Information security would represent one of these additional infrastructural cost factors. When storing personal data, most companies are obliged by law to protect the data through technical means (e.g. encryption) and access control measures. Moreover, back-ups and log files which show who accessed which data serve as another safeguard. Staff at all levels have to be trained how to use and manage data in a lawful way. If a company wishes to transfer data to a country outside the EU, there are serious regulatory hurdles to cross, not least of which is ensuring that the data will be adequately protected and respected to the same extent as in the European Union. Besides, a company may need to respond to requests for access to their data by customers arguing that the data is not correct. The company will need to verify whether the data is correct or not. And when the data is compromised in some way, either through data breaches caused by a hacker attack, or when data is lost, then the data controller faces a plethora of material and immaterial costs. Data and privacy breaches can have devastating consequences for data controllers. Immediate costs would include first of all the repair or replacement of the broken system while slowing down or even stopping whole business processes [33, p. 106]. If mandatory, data subjects have to be notified of the data breach, there is negative publicity, which in a long-term perspective can seriously damage the image and reputation of the data controller. Data protection authorities may require an inspection or audits, and eventually legal actions such as fines, compensations, torts or other liabilities could account for severe financial consequences for the data controller. Acquisti, Friedman and Telang have shown in their study that companies that experienced a privacy breach not only have to fear the loss of existing customers, but also suffer a statistically significant negative impact on the firm’s stock exchange value [34, p. 1573]. However, stock prices tend to recover in a rather short period of time. Ultimately, privacy and data breaches can result in long-term damages for enterprises such as higher insurance premiums, severance of contractual relations, and, most importantly, an eventual harm to trust relationships with customers and/or suppliers. Thus, data controllers need to assess their security investment in relation to the probability of a privacy incident multiplied by the impact the problem will cause. Such a risk assessment is necessary in order to keep the right balance between an adequate level of data protection and an efficient and effective processing of the data [35,36]. When sanctions are unlikely or the costs of compensations do not surpass the financial benefits resulting from the collection and usage of personal data, data controllers will tacitly accept these incidents and prefer to neglect privacy and data protections measures as frequently the case in these days. Trying to exploit personal data commercially, companies aim to understand the mechanisms behind individual purchase behaviour in order to increase their profits from
10
P. Schütz and M. Friedewald
better market opportunities. To sell products and services, suppliers need to comprehend what their customers want or need, to stimulate the buyer’s interest in their products or services and to be reasonably sure what a consumer (or different groups of customers) is willing to pay for the product or service. For this purpose many market players have been aggregating data, regardless of whether personal or non-personal, for a long time. Moreover, enterprises have collected even more data in the field of production and logistics succeeding in making the supply chain more efficient. This general aim prevails in an age where the collection of more and more data becomes feasible and affordable due to the ever-decreasing costs for sensors, storage and computing power. The data comes from traditional sources such as loyalty cards and data trails in the Internet [37], but increasingly also from other sources such as RFID-tagged products or deep-packet inspection. In selling personal data to third parties, companies run, of course, the risk of losing money if the added sales revenue is smaller than the benefits of providing services based on processing the personal data on their own. There are numerous companies which found their business model on the processing of personal data creating consumer profiles and exploiting the results of their data analyses in order to make a huge profit [38]. Offering seemingly free services such as Internet searches, emails, news, games or social interaction, many Internet enterprises are part of an already huge and still rapidly growing online advertising industry [39].
3 Conclusion This paper has presented three different approaches to privacy attempting to answer central questions such as: Why do we consider privacy as something worthy to protect? Is there a social value to privacy? Cui bono from privacy? Although privacy is extremely context-dependent and often valued differently, the ethical perspective suggests that privacy represents an essential component of human identity. Trying to figure out, where the supposed human desire for privacy comes from, this section analyses a variety of antecedents tracing biological, anthropological, psychological and religious origins. Territoriality and noxious reactions to overcrowding, for example, can be seen as biological factors causing sometimes a need for a certain degree of social distance. However, research on empathy and isolation also shows that human-beings are above all social creatures. Furthermore, notions of privacy can be found in most preliterate and modern societies, which indicates a universal value of privacy. However, feminist anthropologists assume that privacy constitutes primarily a realm created by man in order to exert power over women. From a psychoanalytic point of view Freud helps us to realise that there is an unconscious dimension to privacy which lies beyond one’s awareness. Eventually, the influence of the early liberal movement on the notion of privacy has to be considered paving the way of today’s negative and positive privacy right conception. The social perspective aims to depict the tension between private and public. Being Hobbes’ homo homini lupus and Aristotle’s zoon politicon at once, humans face the hedgehog’s dilemma, craving solitude and gregariousness at the time. However, privacy is not only challenged by the personal dilemma but also by conflicting social values
Privacy: What Are We Actually Talking About?
11
such as national security, transparency or free speech. That is why a transparent and effectively designed balancing process must be followed in cases of curtailing privacy; otherwise, as for example Thomas Jefferson stated, there is a risk that “he who trades liberty for security [...] loses both.” The economic perspective resorts to an actor-centred approach distinguishing between data subject and data controller. In the case of the first a dual choice model of “disclosing” or “retaining” personal information presents the options of action, whereas the latter has to consider the costs and benefits of collecting, aggregating, storing and processing data as well as of potential privacy breaches. In disclosing personal information, the data subject is often confronted with costs that are neither easy to identify nor simple to operationalise and quantify. Nonetheless, more and more individuals are facing privacy problems resulting from their lax attitude towards sharing private information or being forced to disclose personal data. These problems include the risk of being a subject to social sorting or other discriminatory practices. Giving away personal information increases the threat for the data subject of becoming a victim of online and offline crime. Other cost factors involve embarrassment, discomfort, annoyance, etc. But there are also a variety of benefits for the data subject resulting from the disclosure of personal data. One of the most important advantages is an increased level of convenience meaning relatively small rewards such as discounts, free Internet services, etc. In retaining personal information, the data subject bears, of course, the costs of not-receiving the benefits for disclosing his/her personal data. Since data protection implies to hold back certain information, individuals who are reluctant to disclose personal data could furthermore be suspected of being loners who want to hide something from the public. In today’s digital society the refusal of sharing personal information could therefore easily lead to an isolation problem. Nonetheless, there are important benefits of retaining informational privacy. First of all, privacy serves in general as a defensive right against intrusions of others creating a protective sphere around the individual. Privacy as a positive right enables ideally the data subject to exercise control over his/her information. Westin’s four functions of privacy include personal autonomy, emotional release, self-evaluation and limited as well as protected communication.[7, pp. 32] Though mostly immaterial, these benefits are much more relevant, profound and complex than economic theory is being able to grasp. The data controller on the other hand faces various material and personnel expenses of aggregating, storing and processing personal data such as costs for property rights (if considered), compliance with state regulations and information security. But in fact, the lucrative benefits outweigh the costs by far. The maxim scientia potentia est (“for also knowledge itself is power”) could not be more appropriate explaining the strategy behind the data controller’s rampant collection behaviour of personal data. In the information society data itself has become one of the most valuable commodities. In analysing data of (potential) customers, companies, for instance, are far better off calculating and minimising risks. Aiming to understand the mechanisms behind individual purchase behaviour, commercial data controllers are able to reduce transaction costs immensely. The rapidly growing online advertising industry is just one example of business that profits in a remarkable way from collecting digitally consumer information.
12
P. Schütz and M. Friedewald
There is, however, a major downside to collecting personal data. Privacy and data breaches can have devastating consequences for data controllers such as legal actions, slowing down or even stopping whole business processes, but also in a long-term perspective damaging the data controller’s image and trust relationships with customers as well as suppliers. In conclusion, providing a comprehensive overview of three different perspectives, this paper attempts to contribute to a better understanding of privacy. The multidisciplinary analysis of privacy represents a difficult task since the different approaches have huge overlaps and are at least in parts difficult to distinguish from each other. Nonetheless, the interacting scientific points of view uncover neglected aspects and give revealing insights into the multifaceted concept of privacy. Summing up the most important findings of the different approaches, it seems that privacy has an intrinsic universal value that does not, however, stand for itself. Privacy is part of a complex social value system which components need to be kept in balance. Since economic theory reaches surprisingly fast its limits, ethical and social aspects needs to be integrated into the analysis. The economic discourse shows as well that privacy implies control and therefore power which the data subjects such as citizens but also consumers should be made aware of.
Acknowledgements This work was carried out in the context of the EU-funded FP7 project PRESCIENT: Privacy and Emerging Sciences and Technologies (SIS-CT-2009-244779). Important input came from David Wright (Trilateral Research & Consulting), Emilio Mordini and Silvia Venier (Centre for Science, Society and Citizenship).13
References 1. Nissenbaum, H.: Privacy as contextual integrity. Washington Law Review 79, 101–139 (2004) 2. Friedewald, M., Wright, D., Gutwirth, S., Mordini, E.: Privacy, data protection and emerging sciences and technologies: Towards a common framework. Innovation: The European Journal of Social Science Research 23, 63–69 (2010) 3. Ford, R.G., Krumme, D.W.: The analysis of space use patterns. Journal of Theoretical Biology 76, 125–155 (1979) 4. de Waal, F.B.: Putting the altruism back into altruism: The evolution of empathy. Annual Review of Psychology 59, 279–300 (2008) 5. Haney, C.: Mental health issues in long term solitary and supermax confinement. Crime and Delinquency 49, 124–156 (2003) 6. Moore Jr., B.: Privacy: Studies in social and cultural history. M.E. Sharpe, Armonk (1984) 7. Westin, A.F.: Privacy and freedom. Atheneum, New York (1967) 8. Friedan, B.: Feminist Mystique. W. W. Norton & Co., New York (1963) 9. Elshtain, J.B.: Public Man, Private Woman. Princeton University Press, Princeton (1981) 10. Pateman, C.: The Sexual Contract. Stanford University Press, Stanford (1988) 13
For more information see: http://www.prescient-project.eu
Privacy: What Are We Actually Talking About?
13
11. Freud, S.: Zeitgemässes über Krieg und Tod. Imago: Zeitschrift für Anwendung der Psychoanalyse auf die Geisteswissenschaften 4, 1–21 (1915) 12. Freud, S.: Das Unheimliche. Imago: Zeitschrift für Anwendung der Psychoanalyse auf die Geisteswissenschaften 5, 297–324 (1919) 13. Warren, S.D., Brandeis, L.D.: The right to privacy. Harvard Law Review 4, 193–220 (1890) 14. Schopenhauer, A.: Parerga und Paralipomena: Kleine philosophische Schriften. Erster Band. Verlag A. W. Hahn, Berlin (1851) 15. Solove, D.J.: Understanding privacy. Harvard University Press, Cambridge (2008) 16. Barak, A.: Proportionality and principled balancing. Law and Ethics of Human Rights 4, Article 1 (2010), http://www.bepress.com/cgi/viewcontent.cgi? article=1041&context=lehr 17. Aquilina, K.: Public security versus privacy in technology law: A balancing act? Computer Law and Security Review 26, 130–143 (2010) 18. de Tocqueville, A.: Democracy in America, vol. 1. George Adlard, New York (1839) 19. Directive 95/46/EC of the European Parliament and of the Council of October 24, on the protection of individuals with regard to the processing of personal data on the free movement of such data. Official Journal of the European Communities L 281, 31–50 (1995) 20. Laufer, R.S., Wolfe, M.: Privacy as a concept and a social issue-multidimensional developmental theory. Journal of Social Issues 33, 22–42 (1997) 21. Acquisti, A., Grossklags, J.: Privacy attitudes and privacy behavior: Losses, gains, and hyperbolic discounting. In: Camp, L.J., Lewis, S. (eds.) The Economics of Information Security, pp. 165–178. Kluwer, Dordrecht (2004) 22. Harvey, M.: Please RobMe website highlights dangers of telling world your location. The Times, February 19 (2010), http://technology.timesonline.co.uk/tol/ news/tech_and_web/the_web/article7032820.ece 23. Berendt, B., Günther, O., Spiekermann, S.: Privacy in e-commerce: Stated preferences vs. actual behavior. Communication of the ACM 48, 101–106 (2005) 24. Grimmelmann, J.: Privacy as product safety. Widener Law Journal 19, 793–827 (2010), http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1560243 25. Grossklags, J., Acquisti, A.: When 25 cents is too much: An experiment on willingness-tosell and willingness-to-protect personal information. In: Proceedings of the Sixth Workshop on the Economics of Information Security (WEIS 2007), Pittsburgh, PA (2007) 26. Lenard, T.M., Rubin, P.H.: In defense of data: Information and the costs of privacy. Policy and Internet 2, Article 7 (2010), http://www.psocommons.org/cgi/viewcontent. cgi?article=1035&context=policyandinternet 27. Roussopoulos, M., Beslay, L., Bowden, C., Finocchiaro, G., Hansen, M., Langheinrich, M., Le Grand, G., Tsakona, K.: Technology-induced challenges in privacy and data protection in europe. A report by the ENISA ad hoc working group on privacy and technology. European Network and Information Security Agency, Heraklion (2008) 28. Solove, D.J.: “I’ve got nothing to hide” and other misunderstandings of privacy. St. Diego Law Review 44, 745–772 (2008), http://papers.ssrn.com/sol3/papers.cfm? abstract_id=998565 29. Swire, P.P.: Efficient confidentiality for privacy, security, and confidential business information. Brookings-Wharton Papers on Financial Services, 273–310 (2003) 30. Harizopoulos, S., Shah, M.A., Meza, J., Ranganathan, P.: Energy efficiency: The new holy grail of data management systems research. In: 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, January 4-7 (2009) 31. Hui, K.L., Png, I.: The economics of privacy. In: Hendershott, T. (ed.) Economics and Information Systems. Handbooks in Information Systems, vol. 1, pp. 471–498. Elsevier Science, Amsterdam (2006)
14
P. Schütz and M. Friedewald
32. Volkman, R.: Privacy as life, liberty, property. Ethics and Information Technology 5, 199– 210 (2003) 33. Tsiakis, T., Stephanides, G.: The economic approach of information security. Computers & Security 24, 105–108 (2005) 34. Acquisti, A., Friedman, A., Telang, R.: Is there a cost to privacy breaches? An event story. In: The Fifth Workshop on the Economics of Information Security (WEIS 2006), Cambridge, UK (2006) 35. Sonnenreich, W., Albanese, J., Stout, B.: Return on security investment (rosi) - a practical quantitative model. Journal of Research and Practice in Information Technology 38, 45–56 (2006) 36. Anderson, R.J., Moore, T.: The economics of information security. Science 314, 610–613 (2006) 37. Graeff, T.R., Harmon, S.: Collecting and using personal data: consumers’ awareness and concerns. Journal of Consumer Marketing 19 (2002) 38. Hildebrandt, M., Gutwirth, S.: Profiling the European Citizen: Cross-Disciplinary Perspectives. Springer, Dordrecht (2008) 39. Evans, D.S.: The online advertising industry: Economics, evolution, and privacy. Journal of Economic Perspectives 23, 37–60 (2009)
Implementability of the Identity Management Part in Pfitzmann/Hansen’s Terminology for a Complex Digital World Manuela Berg and Katrin Borcea-Pfitzmann Technische Universit¨ at Dresden, Faculty of Computer Science D-01062 Dresden, Germany {manuela.berg,katrin.borcea}@tu-dresden.de http://dud.inf.tu-dresden.de
In memory of Andreas Pfitzmann Abstract. Based on a widely cited terminology, this paper provides different interpretations of concepts introduced in the terminology asking for an implementable privacy model for computer-mediated interactions between individuals. A separation of the digital world and the physical world is proposed, as well as a linkage of the two worlds. The digital world contains digital representations of individuals and it consists of pure data. The physical world contains individuals and it consists of information (produced by individuals) and data. Moreover, a refined definition of privacy is being elaborated that serves as justification for identity management of individuals interested in a sophisticated perspective of privacy.
1
Introduction
Since time immemorial, relationships between individuals have been a need of mankind. Interaction, meaning mutual action and communication, between individuals is the basis for establishing a social system [1]. During the last decades, technology rapidly developed. Computer-mediated interaction became important and reached its current high point with the Web 2.0 movement. In contrast to early computer-mediated interaction, users are now more and more active in content and application production. User profiles and content of web applications are sources of personal data. Apart from explicit publishing of personal data, users also implicitly disclose personal attitudes, opinions, or even personal statements within bulletin boards, blogs etc. Moreover, users publish data non-related to themselves. In many cases, users are not aware of the risks connected with the possibilities of linking different sources of information. Applications are usually built to achive a certain functionality (such as communication). Privacy is usually considered as a secondary (or even tertiary) functionality, which is valuable for individuals if the primary functionality is fulfilled. Consequently, privacy aspects are not involved in many applications. An S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 15–26, 2011. c IFIP International Federation for Information Processing 2011
16
M. Berg and K. Borcea-Pfitzmann
identity management system is an application, which fulfills privacy aspects as primary functionality. This could be a reason, why identity management systems are not accepted in the public. No matter, which importance the functionality of privacy has, it is of importance. We elaborate a definition of privacy based on existing approaches. We consider computer-mediated interaction triggered by individuals1 . On the one hand, this interactions generate networks of individuals. We refer to the amount of individuals as physical world. On the other hand, computer-mediated interactions generate networks of digital representations of the individuals, which we refer to as digital world.2 Of course, individuals and their digital representations are linked, so they are not independent from each other. The digital world in all its complexity and its linkage to the individuals is what we refer to when talking about a complex digital world. The authors of [2] introduce the concept of privacy-enhancing identity management using partial identities, i.e., subsets of an individual’s digital identity, to tackle potential privacy problems caused by linking information from different sources to the individual’s identity. Related to this, Pfitzmann and Hansen continuously work on a comprehensive terminology framing the area of privacy-related terms [3]. The objective is to start an analysis of that widely accepted and cited terminology from the perspective of the complex digital world. In this paper, we reason that two worlds have to be distinguished, the physical world and the digital world. The two worlds are linked by two functions – Ego( ) and Oge( ). While the latter maps from the physical world to the digital world and delegates the individual’s communication to the digital representation, the former indicated function maps from the digital world to the physical world and carries information of the communication of digital representations back to the individuals. Moreover, we propose an extended definition of privacy, which is described as a state attained by a desire of an individual. Given computer-mediated communication, this state influences the communication of an individual and its digital representations with others in the complex digital world. Considering the building of applications and the separation of the digital and the physical world, we give a definition of privacy. We also discuss this definition regarding concepts of privacy as given in [4] and [5]. Accordingly, the following section will frame the problem space by briefly introducing the kind of complexity shaping the complex digital world from an analytical point of view and stating the research questions being the road map for the following sections. In Sect. 3, we discuss the Pfitzmann/Hansen terminology by confronting it with the setting of the complex digital world. Finally, Sect. 4 takes up the issues figured out in the previous section and discusses the separation of the physical as well as the digital worlds. Further, this section proposes an 1 2
Organizations and computers can be imagined to trigger computer-mediated interaction, too, but we do not consider this case here. Besides digital representation, there might be other items in the digital world that we do not consider in this paper.
Implementability of Pfitzmann/Hansen’s Terminology
17
enhanced definition of privacy applied on personal interaction as well as on computer-mediated interaction, which is discussed in terms of known concepts of privacy.
2
Framing the Problem Space
Regarding the complex digital world, individuals are considered as actors in the physical world and digital representations of the individuals are considered as actors in the digital world. We call an actor entity. Further, we call connections between entities resulting from possible or occurred communication a relationship. With respect to an established relationship, we call an entity that participates in this relationship involved entity. Of course, relationships might be created even if there has not been any direct communication between the involved parties yet. Consequently, those relationships can be manifested by the involved entities or by non-involved entities. Traditionally, communication is modeled using the bilateral sender-receiver model and privacy has been studied mainly in scenarios involving service providers and service consumers. We argue that communication, and so interaction, is more complex: (a) interaction does not only take place between one particular user and one or more service provider(s). In this paper, we focus on interactions between individuals3 . The implications on privacy protection of them have not yet been studied in detail. (b) interaction does not only occur bilaterally. Given that at one particular point in time, there exists one sender at maximum for a message, there might, nevertheless, be several recipients (or zero or only one) for this message. Interactions constitutes a dynamic system, which develops over time. An interaction is usually describable by a bunch of message that are sent by different individuals. In the sender-receiver model, more than one message can be regarded only if the messages are considered as a totally ordered sequence while in the complex digital world, messages can be transmitted concurrently. However, there is a certain order of the messages (for example ordered by the time of sending). We call a bunch of messages interaction, if one recipient of a former message becomes a sender and the sender of a former message becomes a receiver. Consequently there are usually several senders in an interaction. (c) between two entities, there might exist relationships of different quality and quantity. We defined a relationship between entities as a connection resulting from communication independent from any semantics and quantity. Further research should extend the understanding of relationships by more differentiations, i.e., by considering relationship semantics such as “is friend of” or “is colleague of” and take into account frequencies and durations of communications as well as preferred means of communication. 3
Usually, entities comprise organizations and technical devices, apart from individuals.
18
M. Berg and K. Borcea-Pfitzmann
(d) to be mentioned here is that entities might act differently in different situations. The authors in [6] address the above mentioned issues. They introduce the concepts of entity, view, relationship, and context as parts of a model that touches various aspects of a complex digital world. Regarding the analysis of the Pfitzmann/Hansen terminology [3], this paper particularly focuses on entities and relationships. In the complex digital world as outlined above, we raise the following research questions: 1. Are there concepts in [3] that make it necessary to separate individuals and their digital representation? 2. What precisely are the entities, and how are entities characterized when transcending the borders of the physical and digital world? 3. What are the interests of entities regarding the disclosure of personal information? 4. Having learned about entities and their interests, does this help us to develop an application that supports entities pursuing their interests?
3
Analyzing the Notion of Identities in the Terminology of Pfitzmann/Hansen
In 2000, Pfitzmann and K¨ ohntopp4 published a terminology describing the concepts of anonymity, unobservability, and pseudonymity [7]. In June 2004, they extended their terminology collection with definitions of terms related to identity management with regard to data minimization and user-control. Since then, the authors continuously extended and refined that terminology. The most recent version [3] serves as basis for our analysis regarding the questions raised in Sect. 2. In [3], the traditional sender-receiver model is assumed, where usually one message is considered and a subject is defined as a human being, a legal person or a computer. Different kinds of relationship and different situations are not considered. Table 1 compares the setting of the Pfitzmann/Hansen’s terminology with the setting of the complex digital world. We analyze the questions raised in Sect. 2: 1. Anonymity in [3] is defined as follows. Definition 1. “Anonymity of a subject from an attacker’s perspective means that the attacker cannot sufficiently identify the subject within a set of subjects, the anonymity set.” Subject is defined in [3] and covers the physical body with its feelings, needs and also its digital representation. The German law for data protection 4
Taking identity management serious, Hansen and K¨ ohntopp are names for the same person.
Implementability of Pfitzmann/Hansen’s Terminology
19
Table 1. The comparison of the complex digital world and the model of Pfitzmann/Hansen Comparison Property
Pfitzmann/Hansen’s Terminology
Complex Digital World
a) service providers
considered
not considered yet
number of receivers number of senders
one arbitrary one or as a sequence of arbitrary message sendings considered not considered yet
b)
companies and organizations number of messages concurrence of messages
one or many in a sequence arbitrary number only sequential messages arbitrary concurrence
c) different kinds of relationships
not considered
considered
different situations
not considered
considered
d)
[8, §3 (6)] defines anonymization as the act of transforming personal data in a way to make it difficult or impossible to link personal data to the individual. Since an attacker can be settled either in the physical world or in the digital world (depending on what the attacker wants to find out) and since personal data is part of the individual’s digital representation, we can distinguish three types of anonymity: (a) An individual is anonymous in a set of individuals, which means either that an attacker in the physical world wants to identify an entity in the physical world. (b) Anonymity is regarded as unlinkability between an individual and his digital representations, which either means that an attacker in the physical world (which has control over his digital representations) wants to find out which digital representations belong to a physical entity (or which physical entity belongs to a digital representation) or it means that the attacker in the physical world wants to link some digital representations belonging to the same entity or the same group, e.g., the group of employees of one company. (c) A digital representation is anonymous in a set of digital representations, which means that an attacker in the digital world wants to identify an entity in the digital world. Depending on the respective situation, an individual has different preferences towards the kind of anonymity to be protected most. The three types of anonymity are not independent from each other. If, for example, a digital representation is not anonym and it is also linkable to its individual, then the corresponding individual can not be anonymous neither. Further, we have to admit that representing an individual just by data might not be enough to characterize this individual (see Sect. 4).
20
M. Berg and K. Borcea-Pfitzmann
2. Referring to individual persons, the authors of [3] define identity as follows: Definition 2. (Identity) “An identity is any subset of attribute values of an individual person which sufficiently identifies this individual person within any set of persons.” The definition implies that every individual may have more than one identity. We will reveal different interpretations of this concept. (a) We can consider identity as a socio-centric concept. This means that identities are omnipresent. For example, an all-knowing entity would consider the information he knows about an individual as an individual’s identity. This means that information gained from perceptions of this individual about his interaction partners might be included in his identity. Such an approach of understanding the concept of identity might be interesting for service providers, but it is not reasonable for individuals caring for their privacy because individuals should have control of their data. (b) The term identity can also be considered as an ego-centric concept. For an individual that wants to show himself differently to different people, this is the more appropriate approach. However, this implies even more questions: If we assume that an individual establishes an identity of himself, then this is a digital representation of himself. This approach of defining identity, however, misses the aspect of perceptions. We will refer to the individual’s perception of an interaction partner as view of an individual on an interaction partner. The concept of a view can be realized as a set5 in an application. On the one hand, an individual’s digital representation has views on its interaction partners. So the question arises, whether these views are part of the identity established by the individual. On the other hand, the interaction partner has a particular view of the individual. This leads to the question whether the view can be compared with the identity and if they can be compared, then one may ask when do the identity and the view coincide. 3. Every individual has his own attitudes regarding his privacy. When discussing privacy aspects, Pfitzmann and Hansen refer to the definition given in [9]: Definition 3. (Privacy) “Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” In our view, this definition misses some aspects of variety. Nevertheless, the sentences following the cited definition in [9] contain some of our intended aspects. For example, negotiation between functionality and non-disclosure of data as well as the dependency of situations are two aspects that are mentioned and shall be included in an extended definition (see Sect. 4). 5
A View might also be an empty set.
Implementability of Pfitzmann/Hansen’s Terminology
21
4. Pfitzmann and Hansen embed the definitions of their terminology in an environment aiming at achieving privacy by data minimization through usercontrolled identity management. However, the terminology provides little information of how to implement the concepts of privacy protection when designing applications for the digital world. The above-mentioned issues describe first problems that can be solved by separating the physical and the digital world. Consequently, the following section points out a first approach to model privacy and identity-related concepts in a complex digital world.
4
Approach to Model a Complex Digital World with Respect to Privacy and Identity Management
We propose a model for a complex digital world referring to the problems raised in Sect. 3. The three types of anonymity elaborated there require a separation of the digital representation from the individual of the physical world. We call the digital representation digital entity and the individual physical entity. Similarly, Semanˇc´ık distinguishes the notions of entities in the digital world and in the physical world as follows: native digital entities (e.g., software components), physical entities (persons of the physical world), and digital proxies (digital representations of a physical entity) [10]. Thereby, digital proxies (data structures characterizing the entity) are representations of the physical entities. Also, a physical entity may have several native digital entities and several digital proxies in “the same or in a different computer system.” We neglect the notion of native digital entities here because we are interested in the (inter)action of the entities. Moreover, we think that native digital entities and digital proxies can be subsumed under the set of digital entities. Data and information were discussed by several authors as parts of the knowledge hierarchy for information systems and knowledge management. Two possible differentiators between data and information have been identified in [11]: meaning and structure. So, information is defined either as data that have been given meaning (or a value) or as data that has been processed or organized. The choice of the differentiator implies consequences where information is embedded: either in systems or in individuals’ minds or in both. Usually, information defined by meaning either is said to be produced by individuals or nothing is said about the producer of information. We refer to meaning as the differentiator and assume that information can only be produced by individuals6 . So, the digital entity consists of pure data while the physical entity can produce information. More precisely, the physical entity has feelings and needs (e.g., regarding communication and privacy) and he is able to interpret data, i.e., produce information. 6
If one considers organizations then they produce information only by individuals. Computers can not produce information at all. But individuals can use data out of computers to get information.
22
M. Berg and K. Borcea-Pfitzmann
In [12], a standardization of data and information was given which has been rejected in the meanwhile. The definitions of data and information are broadly discussed as a part of the wisdom hierarchy (also known as DIKW hierarchy), e.g., in [13], [14] and [15]. We refer to the definitions of [13] as given in [11]: Data are defined as symbols that represent properties of objects, events and their environment. They are the products of observation. But are of no use until they are in a useable (i.e. relevant) form. The difference between data and information is functional, not structural. Information is contained in descriptions, answers to questions that begin with such words as who, what, when and how many. Information systems generate store, retrieve, and process data. Information is inferred from data.
information
physical world physical entity
physical entity
& H P
0 H F
& H P
0 H F
digital devices
digital entity
digital world
sending a message
digital entity
data
Fig. 1. The world of data and the world of information are connected by the functions Oge( ) and Ego( )
The digital entities exist in the digital world, which is a world of pure data, and the physical entity is part of the physical world (cf. Fig. 1), where information is part of. The digital world and the physical world are not independent from each other, as it will be explained in the following. Considering an individual A that interacts with an individual B, we assume that, first, A sends a message to B. A decides which message(s) he wants to send and how. He also decides which of his personal data he wants to reveal. But, there might be details about himself that he reveals implicitly or not intentionally. All the matters related to the action of sending can be explicitly fixed in the digital world as data. Those matters are details A reveals about himself consciously and items that A knows (maybe or for sure) or only believes to know about B.
Implementability of Pfitzmann/Hansen’s Terminology
23
Modeling how A delegates the communication from the physical world to the digital world, we introduce the function Oge( ). It is a function which transforms information into data by creating according data structures. If A receives a message from B, then A probably receives new data apart from data that A already stored. Depending on historical knowledge and other parameters such as communication time, place, or the mood of A, the function Ego( ) models how A transforms data into information7 . In general, for different physical entities, the functions Oge( ) and Ego( ) are not the same. Depending on the derivations gotten in Sect. 3 regarding the term identity, one could consider identity as being part of the concept of entity. That means that we have physical identities and digital identities. The existence of physical identities of an individual can be justified by sociological approaches as elaborated in [16]. A digital identity can be described as subset of a digital entity which sufficiently identifies the corresponding entity. From a social point of view, a physical entity is usually interested in whether its digital entities established in the digital world can be linked to the physical entity and whether other physical entities can conclude anything about this particular entity and what. In this respect, we propose a definition of privacy in a complex digital world: Definition 4. (Privacy) Privacy of a physical entity is the result of negotiating and enforcing when, how, to what extent, and in which context which of its data is disclosed to whom. This definition is not limited to considerations in a complex digital world and is achieved considering the following aspects: 1. The understanding of privacy in [9] includes the following sentence: “Thus each individual is continually engaged in a personal adjustment process in which he balances the desire for privacy with the desire of disclosure and communication of himself to others, in light of the environmental conditions and social norms set by the society in which he lives.” It consists of the idea of negotiating the desires for privacy and communication by the physical entity with himself. We believe that negotiation also takes place between the physical entity and other physical entities. We subsume these aspects by the term “negotiating” in Definition 4. 2. Further, we add the term “enforcing” to the privacy definition because privacy is not only a desire but it is a state resulting from the negotiation process. Violation of privacy can, therefore, be explained as a lacking ability of enforcement. 7
The functions are labelled Oge( ) and Ego( ), because “Ego” shall indicate that the way how an individual derives information from data is an essential part the (identity of the) individual.
24
M. Berg and K. Borcea-Pfitzmann
3. We usually differentiate to whom exactly we disclose which information. We do not see all the other physical entities as a mass. Hence, we replace others by “whom” in Definition 4.8 4. The involvement of situations is already mentioned by [9] (see above). The concept of “context” is an appropriate concept for modeling situations. Context as a concept required in information technology research is typically described with respect to locations [17]. Regarding privacy, the concept needs to be elaborated with respect to user-control (cf. [18]) or integrity (cf. [19]). The definition of privacy emphasizes the property of being a state as well as the importance of user control. Especially in respect to the latter, our definition is similar to the definition given by [9]. In [4] and [5], several concepts of privacy are described. In [4], Solove found the following conceptions: 1. 2. 3. 4. 5. 6.
The Right to Be Left Alone, Limited Access to the Self, Secrecy, Control Over Personal Information, Personhood, and Intimacy,
where 3. and 4. are subsets of 2. in [4]. DeCew distinguishes informational privacy, accessibility privacy and expressive privacy [5]. These concepts of privacy are subsumed under 2., 4. and 5. by Solove. Definition 4 matches the concepts of DeCew since informational privacy is covered by the data that is processed and accessibility matches the user control in this definition. Automatic or semiautomatic decision making would not omit this point unless the user keeps control. Since this definition is valid for computer-mediated communication, the body can not be accessed, only the mental properties are concerns of accessibility in the sense of DeCew. The expressive privacy is covered due to possible modification in context, the data itself, and the negotiation. The Right to Be Left Alone is difficult to cover, since not every intrusion is an invasion of privacy. With the differentiation of an individual’s privacy concerns, one can see the separation into identities as an implication of the privacy definition. Similarly, the separation of identities (resulting in partial identities) has been identified as one of the important paradigms of privacy (apart from privacy as confidentiality and privacy as self-determination) in [20].
5
Conclusion
In this paper, we proposed the separation of the physical world and the digital world as well as a linkage between the two worlds by introducing the functions Oge( ) and Ego( ). An enhanced definition of privacy is given for physical entities. 8
If one assumes that organizations have privacy, then the definition should use the word ”whom” not only for individuals but also for organizations or computer. Another, even more flexible possibility is the replacement of “whom” by “which entity”.
Implementability of Pfitzmann/Hansen’s Terminology
25
At this point of work, we did not yet elaborate on the notion of partial identities being an essential part of the identity-related terminology by Pfitzmann/Hansen. This needs to be done in further research. Also, we did not characterize what exactly entities in the physical world could be. At the current point of research, we restrict our considerations to individuals only and did not elaborate on organizations or legal persons. This was mainly addressed in privacy considerations until now. Moreover, groups of entities represent a further challenge if they are physical entities themselves and control their digital (group) entities. Further research will address elaborations regarding the relationships between entities as well as dynamics based on time. In our paper we apply the definition of Westin to yield a refined definition of privacy. However, scientific literature covers more definitions of privacy. Depending on the context or domain, those definitions of privacy could also serve as starting points for discussion with respect to the digital and physical worlds. Regarding the definitions of entities (digital and physical), formal methods could be applied to allow more detailed analysis of privacy in a complex digital world. For example, social network analysis using graph theory and matrix representation are approaches to model entities and their relationships, including relationships of different kinds. Game theory would be another modeling approach. We would like to thank Professor Andreas Pfitzmann, Stefan K¨ opsell and the three unknown reviewers for lots of discussions and comments.
References 1. Luhmann, N.: Soziale Systeme: Grundriss einer allgemeinen Theorie. Suhrkamp Verlag, Frankfurt am Main (1987) 2. Clauß, S., K¨ ohntopp, M.: Identity management and its support of multilateral security. Computer Networks, Special Issue on Electronic Business Systems 37, 205–219 (2001) 3. Pfitzmann, A., Hansen, M.: A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management (2010) http://dud.inf.tu-dresden.de/Anon_Terminology.shtml (Version v0.34 of August 10, 2010) 4. Solove, D.J.: Conceptualizing Privacy. California Law Review 90, 1087–1155 (2002) 5. DeCew, J.W.: In Pursuit of Privacy: Law, Ethics and the Rise of Technology. Cornell University Press, Ithica (1997) 6. Borcea-Pfitzmann, K., Pekarek, M., Poetzsch, S.: Model of Multilateral Interactions. Technical report, EU Project PrimeLife Heartbeat (2009) 7. Pfitzmann, A., K¨ ohntopp, M.: Anonymity, Unobservability, and Pseudonymity A Proposal for Terminology. In: Federrath, H. (ed.) Designing Privacy Enhancing Technologies. LNCS, vol. 2009, pp. 1–9. Springer, Heidelberg (2001) 8. Bundesdatenschutzgesetz (1990) (version 14.08.2009) 9. Westin, A.F.: Privacy and Freedom. Atheneum, New York (1967) 10. Semanˇc´ık, R.: Basic Properties of the Persona Model. Computing and Informatics 26, 105–121 (2007)
26
M. Berg and K. Borcea-Pfitzmann
11. Rowley, J.: The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information Science 33(2), 163–180 (2007), doi:10.1177/0165551506070706 12. DIN ISO/IEC 2382-1 Information technology – Vocabulary – Part 1: Fundamental Terms (1993) 13. Ackoff, R.L.: From data to wisdom. Journal of Applied Systems Analysis 16, 3–9 (1989) 14. Davenport, T.H., Prusak, L.: Working Knowledge. Harvard Business School Press, Boston (1998) 15. Bellinger, G., Castro, D., Mills, A.: Data, Information, Knowledge, and Wisdom (2004), http://www.systems-thinking.org/dikw/dikw.htm (accessed at July 1, 2010) 16. Goffman, E.: The Presentation of Self in Everyday Life. Anchor Books, New York (1959) 17. Schilit, B., Adams, R., Want, R.: Context-Aware Computing Applications. In: First Workshop on Mobile Computing Systems and Applications, p. 8590. IEEE, Los Alamitos (1994) 18. Nissenbaum, H.: Privacy as Contextual Privacy. Washington Law Review 79, 119– 158 (2004) 19. Pfitzmann, A., Borcea-Pfitzmann, K., Berg, M.: Privacy 3.0:= Data Minimization + User-Control of Data Disclosure + Contextual Integrity. it – Information Technology 53(1), 34–40 (2011) 20. G¨ urses, S.: Multilateral Privacy Requirements Analysis in Online Social Network Services. Dissertation, Katholieke Universiteit Leuven (2010)
Towards a Formal Language for Privacy Options Stefan Berthold Karlstad University, 651 88 Karlstad, Sweden
[email protected] Abstract. Describing complex ideas requires clear and concise languages. Many domains have developed their specific languages for describing problem instances independently from solutions and thus making a reference model of the domain available to solution developers. We contribute to the zoo of domain-specific languages within the privacy area with a language for describing data disclosure and usage contracts. Our Privacy Options Language is defined by a small number of primitives which can be composed to describe complex contracts. Our major contribution is the notion of contract rights which is based on the notion of obligations and therefore establishes both concepts as first-class language citizens in a new coherent model for privacy policy languages. Our model overcomes the traditional separation of the right and obligation notions known from access control based policy language approaches. We compare our language to the PrimeLife Policy Language and provide rules for the translation from our language to PrimeLife’s language. Then, we present a canonical form of our contracts. It is used to ensure that contracts with equal semantics have the same syntax, thus eliminating the possibility of a covert channel in the syntax revealing information about the originator. Finally, we show different ways of how to extend our language.
1
Introduction
Informational self-determination means that individuals have control over their personal data. Control in this context particularly means that individuals can access, correct, and possibly delete their personal data stored in, e. g., customer databases of companies or institutions, and object to disclosure of their data to third parties. Control, however, can also mean the possibility to agree on data disclosure in exchange for a reasonable compensation. An interesting idea has been proposed by Laudon [1]: in his scenario, a regulated “national information market” is the only place where personal information is traded between institutions, their customers, and third parties. The market thus becomes a single point of control where individuals and institutions can exercise control and claim their rights on an equal level. This approach is particularly interesting since it uses the strengths of three different fields in order to obtain privacy: regulation for creating a safe environment where crime can be sued after the fact, technology for authorisation of data usage, and a market for determining fair prices for the data. Laudon assumes that customer data is S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 27–40, 2011. c IFIP International Federation for Information Processing 2011
28
S. Berthold
stored by companies and institutions which may wish to use it for different purposes later. Taylor [2] names individual pricing and targeted advertising, among others, as reasons for letting stored customer (and consumer) data become highly valuable for companies and institutions. In order to keep in line with the legislation, institutions that intend to re-use data for another purpose have to return to the individual who will have to give consent and will claim compensation for the new usage of his data. Again, an ideal market would determine a fair price for the data. These scenarios share the notion that personal data may be used long after its disclosure. Taking the view of Laudon [1], individuals should receive a compensation depending on the benefit an institution gains by using the individual’s personal data. Determining the value of a fair compensation is, however, not necessarily easy [3]. In particular, if we cannot assume that individuals can directly control the use of their data after the disclosure, they have to anticipate the consequences of the data disclosure at the time of the disclosure. Part of this problem has been discussed by Berthold and B¨ohme in [4]. An important means for anticipating the consequences of data disclosure is an unambiguous language for describing all rights and obligations connected to the data disclosure. Such a language can be used by both, the individual that discloses data and the institution that receives the data. The individual will use the language for determining clear conditions of the data usage, e. g., limiting it to a specific purpose and possibly to a time frame. The institution can use the language as a management reference that determines under which conditions the data may be used for specific purposes and when data may not be used anymore. Given an appropriate legal framework (e. g., the one suggested in [1]), statements of this language would even form contracts with legal rights and obligations for the individual and the institution. This idea has been extensively explored by Laudon [1] and backed up by the results of Hann et al. [5]. In more recent work, Berthold and B¨ ohme [4] elaborate on the similarities of contracts and data disclosure. The concrete specification of a suitable contract language for this purpose, however, is to the best of our knowledge still an open research question. In order to fill this gap, we will turn our attention to specify a formal language for data disclosure contracts. The following requirements shall be met by our language (hence referred to as POL): – expressive power to capture the notion of Privacy Options [4] – expressive power to capture existing approaches such as “Sticky Policies” as used in the PrimeLife Policy Language (PPL) [6] – easy extensibility and scalability (of syntax and semantics) Our approach is driven by the privacy measurement perspective which is mainly presented in [4] while the notion of Sticky Policies in PrimeLife’s PPL [6] has mainly developed with the (identity) management perspective in mind. Both approaches thus target similar goals. The major difference between both approaches is that PPL is an access control language, thus specifying which accesses are
Towards a Formal Language for Privacy Options
29
allowed and which not, and POL is a contract language, thus specifying commitments to data disclosure and usage. Moreover, the strict separation on a semantical as well as a syntactical level between rights and obligations known from PPL does not exist in our language. POL is rather building the notion of rights on the notion of obligations. The rest of this paper is structured as follows. Section 2 defines the language primitives and demonstrates how these primitives can be combined to contracts. In Section 3, we define an operational semantics for POL which describes how the language can be translated into specific data management actions in order to satisfy the contract. In Section 4, we discuss how and to which extent it is possible to translate POL to PPL. A dialect of POL, POL− , is defined and translation rules are given. In Section 5, we present a canonical form of POL contracts. An abstract rewriting system is defined which translates any POL contract to its canonical form. In Section 6, we show how to extend POL in different ways. Section 7 concludes this paper.
2
Privacy Option Language
We are indeed not the first specifying a formal language. Tool support for language specification has been grown large in recent years and so has the number of domain-specific languages. In order to avoid redundancy with previous work, we build upon an existing language for describing financial contracts, proposed by Peyton Jones and Eber [7], and adapt it for our purposes. The domain of financial contracts turns out to be quite similar to the one of Privacy Options [4]. Thus, we refer to our language as Privacy Option Language (POL). Like the language in [7], our language consists of a small number of primitives1 with basic semantics. For instance, a contract c1 that settles the immediate usage of personal data a1 for purpose p1 can be written as c1 = data a1 p1 . This describes the rights and obligations of one contract party and let us assume that this is the institution that receives the data. The contract c2 written by the individual who discloses the data is the ‘negation’ of c1 ,2 thus, we can write it as a function application with another primitive, give, c2 = give c1 . We define give such that c2 itself is a contract as well as c1 , thus, we can understand give as a primitive that transforms one contract to another one. The other primitives of POL are syntactically equivalent to the language defined by Peyton Jones and Eber, i. e., – (c3 = c1 ‘and‘ c2 ) and (c4 = c1 ‘or‘ c2 )3 each transforms two contracts into a single one, and requires that both contracts are executed and or requires that, no matter which of the two contracts will be executed, the other one may not be executed, 1 2 3
Like Peyton Jones and Eber, we use Haskell syntax [8] to specify our POL. This allows to translate the ideas of this paper directly into program code. ‘Negation’ here means the exchange of obligation and rights, i. e., the rights of the one contract party become the obligations of the other party and vice versa. The backticks around the functions and and or are Haskell syntax and let us use and and or in infix notation.
30
S. Berthold
– c5 = if b c1 is equivalent to c1 if the condition b is satisfied at the time of evaluation and equivalent to zero otherwise (the primitive zero is introduced in c10 ), – c6 = ifnot b c1 is equivalent to c1 if the condition b is not satisfied at the time of evaluation and equivalent to zero otherwise, – c7 = when b c1 postpones c1 to the first time when the condition b becomes true, – c8 = anytime b c1 gives the holder of the contract the right to acquire c1 once (but not necessarily the first time) when the condition b is satisfied, – c9 = until b c1 gives the holder the right to acquire c1 once (but not necessarily the first time) until the condition b is satisfied the first time, and – c10 = zero is a contract without rights and obligations. All transformed contracts ci with i = 2, . . . , 9 are contracts like c1 and c10 and can thus be part of new transformations. A simple Privacy Option in which an institution acquires the right (but not the obligation) to use the data a for purpose p at time t can thus be stated in POL as c11 = when (at t) (data a p ‘or‘ zero).4 zero :: Contract
data :: Contract
PersonalData and :: Contract
Contract
Contract
ifnot :: Contract
Obs Bool
Contract
Purpose
or :: Contract
Contract
Contract
when :: Contract
Obs Bool
Contract
give :: Contract
Contract if :: Contract
Obs Bool
at t
or
c1 = data a1 p1
zero = c10
Contract
anytime :: Contract
Obs Bool
c11 = when
Contract
until :: Contract
Obs Bool
Contract
Fig. 1. POL language primitives in type graph notation (left) and the contract c11 as function evaluation tree (right). The inductive definition of POL contracts is a major strength of the language.
A strict time constraint as in c11 is relevant when an institution that obtained data from an individual cannot use the data without interacting with the individual. Consider, for instance, a scenario where a shop can only use the purchase history of a customer when the customer returns to the shop [9,10]. The right to choose the zero contract gives the shop the option not to use the data, e. g., if the costs connected to the usage are higher than the expected benefits. Other contract combinations are conceivable. For instance, c12 4
We assume that the function at transforms a time t into a condition suitable for the evaluation in contracts.
Towards a Formal Language for Privacy Options
31
= until (at t) (data a p ‘or‘ zero) models the obligation to delete data after a deadline t.5
3
POL Contract Management
Peyton Jones and Eber demonstrate how various semantics can be defined upon their language syntax. They elaborate on a valuation process E, a denotational semantics which assigns a value to each contract. Valuation is better known as privacy measurement in the privacy domain. For many of the language primitives in POL, defining the valuation process would be a straight application of [7] and [4]. Due to that and space constraints, we focus here on an operational semantics for managing Privacy Options specified in POL. In this context, management means what is usually referred to as back-office management of contracts, i. e., timely execution, simplification, and possibly even deletion of outdated contracts. Definition 1 (Contract Management). A contract management function is a (total) function from time to a sequence of I/O operations and a possibly simplified contract, thus satisfying the (informal) type definition ACT ION = Date → IO Contract .
(1)
Note that in the type of contract management functions, IO Contract, is a monad [11] which allows returning a sequence of I/O operations as well as a contract as function results. The management semantics defined in Figure 2 describes which operations take place depending on a given contract in POL and relies on the definition of a couple of auxiliary functions: – return : Contract → ACT ION The function return(c) propagates c without any further actions as a (possibly simplified) replacement of the original contract. – use : PersonalData × Purpose → ACT ION The function use(a, p) lets the data controller immediately use the data item a for purpose p and returns zero, i. e., nullifies the original contract. – send : Contract → ACT ION The function send(c) transmits attribute values contained in c and stores the contract c (on the individual’s side) for later inspection, e. g., recording a Data Track as in PrimeLife. – : ACT ION × ACT ION → ACT ION The function a1 a2 executes the actions a1 and a2 in parallel. After fetching the respective return values c1 and c2 , the contract (c1 ‘and‘ c2 ) will be returned. 5
Here we assume that not using data after the deadline and deleting the data when the deadline occurs has the same consequences.
32
S. Berthold
OE · : Contract → ACT ION OE zero = return(zero)
(O1)
OE data a p = use(a, p)
(O2)
OE give c = send(c)
(O3)
OE c1 ‘and‘ c2 = OE c1 OE c2 OE c1 ‘or‘ c2 = greedyE (OE c2 , OE c1 )
(O4) (O5)
OE if o c = ifthenelse(Vo, OE c, return(zero))
(O6)
OE ifnot o c = ifthenelse(Vo, return(zero), OE c)
(O7)
OE when o c = whenE (Vo, OE c)
(O8)
OE anytime o c = stoppingE (Vo, OE c)
(O9)
OE until o c =
absorbE (Vo, OE c)
(O10)
Fig. 2. Management of Privacy Options as operational semantics of POL
– greedyE : ACT ION × ACT ION → ACT ION The function greedyE (a1 , a2 ) executes either a1 or a2 depending on which action is most beneficial. The (prospective) benefit is determined by E. – ifthenelse : PR Bool × ACT ION × ACT ION → ACT ION The function ifthenelse(o, a1 , a2 ) executes a1 if the observable o is true and a2 otherwise. – whenE : PR Bool × ACT ION → ACT ION The function whenE (o, a) executes a if the observable o is true, or returns the original contract otherwise. – stoppingE : PR Bool × ACT ION → ACT ION The function stoppingE (o, a) solves the optimal stopping problem with regard to o, executes the action a if the optimal stopping point is reached, or returns the original contract otherwise. – absorbE : PR Bool × ACT ION → ACT ION The function absorbE (o, a) returns zero if the condition o is true, else it executes a if the optimal stopping point is reached, or otherwise returns the original contract.
4
Converting POL Contracts to PPL Sticky Policies
POL and PPL are both expressive languages, but of different nature, i. e., PPL is an access control language, whereas POL is a contract language, and none of
them subsumes the semantics of the other one to its full extent. We therefore refrain from translating the full-featured POL to PPL, but instead define a subset of our language, POL− , and provide a mapping from this subset to PPL. PPL Sticky Policies are an extension of XACML 3.0 [12], a powerful XML dialect for defining access control policies. While the complete XML Schema for
Towards a Formal Language for Privacy Options
PolicyCombiningAlg
xacml:Target
0..* xacml:Resource
1 1
0..* PolicySet
0..* Policy
0..1 0..1
AuthorizationSet
0..1
AuthzUseForPurpose
Purpose 1..* 1
StickyPolicy
0..1
ObligationsSet
1..*
33
TriggerSet
1
Obligation
1
Action
Validity
Fig. 3. Model of the Sticky Policy and the obligation language as defined in PrimeLife [6, Figure 12]. Only elements being relevant for the conversion from POL to PPL are displayed. PPL is yet not settled, the general language model is specified in [6]. Figure 3 displays the PPL elements which are relevant in this section. In POL, we know three primitives that change the availability of data over time. PPL can simulate one of them, until, i. e., it is possible to delete data
by means of triggering an obligation after a certain time. The other two timedependent POL primitives, when and anytime, require to make previously unaccessible data available. In the current version of PPL [6], this can neither be achieved by triggering (a sequence of) obligations nor by defining Sticky Policies. The POL primitives when and anytime will therefore not be part of POL− . Three further assumptions are made for POL− and its translation to PPL: A1. The conditions of the POL− primitives until, if, and ifnot are limited to points in time and can therefore be used as triggers of PPL Obligations (Trigger at Time [6]). A2. We define a new Action, immediate usage, for PPL Obligations which causes the data to be used instantly. A3. PPL Actions triggered by Obligations are always permitted. Let us start with three simple examples: – The POL contract c10 = zero is a contract without rights or obligations. It therefore translates to an empty PolicySet in PPL. – The POL contract c1 = data a1 p1 requires the immediate use of the data item a1 for purpose p1 . After that usage, the data may not be used again. After the translation, the purpose p1 will be enforced by a StickyPolicy (nested in a Policy within a PolicySet). The immediate usage of the data can only be enforced by an Obligation attached to the StickyPolicy. The Obligation will be triggered when data is stored 6 and cause the Action immediate usage. Another Obligation will make sure that the data is used only once. It will be triggered on data usage 7 and will cause the data to be deleted instantly. 6 7
This trigger is can be realised by Trigger at Time or Trigger Personal Data Sent. [6] In [6], this trigger is called Trigger Personal Data Accessed for Purpose.
34
S. Berthold
– For translating a complex POL contract, e. g., c13 = data a1 p1 ‘and‘ data a2 p2 , we first translate the data statements to PolicySets, wrap them in a new PolicySet and add two Obligations, both triggering on data usage of either a1 or a2 .8 The idea is to cause the Action immediate usage of one data item as soon as the other one is used. While the first two examples already provide general translation rules, the third is a special case of and in which the left and the right hand side are data statements. A general translation rule can, however, easily be derived by replacing the two Obligations in our example, which were tailored to the data primitive, with obligations that trigger other primitives accordingly. The key idea outlined in these three examples is that every primitive of POL− can be translated to a PolicySet in PPL which may contain nested PolicySets. A translation of POL− to PPL will therefore form a tree of PolicySets which reflects the POL language structure as outlined in Figure 1, with one exception: the primitive give simply vanishes in the translation to PPL, since policies in PPL have the same appearance, no matter whether stored on the data subject’s side or on the institution’s side. The translation of the remaining POL− primitives is straight-forward. Just like and, c4 = c1 ‘or‘ c2 can be translated by first translating c1 and c2 to PolicySets, placing them into a new PolicySet, and complementing them with an Obligation which in case of or will delete the data in c1 and c2 as soon as an on data usage event occurs with regard to data of either c1 or c2 . The translation rule for a contract c9 = until b c1 similarly first translates c1 to a PolicySet, wraps it in a new PolicySet, and complements it with an Obligation which triggers when the condition b becomes true. Due to our assumption A1, this can be realised by the PPL Trigger at Time. As soon as the Obligation is triggered, it will delete the data. Finally, translating c5 = if b c1 or c6 = ifnot b c1 respectively requires to translate c1 to its corresponding PolicySet, placing it into a new one, and complementing it with Obligations such that the data items in c1 are deleted if b is false (if) or true (ifnot), respectively. We used data deletion in order to enforce the semantics of POL primitives in their PPL translation. This works well as long as the data items are not equal in different branches of the PolicySet tree. If this is not the case, PPL needs a method to (de)activate Policies depending on a (global) state. Note that if we use anonymous credentials [13,14] for representing data item, we do not need any of these assumptions. Then even credentials representing the same data would not be linkable and deletion would only affect one occurrence of the data item.
5
Canonical Form Contracts in POL
POL is not only an effort in language design, but primarily one in the privacy
research domain, and it is important that we do not create new problems in one 8
To be precise, each of these Obligations is embedded in a StickyPolicy which needs to be targeted to data a1 and purpose p1 or a2 and p2 , respectively.
Towards a Formal Language for Privacy Options
35
domain when solving problems in the other. A problem that therefore deserves our attention is that of (unintentionally) creating covert channels by using freedom in the POL syntax for expressing semantically equivalent contracts. Such a covert channel may be used to transport any information, but even if not used intentionally it will most likely convey information about the originator of a contract. In this section, we show that POL has freedom in its syntax and how to eliminate it. We show that each POL contract can be transformed to a canonical form by using the abstract rewriting system we define. An example illustrates the syntactical freedom which exists in POL. The contracts on the left and right sides of the Equations (2) and (3) are semantically equivalent (we use the symbol ≡POL ), however the syntax is different, give zero ≡POL zero , give (give (data a p)) ≡POL data a p .
(2) (3)
The degree of freedom increases when contracts become more advanced. We see that transformations to equivalent contracts follow generic transformation rules, e. g., (if o1 (ifnot o1 (data a1 p1 ))) ‘or‘ (until o2 (when o2 (data a2 p2 ))) R20 ≡POL
zero ‘or‘ (until o2 (when o2 (data a2 p2 )))
R45 ≡POL
zero ‘or‘ zero
R7 ≡POL
zero
R1 ≡POL
give zero
R3 ≡POL
(give zero) ‘and‘ zero
R1 ≡POL
(give zero) ‘and‘ (give zero)
R5 ≡POL
give (zero ‘and‘ zero)
R37 ≡POL
(4)
give (zero ‘and‘ (anytime o3 (ifnot o3 (data a3 p3 )))) .
In fact, the contract c10 = zero has an infinite number of incarnations, thus giving each contract originator the option to use a unique version of c10 . For our canonical form it would be best, if all incarnations of c10 could be reduced to the simplest version, zero. A standard method for reducing a language term to its canonical form is applying an abstract rewriting system (ARS). An ARS consists of a number of reduction rules which can be applied in any order to a given language term. Terms that cannot be reduced any further are in canonical form. For POL, a suitable ARS is defined by the reduction rules in Figure 4. In Equation (4), we use the same set of rules to transform the initial contract to its equivalents. Two properties of ARS’ are interesting for our canonical form of POL and we will look at them in the rest of this section:
36
S. Berthold
give zero → zero
(R1)
give (give c) → c
(R2)
c ‘and‘ zero → c
(R3)
zero ‘and‘ c → c
(R4)
(give c1 ) ‘and‘ (give c2 ) → give (c1 ‘and‘ c2 )
(R5)
(if o c) ‘and‘ (ifnot o c) → c
(R6)
c ‘or‘ c → c
(R7)
(give c1 ) ‘or‘ (give c2 ) → give (c1 ‘or‘ c2 ) (c1 ‘and‘ c2 ) ‘or‘ (c1 ‘and‘ c3 ) → c1 ‘and‘ (c2 ‘or‘ c3 )
(R8) (R9)
(c1 ‘and‘ c2 ) ‘or‘ (c3 ‘and‘ c1 ) → c1 ‘and‘ (c2 ‘or‘ c3 )
(R10)
(c1 ‘and‘ c2 ) ‘or‘ (c3 ‘and‘ c2 ) → (c1 ‘or‘ c3 ) ‘and‘ c2
(R11)
(c1 ‘and‘ c2 ) ‘or‘ (c2 ‘and‘ c3 ) → (c1 ‘or‘ c3 ) ‘and‘ c2
(R12)
if o (give c) → give (if o c)
(R13)
ifnot o (give c) → give (ifnot o c)
(R14)
if o (c1 ‘and‘ c2 ) → (if o c1 ) ‘and‘ (if o c2 ) ifnot o (c1 ‘and‘ c2 ) → (ifnot o c1 ) ‘and‘ (ifnot o c2 ) if o (c1 ‘or‘ c2 ) → (if o c1 ) ‘or‘ (if o c2 ) ifnot o (c1 ‘or‘ c2 ) → (ifnot o c1 ) ‘or‘ (ifnot o c2 ) if o (if o c) → if o c
(R15) (R16) (R17) (R18) (R19)
if o (ifnot o c) → zero
(R20)
ifnot o (if o c) → zero
(R21)
ifnot o (ifnot o c) → ifnot o c
(R22)
if o (when o c) → if o c
(R23)
if o (until o c) → zero
(R24)
ifnot o (until o c) → until o c
(R25)
when o zero → zero
(R26)
when o (give c) → give (when o c) when o (c1 ‘and‘ c2 ) → (when o c1 ) ‘and‘ (when o c2 ) when o (if o c) → when o c
(R27) (R28) (R29)
when o (ifnot o c) → zero
(R30)
when o (when o c) → when o c
(R31)
when o (anytime o c) → when o c
(R32)
when o (until o c) → zero anytime o zero → zero
(R34)
anytime o (give c) → give (anytime o c) (R35) anytime o (if o c) → anytime o c anytime o (ifnot o c) → zero anytime o (when o c) → anytime o c anytime o (anytime o c) → anytime o c anytime o (until o c) → zero
(R36) (R37)
(R33) until o zero → zero
(R41)
until o (give c) → give (until o c) (R42) until o (if o c) → zero until o (ifnot o c) → until o c
(R43) (R44)
until o (when o c) → zero
(R45)
(R39) until o (anytime o c) → zero
(R46)
(R38) (R40)
until o (until o c) → until o c
(R47)
Fig. 4. Abstract term rewriting system for POL. The application of the rewriting rules terminates in a canonical form of the language.
Towards a Formal Language for Privacy Options
37
– Termination: Given a POL contract, all applicable reduction rule sequences are finite, i. e., finish after a finite number of reduction steps with a canonical form contract which cannot be reduced any further. – Confluence: Given a POL contract, all applicable reduction rule sequences produce the same canonical form contract. It is easy to see that for termination, an ARS may not contain reduction rules supporting commutativity. Let us assume that we have a rule c1 ‘and‘ c2 → c2 ‘and‘ c1 and c matches the rule’s left hand side. Then, c can be reduced to the right hand side c , thus c → c , but then we see that c matches the left hand side of the same rule again and we can create an infinite sequence of reductions c → c → c → c → · · · . This would violate the termination property. The non-commutativity of and and or, however, may interfere with the application of the Rules (R6) and (R7). We learn, for instance, from Rule (R7) that a contract (c1 ‘or‘ c1 ‘or‘ c2 ) can be reduced to (c1 ‘or‘ c2 ), whereas the contract (c1 ‘or‘ c2 ‘or‘ c1 ) is not reducible in the same way, though being equivalent except for the commutation of the contracts c1 and c2 . This dilemma can be solved by assuming an appropriate order of contracts in the function arguments of and and or such that Rule (R6) and (R7) matches if matching is possible with any permutation of the function arguments. Termination can be proved by showing that a well-founded order POL exists such that POL r holds for all reductions where denotes the left hand side of the rule and r denotes the right hand side. For POL, this order relation can be defined by means of the order ≥ on the natural numbers, c POL c
⇐⇒
m(c) ≥ m(c )
(∀c, c ∈ POL) ,
(5)
with the function m : POL → N defined as in Figure 5. It is easy to see that for any contract c it holds that m(c) > 1 and for any instantiation of a reduction rule in Figure 4 it holds that m() > m(r) with being the left hand side of the reduction and r being the right hand side. Thus a reduction sequence for a POL ⎧ zero ⎪ ⎪ ⎪ ⎪ ⎪ data ap ⎪ ⎪ ⎪ ⎪ ⎪ give c ⎪ ⎪ ⎪ ⎪ ⎪ c ‘and‘ c ⎪ ⎪ ⎪ ⎨c ‘or‘ c m: ⎪ if o c ⎪ ⎪ ⎪ ⎪ ⎪ ifnot o c ⎪ ⎪ ⎪ ⎪ ⎪when o c ⎪ ⎪ ⎪ ⎪ ⎪ anytime o c ⎪ ⎪ ⎩ until o c
→ 2 → 2 → m(c) + 1 → m(c) · m(c ) → m(c) · m(c ) 3 → m(c) 3 → m(c) 3 → m(c) 3 → m(c) 3 → m(c)
.
Fig. 5. The ‘mass’ function m : POL → N. It holds that for any reduction c1 → c2 with regard to a rule in Figure 4, m(c1 ) > m(c2 ) (c1 , c2 ∈ POL).
38
S. Berthold
contract c finishes with a canonical form at latest after m(c) − 2 reduction steps which proves that our ARS has the termination property. Confluence of a terminating ARS can be proved by showing local confluence for all contracts, i. e., showing that all two different (one-step) reductions of one contract always terminate in the same canonical form. We will not explicitly prove confluence in this paper and postpone this task to future work. In careful checks, however, we were not able to produce a contract which violates the conditions for confluence.
6
Extensions
There are two possible ways of extending the language: the first option is adding new primitives to the language and defining their semantics. While this is an obvious way of extending a language, it should be reserved to experts only. Adding primitives represents the risk of creating redundancy in the semantics which can lead to ambiguous contracts. The other way of extending the language is to define combinators. A combinator creates a contract by applying primitives rather than defining new primitives. In particular, if POL is implemented as embedded domain-specific language within a more expressive host language, creating a combinator library becomes easy even for end-users. Consider, for instance, a combinator that allows to execute a given contract twice. We can define this combinator as follows: twice f c = (f c) ‘and‘ (f c) .
(C1)
This definition allows us to note a new contract, c14 = until (at t) ‘twice‘ c1 , which appears like native English and provides intuitive semantics without further definitions. Assuming the full expressiveness of a multi-purpose language, here Haskell, we can even generalise twice to times, times n c = foldr1 (and) (replicate n c) ,
(C2)
which immediately allows us to note c15 = 5 ‘times‘ c14 as a new contract and again with intuitive meaning. Another easy extension is the cond combinator which was a primitive in an earlier version of POL. It combines the functionality of the new primitives if and ifnot, but was discarded as a first class citizen of the language in order to keep the abstract rewriting system (Figure 4) small, cond o c1 c2 = (if o c1 ) ‘and‘ (ifnot o c2 ) .
(C3)
While these three combinators make the language handy, they do not reflect the specific vocabulary used in the privacy research domain. One of these concepts which is easily translated to POL is the definition of a retention period for data usage. We assume that the host language provides a function now which returns the current time and define, retain t c = until (at (now + t)) (c ‘or‘ zero) ,
(C4)
Towards a Formal Language for Privacy Options
39
a contract combinator which allows the holder of the contract to use the data in c (once) within the time frame t, e. g., c16 = retain ”6 months” c1 . Even applications of our language may look like extensions at a first glance. While POL provides, for instance, syntax for rights and obligations of contract partners, we have intentionally excluded the specification of contract partners from the scope of POL. This allowed us to focus on rights and obligations as one subject and postpone the specification of contract partners as a subject for another language which could wrap round POL. Approaching the specification of contract partners in an independent language is particularly interesting when accounting for complex contract partner relations, e. g., when two parties write a contract about the data of a third party.
7
Conclusions
We have specified a formal language POL for Privacy Options and added a semantics for managing Privacy Options. The management semantics allows to execute and simplify contracts. A simple modification of this semantics would switch off the simplification and retain the original contracts, e. g., for PrimeLife’s Data Track. We compare POL with PPL, a language with similar purpose, by elaborating on translation rules from POL to PPL. The translation shows that POL has its strengths in defining contracts that evolve over time. While in PPL data, once it became inaccessible, stays inaccessible forever (data deletion), we can define contracts in POL which flexibly allow data usage depending on time or events. POL is, however, not meant to be a drop-in replacement for PPL and rather focuses on a fraction of the functionality provided by PPL, i. e., Sticky Policies. In contrast to most other approaches in the privacy policy domain, POL contracts can be transformed to a canonical form. This allows to eliminate information about the contract originator hidden in the freedom of the POL syntax. We deem that this is a particularly interesting feature in privacy negotiations when the contract proposals are evaluated by negotiation partners and none of them wants to reveal more information than the terms and conditions under which they could accept a contract. Moreover, we have outlined how POL can be extended. Experts may add new language primitives and benefit from the inductive structure of the language, i. e., in most cases it will be sufficient to define the semantics of the added primitive in order to provide an extended language with all semantics. End-users can extract standard contract patterns and make them available to a larger audience by defining combinators, i. e., functions that combine primitives in a standardised way. Combinators may be defined in a high-level programming language or even in a visual environment, depending on the capabilities of the end-user. Neither the choice of a specific programming language, nor the choice of a suitable visual environment is determined by the work in this paper.
40
S. Berthold
Acknowledgements The author was partially funded by the Research Council of Norway through the PETweb II project. Thanks to Rainer B¨ ohme, Simone Fischer-H¨ ubner, Martin G¨ unther, Stefan Lindskog, Tobias Pulls, and the anonymous reviewers of the PrimeLife Summer School 2010 for useful comments and suggestions.
References 1. Laudon, K.C.: Markets and privacy. Commun. ACM 39(9), 92–104 (1996) 2. Taylor, C.R.: Consumer privacy and the market for customer information. The RAND Journal of Economics 35(4), 631–650 (2004) 3. Acquisti, A.: Protecting privacy with economics: Economic incentives for preventive technologies in ubiquitous computing environments. In: Workshop on Sociallyinformed Design of Privacy-enhancing Solutions in Ubiquitous Computing, pp. 1–7 (2002) 4. Berthold, S., B¨ ohme, R.: Valuating privacy with option pricing theory. In: Workshop on the Economics of Information Security (WEIS). University College London, UK (2009) 5. Hann, I.H., Hui, K.L., Lee, T.S., Png, I.P.L.: Online information privacy: Measuring the cost-benefit trade-off. In: Proceedings of the 23rd International Conference on Information Systems, ICIS 2002 (2002) 6. Raggett, D.: Draft 2nd design for policy languages and protocols. Technical Report H 5.3.2, PrimeLife project (2009) 7. Peyton Jones, S., Eber, J.M.: How to write a financial contract. In: Gibbons, J., de Moor, O. (eds.) The Fun of Programming. Palgrave Macmillan, Oxford (2003) 8. Peyton Jones, S. (ed.): Haskell 98 Language and Libraries – The Revised Report. Cambridge University Press, Cambridge (2003) 9. Acquisti, A., Varian, H.R.: Conditioning prices on purchase history. Marketing Science 24(3), 1–15 (2005) 10. B¨ ohme, R., Koble, S.: Pricing strategies in electronic marketplaces with privacyenhancing technologies. Wirtschaftsinformatik 49, 16–25 (2007) 11. Wadler, P.: The essence of functional programming. In: Proceedings of the 19th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1992, pp. 1–14. ACM, New York (1992) 12. Rissanen, E.: OASIS extensible access control language (XACML) version 3.0. OASIS working draft 10, OASIS (March 2009) 13. Chaum, D.: Security without identification: Transaction systems to make big brother obsolete. Communications of the ACM 28(10), 1030–1044 (1985) 14. Camenisch, J., Lysyanskaya, A.: An efficient system for non-transferable anonymous credentials with optional anonymity revocation. In: Pfitzmann, B. (ed.) EUROCRYPT 2001. LNCS, vol. 2045, pp. 93–118. Springer, Heidelberg (2001)
Using Game Theory to Analyze Risk to Privacy: An Initial Insight Lisa Rajbhandari and Einar Arthur Snekkenes Norwegian Information Security Lab, Gjøvik University College, Gjøvik, Norway {lisa.rajbhandari,einar.snekkenes}@hig.no
Abstract. Today, with the advancement of information technology, there is a growing risk to privacy as identity information is being used widely. This paper discusses some of the key issues related to the use of game theory in privacy risk analysis. Using game theory, risk analysis can be based on preferences or values of benefit which the subjects can provide rather than subjective probability. In addition, it can also be used in settings where no actuarial data is available. This may increase the quality and appropriateness of the overall risk analysis process. A simple privacy scenario between a user and an online bookstore is presented to provide an initial understanding of the concept. Keywords: game theory, privacy, risk analysis.
1
Introduction
Every individual has a right to the privacy of their personal information. People are dependent on information technology in their daily lives, which includes the risk of their personal information being misused, stolen or lost. The personal information of an individual might be collected and stored by government agencies, businesses and other individuals. These organizations and individuals might have the incentive to misuse such gained information, at least from the perspective of the individual. In [1], Anderson stated that individuals produce as well as consume commodity information. There are growing problems of identity theft, tracking of an individual, personal information being used as a commodity and so on. Thus, there is a necessity to protect the privacy of information, perform risk analysis and evaluation for proper protection of the entire identity management infrastructure. According to the guidelines of ISO/IEC 27005, for information security risk assessment, risk identification, estimation and evaluation are necessary tasks [2]. In this paper, we suggest that instead of a classical risk analysis like Probabilistic Risk Analysis (PRA), we use a game theory based approach. The distinction between using PRA and Game theory for general risk analysis are shown in Table 1. In PRA, the risk level is estimated by studying the likelihood and consequences of an event and assigning the probabilities in a quantitative or qualitative scale. Moreover, it can be considered as a single person game because the S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 41–51, 2011. c IFIP International Federation for Information Processing 2011
42
L. Rajbhandari and E.A. Snekkenes
Table 1. Comparison of general Risk Analysis steps: Using PRA and Game theory Classical Risk Analysis Risk Analysis PRA Collect data Ask for subjective probability or historical data Compute risk Compute risk (e.g. expected value) Evaluate
Decide what to do
Our proposal Game theory Ask for preferences or benefits Compute probability and expected outcome (e.g. mixed strategy Nash equilibrium) Decide what to do
strategies followed by the opposing player or adversary are not considered. In [3], Bier has stated the challenges to PRA as subjective judgment and human error and performance. With game theory we can consider settings where no actuarial data is available. Moreover, we do not have to rely on subjective probabilities. By obtaining the preferences or benefits from the subjects, we can compute the probabilities and outcomes to determine the risk. We propose that it can be used in studying and evaluating the behavior of the players in privacy scenarios. It also allows for better audit as the outcomes can be verified at each incident. In this paper, we will focus on two important issues - ‘suitability of game theory for privacy risk analyses’ and ‘how the payoffs of the players are calculated’.
2
Overview of Game Theory
Game theory is a branch of applied mathematics proposed by John von Neumann and Oskar Morgenstern in paper [4]. It has been used in many fields [5] like economics, political science, evolutionary biology, information security and artificial intelligence. It is the study of the strategic interactions among rational players and their behavior [6], [7], which can be modeled in the form of a game. For a game, the required four components are: the players, their strategies, payoffs and the information they have [8]. The players are the ones whose actions affect each other’s payoffs. Whereas, a strategy is a plan of action that the player can take in response to the opponent’s move. It is impossible to act against all the defensive attacks at all times [9]. Thus, it is important to find out the preferred strategies of the players. The payoff of a particular player is affected by both the actions taken by him and the other player. The thing that matters is that the value of the payoff should be consistent throughout the game. According to Auda, besides ordering the preferences of the payoffs, the players can ‘also express the ratio of the differences of the preferences’ on an interval scale called utility [10]. Players make decisions based on the gained information. According to the information they have, the game can be categorized into a perfect/imperfect game and a complete/incomplete game. A complete information game is one in which the players know about the strategies and payoffs of one another (vice versa for the incomplete information game). While, a game where at least a player has no
Using Game Theory to Analyze Risk to Privacy: An Initial Insight
43
knowledge about the previous actions, as a minimum of one other player is called the imperfect information game (vice versa for the perfect information game). After determining the components, the game can be represented in the normal (strategic or matrix) form or extensive (tree) form. The normal form is usually used to represent static situations in which the players choose their actions simultaneously and independently. On the other hand, the extensive form is usually used to represent dynamic situations in which the players have some information about the choices of other players. In a game, each player chooses among a set of strategies to maximize their utilities or payoffs based on the information they have. The choice of strategies can be determined by equilibrium solutions such as the pure strategy Nash equilibrium [11] and the mixed strategy Nash equilibrium, both named after John Forbes Nash. A strategy profile is a Nash equilibrium if each player’s chosen strategy is the best response to the strategies of the other [6] and the players have no incentive to unilaterally deviate. A mixed strategy is a probability distribution (mixing or randomization) of the players’ pure strategies so that it makes the other player indifferent between their pure strategies [6], [12]. The equilibrium gives the outcome of the game [8].
3
Why Game Theory?
Today, whenever we have to provide our personal information for instance, while purchasing a ticket online, most of us wonder and are concerned about our information being collected. Some questions that usually pop into our minds are - ‘Is our information being stored and if so, to what extent? Who gets access to the stored information? Are all the insiders having access to the stored information ‘good’ ?’ If we ask people how often they face risks by providing their personal information, like a credit card number, the answer would be in terms of probability which would be rather vague. However, if we ask them how much they would benefit by providing it, we can have an appropriate answer, for example, in terms of monetary values or time. Thus, with game theory, we can ask expressive questions that the people can answer. Based on these data, risk analysis can be carried out. In addition, we can perform risk analysis more accurately if we place the situation in the form of a ‘game’. If we consider a game of poker, the players are rational. They not only think about their action but also what the other players will do in return to their own particular move. Kardes and Hall state that Probabilistic Risk Analysis (PRA) does not consider the strategies of the adversary and thus, suggest using the game theoretic approach [13]. In the real world, we have to plan our moves considering the moves of the others, especially if the opponent is an adversary. By using game theory, we can find out how the players choose their strategies in different situations of interdependence. For instance, let us consider zero-sum and cooperative games. In a zero-sum game, gain to one player is a loss to another. Thus, each player takes individual actions
44
L. Rajbhandari and E.A. Snekkenes
to maximize their own outcome. In cooperative games, players cooperate or negotiate their actions for the benefit of each other. Moreover, the adversary usually does not give up when his attempts have been defended; he rather uses different strategies. In a game theoretic setting, the benefits are based on outcomes and the incentives of the players are taken into account. Thus, game theory helps to explore the behavior of real world adversaries [9].
4
Scenario and Game Formulation
In this section, we will look at the scenario between a user and an online bookstore and the steps used to formulate the scenario in a game theoretic setting. 4.1
Scenario
The user subscribes to a service from an online bookstore. The online bookstore collects and stores additional private information such as book and magazine preferences. These preferences can then be used according to the privacy policy of the online bookstore to provide customized purchase recommendations. When these recommendations are selected, they generate additional sales for the online bookstore. Also, these recommendations are beneficial to the user, as they save him valuable time. However, it is somewhat tempting for the online bookstore to breach the agreed privacy policy by providing these additional preferences of the user to third parties to be utilized for marketing. This third party marketing incurs additional costs for the user, mostly in terms of time wasting activities like advertisements. However, at the initial stage, the online bookstore cannot determine whether the given information is genuine or fake. 4.2
Game Formulation
For formulating the game, we take into account of the following assumptionsit is a complete information game but of imperfect information. The game is of complete information as we assume both the user and the online bookstore know about the strategies and outcomes of one other. It is of imperfect information as we stipulate that they have no information about the previous action taken by the other player when they have to make their decision. Moreover, it is a one shot game between the user and the online bookstore as a single interaction between them is considered and their actions are taken to be simultaneous and independent. We now explain the strategies of the players and how the data are collected to estimate the payoffs. We then represent the game in the normal form. Players. It is a two player game, between the user and the online bookstore. We assume that both the players are intelligent and rational. They have the incentive to win or to optimize their payoffs.
Using Game Theory to Analyze Risk to Privacy: An Initial Insight
45
Strategies. We will use a set of simple strategies for this two player noncooperative game between a user and an online bookstore. The user has the choice to either provide his genuine or fake personal information, knowing the possibility of his information being sold. The strategies of the user are given by {GiveGenuineInfo, GiveFakeInfo}. The online bookstore either exploits the personal information of the user by selling it to third parties or does not exploit and uses it for its own internal purpose, given by {Exploit, NotExploit}. Payoff. For obtaining the payoffs of the players, we collect the data and then estimate it. 1. Data collection: We have assumed the values for the user and the online bookstore as shown in Table 2. However, it can be collected by conducting experiments and surveys. The values are in hours, reflecting saved or lost time. A positive value represents the hours saved while a negative value represents the hours lost. The profit corresponds to the hours of work saved. The variables ‘a’ to ‘h’ are used to represent the cells. The values of the user and the online bookstore are explained below. Table 2. Assumed saved or lost hours for the user and online bookstore For user Information provided by the user The online bookstore usage of information for its internal purpose The online bookstore usage of information by selling it to third parties
For online bookstore
Genuine
Fake
Genuine
Fake
(a) 1
(b) 0.1
(c) 1
(d) -0.01
(e) -1
(f) -0.01
(g) 0.5
(h) -0.2
For the user: If the online bookstore uses the user’s preferences and personal information for its own internal purpose according to the policy, we assume that the user saves an equivalent of an hour, if he had provided genuine personal information, and 0.1 hours if he had given fake information. However, if the online bookstore sells the information to third parties, the user wastes time dealing with the sale attempts. Thus, we assume that the user loses an hour if he had provided genuine personal information, and 0.01 hours if he had provided fake information.
46
L. Rajbhandari and E.A. Snekkenes
For the online bookstore: Similarly, when the bookstore uses the user’s personal information for its own internal purpose according to the policy, we assume it saves an hour if the user had provided genuine information whereas, it loses 0.01 hours dealing with the fake information of the user. However, when it violates the privacy policy and sells the information to third parties, it saves 0.5 hours in case the user had provided genuine information and loses 0.2 hours in case of fake information. We stipulate that it is not possible to assess if private information is fake or not before the information sale is finalized. 2. Estimation: We can represent the game in a two players normal form as shown in Fig. 1. The first value of each cell given by xij is the payoff of the user while the second value given by yij is the payoff of the online bookstore. Here, i = 1 to n, j = 1 to n and n = number of players. Each value of the cell is explained and estimated below, along with stating how much each of the players influences the outcome. The payoffs are in utility, estimated using the assumed values of hours from Table 2. We have to keep in mind that when the online bookstore exploits the information, it uses the information for its own internal purpose as well as to gain profit by selling it to third parties.
User
Online bookstore Exploit
NotExploit
GiveGenuineInfo
x11 , y11
x12 , y12
GiveFakeInfo
x21 , y21
x22 , y22
Fig. 1. Normal form representation of the scenario
The first strategy profile (GiveGenuineInfo, Exploit) states that the user provides the personal information genuinely, while the online bookstore exploits it by selling it to third parties. Now, we will calculate the values of the payoff for this particular strategy profilex11 : Even though the user will benefit from the service, he will have to waste time dealing with advertisements and sale attempts by third parties, incurred from the exploitation by the online bookstore. The user payoff is obtained by adding the cell values of online bookstore usage of information for its internal purpose (a) and usage by selling it to third parties (e). As mentioned earlier, when the bookstore exploits the information, it uses the data for its own purpose and also sells it to the third parties. Thus, the user’s payoff is given by: x11 = a + e = 0 . y11 : The online bookstore will be able to utilize and exploit the user’s personal information both for legitimate and unauthorized usage. However, it
Using Game Theory to Analyze Risk to Privacy: An Initial Insight
47
will not lose time. Thus, y11 is obtained by summing the cell values of online bookstore usage of information for its internal purpose (c) and usage by selling it to third parties (g) i.e. y11 = c + g = 1.5 . The payoffs for the strategy profile (GiveGenuineInfo, NotExploit) is estimated as given belowx12 : As the user provides his genuine information, he will receive a customized service in accordance with the agreed privacy policy and save time by utilizing the recommendations. Thus, x12 equals the cell value ‘a’, which is obtained as the user provides genuine information and the online bookstore uses the information for its internal purpose i.e. x12 = a = 1 . y12 : The online bookstore will be able to utilize personal data to offer an improved service in accordance with the agreed privacy policy and will not lose time. Thus, y12 equals the cell value ‘c’, which is obtained as the user provides genuine information and the online bookstore uses the information for its internal purpose i.e. y12 = c = 1 . The payoffs for the strategy profile (GiveFakeInfo, Exploit) is estimated as given belowx21 : The online bookstore will try to exploit the data, but later on, will discover that the data was incorrect. The user will only have some limited benefits from the service but will not lose time dealing with the sale attempts from third parties. Thus, x21 is obtained by summing the cell values of the online bookstore usage of information for its internal purpose (b) and usage by selling it to third parties (f) i.e. x21 = b + f = 0.09 . y21 : The fake data provided by the user may only be discovered by the online bookstore at a later stage, for example, at the time when the fake information is to be used to generate profit. The online bookstore will receive limited benefits from the interaction and lose time dealing with the fake data. Thus, the online bookstore’s payoff is obtained by summing the cell values of the online bookstore usage of information for its internal purpose (d) and usage by selling it to third parties (h) i.e. y21 = d + h = −0.21 . The payoffs for the strategy profile (GiveFakeInfo, NotExploit) is estimated as given belowx22 : The online bookstore will try to use this fake information given by the user to provide a customized service. However, the user will not receive any benefits and saves less time, as the improved service generated from fake data will be irrelevant. Thus, the user’s payoff equals the cell value ‘b’, which is obtained as the user provides the fake information and the online bookstore uses the information for its internal purpose i.e. x22 = b = 0.1 . y22 : As the user provides fake information, the online bookstore will not be able to provide a customized service, resulting in reduced future sales. Moreover, it will lose time dealing with the fake data. Thus, the online bookstore’s payoff equals the cell value ‘d’, which is obtained as the user
48
L. Rajbhandari and E.A. Snekkenes
provides the fake information and the online bookstore uses the information for its internal purpose i.e. y22 = d = −0.01 . The normal form representation with the estimated payoffs is given in Fig. 2.
User p
Online bookstore q Exploit
GiveGenuineInfo
1-p GiveFakeInfo
1-q NotExploit
0 , 1.5
1,1
0.09 , -0.21
0.1 , -0.01
Fig. 2. Normal form representation of the scenario with estimated payoffs
5 5.1
Game Solution Pure/Mixed Strategy Nash Equilibrium
Using the above payoffs, we found that the game has no pure strategy Nash equilibrium as the players do not agree on a particular strategy profile. However, we can always find the mixed strategy Nash equilibrium. For obtaining the mixed strategy Nash equilibrium, we will use the calculation as explained in [6](p. 123).We assume that the user plays the strategies GiveGenuineInfo and GiveFakeInfo with probabilities p and 1-p respectively, for (0 ≤ p ≤ 1). After the calculation, we get p = 0.29. Thus, the user plays with the mixed strategy (0.29, 0.71). This means that the user provides genuine information with a 0.29 probability and fake information with a 0.71 probability when playing this game. Similarly, assume that the online bookstore plays the strategies Exploit and NotExploit with probabilities q and 1 − q respectively, for (0 ≤ q ≤ 1). After the calculation, we obtain the mixed strategy as (0.91, 0.09) for the strategy profile (Exploit, NotExploit). Hence, with this mixed strategy we can know the probabilities with which each of the players will choose a particular strategy. 5.2
Expected Outcome
We can represent the game in the normal form with the matrix A. Then, aij represents each cell of the matrix. In case of a two player game, the expected outcome of the game using mixed strategy to each player is given by l k i =1 j =1
pi qj aij .
(1)
Using Game Theory to Analyze Risk to Privacy: An Initial Insight
49
where, i- number of strategies of player 1 (user) (1 ≤ i ≤ k), j- number of strategies of player 2 (online bookstore) (1 ≤ j ≤ l), pi - probabilities with which player 1 plays each of his strategies (0 ≤ pi ≤ 1), pi = 1, qj - probabilities with which player 2 plays each of his strategies (0 ≤ qj ≤ 1), qj = 1 . By using (1) and substituting the values of p and q, the expected outcome of the game for the user and the online bookstore is 0.09 and 0.28 respectively. We can conclude that by playing this game, the online bookstore benefits more than the user. The overall values of the expected outcome that the players get by playing each of the strategies can also be estimated which are given in Fig. 3. The benefit to each of the player can be based on these outcomes.
Expected outcome
0.25 q = 0.91
0.03
Sum: 0.28
1-q = 0.09
Online bookstore User 0.03
p = 0.29
GiveGenuineInfo
0.06
1-p = 0.71
GiveFakeInfo
Exploit
NotExploit
0, 1.5
1, 1
0.09, -0.21 0.1, -0.01
Sum: 0.09 Fig. 3. Normal form representation along with the probabilities and expected outcomes
6
Discussion
We formulated the scenario in the form of a strategic game. We used the concept of mixed strategy Nash equilibrium to compute the probabilities with which the players play each of their strategies, the expected outcome the players gain by playing each of the strategies and also the expected outcome of the game for each player. Risk analysis can then be based on these computed probabilities and outcomes. However, the following two issues needs to be consideredThe first issue is the preference of the players. It is important to understand the uncertainties related to the preferences of the players in any game. The players might think differently, which may lead them to choosing a different strategy than the equilibrium. Some of the questions that need to be taken into account are -
50
L. Rajbhandari and E.A. Snekkenes
1. Does the user know what the online bookstore prefers and how he orders the preferences and vice versa? 2. What are the consequences if the two players play ‘different games’ i.e. it differs in the perception of outcome? The second is obtaining appropriate data by conducting experiments, interviews and surveys. In addition, this scenario in a real world situation is usually of partial information. The user knows the exact value of his own ‘saved/lost’ time. The online bookstore knows the distribution of saved/lost time of the user from the population of all users. However, the online bookstore cannot guess the exact value because, at a given instant, it does not know with which user it is playing the game while the saved/lost time of the online bookstore is known by all users.
7
Conclusion
We can conclude that, with game theory, risk analysis can be based on the computed expected outcomes and probabilities rather than relying on subjective probability. For demonstrating this, we have considered a simple scenario between the online bookstore and its user. Moreover, we have explained how the data can be collected for estimating the payoffs. The present study provides a starting point for further research. We will conduct a survey for gathering data as the next step. Further, the main objective of the research will be to incorporate the use of game theory in real world privacy scenarios besides the theoretical details. Acknowledgment. The work reported in this paper is part of the PETweb II project sponsored by The Research Council of Norway under grant 193030/S10.
References [1] Anderson, H.: The privacy gambit: Toward a game theoretic approach to international data protection. Vanderbilt Journal of Entertainment and Technology Law 9(1) (2006) [2] ISO/IEC 27005: Information technology -security techniques -information security risk management (2008) [3] Bier, V.: Challenges to the acceptance of probabilistic risk analysis. Risk Analysis 19, 703–710 (1999) [4] von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton University Press, Princeton (1944) [5] Shoham, Y.: Computer science and game theory. Commun. ACM 51, 74–79 (2008) [6] Watson, J.: Strategy: An Introduction to Game Theory, 2nd edn. W. W. Norton & Company, New York (2008) [7] Ross, D.: Game Theory. The Stanford Encyclopedia of Philosophy. In: Zalta, E.N. (ed.) (2010), http://plato.stanford.edu/archives/fall2010/entries/ game-theory/
Using Game Theory to Analyze Risk to Privacy: An Initial Insight
51
[8] Rasmusen, E.: Games and Information: An Introduction to Game Theory, 4th edn. Wiley-Blackwell, Chichester (2006); Indiana University [9] Fricker, J.R.: Game theory in an age of terrorism: How can statisticians contribute? Springer, Heidelberg (2006) [10] Auda, D.: Game theory in strategy development of reliability and risk management. In: Annual Reliability and Maintainability Symposium, RAMS 2007, pp. 467–472 (2007) [11] Nash, J.: Equilibrium points in n-person games. Proceedings of the National Academy of Sciences of the United States of America 36, 48–49 (1950) [12] Fudenberg, D., Tirole, J.: Game theory. MIT Press, Cambridge (1991) [13] Kardes, E., Hall, R.: Survey of literature on strategic decision making in the presence of adversaries. CREATE Report (2005)
A Taxonomy of Privacy and Security Risks Contributing Factors Ebenezer Paintsil and Lothar Fritsch Department of Applied Research in ICT, Norwegian Computing Center, Oslo, Norway {Paintsil,lothar.fritsch}@nr.no
Abstract. Identity management system(s) (IDMS) do rely on tokens in order to function. Tokens can contribute to privacy or security risk in IDMS. Specifically, the characteristics of tokens contribute greatly to security and privacy risks in IDMS. Our understanding of how the characteristics of token contribute to privacy and security risks will help us manage the privacy and security risks in IDMS. In this article, we introduce a taxonomy of privacy and security risks contributing factors to improve our understanding of how tokens affect privacy and security in IDMS. The taxonomy is based on a survey of IDMS articles. We observed that our taxonomy can form the basis for a risk assessment model.
1
Introduction
A token is a technical artifact providing assurance about an identity. Tokens help authenticate and establish the identity of the end-users. They also help the end-user to remember their identifiers and facilitate information flow in identity management system(s) (IDMS). Figure 1 depicts an example of a simple identity management system. The identity provider (IdP) is an organization that collects the personal information of the end-user and creates a digital identity or identities for it. An IdP issues or helps the end-user to choose an identifier representing the digital identity. The end-user can then use the identifier(s) to identify or authenticate herself to a service provider (SP) in order to access an online service or resource. Each SP may require a different identifier or has a peculiar identification or authentication scheme. The number of identifiers may grow depending on the number of SPs the end-user interacts with and the kind of services or resources the end-user subscribes. The growth may reach a point where the end-user could no long remember all the numerous identifiers and their corresponding SPs. To solve this identifier management challenge, we either employ a software agent to store and select the correct identifier for a SP as in [1] or a hardware device such as a smart card to store and facilitate the selection of the correct identifier for a SP. In the physical world, identity tokens consist of identifying information or identifiers stored in a physical device such as credit card, passports, a silicon chip and a magnetic stripe [2]. We also have virtual or software identity tokens S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 52–63, 2011. c IFIP International Federation for Information Processing 2011
A Taxonomy of Privacy and Security Risks Contributing Factors
53
Fig. 1. A Simple Identity Management System, SP1 means service provider 1, SP2 means service provider 2 and IdP means identity provider
such as the Microsoft information card (InfoCard or CardSpace) technology. The InfoCard technology consists of identity metadata stored in a visual icon. The identity metadata point to or associate with a digital identity. The digital identity in this case represents the identity of the end-user. We can also find other kinds of tokens such as user name tokens, binary tokens, nonce, XML based tokens and custom tokens1 such as Keynote [3], [4]. A token can consist of a piece of data, a mechanism, an algorithm, an assertion or a credential. The construction and different uses of tokens contribute to privacy and security risks in IDMS [3]. Camenisch and others have suggested anonymous credential 1
A user name token consists of a user name and optionally, password information for basic authentication. A nonce is a random generated tokens used to protect against replay attack. Binary tokens are non-XML based security tokens represented by binary octet streams. Examples of binary tokens are X.509 certificates, Kerberos tickets and Lightweight Third Party Authentication (LTPA) tokens. We represent XML based tokens by extensibility markup language. Examples of XML based tokens are, Security Assertion Markup Language (SAML), Services Provisioning Markup Language (SPML) and Extensible rights Markup Language (XrML).
54
E. Paintsil and L. Fritsch
systems as a means of enhancing privacy in IDMS. In such systems, different uses of identity tokens by the same user are unlinkable. However, apart from unlinkability, tokens contribute to security and privacy risks in diverse ways. For example, function creep is as a result of using a token for unintended purpose. Anonymous credential systems also lack practical use and may not be compatible with already deployed IDMS [5], [6]. They may rely on a master secret to protect all the tokens, however accidental disclosure of the master secret could lead to identity theft, linkability and eventual privacy or security risk. Thus, the characteristics of tokens can contribute to privacy and security risks even in anonymous credential systems. In this article, we introduce a taxonomy of privacy and security risks contributing factors in order to understand the impact of the characteristics of tokens on privacy and security in IDMS. In addition, we introduce the applications of our taxonomy. We organize this article as follows. In Section 2, we introduce existing works on the effect of tokens on IDMS. Section 3 describes our taxonomy of privacy and security risks contributing factors in detail. We introduce the applications of our taxonomy in section 4. Section 5 states the conclusion and future work.
2
Related Work
There is a large body of literature on identity tokens and how they may contribute to security and privacy risks. However, our work organizes tokens according to their contribution to privacy and security risks. In [7] D.J. Lutz and R. del Campo used a custom identity token to bridge the gap between privacy and security by providing a high level of privacy without anonymity. They designed a scheme to prevent replay attack and ensure that personal data is not sent to a foreign domain. They did not focus on the effect of the characteristics of tokens on privacy and security risks within a domain. Furthermore, tokens are used to facilitate single sign-on authentication in federated IDMS [6]. However, this work focuses on a better way of constructing a token but not on the effect of the characteristics of tokens on privacy and security risks. The identity mix (idemix) scheme proposed in [3], employs anonymous tokens to enhance privacy. The idemix scheme has three main parties. The end-user obtains a pseudonym in a form of an anonymous token from the identity issuer. The verifier verifies the credentials. The end-users can authenticate with a verifier without revealing their pseudonyms. The characteristics of tokens such as token secrecy, can affect this scheme, as one can misuse a pseudonym if the token’s secret is inferred or revealed. Peterson introduces factors for asset value computation and stresses the importance of asset in risk calculation [8]. We can use such factors to quantify the contributions of tokens to privacy and security risks. Peterson derived the asset value of tokens from their loss, misuse, disclosure, disruption, replacement value, or theft. However, this is just an aspect of the risk contributing factors. Solove introduces a taxonomy of privacy [9]. He introduces a high-level privacy risk contributing factors. They include data processing, data collection, data
A Taxonomy of Privacy and Security Risks Contributing Factors
55
transfer and invasion. Nevertheless, the taxonomy is a high-level explanation of privacy principles without paying particular attention to the contributions of tokens to privacy risk. Privacy Impact Assessment (PIA), a framework for assessing the impact of personal data processing on privacy before an information system is implemented, is introduced in [10]. The PIA is a compliance check of personal data processing to privacy laws, policies, or regulations. It specifies the requirement for privacy risk assessment without any explicit assessment technique or method. Our work differs from PIA in our introduction of an explicit privacy and security risks assessment technique. The characteristics of tokens affect privacy and security in IDMS. A token characteristic include but not limited to its usages, how it is built, designed or constructed and how it is chosen. Security protocols are often concerned with how to build or construct a security token that can enhance privacy and security in IDMS [3], [7] , [11], [6]. The emphasis may be placed on formal verification of privacy and security risks caused by how tokens are built, designed or constructed. Moreover, tokens contribute in many different ways to security and privacy risks in IDMS. For instance in analogy to [8] tokens can be a good source of security and privacy risk metrics since they can consist of metadata about identity definitions.
3
The Taxonomy
This section introduces our taxonomy of privacy and security risks contributing factors. We based our factors on a literature survey of scientific articles in IDMS such as [8], [12], [3], [13], [14], [15] and many more. Our taxonomy, for the time being, avoids the Pfitzmann-Hansen terminology [16] with its definitions of e.g. linkability and observability, as the terms therein are defined on the background of anonymous communication and information sparsity. We feel that these terms need to be analyzed from our perspective on handling electronic identifiers and their risks. Our future work aims at aligning or re-defining the Pfitzmann-Hansen terminology to be meaningful in our context. Following this, we depict our taxonomy in Table 1. The taxonomy is non-exhaustive list of the characterization of tokens according to the manner in which they contribute to privacy and security risks. We explain the meaning of each contributing factor as listed in Table 1. Token Mobility: This factor indicates the degree of mobility of a token. The degree of mobility refers to how easy to copy the token or its content, the physical constraints regarding the movement of the token, among others. For example the content of a low cost RFID tag with no additional security could easily be read as compared to the high cost RFID tags that come with additional security. The risks created by various forms of mobility directly relate to identity theft, linkability and the risk impacts. We assess the contributions of token mobility to privacy and security risks according to the following:
56
E. Paintsil and L. Fritsch Table 1. Taxonomy Risk Contributing Factors Token Mobility
Parameters copyable, remotely usable, concurrently usable, immobile Token Value at Risk loss, misuse, disclosure, disruption, theft, replacement value Token Provisioning creation, editing, deletion Token Frequency & Uses per year, life-time, Duration of Use multiple times, one-time Token Use & Purpose original, unintended Token Assignment & Relationship forced, chosen, jointly-established, role, pseudonymity Token Obligation & Policy absence, present, functionality Token Claim Type single, multiple Token Secrecy public, inferable, secret Token Security origination, identification, validation, authentication, authorization
1. Copyable: the token can be copied with limited effort. 2. Remotely usable: the token can be used for remote identity management. 3. Concurrently usable: the token can be used concurrently in many parallel sessions, transactions, or applications. 4. Immobile: the token is not mobile, as it either must be physically present in a form of a user, or even presented to the system that is supposed to accept it. Token Value at Risk: Finding assets and the value of the asset at risk is an important part of risk assessment (see [17]). Similarly, tokens are assets of IDMS and their value at risk can contribute to privacy and security risks. Thus we can quantify the risk of using tokens by assessing the significance of the token or the value of the token to the operation and security of the IDMS and the privacy of the end-users. In [8], six risk factors for IDMS have been found in analogy to classic risk assessment focusing on asset value at risk. We classify token’s value at risk in a similar manner as follows: 1. Loss: value at risk when token is lost. 2. Misuse: value at risk when token is used in wrongful ways. 3. Disclosure: value at risk when token or token-related information gets known to someone else. 4. Disruption: value at risk when token doesn’t function. 5. Theft : value at risk when token gets into someone else’s possession without authorization. 6. Replacement value: cost (effort, resources, time) to replace a token. Token Provisioning: We create a token from personal attributes. The amount of personal information collected during the creation phase could contribute to privacy risk. Data minimality principle prohibits excessive data collection of personal information [18]. In addition, tokens that store excessive personal information may be used for other unintended purposes (function creep) or enable
A Taxonomy of Privacy and Security Risks Contributing Factors
57
profiling and linkability. There should be a means of updating the token in order to ensure data integrity. Further, there should be a means of deleting or destroying the token when the purpose for its creation is longer in the interest of the end-user or when the end-user decides to do so. This would ensure the privacy of the end-user. Therefore, we distinguish the following phases of the token provisioning life cycle: creation, editing and deletion. Each phase can be assumed to have different privacy or security risk impacts. Token Frequency and Duration of Use: The underlining IDMS information flow protocol could determine if multiple use of the same token is susceptible to privacy and security risks [3]. Different uses of even specifically constructed tokens remain linkable if the underlining IDMS information flow protocol is not designed to prevent such privacy risk. Token’s frequency and duration of use are decisive for risks concerning exposure or long-time risk. IDMS that allow a SP and an IdP to share information, a token used often allows for detailed profiling. A token used in association with life-time identification or multiple times causes high risks of secondary use and profiling of the person’s life [19]. We classify a token chosen or assigned for life as lifetime token, a token that can be used multiple times but not for life as a multiple times token and a token that can be used once is a one-time token. Token Use and Purpose: Purpose specification is an important privacy principle. It requires that personal information be collected for a specific purpose and use in an open and transparent manner [20]. Personal information should not be used for purposes other than the original purpose for which it was collected without informed consent. Since tokens may carry personal information or contain identifiers that link to personal information, any form of function creep or misuse could contribute to privacy and security risks. Function creep occurs when a token is used for unintended purpose. For example, in the United States (US), driver licenses have become the de facto general-purpose identity tokens [21], however, in some cases the driver licenses disclose more personal information than actually needed. For example, using a driver license to prove one’s age will result in disclosing additional personal information meant for secondary purpose to the SP [22]. Such function creep and lack of selective disclosure pose privacy and security risks. We classify the purpose of a token as original and unintended. In general the intended purpose of a token is identification, authentication or authorization. The abuse of the purpose of a token or how the purpose of a token is achieved may contribute to privacy and security risks. Token Assignment and Relationship: The need for user-centric IDMS clearly emphasizes the importance of how tokens are assigned or chosen. User-centric IDMS offer the end-users the ability to control their personal information and enforce their consent by determining the choice of their identity tokens [23]. We can determine the amount of personal information to disclose or attach to our identity token with user-centric IDMS (see [24], [25]). The origin of and control over tokens contribute to privacy or security risk. A token can be chosen by a person, jointly established or forced upon by an
58
E. Paintsil and L. Fritsch
authority. They can relate to a role rather than a person, and they could relate to a pseudonym [3]. A role is some sort of authorization, while a pseudonym is a mask with properties and usage patterns that might only occur for a particular purpose. The token assignment and relationship risk-contributing factor assesses the risk impact if a token is forced on an end-user, chosen by an end-user as in the user-centric IDMS, jointly-established with the end-user or chosen according to the end-user’s role. It also assesses the risk impact if a token is chosen under a pseudonym (pseudonymous tokens). Choosing a token under a pseudonym can enhance privacy especially if we cannot profile it various usages [3]. Token Obligation and Policy: We may protect a token with data minimization technique such as idemix [3] or attach a policy expressing the end-user’s consent to the token in an IDMS. For policy-based protection, a major factor for the introduction of risks is the question of privacy policy enforcement, the possibilities for auditing and investigation of suspicious system behavior. Recent work suggests combine services’ privacy policies [12] with an obligation [26]-a policy expressing the data processing consent that was given by a person. In addition, to ensure audit of processing, an audit trail [13] is suggested. The absence or the presence of privacy policy and a particular functionality (or enforcement mechanism) of the policy based concepts express a potential privacy risk. In the absence of policy or for non-policy based IDMS a particular data minimization technique or functionality can also enhance privacy or contribute to privacy risk (see [3]). Token Claim Type: We can enforce the security of identifiers stored in a token or protect the misuse of identifiers by attaching a secret or a claim to the tokens. The type of claims are generally classify as “something we know” (a secret e.g. password), “something we are” ( something we cannot share with others, e.g. finger print, iris etc) and “something we have” (e.g. possessing a smart card, key card or a token etc) [21]. Each claim type is a factor. Token claim type considers the impact of the number of factors used to secure a token in IDMS. Some tokens may require no additional secret in order to protect it content. We refer to such tokens as single claim tokens since the possession of the token itself is a factor. A token that requires a factor such as a personal identification number (PIN) in order to access its content is referred to as a two-factor or multiple claim token. We refer to a token requiring multiple factors such as PIN and finger print as multiple claim tokens or a three-factor token. For example, an X509 token may require the possession of a physical smart card and a PIN [1]. The smart card is inserted into a card reader and a PIN is used to access the certificate’s private key. The possession of the card is a factor representing what the end-user has or possesses and the PIN is a second factor representing what the end-user knows. This is an example of a two-factor authentication token. The number of authentication factors or factors may determine the complexity and security of the token. For example, current master cards may require no
A Taxonomy of Privacy and Security Risks Contributing Factors
59
additional secret PIN when used for online transactions. We regard such token as a single factor authentication token. A single factor authentication token would be less secure and easy to use than a three-factor authentication token such as an X509 token that requires a PIN and a biometric factor. We therefore classify tokens into single claim and multiple claim tokens. Token Secrecy: Tokens have claim type as discussed above. The secrecy or the constraints of authentication factors such as physical presence can contribute to security or privacy risk in IDMS. If a token’s additional authentication factor is public then the claim type of such a token is always single. The secrecy of the additional authentication factor and the constraint attach to the additional authentication factor may determine the claim type. Furthermore, secrecy is essential for maintaining unique mapping between identifiers and persons [8]. We can facilitate identification by how much secrecy a token possesses. A token with additional authentication factor such as “mothers’ maiden name” used in the past for telephone authentication with credit cards [27] is easy to guess or infer. We classify such tokens secrecy as inferable. We consider the secrecy of a token as public if the additional authentication factors or claims are known by a number of people, group and organization or exist in many databases, at the disposal of an unknown number of persons and organizations. We classify the secrecy of a token as “secret” or an additional authentication factor of a token as “secret” if it is private or not shared with others because it is link to a valuable resource such as our bank account (see [3] ). A possible example of private “secret” is a private cryptographic key and a PIN. Token Security: The security of token can contribute to the security of the IDMS. If a token used in an IDMS is a credential then the system should have a mechanism for checking the authority who issued the token. If the token is a mere assertion then the IDMS should provide a different mechanism to ensure the authenticity of the assertion. In order to ensure token security, there should be a means of ensuring the validity, identity and legitimacy of the token. Similar to [21] we describe token security as follows: 1. origination: the token is issued by the indicated legitimate authority 2. Identification: token identifies the subject-the entity possessing the token. 3. Validation: token has not expired, its lifespan is within the validity period or has passed the validity test. 4. Authentication: token belongs to the entity presenting it to the IDMS. 5. Authorization: token grants relevant permissions to the entity possessing it. The IDMS should provide functionality for checking the above security properties of the token.
4
Applications of Taxonomy
We can fairly estimate the privacy and security risks of an IDMS based on the characteristics of token as explained in our taxonomy. This section explains the
60
E. Paintsil and L. Fritsch
possible application of our taxonomy for privacy and security risks assessment. We state the possibilities of conducting privacy and security risks assessment based on the characteristics of tokens used in the IDMS. We follow the following steps: Known Security Risks in IDMS: We examine the IDMS in order to determine all the possible system tokens. We assess how the characteristics of the system tokens may affect the achievement of the information security objectives of the IDMS. Thus, which of the characteristics of tokens listed in table 1 contribute to the breach of information security such as unavailability, loss of confidentiality, repudiation and lack of integrity [28] and in what way do they contribute to security risk. For example we may assess the impact of multi-factor authentication scheme on the security of the IDMS. We may assess if the IDMS employs a two-factor authentication tokens or a three-factor authentication tokens. A two-factor authentication token is simpler but may provide less secure authentication [21]. We also assess how the security of the token contributes to security risk of the IDMS. In other words, we examine if the IDMS has a method for checking the security of the tokens. That is, we assess if the IDMS has a mechanism to check the legitimacy of the token, validity of the token, identity of the holder of the token, authenticity of the token and level of authorization the token has. Finally we assign or compute the risk impact of token security, token secrecy etc of the IDMS. Known Privacy Risks in IDMS: Similar to security risk assessment, we examine the IDMS in order to understand the information flow of the system. We examine how the IDMS tokens are designed and used. We assess if the information flow in the IDMS can contribute to the misuse of personal data, accidental disclosure of personal data etc. We determine if these privacy risks are caused by the construction of the IDMS tokens using the risk contributing factors listed in table 1. We then determine the impact and the possibility of the known risk occurring. For example, how will the token mobility contribute to accidental disclosure of personal data and what will be the impact of accidental disclosure on the IDMS? Stakeholders Risk in IDMS: Usually tokens are created to reflect the needs of the system stakeholders. However, lack of extensive taxonomy of token characteristics to guide this process may contribute to future privacy and security risks of the IDMS. This taxonomy will aid the stakeholders of the IDMS to ask the necessary questions before designing or consenting to use any identity token. Furthermore, the stakeholders can now, before they take an IDMS that was designed for a different purpose into a new application context, thoroughly analyze its properties and the possible risks and consequences. For example, the stakeholders can now assess how an IDMS with high token mobility contribute to function creep. How does function creep affect their interest such as the reputation? Furthermore, how does multiple claim authentication token inconvenience the end-user and how does it protect against identity theft in IDMS? We can
A Taxonomy of Privacy and Security Risks Contributing Factors
61
also examine how token assignment contributes to privacy and security risks from the stakeholders’ point of views. For example, user-centric IDMS may allow the end-users to exercise selective disclosure by choosing tokens on their own. This enhances the privacy of the end-user but the other stakeholders may be deprived of information for auditing and accounting [24], [1].
5
Conclusion
Identity tokens are constructed without extensive assessment of their affect on privacy and security risks in identity management system (IDMS). The understanding of the contribution of tokens to privacy and security risks can aid in managing privacy and security risks in IDMS. This article defines a token and how tokens affect privacy and security in IDMS. We introduced a taxonomy of risk contributing factors for IDMS based on the characteristics of tokens and the application of the taxonomy for privacy and security risks assessment. We explained how the taxonomy contributes to privacy, security, and the stakeholders’ risk in IDMS. Finally, we showed that tokens are rich sources of privacy and security risks metric for IDMS and can serve as a basis for privacy and security risks assessment model. We intend to investigate the development of a privacy and security risks assessment model based on our taxonomy in our future work.
Acknowledgment The work reported in this paper is part of the PETweb II project sponsored by The Research Council of Norway under grant 193030/S10.
References [1] MICROSOFT CORPORATION: The identity metasystem: Towards a privacycompliant solution to the challenges of digital identity. White paper, MICROSOFT CORPORATION (2006) [2] Clarke, R.: A sufficiently rich model of identity, authentication and authorisation (2010), http://www.rogerclarke.com/ID/IdModel-1002.html [3] Camenisch, J., Herreweghen, E.V.: Design and implementation of the idemix anonymous credential system (2002) [4] IBM, C.: Overview of token types. Framework document, IBM (2010), http:// publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com. ibm.websphere.express.doc/info/exp/ae/cwbs_tokentype.html [5] Camenisch, J., Lysyanskaya, A.: An efficient system for non-transferable anonymous credentials with optional anonymity revocation. In: Pfitzmann, B. (ed.) EUROCRYPT 2001. LNCS, vol. 2045, pp. 93–118. Springer, Heidelberg (2001) [6] WP3: D3.1: Structured overview on prototypes and concepts of identity management systems. Deliverable 1.1, Future of Identity in the Information Society (2005)
62
E. Paintsil and L. Fritsch
[7] Lutz, D.J., del Campo, R.: Bridging the gap between privacy and security in multi-domain federations with identity tokens. In: 2006 Third Annual International Conference on Mobile and Ubiquitous Systems, pp. 1–3 (2006) [8] Peterson, G.: Introduction to Identity Management Risk Metrics. IEEE Security & Privacy 4(4), 88–91 (2006) [9] Solove, D.: A taxonomy of privacy - GWU Law School Public Law Research Paper No.129. University of Pennsylvania Law Review 154(3), 477 (2006) [10] Office, I.C.: Privacy impact assessment handbook - version 2. Technical report, ICO, London, UK (2009) [11] Lutz, D.J.: Secure aaa by means of identity tokens in next generation mobile environments. In: ICWMC 2007: Proceedings of the Third International Conference on Wireless and Mobile Communications, p. 57. IEEE Computer Society, Washington, DC (2007) [12] Ardagna, C., Bussard, L., De Capitani di Vimercati, S., Neven, G., Paraboschi, S.: Pedrini: PrimeLife Policy Language. In: W3C Workshop on Access Control Application Scenarios, Luxembourg (2009) [13] Hansen, M.: Concepts of Privacy-Enhancing Identity Management for PrivacyEnhancing Security Technologies. In: Cas, J. (ed.) PRISE Conference Proceesings: Towards Privacy Enhancing Security Technologies - the Next Steps, Wien, pp. 91– 103 (2009) [14] Iwaihara, M., Murakami, K., Ahn, G.-J., Yoshikawa, M.: Risk Evaluation for Personal Identity Management Based on Privacy Attribute Ontology. In: Li, Q., Spaccapietra, S., Yu, E., Oliv´e, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 183–198. Springer, Heidelberg (2008) [15] WP2: D 2.1: Inventory of topics and clusters. Deliverable 2.0, Future of Identity in the Information Society (2005) [16] Pfitzmann, A., Hansen, M.: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management, A Consolidated Proposal for Terminology - v0.29 (2007), http://dud.inf.tu-dresden.de/Anon_Terminology.shtml [17] ISACA: The Risk IT Practitioner Guide. ISACA, 3701 Algonquin Road, Suite 1010 Rolling Meadows, IL 60008 USA (2009) ISBN: 978-1-60420-116-1 [18] Bygrave, L.A.: Data Protection Law Approaching Its Rationale, Logic and Limits. Kluwer Law International, Dordrecht (2002) [19] Fritsch, L.: Profiling and Location-Based Services. In: Hildebrandt, M., Gutwirth, S. (eds.) Profiling the European Citizen - Cross-Disciplinary Perspectives, pp. 147–160. Springer, Netherlands (2008) [20] Hansen, M., Schwartz, A., Cooper, A.: Privacy and identity management. IEEE Security and Privacy 6(2), 38–45 (2008) [21] Mac Gregor, W., Dutcher, W., Khan, J.: An Ontology of Identity Credentials - Part 1: Background and Formulation. Technical report, National Institute of Standard and Technology, Gaitersburg, MD, USA (2006) [22] Camenisch, J., Shelat, A., Sommer, D., Fischer-H¨ ubner, S., Hansen, M., Krasemann, H., Lacoste, G., Leenes, R., Tseng, J.: Privacy and identity management for everyone. In: DIM 2005: Proceedings of the 2005 Workshop on Digital Identity Management, pp. 20–27. ACM, New York (2005) [23] Bramhall, P., Hansen, M., Rannenberg, K., Roessler, T.: User-centric identity management: New trends in standardization and regulation. IEEE Security and Privacy 5, 84–87 (2007) [24] Bar-or, O., Thomas, B.: Openid explained (2010), http://openidexplained.com/ (Online; accessed August 18, 2010)
A Taxonomy of Privacy and Security Risks Contributing Factors
63
[25] Kruk, S.R., Grzonkowski, S., Gzella, A., Woroniecki, T., Choi, H.C.: D-foaf: Distributed identity management with access rights delegation. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 140–154. Springer, Heidelberg (2006) [26] Mont, M.C., Beato, F.: On parametric obligation policies: Enabling privacy-aware information lifecycle management in enterprises. In: POLICY 2007: Proceedings of the Eighth IEEE International Workshop on Policies for Distributed Systems and Networks, pp. 51–55. IEEE Computer Society, Washington, DC (2007) [27] Anderson, R.J.: Security Engineering: A Guide to Building Dependable Distributed Systems. John Wiley & Sons, Inc., New York (2001) [28] Siponen, M.T., Oinas-Kukkonen, H.: A review of information security issues and respective research contributions. SIGMIS Database 38(1), 60–80 (2007)
ETICA Workshop on Computer Ethics: Exploring Normative Issues Bernd Carsten Stahl and Catherine Flick De Montfort University, U.K., Middlesex University, U.K.
[email protected],
[email protected] Abstract. The ETICA project aims to identify emerging information and communication technologies. These technologies are then analysed and evaluated from an ethical perspective. The aim of this analysis is to suggest possible governance arrangements that will allow paying proactive attention to such ethical issues. During the ETICA workshop at the summer school, participants were asked to choose one of the 11 technologies that ETICA had identified. For each of these technologies there was a detailed description developed by work package 1 of the project. Workshop participants were asked to reflect on the ethical issues they saw as relevant and likely to arise from the technology. This paper discusses the ethical views of the workshop participants and contrasts them with the findings of the ethical analysis within the ETICA project. Keywords: ethics, emerging technologies, privacy, evaluation, norms.
1 Introduction One purpose of combining the ETICA summer school with the PrimeLife and IFIP summer school was to engage with more technically oriented communities and to expose the ETICA findings to an external audience of individuals who had expertise in areas similar to that of ETICA. The main purpose was to understand which ethical issues such experts would identify from the description of the technologies and from their own experience. This paper briefly outlines how the descriptions of the technologies were created and how the ethical analysis within the ETICA project was undertaken. On this basis, the ethical issues of the individual technologies, as developed by workshop participants, were contrasted with those of the ETICA experts in ICT ethics. The paper concludes by discussing the substantial differences between the two sources and reflects on the value of the similarities and differences to the ETICA project.
2 Technologies and Their Ethical Issues ETICA, an EU FP7 research project, funded under the Science in Society funding stream, aims to identify emerging ICTs with a view to identifying and evaluating ethical issues. The results of these investigations will then be used to review and S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 64–77, 2011. © IFIP International Federation for Information Processing 2011
ETICA Workshop on Computer Ethics: Exploring Normative Issues
65
recommend governance arrangements that will be conducive to proactively addressing such issues. The technologies considered emerging are likely to be developed within the next 10-15. To identify the technologies, political, scientific, and commercial reports of research and development of cutting edge technologies were analysed and the key technologies determined. Ethical analysis of these technologies then took place, involving meta-analysis of existing critical ethical analysis of the technologies as well as other analytical techniques, which are further discussed briefly below, and in more detail in the ETICA project deliverables. The ETICA workshop described in this paper drew from the descriptions of technologies that were identified. These are available individually on the ETICA website (www.etica-project.eu) and as collected as deliverable D.1.2 "Emerging Information and Communication Technologies" from the publication/deliverables section of the website. Future-oriented work always has to contend with a number of conceptual, methodological and epistemological problems (1). In the case of ETICA, conceptual issues were raised concerning the meaning of "emerging", "information", "technology" and most other central terms. Since the future is fundamentally unknown and unknowable, the claims of the ETICA project needed to be carefully scrutinized. Very briefly, the ETICA approach views technologies as at least partially socially constructed and therefore subject to interpretive flexibility. This view of technologies corresponds with those of the Social Study of Technology (SST) or the Social Construction of Technology (2-4). It is also compatible with views of technology in related fields such as Actor Network Theory (5,6). Finer point of the debate such as the distinction between interpretive and interpretative flexibility are of less importance here (7). A further problem of prospective studies is that the possible future consequence of any occurrence, action or technology is infinite. The ability to adequately predict the future shrinks with the temporal horizon. ETICA thus had to make a reasonable compromise with regards to the temporal reach of its investigation. The temporal reach of truth claims about emerging technologies is around 15 years. The justification of this temporal horizon is that technologies that will be relevant within this time span are likely to currently being developed. An investigation of current research and development activities should thus give an indication of such technologies. It is important to point out that ETICA positions itself within the field of foresight activities. This means that it does not claim to know the future but that the findings of the project have the character outlining possible futures with a view to identifying futures that are desirable and that can be influenced by present action. The aim of foresight activities is not to describe one true future but some or all of the following (8): • • • • • •
To enlarge the choice of opportunities, to set priorities and to assess impacts and chances To prospect for the impacts of current research and technology policy To ascertain new needs, new demands and new possibilities as well as new ideas To focus selectively on economic, technological, social and ecological areas as well as to start monitoring and detailed research in these fields To define desirable and undesirable futures and To start and stimulate continuous discussion processes.
66
B.C. Stahl and C. Flick
This understanding of the aims of foresight fits well with the ETICA project. It renders the entire project feasible because it underlines that there is no claim to a correct description of the future but only one to a plausible investigation of possible options and outcomes. This raises the question of how such a claim to a plausible description of possible futures can be validated: the question of methodology. The identification of emerging ICTs was done by undertaking a discourse analysis of discourses on emerging ICTs. Sources of this discourse analysis were on the one hand governmental and funding publications and on the other hand publications by research institutions. The rationale for the choice of these sources was that that the cover the visions and intentions of influential technology R&D funding and at the same time include views on what is happening in R&D organizations. Taken together they should thus give a reasonable view of which technology developments are expected. The analysis of these texts was undertaken by using the following analytical grid, which allowed for the distinction of technologies, application areas and artefacts. In addition it allowed for the early identification of ethical, social, or legal questions as well as technical constraints or capabilities. Is implementation of / is application of Examples: - future internet - quantum informatics - embedded systems - autonomous ICT Is demonstrated by
Application Examples
Technologies
Examples: - internet of things - trustworthy ICT - adaptive systems for learning - service robots for elderly - quantum cryptography
Is implemented in
Examples: - Concurrent tera-device computing - wireless sensor networks - intelligent vehicles - open systems reference architectures - self-aware systems - self-powered autonomous nano-scale devices
All entries (technology, application example, artefact) defined by these items
Social Impacts
Critical Issues
Capabilities
Constraints
Fig. 1. Categories of data analysis (analytical grid)
Artefacts
ETICA Workshop on Computer Ethics: Exploring Normative Issues
67
During the data analysis it became clear that there was going to be a large number of technologies, application examples and artefacts that could be identified. In order to render these manageable and to facilitate ethical analysis and subsequent evaluation, it was decided to group the findings into general high-level technologies. The description of such technologies was meant to illustrate their essence, i.e. the way the technology affects the way humans interact with the world. The analytical grid proved helpful in indicating which of the analysed items were related and thereby pointing to the most pertinent ones. It provided the basis for the identification of top-level technologies for which detailed descriptions were developed. For each of these top-level technologies, the description was constructed on the basis of the data derived from the analysis of discourses but also drawing on additional data. The structure of the technology descriptions was: • • •
Technology Name History and Definitions (from discourse analysis and other sources) Defining Features ("essence" of technology, how does it change the way we interact with the world) Application Areas / Examples Relation to other Technologies Critical Issues (ethical, social, legal and related issues as described in the discourse) References
• • • •
The method just described provides a transparent and justifiable way of identifying emerging ICTs for the purpose of foresight, as described earlier. It can nevertheless have blind spots because it relies on interrelated discourses by governments and research institutions. It was therefore decided to use several methods to ensure that the list of technologies was reasonable. These consisted of a set of focus groups with technology users, a survey of technology development project leaders, and a crosscheck with an amalgamated list of technologies from current futures research. The full list of technologies which survived several rounds of review, amalgamation and delimitations is as follows: Table 1. List of emerging ICTs
• • • • • • • • • • •
Affective Computing Ambient Intelligence Artificial Intelligence Bioelectronics Cloud Computing Future Internet Human-machine symbiosis Neuroelectronics Quantum Computing Robotics Virtual / Augmented Reality
68
B.C. Stahl and C. Flick
Before coming to the views that workshop participants had on these technologies, it is necessary to say a few words on the way in which they were ethically analysed within ETICA. Very briefly, the ETICA work package 2, which was tasked with undertaking the ethical analysis, chose a descriptive and pluralist approach. This means that, instead of relying on a particular ethical theory, such as Kantian deontology, utilitarianism, virtue ethics or more current approaches such as discourse ethics, the analysis reviewed the literature on ICT ethics and accepted as ethical issues what was presented as such. The term ICT ethics was chosen because it reflects the EU FP7 attention to ICT. The review of ICT ethics was based on the more established fields of computer and information ethics. Computer and information ethics can be understood as that branch of applied ethics which studies and analyzes social and ethical impacts of ICT (9). The more specific term ‘computer ethics’, coined by Walter Maner in the 1970s, refers to applying normative theories such as utilitarianism, Kantianism, or virtue ethics to particular ethical cases that involve computer systems or networks. Computer ethics is also used to refer to professional ethics for computer professionals such as codes for conduct that can be used as guidelines for an ethical case. In 1985, Jim Moor (10) and Deborah Johnson (11) published seminal papers that helped define the field. From then on, it has been recognized as an established field in applied ethics, with its own journals, conferences, research centres and professional organisations. Recently, computer ethics is related to information ethics, a more general field which includes computer ethics, media ethics, library ethics, and bioinformation ethics (12). For contemporary overviews of the field, see Floridi (13); Tavani & Himma (14); and Van den Hoven & Weckert (15). The field of computer ethics is also institutionalised inside ethics, e.g. in the International Society for Ethics of Information Technology (INSEIT) and outside applied ethics for example in several working groups of organisations for IT professionals such as the Association for Computing Machinery (ACM), International Federation for Information Processing (IFIP) and national professional organisations across the globe. The approach taken was to mine the literature on ICT ethics, initially using a biometric tool that showed proximity between terms in the literature. Using this tool as a starting point, a detailed review of all technologies was undertaken. Where the ICT ethics literature did not show any relevant issues, literature from adjacent fields was used. The primary ethical analysis was based on the defining features of the technology. Where these did not lend themselves to ethical analysis, the application examples were considered. Overall the aim was to identify a broad spectrum of ethical concerns that can be used by researchers, funders or policy makers to ensure that ethical issues are addressed early and appropriately. For a more detailed description of the identification of technologies and their ethical consequences see (16). Having now outlined the way in which the technologies were identified and their ethical analysis was undertaken, the next step is to compare these ethical analyses with the findings and intuitions of the summer school workshop.
3 Findings This section contrasts the views on the ethics of emerging ICTs of participants in the workshop with the findings of the ETICA project. For each of the technologies
ETICA Workshop on Computer Ethics: Exploring Normative Issues
69
discussed, the paper first summarizes the main points by the workshop participants. These were captured and transcribed after the workshop. In a second step we then outline the ETICA findings. These are summarized from the ethical analysis that has been published as deliverable D.2.2, Normative Issues Report and is available from the "deliverables" section under "publications" of the ETICA website (www.eticaproject.eu). The technology descriptions that underlie the following discussions are also available from the same source in deliverable D.1.2, Emerging Information and Communication Technologies Report. 3.1 Neuroelectronics The participants in the workshop were mainly concerned by the effect that neuroelectronics would have on human identity and dignity. Issues such as the potential for invasion of psychological processes or accidental (or intentional) changes of personality were raised, with some concern about neuroelectronics being used as “advanced lie detectors”, bringing along with it problems associated with what makes something the truth, and causing power imbalances. Participants were also concerned with the idea of the “reification” of human beings: i.e. the reduction of humanity to its brain functions. The most pressing issues, however, for the workshop, were the issues of informed consent and whether one can fully understand what’s happening with neuroelectronics, and the misuse of such technology in areas such as intentional changes of personality, or advanced torture methods. These concerns are well in line with ETICA’s analysis of neuroelectronics, which raises the issues of responsibility and liability issues for the potential for harm through neuroelectronics. The analysis also touches on the issues of agency and autonomy, which, it argues, could be altered. Not only might cognitive enhancements amplify autonomous capacity, but this new-found ability to know one’s self may promote further human autonomy through further control over one’s mind. However, this could also lead to a “God complex” which could end up being detrimental to agency and autonomy. Informed consent is also considered a major issue, with issues of consent for the mentally ill or convicted criminals a particularly difficult problem. Although the idea of a “lie detector” is not mentioned in the ETICA analysis, there is some analysis of the issues of brain image processing, which could be used for such a purpose. Of particular note is the idea of “potential crimes”, in which this sort of technology could be used for identifying “criminal thoughts”. The question of whether one could be held accountable for such thoughts as if they were a real crime is another difficult issue. Similarly difficult is the interpretation of data retrieved from neuroimaging. The ETICA analysis does not include the issues of torture or personality change specifically, although it does suggest that there could be significant impacts of peer pressure, advertising, and other social effects (enhancement of cognitive skills, overcoming disabilities, use as a “technological fix” to deeper social or personal problems) that could be considered factors that might contribute to personality change.
70
B.C. Stahl and C. Flick
3.2 Affective Computing Affective computing is technology that aims to achieve emotional cognition through simulation, recognition, and/or realization of emotions. In the workshop, the participants were concerned about the accuracy of behaviour interpretation. What sort of information is being used to determine the response of technology such as a robot that “learns” your emotions? Participants were also concerned by the potential for a society in which robots “learn” how best to deal with people becoming accustomed to obsequiousness. Additionally, they were worried about who might have access to these patterns of behaviour, and whether they might be useful for a third party, such as business, law enforcement, etc. Other concerns were voiced about the social side of this sort of computing, whether it was appropriate for, say, a computer gaming community to know what emotions a player is experiencing, and whether it might affect the play of other members. The issue of allowing gaming companies to know what emotions were being felt during play of their games arose as well: this could be used to identify the most addictive parts of games, or the sorts of people who might become addicted to games, which could seriously affect a person’s autonomy. Finally, the participants raised the question about who ultimately benefits from this sort of information: although there can be benefits to many people, is it worth the trade-off in terms of privacy, identity, and autonomy? The ETICA analysis reflects the problems of persuasion and coercion, taking it further than the problem of addiction identification and into the realm of manipulation, which can cause people to change their behaviour significantly. It also confronts the issue of identity in a realm of emotive robots, especially the fact that technology that is convincing in its ability to interpret emotions could cause the user to have unrealistic expectations of it. Privacy was also mentioned in the ETICA analysis, echoing the sentiment of the workshop participants by discussing issues such as mass databases of affective information, and the relaying of personal affective information over the internet to third parties. There was no discussion about benefits or tradeoffs specifically, nor cheating on online games, although the former is mentioned within the specific examples, and the latter a more general issue that is found in already present gaming situations. 3.3 Bioelectronics The description of bioelectronics raised some more existential questions amongst participants. One of the biggest issues was the question of what it is to be human. Since bioelectronics can, like neuroelectronics, enhance natural human abilities, there was a concern that it could slowly shift the idea of what a “normal human” could be. Equally concerning was the dehumanizing potential of the technology and the possibility of immortality through bioelectronic techniques (bypassing the normal aging process). There were also serious potentials for misuse of the technology, through things like surveillance and remote control of human actions. These raise the issues of privacy and security of people and their data, and the informed consent that might need to be given to allow this sort of technology to be “installed”. Finally there were concerns
ETICA Workshop on Computer Ethics: Exploring Normative Issues
71
about how this technology could increase the digital divide: bioelectronics that enhance human functions would probably only be available at first to the rich, and could then give those with access to it an unfair advantage in life. ETICA found that many of the issues related to bioelectronics are similar to those of neuroelectronics. Furthermore, there is relatively little literature specifically on bioelectronics. For an ethical analysis, there is also overlap with human machine symbiosis. The main issues identified by ETICA are safety, risk, and informed consent; anxieties about the psychological impacts of enhancing human nature; worries about possible usage in children; the digital divide, privacy and autonomy. 3.4 Robotics In the discussion about robotics, the distinction between social and ethical issues was raised. The particular examples (human performance, military, companion robots) were discussed separately, with different issues considered important for each application. The first application (human performance improvement) raised the issue of the digital divide as a social issue, considering, like bioelectronics, the sort of technology that would be developed would be quite expensive to make, and so only be open to the rich (at first). The ethical issues that were raised were the questions of why this sort of technology might be developed at all, and what kinds of performance might be improved. The concern here seemed to be about equality of people, and what this entails in terms of human enhancement through robotics. As for the military application, the issue of “swarms” of autonomous robots was discussed. Issues considered social by the participants included the potential for environmental damage by these robots, and how much control was had over the robots. The issue of control led to the ethical issues of responsibility for the robots’ actions, what sorts of decisions the robots could potentially make: could they be trained to kill? And if so, how reliable would they be at identifying the correct targets? Another issue raised was that of “remote controlled killing”, where the real people involved in the war could just “show up at the office” potentially anywhere in the world and control the outcomes on the battlefield. However, one more positive suggestion was the idea of remote medical administration, but even that raised issues: do we use the robots just to save “our side”, or civilians, or who? And then, who is responsible for these decisions, and how detached are they from what’s actually happening? Finally, the companion robot raised several issues, mainly the social issues of the digital divide, the sorts of interactions one could have with the robots, and the robot’s rights, particularly when it comes to applications such as “sex bots”. The main ethical issue identified was responsibility, particularly as it concerns the use of these sorts of robots, the programming that goes into them to “learn”, and what happens when something goes wrong. Robotics, as a research field that has been established for several decades, has created a significant amount of literature on ethical aspects. In the ETICA analysis a first point raises was that of privacy which can be threatened by the mobility of robots, which gives them new capabilities of collecting data. Robots can contribute to tele-presence, which can be morally desirable, for example where people are removed
72
B.C. Stahl and C. Flick
from danger, but can also lead to social exclusion. A core issue is that of robot autonomy. While this is linked to the difficult philosophical question what constitutes autonomy, it also has knock-on ethical consequences such as responsibility or liability of robots in case of problematic behaviour or the question whether robots can or should be made to behave in an ethically acceptable manner. This then is directly linked with the question whether and what stage robots should be the subject to ethical or legal rights. A further ethically challenging development is that of a possible competition between humans and robots and robots possibly overtaking humans. This is linked to the question of the social consequences of large-scale use of robots, which promise to raise particular issues if robots become very similar to humans. The competition between humans and robots is, however, not linked to futuristic and autonomous human-like robots but can be observed at present, for example when robots take over human work and thereby cause unemployment. 3.5 Quantum Computing The quantum computing discussion was also quite existential, looking back to some of the more fundamental questions of humanity and technology. Questions like “what is reality?” “What is it to be human?” “What is space?” and “What is consciousness?” were brought about as a response to the technical potential for this technology. The discussion also touched on what sorts of decisions computers should be making, and what sorts of applications quantum computing might have in reality. However, the issues of motivation for development of quantum computing were also examined: could it be used for the common good, dealing with large amounts of data for public health threats, pandemics, or mass movement of people? Could it be used to “save humanity” by allowing for space travel (tele-transportation)? Or does it only have an economic drive behind it? One participant suggested that because quantum computing is a scientific instrument, there could be no real social or ethical issues, although some of the potential applications, it was argued, could have some issues. Such applications, like quantum memory, also caused the participants to question their ideas of what memory is, as well as how this sort of memory could be used. There were concerns about the possibility for quantum computing to circumvent the security and encryption mechanisms in contemporary use. This was considered a social issue, that is, that it would mean that people would be concerned about security of data and whether encryption is working. A quantum information network, with data “appearing in the moment” was considered a double-edged sword, since it could be used for the common good (such as in pandemics, etc.) but also had a strong economic imperative for data-mining purposes. The ETICA ethical analysis of quantum computing suffered from the fact that little is known about quantum computing at the moment and practical applications are difficult to discern. Due to sub-atomic scale of aspects of quantum computing, it may raise concerns that are similar to those of nanotechnology, which, however, is outside of the ICT-related scope of ETICA. Otherwise quantum computing is often portrayed as a qualitative and quantitative improvement of current computing. It can thus contribute to the exacerbation of established ethical issues of computing. One of the
ETICA Workshop on Computer Ethics: Exploring Normative Issues
73
few applications that are discussed in the literature on quantum computing is that of encryption. It is sometimes speculated that quantum computers could render current encryption methods redundant and would require new methods of encryption. Corresponding and resulting issues would then be those of security and information assurance, but possibly also questions of freedom of speech or censorship. In addition to such applied questions, the ontological nature of quantum computing and its ability to link matter-like and idea-like things, it may change our perceptions of reality and also of ethics. 3.6 Future Internet In the discussion on the future internet (which included things like the Internet of Things), there were just a few issues that were raised as being emergent issues, particularly the logistics and responsibility for monitoring of (for example) health conditions (especially rapid response), which would be possible with the technology, and the delegation of decision making, that is, if it is a computer or a person who makes decisions about the sorts of responses necessary. Also identified was the issue of user autonomy: with so many decisions being made for you by machines or externally, it could take away the autonomy of users that would otherwise need to make the decisions themselves. Finally, the participants identified a need for balancing these pros and cons of the future internet technology, since there could be many of both sides: machines making decisions are cheaper, faster, and could be more effective, but without the proper checks and balances could allow people to “slip through the gaps”. Future Internet, with its components of the internet of things, the semantic web and cognitive networks was seen by ETICA as raising a number of potential issues. Incorporating meaning into internet structures by adding to available data and meta data can raise issues of privacy and data protection. Resulting questions can arise concerning trust and acceptance of technologies. New capabilities can lead to further problems of digital divides and intellectual property. This includes the question of openness and regulations of new networks and infrastructures. A final important thought was that of the sustainability implications of the future internet and its projected increase in energy consumption. 3.7 Cloud Computing The description of cloud computing was considered not entirely accurate, but even still some issues were raised about the information within the technology description. For example, participants discussed the problem of providers not telling users what their data is being used for: email providers might be reading email (through sophisticated programs) that might be used to profile the user in order for ads to target them better. However, the mechanisms and use of the data are not usually explained very well to the users. Profiling, it is noted, is a widespread general problem with cloud computing. Since the providers have access to a lot of data from lots of individual sources, it becomes very easy for an economic incentive to inspire use of that data, even if it becomes anonymised. Everything from location information (which could be sensitive) to what is written in a document or photographs taken
74
B.C. Stahl and C. Flick
could be used to discover the habits and profile of the user. Even anonymisation is not foolproof, since much of the data within these sorts of documents could be identifying in their content. One of the other big issues of cloud computing is that of provider lock-in. Since providers provide a service, users have a significant incentive to continue to pay for the use of that service, particularly if the service makes it difficult for them to export their data in easy-to-read or standard open formats. In some cases it could be difficult to “quit” a service contract, since you run the risk of losing data. Also, in a related issue, if you wanted to actually delete the data from your cloud computing account, it could be very difficult to ensure it is fully gone. Computer service companies make extensive backups, for example, and it could be impossible to remove data permanently from the cloud. Some other issues involve security of data: large quantities of data and possibly personal information are particularly interesting to black-hat hackers who might want to sell this information for nefarious purposes. Also, the lack of physical access and control of the machines on which the data resides was brought up as a potential issue. Finally, applications and software being outsourced to the cloud raise some potentially serious legal issues regarding jurisdiction and intellectual property ownership of data. The ETICA analysis of cloud computing, which to some degree is an existing technology, covered similar aspects as that of the workshop participants. A core issue is that of control and responsibility. If data or processes are moved to the cloud, then who has control over them and who is responsible for what happens to them? This is related to the "problem of the many hands" which stands for the difficulty of attributing causal relationships and thus responsibility to individuals. This lack of control means that users of cloud services may lose their autonomy, or at least their ability to make informed choices. Such problems cover the issue of ownership, which is often difficult to clearly delimit in cloud applications. Cloud computing thus raises the spectre of monopolies and user lock-in. Once data is part of a cloud it is hard to avoid its use for different purposes, as function creep is difficult to predict or control in clouds. This adds to the concerns about privacy. Such concerns are particularly virulent in clouds, which, as global technical systems, will find it difficult to address culturally or locally specific concerns. 3.8 Virtual and Augmented Reality The last technology discussed in the workshop was that of virtual and augmented reality. The participants were particularly concerned about crime in virtual reality settings. For example, they were wondering whether avatars could commit crimes. One participant noted that avatars can make legal contracts in current VR settings (Second Life, Massively Multimedia Online Games, etc.). The question of whether sex in virtual worlds would be considered adultery, and other legal and illegal sexual activity occurring online, with the famous “rape in cyberspace” being mentioned as an example of real world sexual occurrences that can also happen within virtual reality systems. The workshop participants concluded that perhaps a redefinition of crime could be required for within virtual reality settings.
ETICA Workshop on Computer Ethics: Exploring Normative Issues
75
However, more advanced virtual reality technology that was more immersive caused the participants to wonder whether it would be ethical to raise a child entirely within a virtual reality world, or if people should be allowed to spend most of their time in virtual reality. The question of whether prisoners could be put into VR was also brought up, given the potential effect on the person within the virtual world. Another problem was that of the digital divide. Like so much technology, much of it is expensive and thus more likely to be used by the rich before the poor. The participants were also concerned about “Matrix”-like scenarios, where the population could “live” unknowingly in a virtual reality while being “farmed” for their resources in reality. This dehumanization and loss of dignity seemed to be important issues for the participants. The ETICA analysis highlighted similar issues. An initial concern was the relationship between VR / AR and well-being. This touches on the question whether positive experiences that are conducive to a good life can be made in a non-real environment. Virtual or augmented realities may tempt individuals to escape from real challenges or responsibilities. Moreover, such artificial environments may have harmful consequences for users. Such harm may be psychological (e.g. addiction) or physical (e.g. motion sickness). Users may furthermore find it difficult to distinguish between "real" and "virtual", raising practical problems but also philosophical questions about the nature of this division. Many of the VR / AR applications are in the area of gaming which raises ethical concerns about the violence that can be found in many such games. A corollary to this problem is that question of the relevance and ethical evaluation of virtual harm or virtual immorality (e.g. virtual murder, virtual child sexual abuse). Digital divides may again arise as ethical problems due to the inequity of access to VR / AR technologies. Immersion in virtual environments can lead to the question of autonomy of the user and their ability to control their environment. The numerous issues surrounding VR / AR raise difficult questions concerning the responsibility of designers and producers of such devices.
4 Conclusion This comparison of workshop participants' perceptions and ETICA analysis shows that there is a significant amount of overlap. In some cases ethical issues of emerging ICTs are already widely discussed and people are aware of different positions. Some ethical issues are recurrent and already subject to regulation. Notable examples of this include privacy/data protection and intellectual property. Other problems are less obvious or less widely discussed. Many of them raise fundamental philosophical issues concerning the question of what we believe to be real and good, how we come to such judgements and how societies as a whole develop their views on this. Among the interesting differences between the summer school workshop participants and the ETICA analysis is the amount of attention paid to particular application examples. The technology descriptions used as primers for the exercise all contained approximately five different application examples used to allow the development or deduction of core features, which are also then listed in the descriptions. The workshop
76
B.C. Stahl and C. Flick
participants were mostly drawn to these applications and based much of their discussion on them. The ETICA analysis, on the other hand, aimed to base the ethical analysis on the more general defining features. As a result many of the issues discussed in the workshop referred to specific applications, e.g. robots in healthcare or in the military. Due to the overarching theme of the summer school being privacy, this topic was much on the participants' minds; this is reflected in the discussion on the technologies. Since data collection and sharing is often the focus or a side effect of many emerging technologies, this is unsurprising, but it also reflects the particular concern for privacy that the participants had. From the perspective of ETICA the exercise showed that the ethical analyses are compatible with what a set of educated lay people would see. It also raises the question how more detailed and application-oriented analyses could be introduced into ethical discussions. The workshop can therefore be seen as a success. Acknowledgments. The ETICA project (http://www.etica-project.eu) is funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement #230318.
References 1. Sollie, P.: Ethics, technology development and uncertainty: an outline for any future ethics of technology. Journal of Information, Communication & Ethics in Society 5(4), 293–306 (2007) 2. Bijker, W.: Of Bicycles, Bakelites, and Bulbs: Toward a Theory of Sociotechnical Change, New edition. MIT Press, Cambridge (1997) 3. Grint, K., Woolgar, S.: The Machine at Work: Technology, Work and Organization. Polity Press, Cambridge (1997) 4. Howcroft, D., Mitev, N., Wilson, M.: What We Learn from the Social Shaping of Technology Approach. In: Mingers, J., Willcocks, L.P. (eds.) Social Theory and Philosophy for Information Systems, pp. 329–371. Wiley, Chichester (2004) 5. Latour, B.: Reassembling the Social: An Introduction to Actor-Network-Theory, New edition. OUP Oxford (2007) 6. Law, J., Hassard, J.: Actor Network Theory and After. WileyBlackwell, Chichester (1999) 7. Cadili, S., Whitley, E.: On the interpretative flexibility of hosted ERP systems. Journal of Strategic Information Systems 14(2), 167–195 (2005) 8. Cuhls, K.: From forecasting to foresight processes - new participative foresight activities in Germany. Journal of Forecasting 22(2-3), 93–111 (2003) 9. Bynum, T.: Computer and Information Ethics (Internet) (2008), http://plato.stanford.edu/entries/ethics-computer/ (cited December 4, 2008) 10. Moor, J.H.: What is computer ethics. Metaphilosophy 16(4), 266–275 (1985) 11. Johnson, D.G.: Computer Ethics, 1st edn. Prentice Hall, Upper Saddle River, New Jersey (1985) 12. Brey, P., Soraker, J.H.: Philosophy of Computing and Information Technology. In: Gabbay, D.M., Meijers, A.W., Woods, J., Thagard, P. (eds.) Philosophy of Technology and Engineering Sciences, vol. 9, pp. 1341–1408. North Holland, Amsterdam (2009)
ETICA Workshop on Computer Ethics: Exploring Normative Issues
77
13. Floridi, L. (ed.): The Cambridge Handbook of Information and Computer Ethics. Cambridge University Press, Cambridge (2010) 14. Himma, K.E., Tavani, H.T. (eds.): The Handbook of Information and Computer Ethics. Wiley, Hoboken (2008) 15. van den Hoven, J., Weckert, J.: Information Technology and Moral Philosophy, 1st edn. Cambridge University Press, Cambridge (2008) 16. Stahl, B.C., Heersmink, R., Goujon, P., Flick, C., van den Hoven, J., Wakunuma, K., et al.: Identifying the Ethics of Emerging Information and Communication Technologies: An Essay on Issues, Concepts and Method. International Journal of Technoethics (2011)
Contextualised Concerns: The Online Privacy Attitudes of Young Adults Michael Dowd School of ESPaCH, The University of Salford, Crescent House, Salford, Greater Manchester, M5 4WT
[email protected] Abstract. Existing research into online privacy attitudes, whilst useful, remains insufficient. This paper begins by outlining the shortcomings of this existing research before offering a fresh approach which is inspired by Solove’s notion of “situated and dynamic” privacy. With reference to ongoing PhD research it is argued that the generation of rich, situated data can help us to understand privacy attitudes in context. In this research semi-structured interviews are being used in order to grasp how young adults understand, manage, and negotiate their privacy across online settings. The paper concludes with a call for further qualitative research into online privacy attitudes and suggests focusing on more niche online settings than Facebook. Keywords: Online privacy, Young adults, Qualitative.
1 Introduction As technological advancements are made, new means of both violating and protecting privacy may emerge and this is why technological developments are often met with concerns and questions relating to privacy. Indeed the most widely cited text in privacy literature, Warren and Brandeis’ landmark 1890 publication: “The Right to Privacy” [1], was concerned with the “recent inventions” of instantaneous photographs and advanced printing technologies, which were described as threatening to: “…make good the prediction that "what is whispered in the closet shall be proclaimed from the house-tops.” [1]. It should come as little surprise then that the rapid evolution of the internet has been accompanied by increasing interest in its impact on privacy, as is illustrated both by the current focus of media attention on the topic [2, 3, 4] and also the growing body of academic research into a wide range of issues relating to online privacy. The existing research into online privacy emanates from a broad range of disciplines including law, computer science, business, media and various others. It is important for any work in this area to explicate its disciplinary background because research from these diverse disciplines of course focuses on varying aspects of online privacy e.g. computer scientists tend to be concerned with the technical means of protecting privacy online, whilst business researchers are usually interested in the difficulties online privacy concerns can pose for e-marketing. This paper is written S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 78–89, 2011. © IFIP International Federation for Information Processing 2011
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
79
from a sociological perspective and is concerned with people’s attitudes, understandings, behaviours and concerns relating to online privacy. There is a useful body of existing research into online privacy attitudes but, as will be explained, it remains insufficient and so a fresh research approach will be offered here. This paper will present ongoing PhD research into the online privacy attitudes of young adults in support of the view that context is vitally important in understanding privacy attitudes. It will be shown that because of this it can be very fruitful for researchers to take an inductive research approach, generating situated data which can help us to understand privacy attitudes in particular contexts. It will be argued that generating data of this nature, and so offering insights beyond those offered by previous survey research, is the real value of social science in researching online privacy. In order to achieve these aims the paper will be structured in the following way: firstly, the existing online privacy literature which has informed this research will be summarised and its influence clarified; secondly, the research methods being used here will be described and the rationale for their selection justified; and thirdly, interim findings from the PhD research will be cautiously presented in order to demonstrate both the significance of the research, and also the nature of the data being generated.
2 Existing Research The existing literature relating to online privacy attitudes can be divided largely into two distinct categories with contrasting research approaches. Firstly, there is a significant body of quantitative, survey based research which comes generally from a business and marketing perspective. In the past this research had tended to focus on issues of consumer privacy [5, 6, 7, 8, 9] and so was concerned with commercial, rather than social, activities, the “instrumental”, rather than the “expressive”, internet [10]. This trend, however, has faded away in recent years with the publication of quantitative research into social networking sites [11]. Secondly, there is a growing amount of relevant qualitative research exploring the use of social networking sites. This research has developed in response to the rapid growth in popularity of such sites [12] and, whilst not always focused specifically on privacy, it does make important contributions to understanding the online privacy attitudes of social networking site users. Survey questionnaires have been by far the most popular method used for researching online privacy attitudes and so we will begin by summarising the key survey findings which have informed this research. 2.1 Survey Research There are two prominent themes which have emerged from the existing survey literature and inform the PhD research presented here: the “privacy paradox”, and “determinant factors”. The “privacy paradox” is the apparent disconnect, or even contradiction, between reported privacy attitudes and actual behaviours. Surveys have found that users, perhaps particularly young adults [13], report themselves to be very concerned about their online privacy and the flow of their personal information, yet
80
M. Dowd
upon examining their behaviour it seems they freely share personal information and either do not engage, or do not engage effectively, with privacy settings on social networking sites [14]. The “privacy paradox” has been evident in surveys of both social [13, 14] and commercial [15] aspects of internet use. A number of possible explanations for the “privacy paradox” have been suggested but none have been widely accepted. Stenger and Coutant have contended that users may indeed be concerned about their online privacy but lack the technical skills and understanding required to protect it [16]. Albrechtslund has outlined the “moral panic” perspective: that users actually have a complete lack of interest in personal privacy [17]. It has also been suggested that perhaps “optimistic bias” is at play, whereby users are concerned about privacy on a societal level but do not consider themselves to be vulnerable and so do not feel the need to actively protect their privacy [18]. “Determinant factors” are demographic variables which have been found to apparently influence online privacy attitudes. Survey research has identified a number of apparent “determinant factors” including: gender, age and level of education. Gender is perhaps the most prominent of these as women have repeatedly and consistently been found to be more concerned about their online privacy than men [6, 19, 20, 21, 22, 23]. These gendered findings have also been supported by studies focused on adolescents [7, 24]. Age too has been repeatedly identified as a significant factor, with older people found to be generally more concerned about their online privacy than younger people [5, 9, 20], although it is important to recognise that, despite these apparent patterns, levels of privacy concern vary considerably within age cohorts as well as across them [25]. Numerous studies have found that higher levels of education correspond with higher levels of online privacy concern [8, 26] and this is supported by research which has found that those with higher levels of education are more likely to adopt privacy protective behaviours [9]. The robustness of these findings has been demonstrated by the consistency with which they have been replicated; however there remains a need to further explore these “determinant factors” in order to understand how and why they have an apparent influence. This is particularly true in the case of gender because, despite the consistent survey findings, there has been no sustained exploration of why women may be more concerned about their online privacy than men. The explanations suggested in the literature have merely been weakly extrapolated from research in other areas. For instance, Moscardelli and Divine [7] have argued that women are more concerned about their online privacy than men because they have a greater fear of danger citing research into the fear of crime. Garbarino and Strahilevitz [6] have argued that women are more concerned because they are more risk averse generally, citing research from a variety of other domains to support this contention. These explanations do not have any empirical link to the research conducted and are completely unsatisfactory. What is problematic about using survey research to further explore the “privacy paradox” and “determinant factors” is that the data generated is abstract and not rooted in the lived experiences of the research participants. Quantitative research is useful for identifying patterns but in order to insightfully explore these patterns, to understand how and why “determinant factors” such as gender might have an influence on privacy attitudes, and to better grasp the “privacy paradox”, we contend that a qualitative approach is required. By adopting a qualitative approach the
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
81
research can be inductive: a weakness of this existing body of research is its ‘topdown’ approach, relying on predetermined definitions of privacy built into precoded surveys. This denies participants the opportunity to express their own understandings of privacy and so ignores individuals’ ability to make, modify and define privacy in their own terms, in different ways across contexts. By working inductively, understandings can be allowed to emerge through the generation of situated, contextualised data so that credible explanations, rooted in the lived experiences of participants, can be offered. 2.2 Qualitative Research The emerging body of qualitative research into social networking sites makes use of ethnographic methods [27, 28, 29, 30, 31] and, although the specific research focus is not necessarily on privacy issues, the rich and contextualised data generated in these studies offers valuable insights into how users understand and manage their privacy in these online spaces. Perhaps the central finding of the existing qualitative literature is that for young people online: “…“privacy” is not a singular variable” [25], instead, different types of information are considered more or less private depending on who may have access to that information [30]. For instance, individuals may be content to share certain information with their friends but not with their parents, with potential romantic partners but not with employers and so on. The work of both Livingstone [29, 30] and boyd [27, 32] makes clear that online settings, such as social networking sites, are not conceptualised by young people as straightforwardly “public” or “private” in a binary way, but instead distinctions are made based upon their ‘imagined audience’ [27] and the affordances of the particular online setting [30]. What “privacy” means and what is “private” online is context dependent and so understandings and negotiations of privacy can vary across online settings. These findings demonstrate the need for research into online privacy attitudes to generate data which is situated rather than abstract because privacy attitudes are contextual. These ethnographic studies have also illuminated a variety of the ways in which users manage their privacy in these online spaces. boyd describes a variety of privacy protective behaviours which users sometimes adopt in order to retain control of who has access to their social networking profiles, in an attempt to manage ‘social convergence’ [32]: falsifying identifying information such as age and location; engaging with privacy settings; and creating “mirror profiles” for instance [27]. More recent research into social networking sites has uncovered further means of managing privacy in these settings such as “wall cleaning” and creating aliases [28]. The point here is that the nature of the data generated by these qualitative studies enables researchers to develop more granular and sophisticated understandings of how privacy is conceptualised and managed by users in the social networking sites studied. The shortcoming of the existing qualitative research into social networking sites is that thus far it has remained focused on specific online activities and sites as chosen by the researcher, largely Facebook. This means that the ways in which understandings of privacy may alter, or remain constant, across online settings has not been explored and also that more niche online settings are being neglected. Nonetheless, the illuminating findings of this body of qualitative research and the nature of the data generated have, to some extent, inspired the methods employed in the PhD research presented here.
82
M. Dowd
3 Conceptualising Privacy Research into privacy attitudes is often presented with a lengthy discussion of the specific privacy definition being utilised in that particular paper. This research, however, does not rely on any predetermined, overarching conception of privacy but instead takes an approach which builds on the work of Solove [33] who proposed attempting to understand privacy contextually rather than in abstract. Privacy is both historically and culturally contingent [34], and has been recognised as being an especially ‘elastic’ concept [35], as is apparent from the vast number of ways in which it has been conceptualised. Solove has argued that the bewildering array of existing privacy conceptions can be dealt with in six categories: the right to be let alone; limited access to the self; secrecy; control over personal information; personhood; and intimacy [33]. These categories are not mutually exclusive, as there is overlap between conceptions, but Solove uses them as a means of critically examining the overall discourse on conceptualising privacy. Solove concludes that existing conceptions, whilst illuminating in relation to certain aspects of privacy, are either too narrow or too broad due to their attempts to isolate the “core” characteristics, or abstract “essence” of privacy [33]. Solove advocates a new and pragmatic approach to conceptualising privacy inspired by Wittgenstein’s notion of ‘family resemblances’. Solove contends that it would be more useful to attempt to understand privacy as an aspect of everyday practices, by focusing on specific privacy problems, rather than attempting to isolate the “core” characteristics of privacy in order to understand it as an abstract, overarching concept. In doing this, privacy can be conceptualised: “…from the bottom up rather than the top down, from particular contexts rather than in the abstract” [33], so recognising the: “…contextual and dynamic nature of privacy” [33]. This conceptual approach is consistent with our contention that the data generated by research into privacy attitudes must be situated rather than abstract and underpins the methodological decisions outlined in the following section.
4 Method 4.1 Research Sample The decision to focus on the online privacy attitudes of young adults was made for a number of reasons. Firstly, online privacy is considered to be an especially important issue for young people as they will be: “…the first to experience the aggregated effect of living a digital mediated life.” [25]. Secondly, today’s young adults are widely recognised as being part of the first generation to grow up immersed in digital technology [36, 37, 38] and so may be expected to have online experiences distinct from other demographic groups. Thirdly, much of the existing research into online privacy is focused on children [29, 30, 39] and is often motivated by the public’s desire to protect the young and ‘vulnerable’ [39]. This research aims to recognise its participants not as “youth”, “children” or “adolescents” in need of protection, but as young adults, responsible for themselves both online and offline. “Young adults” have been defined as those born between 1990 and 1994 because the formative years of these people have coincided with the widespread integration of
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
83
online communication and social media into everyday life: these young adults would have been aged between 5 and 9 when the instant messaging service “MSN” was launched in 1999; and between 8 and 12 when Friendster and MySpace heralded the arrival of mainstream social networking sites in 2002 [12]. In an English context this means that even before these people began studying at secondary school, aged 11, instant messaging had become everyday and widespread [40], with social networking sites not far behind. The experiences of these young adults then may be distinct even from those only slightly older. The sampling is purposive: participants are being recruited in an attempt to access a diversity of experience and attitudes [41]. Sampling has been informed by the existing literature which suggests gender and levels of education influence online privacy attitudes. For this reason, both men and women from a variety of educational backgrounds are being recruited e.g. college students; university students; vocational apprentices; those in employment; those not in education, employment or training. An aim underpinning the work is to explore understandings of privacy across online settings, including those which have previously been under researched, and this is why participants are being recruited from diverse online settings such as Last.fm and Foursquare. Offline recruitment of research participants has taken place via contacts in educational institutions and community groups. 4.2 Research Method Inspired by the rich, insightful data generated in ethnographic studies of social networking sites [27, 28] this research is using a qualitative approach to explore how young adults understand, manage and negotiate their privacy online. This approach is suitable because the interest is in understanding the participants, their practices, and attitudes rather than in measuring or quantifying them [42]. Adopting a qualitative approach also allows us to work inductively: rather than limiting the research by imposing predefined, overarching definitions of privacy we instead allow understandings to emerge from the participants’ own accounts. This enables us to conceptualise privacy contextually, as Solove has argued in favour of. Semi-structured interviews are being conducted in order to generate situated and contextualised data: rather than asking all participants the same questions, participants are deliberately enabled to help shape discussions so ensuring that interviews focus on activities and issues which are familiar and relevant to them. This means that we do not limit the online settings which can be discussed: we focus on whichever online activities are most important to the participant. In exploring a variety of online settings the aim is to be able to make cross-contextual comparisons. This is important because as Jennifer Mason, a leading authority on qualitative research, has argued: “…instead of asking abstract questions, or taking a ‘one-size-fits-all’ structured approach, you may want to give maximum opportunity for the construction of contextual knowledge by focusing on relevant specifics in each interview…The point really is that if what you are interested in, ontologically and epistemologically speaking, is for example a social process which operates situationally, then you will need to ask situational rather than abstract questions.” [43]
84
M. Dowd
An interview guide is composed prior to interviews but its structure is not rigid: interviews are interpersonal events [44] in which both the interviewer and interviewee are active, co-constructing knowledge [41], and so each interview is inevitably and desirably different. Field notes are made after each interview noting both theoretical and methodological reflections. These field notes are later read in conjunction with the interview transcriptions in order to retain as much of the interview context as possible when coding and analysing the data. The environments in which interviews are conducted vary but it is always ensured that a laptop with internet access is available so that participants are able to illustrate points with online examples if they wish to do so. At this point interviews have been conducted with 15 participants. Interviews with further participants are currently being scheduled with recruitment continuing both online and offline. There are also plans for follow up interviews with some of the existing participants in order to further pursue specific issues raised in their initial interviews.
5 Interim Findings It must be stressed that the findings cautiously presented here are only preliminary: research is continuing and data analysis is iterative and ongoing. It is intended that these preliminary findings begin to demonstrate the value of taking an inductive approach and generating rich, situated data when researching online privacy attitudes. The names of all participants are pseudonyms for reasons of confidentiality. A broad range of online activities have been discussed in the interviews conducted so far, including: social networking, instant messaging, online gaming, blogging, online shopping, online banking, file sharing, and participation in interest driven online communities. The specific online settings which have been discussed in-depth have so far included: Last.fm, Facebook, MySpace, Bebo, Chat Roulette, “MSN”, Mousebreaker, Amazon, eBay, YouTube, various dating websites and a number of chat forums. Initial findings have indicated that all participants so far consider themselves to be techno-savvy internet users who are confident in their own abilities to manage their privacy online. Often participants have explicitly linked this confidence to technical skills, for example ‘Frank’ (Gender: male. Age: 17. Occupation: toolmaker) explained that: “I got an ‘A’ in ICT so I know most stuff about computers and the internet”. Of course, how well placed this self-confidence actually is remains open to question. Related to this self-confidence has been an advocacy for personal responsibility in managing privacy online: often breaches of privacy, and general negative online experiences, have been characterised as being at least partially the fault of the victim. Breaches of online privacy have been described as resulting from the victim’s own reckless behaviour, immaturity or lack of technical knowledge, as ‘Luke’ (Gender: male. Age: 17. Occupation: apprentice maintenance technician) phrased it: “It’s just what you get yourself into”. The most prominent concerns expressed have been related to being deceived online, essentially the concern being that: “People online are not who they present themselves to be”. Interestingly though this has not been in relation to e-commerce:
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
85
concerns that online vendors are not reputable or genuine for instance, but instead it has been on a more social level. The kinds of deception which concern participants occur not during online transactions but through online interactions, whether that be via instant messaging, social networking sites or any other form of online communication. There has been a perceived distinction between being deceived by strangers, and being deceived by known parties. Being deceived by strangers has been associated with a physical threat from the “paedos”, “creeps” and “weirdoes” ubiquitous in media reports [45, 46]. Whilst being deceived by known parties has been associated with the emotional threat of being embarrassed or humiliated, be it playfully by friends or maliciously by former partners seeking revenge for instance. Some participants explained that they were wary of deception as they themselves had deceived people online. For example, ‘Kirsten’ (Gender: female. Age: 16. Occupation: engineering student) explained how she had duped a female friend into believing an enamoured male had made contact with her: “I said, ‘This lad fancies you’, so I made an email address”. Participants have also recognised ‘identity theft’ as a concern, but not in the sense of financial fraud as the term is commonly used. Instead, the participants have expressed concerns about being impersonated online, perhaps via a convincing social networking profile or instant messaging account, and having their social identity stolen and, subsequently, their reputation damaged. A number of participants explained that this was the main reason they engaged with the privacy settings on social networking sites: to ensure that only those they trusted had access to photographs which could be used to convincingly impersonate them. Some intriguing gender issues have also started to emerge. Male participants have expressed great enthusiasm for approaching “new girls” online, ‘Frank’ for instance explained that: “Lads, I know for definite, go on with their mates to Facebook profiles and they go through their friends. If they see a good looking girl they’re obviously going to add them and try to get to know them and see if they get anywhere, if you know what I mean…I mean I’ve done it before.” In stark contrast, female participants have consistently complained of the irritation of being approached by unfamiliar men. For example, ‘Julie’ (Gender: female. Age: 16. Occupation: engineering student) explained how she had grown tired of being approached by men through Bebo and so decided to move on to Facebook, although this migration from one social networking site to another did not actually solve the problem: “Ah, all the men and stuff adding me all the time [laughs]…and in the end I got sick of it that much I just deleted my account…I get it sometimes on Facebook”. It has also been interesting to note the widespread use of gender stereotypes in perceptions of other internet users with older men frequently cast as “dirty old men” and female users widely considered vulnerable targets. Other themes which have emerged and which will be explored further as the research progresses include interesting perceptions of age and the media. Participants commonly expressed concern for the privacy of younger users and also voiced the view that older people may lack the technical knowledge to understand online activities and interactions. Participants have inconsistently criticised media portrayals of privacy online and yet also cited the media in support of their own opinions. It could be interesting to explore further how participants connect media reports with their own attitudes, experiences and practices.
86
M. Dowd
5.1 Significance of Findings Whilst these findings are, as previously stressed, only preliminary they do begin to demonstrate the usefulness of the research approach adopted and the nature of the data being generated. By working inductively, understandings are allowed to emerge from participants’ own accounts and this can lead to the revelation of previously hidden meanings. In this case, a fresh perspective was gained on the taken for granted term “identity theft”. “Identity theft” commonly refers to a form of financial fraud but for the participants in this research the term had a quite different meaning related to their social identity. This insight could not have been gained from, for instance, a survey which posed the question: “How concerned are you about identity theft?” This potential for revealing previously hidden meanings and understandings is a key advantage of working inductively. Issues of online privacy are embedded in everyday life and so it is important to try and understand where these issues fit into the lives of the participants. This is why semi-structured interviews are useful: they enable us to focus on “relevant specifics” [43] in each interview in order to generate rich and contextualised data. The short data extracts presented here should have imparted a small flavour of the data and briefly illustrated its richness. The importance of data being rich and contextualised is that theories generated from such data will be rooted in the lived experiences of the participants. This is a departure from previous theorising on the influence of “determinant factors” which has relied on abstract data and, at times, weak extrapolation [6, 7]. The fact that previous research has consistently identified women as more concerned about their online privacy than men without any sustained exploration of why this is the case means that the gender issues emerging from the data are theoretically significant. It is significant that many of the diverse online settings discussed in interviews so far are under-researched. Websites such as Last.fm, Chat Roulette and LookBook have attracted only small amounts of interest from academic researchers despite having considerable numbers of users engaging in activities with significant privacy implications. This is, of course, partly as a result of internet research having to pursue a “moving target” [47], meaning that researchers may not always be able to keep pace with developments, but it is probably also to some extent due to the intense focus on Facebook. This indicates a need for research to focus on more diverse online settings.
6 Conclusion This paper has emphasised the importance of context in understanding online privacy attitudes by providing an outline of ongoing PhD research and explaining its relationship with existing literature in the area. It has been contended that social scientists can make a significant contribution to research into online privacy attitudes by generating rich, situated data which can help us to understand privacy attitudes in context. To conclude this paper it is appropriate to call for further qualitative research into online privacy attitudes. It could be especially fruitful for researchers to move their focus away from Facebook and to explore more niche online settings. The current
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
87
trend of Facebook centered research is understandable, given its rapid growth in membership [48] and media attention on the sites, however, this does not justify neglecting other significant online settings. Last.fm for instance has a reported 30 million active members [49] engaging in a variety of social activities based around music. This considerable membership only seems diminutive in comparison with the phenomenon that is Facebook. If privacy concerns are influenced by the affordances and audiences of particular online settings [27, 29, 30] then settings such as Last.fm are important for research into online privacy attitudes, not only because of their substantial membership, but also because of their varying technical affordances and differing social contexts. Acknowledgements. This work forms part of the Visualisation and Other Methods of Expression (VOME) research project and was supported by the Technology Strategy Board; the Engineering and Physical Sciences Research Council and the Economic and Social Research Council [grant number EP/G00255/X].
References 1. Warren, S.D., Brandeis, L.D.: The Right to Privacy. Harvard Law Review 4, 193–220 (1890) 2. Aaronovitch, D.: Online truth is more valuable than privacy. Times Online (2010), http://www.timesonline.co.uk/tol/comment/columnists/david_aa ronovitch/article7045915.ece 3. Chakrabortty, A.: Facebook, Google and Twitter: custodians of our most intimate secrets. Guardian.co.uk (2010), http://www.guardian.co.uk/commentisfree/2010/ May/25/personal-secrets-to-internet-companies 4. Keegan, V.: Where does privacy fit in the online video revolution? Guardian.co.uk (2010), http://www.guardian.co.uk/technology/2010/mar/19/streamingvideo-online-privacy 5. Bellman, S., Johnson, E.J., Kobrin, S.J., Lohse, G.L.: International Differences in Information Privacy Concerns: a Global Survey of Consumers. The Information Society 20, 313–324 (2004) 6. Garbarino, E., Strahilevitz, M.: Gender differences in the perceived risk of buying online and the effects of receiving a site recommendation. Journal of Business Research 57, 768– 775 (2004) 7. Moscardelli, D.M., Divine, R.: Adolescents’ Concern for Privacy When Using the Internet: An Empirical Analysis of Predictors and Relationships With Privacy-Protecting Behaviors. Family and Consumer Sciences Research Journal 35, 232–252 (2007) 8. Milne, G.R., Gordon, M.E.: A Segmentation Study of Consumers’ Attitudes Toward Direct Mail. Journal of Direct Marketing 8, 45–52 (1994) 9. Nowak, G.J., Phelps, J.: Understanding Privacy Concerns: An Assessment of Consumers’ Information-Related Knowledge and Beliefs. Journal of Direct Marketing 6, 28–39 (1992) 10. Tufekci, Z.: Can you see me now? Audience and disclosure regulation in online social network sites. Bulletin of Science, Technology & Society 28, 20–36 (2008) 11. Hoy, M.G., Milne, G.: Gender Differences in Privacy-Related Measures for Young Adult Facebook Users. Journal of Interactive Advertising 10, 28–45 (2010) 12. Boyd, D., Ellison, N.: Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication 13, 210–230 (2007)
88
M. Dowd
13. Barnes, S.B.: A privacy paradox: Social networking in the United States. First Monday 11 (2006), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/ fm/article/viewArticle/1394/1312 14. Gross, R., Acquisti, A.: Information Revelation and Privacy in Online Social Networks. In: WPES 2005: Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society, pp. 71–80. ACM, New York (2005) 15. Norberg, P.A., Horne, D.R., Horne, D.A.: The privacy paradox: Personal information disclosure intentions versus behaviours. Journal of Consumer Affairs 41, 100–126 (2007) 16. Stenger, T., Coutant, A.: How Teenagers Deal with their Privacy on Social Network Sites? Results from a National Survey in France. In: 2010 AAI Spring Symposium Series (2010) 17. Albrechtslund, A.: Online Social Networking as Participatory Surveillance. First Monday 13 (2008), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index. php/fm/article/viewArticle/2142/1949 18. Cho, H., Lee, J.S., Chung, S.: Optimistic bias about online privacy risks: Testing the moderating effects of perceived controllability and prior experience. Computers in Human Behavior 26, 987–995 (2010) 19. Coles-Kemp, L., Lai, Y.L., Ford, M.: Privacy on the Internet: Attitudes and Behaviours. VOME (2010), http://www.vome.org.uk/wp-content/uploads/2010/03/ VOME-exploratorium-survey-summary-results.pdf 20. Cho, H., Rivera-Sanchez, M., Lim, S.S.: A multinational study on online privacy: global concerns and local responses. New Media Society 11, 395–416 (2009) 21. Fogel, J., Nehmad, E.: Internet social network communities: Risk taking, trust, and privacy concerns. Computers in Human Behavior 25, 153–160 (2009) 22. O’Neil, D.: Analysis of Internet Users’ Level of Online Privacy Concerns. Social Science Computer Review 19, 17–31 (2001) 23. Sheehan, K.B.: An investigation of gender differences in online privacy concerns and resultant behaviors. Journal of Interactive Marketing 13, 24–38 (1999) 24. Youn, S., Hall, K.: Gender and Online Privacy among Teens: Risk Perception, Privacy Concerns, and Protection Behaviors. CyberPsychology & Behavior 11, 763–765 (2008) 25. Marwick, A.E., Diaz, D.M., Palfrey, J.: Youth, Privacy and Reputation: Literature Review. The Berkman Center for Internet & Society at Harvard University (2010), http://cyber.law.harvard.edu/publications 26. Wang, P., Petrison, L.A.: Direct Marketing Activities and Personal Privacy: A Consumer Survey. Journal of Direct Marketing 7, 7–19 (1993) 27. Boyd, D.: Why Youth (Heart) Social Network Sites: The Role of Networked Publics. In: Buckingham, D. (ed.) Youth, Identity and Digital Media, pp. 119–142. MIT Press, Cambridge (2007) 28. Raynes-Goldie, K.: Aliases, creeping, and wall cleaning: Understanding privacy in the age of Facebook. First Monday 15 (2010), http://firstmonday.org/htbin/ cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2775/2432 29. Livingstone, S.: Mediating the public/private boundary at home. Journal of Media Practice 6, 41–51 (2005) 30. Livingstone, S.: Taking risky opportunities in youthful content creation: teenagers’ use of social networking sites for intimacy, privacy and self-expression. New Media Society 10, 393–411 (2008) 31. West, A., Lewis, J., Currie, C.: Students’ Facebook ‘friends’: public and private spheres. Journal of Youth Studies 12, 615–627 (2009)
Contextualised Concerns: The Online Privacy Attitudes of Young Adults
89
32. Boyd, D.: Facebook’s Privacy Trainwreck: Exposure, Invasion, and Social Convergence. Convergence: The International Journal of Research into New Media Technologies 14, 13–20 (2008) 33. Solove, D.J.: Conceptualizing Privacy. California Law Review 90, 1087–1155 (2002) 34. Bezanson, R.P.: The Right to Privacy Revisited: Privacy, News, and Social Change, 18901990. California Law Review 80, 1133–1175 (1992) 35. Allen, A.L.: Uneasy Access: Privacy for Women in a Free Society. Rowman and Littlefield, New Jersey (1988) 36. Tapscott, D.: Growing Up Digital: The Rise of the Net Generation. McGraw-Hill, New York (1998) 37. Howe, N., Strauss, W.: Millennials Rising: The Next Great Generation. Vintage, New York (2000) 38. Prenksy, M.: Digital natives, digital immigrants. On the Horizon 9, 1–6 (2001) 39. Byron, T.: Safer Children in a Digital World: The Report of the Byron Review. DCSF (2008), http://www.dcsf.gov.uk/byronreview/pdfs/Final%20Report% 20Bookmarked.pdf 40. Lenhart, A., Rainie, L., Lewis, O.: Teenage Life Online: The Rise of the Instant Message Generation and the Internet’s Impact on Friendships and Family Relationships (2001), http://www.pewinternet.org/~/media//Files/Reports/2001/PIP_ Teens_Report.pdf.pdf 41. Holstein, J.A., Gubrium, J.F.: The Active Interview. Sage, Newbury Park (1995) 42. Gilbert, N.: Research, Theory and Method. In: Gilbert, N. (ed.) Researching Social Life, pp. 21–40. Sage, London (2008) 43. Mason, J.: Qualitative Researching. Sage, London (2002) 44. Kvale, S.: Interviews: An Introduction to Qualitative Interviewing. Sage, London (1996) 45. Minchin, R.: Postman jailed for child sex abuse. The Independent (2010), http://www.independent.co.uk/news/uk/crime/postman-jailedfor-child-sex-abuse-2089104.html 46. Savill, R.: Father frog-marches internet paedophile to police. Telegraph.co.uk (2010), http://www.telegraph.co.uk/technology/news/8013649/Fatherfrog-marches-internet-paedophile-to-police.html 47. Livingstone, S.: Critical debates in internet studies: reflections on an emerging field. LSE Research Online (2005), http://eprints.lse.ac.uk/1011 48. Facebook Statistics, http://www.facebook.com/press/info.php?statistics 49. Jones, R.: Last.fm Radio Announcement. Last.fm Blog (2009), http://blog.last.fm/2009/03/24/lastfm-radio-announcement
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights Norberto Nuno Gomes de Andrade European University Institute, Law Department, Florence - Italy
Abstract. The purpose of this article is to provide a sound and coherent articulation of the rights to data protection, privacy and identity within the EU legal framework. For this purpose, the paper provides a number of important criteria through which the three different rights in question can be clearly defined, distinguished and articulated. Although intrinsically interrelated, the article draws attention to the importance of keeping the rights and concepts of data protection, privacy and identity explicitly defined and separated. Based on two proposed dichotomies (procedural/substantive and alethic/nonalethic), the paper makes three fundamental arguments: first, there are crucial and underlying distinctions between data protection, privacy and identity that have been overlooked in EU legislation (as well as by the legal doctrine that has analyzed this topic); second, the current data protection legal framework (and its articulation with the concepts of privacy and identity) presents serious lacunae in the fulfilment of its ultimate goal: the protection of the autonomy, dignity and self-determination of the human person; and, third, the right to identity should be explicitly mentioned in the EU Data Protection Directive. Profiling is taken as a case study technology to assert the importance of incorporating the right to identity in the EU data protection framework as well to document its current shortcomings. Keywords: privacy, identity, data protection, EU law, profiling.
1 Introduction The article1 begins by illustrating the apparently harmonious and coherent manner which the concepts and rights of data protection, privacy and identity have been enshrined and implemented in the European Data Protection Legal Framework, namely in its main instrument: the Directive 95/46/EC of the European Parliament and the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data2 (hereafter: “data protection directive”, “DPD” or simply “directive”). The concordant nexus between the concepts of privacy and identity has, 1 2
This paper contains parts of (Andrade, 2011). Directive 95/46/EC of the European parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data: see http://eur-lex.europa.eu/LexUriServ/ LexUriserv.do?uri=CELEX:31995L0046:EN:NOT
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 90–107, 2011. © IFIP International Federation for Information Processing 2011
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
91
moreover, been supported by the legal doctrine that has examined the relationship between these concepts.3 In this respect, many scholars have pursued a line of reasoning that establishes a harmonious relationship between privacy and identity. After depicting the current state-of-art of the legislative and doctrinal frameworks concerning the relationship between these three elements, the article then proceeds to its deconstruction and criticism. Therefore, the paper distinguishes the concepts and rights to data protection, privacy and identity. And it does so in two different steps. Firstly, the paper distinguishes data protection, on the one hand, from privacy and identity, on the other. Such distinction underlines the procedural nature of the former in contrast to the substantive character of the latter. Secondly, and relying upon work previously developed by the author, the paper distinguishes privacy from identity based upon the notion of information and through the so-called alethic criteria. Taking into account the fundamental differences between data protection, privacy and identity, the paper then elaborates on the repercussions of such distinctions. For this purpose, the article looks at the use of profiling technologies, focussing in particular on the case of non-distributive group profiling. By taking into account the regulatory challenges that automated profiling processes pose to the directive, the paper puts forward two main arguments: the failure of the EU data protection legal framework to protect the dignity, autonomy and self-determination of the human person (which are at the base of both the right to privacy and the right to identity) and the need to incorporate the right to identity into the data protection directive.
2 The Data Protection, Privacy and Identity “Triangle” According to the Current EU Legislation The European Union Data Protection legal framework, which is rooted in the data protection directive, presents an apparently harmonious and coherent articulation of the concepts of data protection, privacy and identity. As such, the data protection directive protects the right to privacy by relying upon the notion of identity. In other words, the DPD seeks to achieve privacy protection by regulating the processing of personal data, which is then defined by recourse to the notion of (personal) identity. In what follows, I shall look at how the existing legislation and the legal doctrine currently connect and articulate these three concepts. I will begin by exploring the relationship between privacy and data protection, adding afterwards the identity element. Privacy and data protection are intimately related. The emergence of the first data protection legislations in the early 1970s, as well as their subsequent developments, were and have been aimed at tackling problems generated by new technologies. Within the broad spectrum of problems to be resolved, the application of those data protection regulatory schemes were – to a great extent – motivated by privacy concerns. In fact, one can say that the incessant development and sophistication of data protection legal frameworks across the last decades has taken place as a result of the fact that individuals’ privacy is continuously under threat via increasingly novel means. Poullet describes this phenomenon by distinguishing a series of different 3
(Agre & Rotenberg, 1997), (Hildebrandt, 2006), (Rouvroy, 2008).
92
N.N.G. de Andrade
generations of data protection legislations,4 characterizing them as progressive extensions of the legal protection of privacy. Given this historical background, it comes as no surprise that the underpinning principle of Directive 95/46/EC is the protection of privacy. In fact, the protection of the right to privacy is expressly stated by the EU Data Protection as its main goal. In article 1, the directive states that its objective is: to protect the fundamental rights and freedoms of natural persons and in particular the right to privacy, with regard to the processing of personal data (Emphasis added) In this manner, the directive, without ever defining the term privacy, seeks to protect it by regulating the processing of personal data.5 Thus, the interest and the value of privacy are deemed to be protected and sustained through the underlying mechanical procedures of data protection. Therefore, the directive protects privacy by regulating in detail the conditions through which personal data can be collected, processed, accessed, retained and erased. It is within the definition of personal data, a key concept of the data protection legal framework, that the notion of (personal) identity makes its appearance. Personal data is defined in article 2 of the DPD as: Any information relating to an identified or identifiable natural person; an identifiable person is one who can be identified directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, psychological, mental, economic, cultural or social identity. (Emphasis added) In brief, data protection protects privacy by regulating the processing of personal data. The concept of personal data is defined by recourse to the criteria of identifiability, which is then asserted by reference to factors specific to one’s identity. Two important conclusions emerge from this brief analysis. First, the data protection directive seems to be overly oriented to the protection of the right to privacy, neglecting (at least in its wording) other important rights and interests. Second, the notion of identity assumes only a marginal role in this triangle. Identity, in fact, is enshrined in the directive as a secondary notion, placed in the DPD only to facilitate the definition of the concept of personal data and, as such, to ascertain the applicability of the data protection legal framework. In this way, identity is not seen as a right, interest or value to be protected per se through data protection, as privacy 4
5
While the first generation of legislation encompassed a negative conception of privacy, defined as a right to opacity or to seclusion, protecting one’s intimacy and linked to specific data, places and exchanges; the second generation, which came into being as a result of the disequilibrium of the balance of informational powers between individuals/citizens and administrations/companies, substituted such negative approach with a more positive one (Poullet, 2010). This new approach is grounded on a set of new principles that correspond to today’s data protection principles (such as transparency, legitimacy and the proportionality of the processing of personal data) or, in the United States, to the so-called “fair uses of personal information”. (McCullagh, 2009).
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
93
is, but as a technical criterion that helps to define the concept of personal data. Identity makes its way into the data protection legal framework through the backdoor, as part of the procedural definition of personal data, which – in its turn – is oriented to protect the privacy interests of the data subject. Identity is thus dissolved within the relationship between, and articulation of, privacy and data protection. As a result, the triangle “data protection – privacy – identity” portrayed in the EU legislation is not only a rather static one, but also a profoundly unbalanced one. Broadly, the right to identity can be defined as the right to have the attributes or the facets of personality which are characteristic of or unique to a particular person (such as appearance, name, character, voice, life history, etc) recognized and respected by others. In other words, the right to identity is the right to be different, that is, the right to be unique.6 Returning to the analysis of the relationship between privacy and identity, one should note that the marginal role played by the notion of identity and the harmonious relationship between the concepts and rights of privacy and identity does not only transpire from legislation, but it has also been sustained by the legal doctrine.7 Agre and Rotenberg, in this respect, define the right to privacy as “the freedom from unreasonable constraints on the construction of one’s identity.”8 Such perspective is linked to the rationale of data protection and to the idea that the “control over personal information is control over an aspect of the identity one projects in the world.”9 The link between privacy and the absence of restraints in developing one’s identity has been pursued and reconfirmed by other scholars, such as Rouvroy10 and Hildebrandt.11/12 As we shall see in the following section, this assumed harmonious connection between “data protection – privacy – identity” is, in reality, deeply flawed and problematic. This triangular relationship is, in fact, much more complex and dynamic than the static and straightforward picture that has been depicted by the current legislation and doctrine. In what follows, I shall deconstruct the accepted position that there is a harmonious web that connects these three elements, proposing new ways and criteria through which to distinguish and articulate such concepts. Firstly, I shall tackle the distinction between data protection, on the one hand, and privacy and identity, on the other. For that purpose, a procedural/substantive dichotomy will be used. Secondly, I will distinguish the scope of the right to privacy and the right to identity through an alethic criterion. 6
As we shall see in section 2.2, the right to identity reflects a person's definite and inalienable "interest in the uniqueness of his being" (J. Neethling, Potgieter, & Visser, 1996, p. 39). 7 (Agre & Rotenberg, 1997), (Hildebrandt, 2006), (Rouvroy, 2008). 8 (Agre & Rotenberg, 1997, p. 6). 9 (Agre & Rotenberg, 1997, p. 7). 10 (Rouvroy, 2008). 11 (Hildebrandt, 2006). 12 For a criticism of this conceptualization of privacy, see (Andrade, 2011). In my own view, and as comprehensively developed in the mentioned article, the idea of a right to privacy as “the freedom from unreasonable constraints in developing one’s identity” is reductive and one-sided, capturing only one dimension among the many others that compose the spectrum of the intricate relationships between privacy and identity. Furthermore, such proposed concept of (a right to) privacy blurs itself with the one of identity, assuming an overly broad character, claiming some of the definitional and constitutive characteristics that, in truth, pertain to the concept and to the right to identity (Andrade, 2011).
94
N.N.G. de Andrade
2.1 Data Protection vs. Privacy and Identity A number of studies have been devoted to clarifying the underlying differences between the rights to data protection and privacy, this paper shall focus on those authored by De Hert and Gutwirth.13 These scholars propose an ingenious way in which to illustrate the differences in scope, rationale and logic between these two rights. They characterize privacy as a “tool of opacity” and data protection as a “tool of transparency.” In connecting the invention and elaboration of these legal tools to the development of the democratic constitutional state and its principles, the above mentioned authors state that: “the development of the democratic constitutional state has led to the invention and elaboration of two complementary sorts of legal tools which both aim at the same end, namely the control and limitation of power. We make a distinction between on the one hand tools that tend to guarantee non-interference in individual matters or the opacity of the individual, and on the other, tools that tend to guarantee the transparency/accountability of the powerful”14 In developing the fundamental differences between these tools, the authors explain that: “The tools of opacity are quite different in nature from the tools of transparency. Opacity tools embody normative choices about the limits of power; transparency tools come into play after these normative choices have been made in order still to channel the normatively accepted exercise of power. While the latter are thus directed towards the control and channelling of legitimate uses of power, the former are protecting the citizens against illegitimate and excessive uses of power.”15 In this way, privacy, as an opacity tool, is designed to ensure non-interference in individual matters, creating a personal zone of non-intrusion. Along these lines, privacy is defined in negative terms,16 protecting individuals against interference in their autonomy by governments and by private actors. Such protection is enacted through prohibition rules, delimiting the personal and private sphere that is to be excluded from those actors’ scope and range of intervention.
13
(De Hert & Gutwirth, 2003); (De Hert & Gutwirth, 2006). (De Hert & Gutwirth, 2006, pp. 66-67). 15 (De Hert & Gutwirth, 2006, p. 66). 16 Although De Hert and Gutwirth define privacy as an opacity tool in negative terms, i.e. as rules which prohibit certain acts, the authors also allude to the positive roles of privacy. Regarding the latter, the authors state that “[p]rivacy protects the fundamental political value of a democratic constitutional state as it guarantees individuals their freedom of selfdetermination, their right to be different and their autonomy to engage in relationships, their freedom of choice, their autonomy as regards – for example - their sexuality, health, personality building, social appearance and behaviour, and so on. It guarantees each person’s uniqueness...” (De Hert & Gutwirth, 2006, p. 72) Nonetheless, and in my view, this positive function of privacy renders the definition of the term too large and overstretched, invading the domains of other specific rights, namely the right to identity. 14
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
95
Further to this prohibitive feature, the authors also characterize opacity tools as collective and normative in nature. The implementation of these tools, as such, requires a delicate balance of interests with other rights, whose application may supersede the need for individual consent when important societal interests are at stake. In this respect, privacy (as well as identity, I would add) is “a relational, contextual and per se social notion which only acquires substance when it clashes with other private or public interests.”17 Data protection, on the other hand, is defined as a “tool of transparency.” In this way, data protection is described as “a catch-all term for a series of ideas with regard to the processing of personal data. Through the application of these ideas, governments try to reconcile fundamental but conflicting values such as privacy, free flow of information, governmental need for surveillance and taxation, etc.”18 In addition, data protection, contrarily to privacy, has a different rationale. It is not prohibitive by nature. Instead, it operates under the natural presumption that personal information is, in principle, allowed to be processed and used. In this respect, data protection is pragmatic in nature, recognizing that – under democratic principles and for societal reasons – both private and public actors need to be able to process personal information. In this sense, the right to data protection could also be called the right to data processing, as it enables public and private entities to collect and use personal information. Such collection and use are, nonetheless, subject to conditions, procedures, limitations and exceptions. Accordingly, and as De Hert and Gutwirth put it, “[d]ata protection laws were precisely enacted not to prohibit, but to channel power, viz. to promote meaningful public accountability, and provide data subjects with an opportunity to contest inaccurate or abusive record holding practices.”19 Bearing in mind the societal need to collect, store and process data, along with the relative ease through which entities collecting such data can abuse power and infringe privacy, data protection seems to assume an administrative role. In fact, and as Blume notes, this is one of the functions of traditional administrative law that has been extended to data protection law.20 Similarly to administrative law, data protection also regulates the activities of other institutions and entities. In the case of data protection, the focus is not only on administrative agencies of government, but on "natural or legal person, public authority, agency or any other body which alone or jointly with others determines the purposes and means of the processing of personal data."21
17
(De Hert & Gutwirth, 2006, p. 75). The relational and contextual character of the right to privacy, which has been derived from article 8 of the European Convention on Human Rights (ECHR), the right to respect for private and family life, is evident in the wording of article 8.2. Such article, in this respect, is an excellent example of how the respect for privacy is not absolute and can be restricted by other interests, namely by “the interests of national security, public safety or the economic well-being of the country, for the prevention of disorder or crime, for the protection of health or morals, or for the protection of the rights and freedoms of others”. 18 (De Hert & Gutwirth, 2006, p. 77). 19 (De Hert & Gutwirth, 2006, p. 77). 20 (Blume, 1998). 21 Directive 95/46/EC, article 2(d).
96
N.N.G. de Andrade
Despite their differences, the tools of opacity and transparency do not exclude each other. On the contrary, “[e]ach tool supplements and pre-supposes the other.”22 The quality of a legal framework depends on the adequate blending of the two approaches, that is, on the balance between a privacy-opacity approach (prohibitive rules that limit power) and a data protection-transparency approach (regulations that channel power).23 In this way, “[a] blend of the two approaches will generally be preferable, since a solid legal framework should be both flexible (transparency) and firmly anchored in intelligible normative approaches (opacity).”24 As a result of these observations, a crucial distinction can be made between data protection, on the one hand, and privacy and identity on the other. Data protection is procedural, while privacy and identity are substantive rights. While substantive rights are created in order to ensure the protection and promotion of interests that the human individual and society consider important to defend and uphold, procedural rights operate at a different level, setting the rules, methods and conditions through which those substantive rights are effectively enforced and protected. Privacy and identity, as substantive rights, represent specific interests of the human personality and presuppose the making of normative choices. Those rights and interests (such as, among others, freedom of expression or security) are often in conflict, a fact which requires them to be balanced and measured against each other. It is through the weighing and balancing of these (conflicting) interests and rights that, in the case of privacy, certain intrusions to one’s private sphere are deemed to be necessary and acceptable, while others not. In the case of privacy as an opacity tool, its substantive character is reflected in the normative choice and interpretation required to determine what is to be deemed so essentially individual that it must be shielded against public and private interference. Such normative choices, interpretative exercises and balancing processes are exclusive to substantive rights. Procedural rights, on the other hand, only appear at a later stage. It is only after the weighing and balancing of the substantive interests and rights in question that procedural rights come into play, laying out the legal conditions and procedures through which those substantive rights are to be effectively enforced. In other words, procedural rights lay out the conditions through which substantive rights are to be articulated. Procedural conditions, such as the ones concerning transparency, accessibility and 22
(De Hert & Gutwirth, 2006, p. 94). In spite of such necessary and welcoming ‘mix’ of approaches, these tools should not be blurred. In fact, De Hert and Gutwirth call the attention to the importance of not blurring this distinction, as each tool has its proper logic. As an example of the perils that such blurring may cause, the scholars turn their attention to European human rights law and to what they call the “danger of proceduralization”, focussing on the article 8 of the European Convention of Human Rights (ECHR). This legal disposition, due to the interpretation made by the European Court of Human Rights in Strasbourg, is shifting from a prohibitive and opacity logic to a channelling one, becoming a transparency-promoting vehicle (De Hert & Gutwirth, 2006, p. 87). The problem we have here is, in brief, the construction of substantive norms through elements of procedural rights. As a result, “[t]he transformation of Article 8 into a source of procedural rights and procedural conditions takes it away from the job it was designed for, viz. to prohibit unreasonable exercises of power and to create zones of opacity” (De Hert & Gutwirth, 2006, p. 91). 24 (De Hert & Gutwirth, 2006, p. 95). 23
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
97
proportionality, function as indispensable conditions for the articulation and coordination between different interests and rights. The data protection directive is an excellent example of such procedural exercise. In order to conciliate the right to privacy, on the one hand, and the free flow of information within the internal market, the directive furnishes a number of procedural guidelines and principles through which to attain such balance.25 Such procedural conditions, as such, also operate as legitimate restrictions for substantive rights enshrined in the directive. As a result, and contrary to privacy, data protection is inherently formal and procedural. It is structured and shaped according to the interests and values of other substantive rights and legitimate interests, emerging as a result of the clashes between such different rights and interests. Thereby, “[t]he main aims of data protection consist in providing various specific procedural safeguards to protect individual’s privacy and in promoting accountability by government and private record holders”26 (emphasis added). As such, the goal of data protection is to ensure that personal data is processed in ways that respect or, at least, do not infringe other rights. Or, to put it in a positive way, data protection only exists to serve and pursue the interests and values of other rights. In other words, data protection does not directly represent any value or interest per se, it prescribes the procedures27 and methods for pursuing the respect for values embodied in other rights (such as the right to privacy, identity, freedom of expression, freedom and free flow of information, etc), ensuring their articulation and enforcement. As Poullet clearly states “[d]ata protection is only a tool at the service of our dignity and liberties and not a value as such.”28 One of the important conclusions to derive from this analysis is that data protection, on the one hand, and privacy on the other, do not fit perfectly into each other. There are important mismatches that need to be acknowledged and underlined.
25
These basic principles are summarized in article 6 of the Directive, and include the requirements that personal data must be: (a) processed fairly and lawfully; (b) collected for specific, explicit and legitimate purposes and not further processed in a way incompatible with those purposes. Further processing of data for historical, statistical or scientific purposes shall not be considered as incompatible provided that member States provide appropriate safeguards; (c) adequate, relevant and not excessive in relation to the purposes for which they are collected and/or further processed; (d) accurate and, where necessary, kept up to data; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified; (e) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the data were collected or for which they are further processed. Member States shall lay down appropriate safeguards for personal data stored for longer periods for historical, statistical or scientific use. 26 (De Hert & Gutwirth, 2006, p. 77). 27 In this point, and as De Hert and Gutwirth observe, “[t]he sheer wordings of the data protection principles (the fairness principle, the openness principle and the accountability principle, the individual participation principle, …) already suggest heavy reliance on notions of procedural justice rather than normative (or substantive) justice” (De Hert & Gutwirth, 2006, p. 78). 28 (Poullet, 2010, p. 9).
98
N.N.G. de Andrade
“Data protection explicitly protects values that are not at the core of privacy.”29 This is the case of the requirements of fair processing, consent or legitimacy, which pertain to the specific procedural nature and justice associated with data protection. This is also the case of the protection of rights and liberties such as the freedom of religion, freedom of conscience and the political freedoms. Such rights and liberties are, in effect, protected by the directive through the special regime for “sensitive data,” which prohibits the processing of data relating to racial or ethnic origin, political opinions, religious or philosophical beliefs, etc. Data protection protects the value and interest of privacy as it protects the value and interest of identity, security and freedom of information, among others. They do not always overlap. In this respect, data protection is both larger and more restricted than privacy (and vice versa). The autonomy and difference between data protection and privacy has, moreover, been acknowledged by the Charter of Fundamental Rights of the European Union which, with the entry into force of the Lisbon Treaty, was given legal binding effect equal to the Treaties. In this way, article 830 of the EU Charter now establishes data protection as a separate and autonomous right, distinct from the right to privacy (which is enshrined in article 7). Furthermore, and looking not at the EU level but at the individual member states, the intricate link between data protection and privacy is not always a given. While Belgium, for instance, has always linked data protection to privacy, France and Germany have based their rights to data protection on the right to liberty and on the right to dignity, respectively. The constitutions of those countries, presenting no explicit right to privacy, have nonetheless provided consolidated legal grounds on which to derive and recognize their data protection rights.31 In the same way, the United States have not followed the right to privacy as the legal anchor for their data protection regulation, but have based the latter in public law, namely through the socalled fair information practices.32 All of these facts clearly show that there are several and clearly distinct bases upon which to ground the right to data protection, rendering erroneous the reduction of the latter to a unique dimension of privacy. This demonstrates, in addition, that data protection is an instrument protecting several different values and interests, and that no specific advantage is gained by linking it solely to privacy. Returning now to the triangle, it is important to remember at this point that data protection can and should be clearly distinguished from privacy and identity. The former is a procedural right while the latter are substantive ones. The next section delves into the substantive nature of the rights to identity and privacy, proposing a criterion through which to distinguish them.
29
(De Hert & Gutwirth, 2006, p. 81). The article, entitled "Protection of personal data", states that "Everyone has the right to the protection of personal data concerning him or her". 31 The diversity of approaches followed by different EU member states in the legal anchoring of their respective data protection regulations also constitutes a strong reason supporting the recognition of a constitutional right to data protection in the EU Charter, different and separate from the one of privacy. Such recognition is, in fact, “more respectful of the different European constitutional traditions” (De Hert & Gutwirth, 2006, p. 81). 32 (De Hert & Gutwirth, 2006, p. 82). 30
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
99
2.2 Privacy vs. Identity33 The right to privacy and the right to identity share the same DNA. They are both part of a larger set of rights called personality rights34 and, as such, they both derive from the fundamental rights to dignity and self-determination. Hence, they both reflect the dignity interest that all of us possess. Contrary to the right to data protection, the rights to privacy and identity are not procedural but substantive rights. They embed particular values and, as such, protect specific interests of the human personality. Regarding their distinction, only a very restricted number of works have touched upon the underlying differences between these two interests and rights.35 The tendency, as mentioned before, has been to associate the right to privacy with the value and interest of identity. In addition, the assumption that the right to privacy equates to “the freedom from unreasonable constraints on the construction of one’s identity”36 has remained unquestioned and undisputed. It is as if privacy is the presupposition of identity and identity is the consequence of privacy. From an informational perspective, these two concepts also seem to be tied together. As such, the fact that privacy tends to encompass information intimately connected to one’s identity has led to the idea that “privacy protects the right of an individual to control information that is intrinsically linked to his or her identity.”37 Following such perspective, privacy and identity seem to act as collaborating partners, defining and contextualizing the type of information that is closely attached to a given person, endowing him or her with the right to exert control over such information. While this view is not incorrect per se, it is limited and short sighted. Despite their common history and background, identity and privacy - as rights protect different interests. Identity, as an interest of personality, can be defined as a “person's uniqueness or individuality which defines or individualises him as a particular person and thus distinguishes him from others.”38 In this account, identity is manifested in various indicia by which that particular person can be recognized. Such 33
34
35 36 37 38
This section includes parts of (Andrade, 2011). Where it is argued that the overly broad definition of privacy has undermined the concept of, and therefore the right to, identity. The relentless inflationary trend in the conceptualization of the right to privacy is presented as the main reason behind the need to articulate in a coherent manner the right to privacy and identity. The article, moreover, specifies the main (and often overlooked) differences between the right to privacy and identity, describing in detail how each of them relate to a different interest of the right to personality. Following Neethling’s study of this particular category of rights: “[t]here is general consensus that personality rights are private law (subjective) rights which are by nature nonpatrimonial and highly personal in the sense that they cannot exist independently of a person since they are inseparably bound up with his personality. From the highly personal and patrimonial nature of personality rights it is possible to deduce their juridical characteristics: they are non-transferable; unhereditable; incapable of being relinquished or attached; they cannot prescribe; and they come into existence with the birth and are terminated by death of a human being.”(Johann Neethling, 2005, p. 223). See (J. Neethling, et al., 1996); (Sullivan, 2008); (Pino, 2000). (Agre & Rotenberg, 1997, p. 6). (Boussard, 2009, p. 252). (Johann Neethling, 2005, p. 234).
100
N.N.G. de Andrade
indicia, in other words, amount to the facets of a person’s personality which are characteristic or unique to him or her, such as their life history, character, name, creditworthiness, voice, handwriting, appearance (physical image), and so on.39 As a result, the right to identity reflects a person’s definite and inalienable “interest in the uniqueness of his being.”40 According to such conceptualization, a person's identity is infringed if any of these indicia are used without authorization in ways which cannot be reconciled with the identity one wishes to convey. In order to clearly differentiate the right to privacy from the one of identity, I defend a more delimited conceptualization of the former.41 I thus argue against the trend of over-stretching the definition and scope of the right to privacy.42 Following such delimited conceptualization, the right to privacy protects an interest that has been defined as a “personal condition of life characterised by seclusion from, and therefore absence of acquaintance by, the public.”43 In these terms, privacy can only be breached when third parties become acquainted with one’s true private facts or affairs without authorization. As we shall see in the following, this distinction bears important consequences once transposed to an informational science dimension and applied to the current EU data protection legal framework. As we shall see, such a distinction helps to clarify the articulation between the rights to privacy and identity within the data protection legal framework, as well as to better understand and interpret the concept of personal data. 2.2.1 Identity and Privacy Distinguished through an Alethic Criterion Based upon the different interests of personality pursued by the rights to privacy and identity, and bearing in mind their distinction in terms of harmful breach, the following part of this article seeks to provide a new angle through which to distinguish these two rights. Such distinction is based on the different type of information that each of these rights protect. As briefly mentioned in the previous section, the right to identity is infringed if person A makes use of person B’s identity indicia in a way contrary to how that person B perceives his or her identity. This will happen, for instance, when person B’s 39
(Johann Neethling, 2005, p. 234). (J. Neethling, et al., 1996, p. 39). 41 This paper, following the same line of thought I have developed in previous works, advocates a more restricted conceptualization of the term privacy. Hence, I lean towards an understanding of privacy along the lines of the classical definition given by Warren and Brandeis, that is, as a "right to be let alone" (Warren & Brandeis, 1890). In this way, I envisage a more negative configuration of privacy, conceptualizing the latter as a right to seclusion. Thereby, and as I shall develop in the following sections, I associate privacy with the control over truthful information regarding oneself, and not with generalist and overstretched understandings of privacy as freedom, self-determination and personality building. 42 Among the many meanings and purposes that have been attached to the term, the right to privacy has been understood, for example, as providing the conditions to plan and make choices concerning one’s private life, as well as forbidding the distortion of one’s image. The concept of privacy has also encompassed the freedom of thought, the control over one’s body, the misapprehension of one’s identity and the protection of one’s reputation (among other aspects). 43 (Johann Neethling, 2005, p. 233). 40
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
101
identity is falsified or when an erroneous image of his or her personality is conveyed. The right to privacy, on the contrary, is only infringed if true private facts related to a person are revealed to the public.44 Neethling summarizes the distinction between identity and privacy in the following manner: “[i]n contrast to identity, privacy is not infringed by the untrue or false use of the indicia of identity, but through an acquaintance with (true) personal facts regarding the holder of the right contrary to his determination and will.”45 In this regard, it is important to stress that while the right to identity concerns all of those personal facts - regardless of being truthful or not – which are capable of falsifying or transmitting a wrong image of one’s identity, the right to privacy comprises only those true personal facts that are part of one’s private sphere and which, by one reason or the other, spill over to the public sphere.46 Applying such important findings to the notion of information and within the context of data protection, I propose a criterion that distinguishes two different kinds of personal information, one concerning privacy interests, and the other related to identity ones. This criterion, which I have termed as ‘alethic criterion’ (from αλήθεια [aletheia]: the Greek word for truth), differentiates between personal information that is truthful and objective from that which is not (or, at least, not necessarily). In this way, and according to such criteria, it is argued that only personal information that qualifies alethically (in which there is a correspondence between the concept of personal data and the set of true and objective facts or acts related to the data subject) shall be protected under the right to privacy, whereas personal information that is not necessarily truthful (or that is false or de-contextualized) shall be covered by the right to identity. In other words, it is based upon whether personal information represents or conveys a truth or a non-truth (depending on whether it has an alethic value or not) that the processing of personal data will be deemed relevant to identity or privacy (purposes). It is on the basis of this proposed distinction that I shall develop, in the following section, two important arguments. First, I shall sustain that the current data protection legal framework (and its articulation with the concepts of privacy and identity) presents serious lacunae in the fulfilment of its ultimate goal: the protection of the autonomy, dignity and self-determination of the human person. Second, I shall argue that the right to identity should be explicitly mentioned in the EU Data Protection Directive.
3 The Lacunae of the Current Data Protection Legal Framework As I have observed earlier on, the data protection directive is (at least in its wording) overly oriented to the protection of privacy, downgrading identity to a technical 44
45 46
In this respect, Pino affirms that: “[t]he first feature of the right to personal identity is that its protection can be invoked only if a false representation of the personality has been offered to the public eye. This feature makes it possible to distinguish the right to personal identity from both reputation and privacy” (Pino, 2000, p. 11). (Johann Neethling, 2005). This particular conceptualization, furthermore, corresponds to the notion of privacy advocated by writers such as Archard, who defines privacy as “limited access to personal information”, that is, “the set of true facts that uniquely defines each and every individual” (Archard, 2006, p. 16).
102
N.N.G. de Andrade
component of the definition of personal data. As a consequence, and looking at the principles that are at the core of both the right to privacy and identity, it can be observed that the directive, in almost exclusive terms, tends to protect the individual’s dignity and self-determination from an exclusive privacy point of view. The data protection directive protects privacy through the regulation of the processing of personal data, operating on the basis of an identification procedure. Hence, the rules of data protection will only be applicable if the processing of data allows for the data subject to be identified. Such a construction, as I attempt to demonstrate in the following sections, is deficient and inadequate, failing to protect the individual’s autonomy and self-determination when other important personality interests are at stake (namely his or her identity interests). Such failure is particularly evident in the case of profiling technologies. 3.1 Profiling Technologies In terms of definition, “the term profiling is used to refer to a set of technologies that share at least one common characteristic: the use of algorithms or other mathematical (computer) techniques to create, discover or construct knowledge out of huge sets of data.”47 In a more technical fashion, profiling can be defined as: “the process of ‘discovering’ patterns in databases that can be used to identify or represent a human or nonhuman subject (individual or group) and / or the application of profiles (sets of correlated data) to individuate and represent an individual subject or to identify a subject as a member of a group”48 In the case of profiles on human subjects, they can be defined as digital representations49 that refer to unknown or potential individuals instead of to a known individual. As such, the concerned individuals are not identified in those profiling practices.50 Taking into account the several distinctions and categorizations that can be made within the general process of profiling: individual or group, direct or indirect, distributive or non-distributive,51 we will use as a case-study the most problematic one, that is, group profiling of a non-distributive type. This type of profiling is particularly challenging as a non-distributive profile identifies a group of which not all members share the same characteristics.52 As such, the link between non-distributive group profiles and the persons to whom it may be applied is opaque.53 In other words, this specific type of profiling represents a group and reveals attributes that may (or may not)
47 48 49
50 51 52 53
(Hildebrandt, 2009, p. 275). (Hildebrandt & Gutwirth, 2008, p. 19). They are not the only modalities of digital representations. For an analysis of the commonalities and differences between profiles and digital personae as both forms of digital representations, as well as the implications of such distinction to the current data protection directive, see (Roosendaal, 2010). (Roosendaal, 2010, p. 235). For a concise explanation of such distinctions, see (Hildebrandt, 2009, pp. 275-278). (Hildebrandt, 2009). (Leenes, 2008).
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
103
be applicable to the individuals in such group. Accordingly, the profile is not inferred from the personal data of the categorized person but inferred from a large amount of often anonymized data relating to many other people. In this way, one of the major risks linked to these profiling practices lies in the fact that “the process results in attributing certain characteristics to an individual derived from the probability (dogma of statistical truth) that he or she belongs to a group and not from data communicated or collected about him or her.”54 This is problematic in the sense that the processing of this data is not covered by data protection regulations. Besides the fact that these profiles are built without the awareness of the subject, who has no means to influence how the data set is used to make decisions that will affect him or her, there is no direct connection between the profile and the individual. This means, consequently, that the data corresponding to such profile does not qualify as personal data. In short, data protection is not applicable as tool of protection in these cases, and non-distributive group profiles are excluded from the scope of the data protection legal framework. Therefore, individuals, in this case, are not only influenced by decisions taken on the basis of such profiles, but are also prevented from making use of the rights given by the data protection directive to protect them. This is the type of situation where there are serious lacunae the legal framework of data protection55 and where the right to identity may prove to be very useful. In my view, the lessening of one’s autonomy and self-determination caused by such profiles (namely by the way they influence how one’s identity is represented and projected) cannot be tackled by combining provisions regarding data protection, privacy and personal data. This is, in my opinion, a case to be solved by the right to identity and by the application of the data protection directive to non-personal data. The problem we have here is that some types of digital representations cannot be connected to a specific individual person. This fact renders the information in question non-personal data, which – consequently – precludes the applicability of the DPD. Nevertheless, the data sets constituting such profiles are used to make decisions that affect the individual person. In order to suppress such a legal gap, which cannot be resolved through the privacy-identification paradigm of the current data protection directive, we need to turn our attention to the right to identity. In what follows, I thus defend the explicit recognition of the right to identity in the data protection framework.
4 Inserting the Right to Identity in the EU Data Protection Directive The use of profiling technologies seen in the previous section does not raise a question of privacy, but of identity. In fact, the application of such technology does not involve 54 55
(Poullet, 2010, p. 16). In this regard, I am not defending an overall application of the DPD to all data processing, regardless of being personal or not (as that, as Roosendall observes, “might have major, probably undesirable, consequences for the way industry and commerce are organized”).
104
N.N.G. de Andrade
the disclosure of true facts regarding the data subject. That technology, on the contrary, involves the processing of information which may not necessary be truthful (or which may even be false). Despite not being retraceable to the individual in the "shape" of personal information, the processing of such information still affects the targeted person, infringing her right not to be misrepresented, that is, her right to identity. In the sense that group profiles are used to infer preferences, habits or other characteristics that the profiled person may be found to have (or not), they do not convey a necessary truecondition, presenting instead the possibility of misrepresenting the profiled individual. Thereby, they should be covered by the right to identity. Taking into account the rationale of the right to identity, I argue for its explicit inclusion in the data protection framework.56 In this way, the DPD could be interpreted in the light of the right to identity and therefore also regulate the processing of types of non-personal data57 involved in the construction of non-distributive group profiles.58 In the present state of affairs, the data protection directive is based upon the concept of privacy and constructed under a logic of identification. As such, the directive is only applicable if it processes data that allows for a specific person to be identified. In so doing, the DPD neglects the concept of identity and the logic of representation. According to the latter, what is becoming increasingly important is how data and information are being used to represent someone, and not to merely identify him or her. In other words, the issues raised by the processing of personal information cannot only be about disclosing information involving someone's privacy, but also of using such information to construct and represent someone else's identity. In addition, it is also important to note that the enshrinement of the right to identity in the data protection directive is not only justified in light of the need to cover the processing of non-personal data in the case of group profiles, but also (and primarily) in light of the need to process personal data in accordance with its most recent understanding. Taking into account the proposed alethic criterion, according to which identity concerns true facts while privacy deals with not-necessarily true data (false or decontextualized), and bearing in mind that the concept of personal data, enshrined in the DPD, is currently understood as encompassing any information relating to a
56
57
58
The Data Protection directive mentions the terms "privacy" and "right to privacy" thirteen times, while the term "identity" is only mentioned three times (and either as part of the technical definition of personal data, or as part of the information related to the data subject that data controllers are oblige to provide the latter with [namely the identity of the controller]). The term "right to identity" is never mentioned in the directive. The protection of privacy regardless of personal data being processed or not is, in fact, a trend that can already be observed in EU legislation. This is the case with the E-Privacy Directive and its recent revision. Poullet, in this respect, cites recital 24 (which suggests a comparison between the terminal equipment of a user and a private sphere similar to the domicile), qualifying it as a provision that “clearly focuses on protection against intrusion mechanisms irrespective of the fact that personal data are processed or not through these mechanisms” (Poullet, 2010, p. 25). Despite not supporting it through a right to identity justification, Roosendaal hints at the same solution. "The key issue is that individuals are affected, even when their names are not known. Because the decisions are applied to individuals, perhaps even without processing personal data in a strict sense, the DPD should apply" (Roosendaal, 2010, p. 234).
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
105
person, regardless of being objective or subjective, true or false,59 it is possible to arrive at the following conclusion: a large portion of personal data currently being processed concerns a person’s identity, and not necessarily his or her privacy. Moreover, this means that the rules on the protection of personal data (defined as any information, truthful or not, relating to an identified or identifiable person) go clearly beyond the protection of privacy, covering also the protection (and promotion) of one’s identity. Therefore, and taking into account the need to protect the individual human person from a representative perspective, and not only from an identifiability standpoint, I argue that it is through the enforcement of both rights to identity and privacy through data protection rules that a more solid and complete protection of an individual’s autonomy, dignity and self-determination can be achieved.
5 Conclusion In this article I have distinguished and articulated the rights to data protection, privacy and identity. In this respect, I have deconstructed and criticized the allegedly harmonious approach through which this triangle of concepts has been depicted in EU legislation and by the legal literature. According to such perspective, data protection seeks to achieve privacy protection by regulating the processing of personal data, which is then defined by recourse to the notion of (personal) identity. Privacy, in this line of reasoning, is then defined as the absence of constraints in constructing one's identity. One of the main conclusions that can be extracted from this brief analysis is that privacy, identity and data protection are clearly undermined if understood as being simple, straightforward and harmonious. In this respect, data protection and privacy do not equate to one another. Data protection does not confine itself solely to the purposes of privacy and the value of privacy is far broader than the mere control of personal data.60 As Poullet remarks, “[p]rivacy is an issue which goes well beyond data protection.”61 Furthermore, the conceptualization of privacy as the absence of obstacles in constructing identity conflates and blurs the concepts of privacy and identity, overstretching the former and understating the latter.
59
In Opinion 4/2007, the Article 29 Data Protection Working Party (Art. 29 WP) advanced the following broad definition of personal data: “From the point of view of the nature of information, the concept of personal data includes any sort of statements about a person. It covers “objective” information, such as the presence of a certain substance in one’s blood. It also includes “subjective” information, opinion or assessments” (Art.29 Data Protection WP, 2007, p. 6). Furthermore, Art. 29 WP stated explicitly that “[f]or information to be ‘personal data’, it is not necessary that it be true or proven” (Art.29 Data Protection WP, 2007, p. 6). 60 For a criticism of the idea of privacy protection through control of personal data processing, see (McCullagh, 2009). 61 (Poullet, 2010, p. 17). Poullet, in fact, presents the E-Privacy Directive (Directive 2002/58/EC concerning the processing of personal data and the protection of privacy in the electronic communications sector), and its recent revision, as an example of how privacy can go beyond the parameters of data protection and favour the emergence of new principles that do not find a correspondence in the EU Directive 95/46/EC.
106
N.N.G. de Andrade
In order to grasp their underlying differences in scope, nature and rationale, I distinguished data protection, on one side, from privacy and identity, on the other, qualifying one as procedural and the others as substantive. In addition, I differentiated privacy and identity through an alethic criterion, allocating the protection of truthful information to the right to privacy and the not-necessarily truthful information to the right to identity. Such criterion was then tested through the analysis of profiling technologies. I thus examined the impact of automated decisions upon individuals made on the basis of non-distributive group profiles. By acknowledging these crucial differences between data protection - privacy – identity, and by considering how individuals can be affected by decisions taken on the basis of such profiling practices, I then formulated two important conclusions: the data protection framework presents serious lacunae (1) and, therefore, needs to explicitly recognise the right to identity (2). Regarding the lacunae, I stressed that the current data protection directive should also operate with the concept of identity and under a logic of representation, and not only with the notion of privacy and through an identification rationale. Consequently, in order to enlarge its modus operandi, it is absolutely vital that data protection directive recognizes identity not as a technical term which is part of the definition of personal data, but as a value and interest per se, that is, as an explicit and independent right. The recognition of identity as an interest and right is of utmost importance in order to attain and consolidate a complete and flawless protection of human individual autonomy and self determination. Thereby, I argue that a clear and sound distinction between the rights to data protection, privacy and identity is absolutely crucial. And that is so not only for the sake of the coherence and operability of the legal system, but especially for the sake of attaining a comprehensive and solid protection of all the different aspects related to an individual’s personality.
References Agre, P., Rotenberg, M.: Technology and privacy: the new landscape. MIT Press, Cambridge (1997) Andrade, N.N.G.d.: The Right to Privacy and the Right to Identity in the Age of Ubiquitous Computing: Friends or Foes? A Proposal towards a Legal Articulation. In: Akrivopoulou, C., Psygkas, A. (eds.) Personal Data Privacy and Protection in a Surveillance Era: Technologies and Practices: Information Science Publishing (2011) Archard, D.: The Value of Privacy. In: Claes, E., Duff, A., Gutwirth, S. (eds.) Privacy and the Criminal Law, pp. 13–31 (2006) Art.29 Data Protection WP 2007, Opinion 4/2007 on the concept of personal data, pp. 1–26 (2007) Blume, P.: The Citizens’ Data Protection. The Journal of Information, Law and Technology (1) (1998) Boussard, H.: Individual Human Rights in Genetic Research: Blurring the Line between Collective and Individual Interests. In: Murphy, T. (ed.) New Technologies and Human Rights. Oxford University Press, New York (2009)
Data Protection, Privacy and Identity: Distinguishing Concepts and Articulating Rights
107
De Hert, P., Gutwirth, S.: Making sense of privacy and data protection. A prospective overview in the light of the future of identity, location based servics and the virtual residence. In Institute for Prospective Technological Studies - Joint Research Centre, Security and Privacy for the Citizen in the Post-September 11 Digital Age. A prospective overview, Report to the European Parliament Committee on Citizens’ Freedoms and Rights, Justice and Home Affairs (LIBE), IPTS-Technical Report Series, EUR 20823 EN, pp. 111–162 (2003) De Hert, P., Gutwirth, S.: Privacy, Data Protection and Law Enforcement. Opacity of the Individual and Transparency of Power. In: Claes, E., Duff, A., Gutwirth, S. (eds.) Privacy and the Criminal Law, pp. 61–104. Intersentia, Antwerp (2006) Hildebrandt, M.: Privacy and Identity. In: Claes, E., Duff, A., Gutwirth, S. (eds.) Privacy and the Criminal Law, p. 199. Intersentia, Hart Pub., Antwerpen, Oxford (2006) Hildebrandt, M.: Profiling and AmI. In: Rannenberg, K., Royer, D., Deuker, A. (eds.) The Future of Identity in the Information Society: Challenges and Opportunities, pp. 273–313. Springer, Berlin (2009) Hildebrandt, M., Gutwirth, S.: Profiling the European citizen: cross-disciplinary perspectives. Springer, New York (2008) Leenes, R.E.: Regulating Profiling in a Democratic Constitutional State. Reply: Addressing the Obscurity of Data Clouds. In: Hildebrandt, M., Gutwirth, S. (eds.) Profiling the European Citizen: Cross-disciplinary Perspectives, pp. 293–300. Springer, New York (2008) McCullagh, K.: Protecting ’privacy’ through control of ’personal’ data processing: A flawed approach. International Review of Law, Computers & Technology 23(1), 13–24 (2009) Neethling, J.: Personality rights: a comparative overview. Comparative and International Law Journal of Southern Africa 38(2), 210–245 (2005) Neethling, J., Potgieter, J.M., Visser, P.J.: Neethling’s law of personality. Butterworths, Durban (1996) Pino, G.: The Right to Personal Identity in Italian Private Law: Constitutional Interpretation and Judge-Made Rights. In: Van Hoecke, M., Ost, F. (eds.) The Harmonization of Private Law in Europe, pp. 225–237. Hart Publishing, Oxford (2000) Poullet, Y.: About the E-Privacy Directive: Towards a Third Generation of Data Protection Legislation? In: Gutwirth, S., Poullet, Y., de Hert, P. (eds.) Data Protection in a Profiled World, pp. 3–30. Springer Science+Business Media B.V. (2010) Roosendaal, A.: Digital Personae and Profiles as Representations of Individuals. In: Bezzi, M., Duquenoy, P., Fischer-Hübner, S., Hansen, M., Zhang, G. (eds.) Privacy and Identity Management for Life. IFIP AICT, vol. 320, pp. 226–236. Springer, Heidelberg (2010) Rouvroy, A.: Privacy, Data Protection, and the Unprecedented Challenges of Ambient Intelligence. Studies in Ethics, Law, and Technology 2(1), 51 (2008) Sullivan, C.: Privacy or Identity? Int. J. Intellectual Property Management 2(3), 289–324 (2008) Warren, S.D., Brandeis, L.D.: The Right to Privacy. Harvard Law Review 4(5), 193–220 (1890)
Oops - We Didn’t Mean to Do That! -- How Unintended Consequences Can Hijack Good Privacy and Security Policies Thomas P. Keenan Faculty of Environmental Design and Department of Computer Science University of Calgary
[email protected] Abstract. All privacy laws, security policies, and even individual actions are subject to an often-forgotten factor – the “Law of Unintended Consequences” (LUC.) Yet LUC is not a “law” in the sense of appearing in the Criminal Code, nor is it a Law of Nature like gravity. It is actually a manifestation of our inadequate efforts at foresight, and there are things we can do to counteract it. This paper identifies classes of factors which have lead to unintended consequences in the privacy and computer security domains, though the list is by no means exhaustive. It is primarily intended to inspire further thinking and research. We clearly need to make a stronger effort to “foresee the unforeseeable” or at least “expect the unexpected” to maintain public confidence in technological systems. The disciplines of strategic foresight and automated policy analysis may prove useful in attaining this goal. Keywords: Unintended consequences, technology policy, privacy, security, hacking, strategic foresight, policy analysis.
1 Introduction: The Unintended Consequences Problem Consider these fictitious news headlines, which are based on real-life cases: • • • • •
Canada’s Levy on Blank Media Triggers Surge in Cross-Border Shopping US Patient Privacy Law Makes it Impossible for Hospitals to Defend Themselves Web Browser Flaw Allows Sneaky People to Guess Your Identity New Gambling Law Allows Americans to Be Cheated Friendly Credit Card Company Sends Barack Obama Card to Someone Else
Each of these cases demonstrates one or more class of unintended consequences. Some are technical, like the web browser exploit; others are corporate and government policy decisions that were not thoroughly considered. There is a rich body of theoretical literature on the LUC as it applies to technology, much of it from an engineering perspective. Healy discusses unanticipated S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 108–119, 2011. © IFIP International Federation for Information Processing 2011
Oops - We Didn’t Mean to Do That!
109
consequences, i.e. “consequences which are not foreseen and dealt with in advance of their appearance.“ [1] He cautions us not to confuse unanticipated consequences with undesirable or improbable ones. In an example about a nuclear power plant near an ocean, he notes that “the anticipated and intended goal or consequence is the production of electric power. The undesired but common and expected consequence is the heating of the ocean water near the plant. An undesired and improbable consequence would be a major explosion…” Examples from Chernobyl to Three Mile Island tell us that the improbable consequences should indeed be considered in a risk analysis of such a project. In a book on this subject [2] Perrow also discusses LUC at nuclear power plants, noting that “typical precautions, by adding to complexity may help create new categories of accidents.” So, ironically, “at Chernobyl tests of a new safety system helped produce the meltdown and the subsequent fire.” This type of occurrence is what Tenner calls a “revenge effect.” In his book on the subject [3] he uses the example of car alarms which are intended to protect vehicles from theft and vandalism. Of course they sometimes malfunction, triggering annoying false alarms with flashing lights and blaring horns. “In cities where alarms are most needed,” Tenner writes, “neighbors silence malfunctioning systems by trashing cars.” In other words, the technology intended to prevent car vandalism can lead to precisely that. In the updated edition of his book on highly improbable events, Taleb [4] develops the concept of Black Swans, asserting that all really important discoveries and advances (e.g. the Internet) have come from events that eluded the normal prognostication techniques. So, he writes, “Black Swans being unpredictable, we need to adjust to their existence (rather than naively trying to predict them). There are so many things that we can do if we focus on antiknowledge, or what we do not know.” Dörner [5] provides a helpful framework for understanding why outcomes are often difficult and sometimes impossible to foresee. He notes that complexity, (system) dynamics, intransparence, ignorance and mistaken hypotheses can all play a role in preventing us from correctly foreseeing consequences. A system may simply be too complex or opaque for us to really understand it; it may be changing on its own; or we may simply have inaccurate mental models of reality which make our predictions incorrect. It is clear that anticipating consequences is hard work, and, sometimes even impossible. Yet it is important work, because we are seeing more and more examples of negative outcomes of bad design at both the technical and policy levels. Without falling into the logical trap of trying to predict what is truly unpredictable, it does appear that there are some common factors that lead to unintended consequences in the domains of privacy, identity and security. It also seems to be helpful to draw on concepts from other disciplines to better understand these issues.
2 Factors That Can Lead to Unintended Consequences One tool for understanding a complex phenomenon is to identify classes of factors which tend to contribute to it. A related technique is to reason by analogy with other concepts in other fields of human endeavor. The factors listed below, while certainly not an exhaustive list, are intended to shed some light on this type of analysis and to inspire further thinking. They draw on fields as diverse as accounting, information
110
T.P. Keenan
science, economics, and even human psychology to generate models that can be helpful in this research. 2.1 Materiality – Does This Matter to Me? In accounting, a sum of money is considered “material” if it could, by its omission or mis-statement, cause the reader of financial statements to form incorrect conclusions about the financial health of the entity. So, a few coins stuck under the cash register drawer are not material, but an unreported commitment to make a large purchase might well be. For the purposes of this paper, policies can be considered material if they have sufficient positive or negative consequences to affect the behavior of a reasonable person. For example, in the Canadian blank media levy example noted above, the government’s stated goal was to collect revenue to compensate musicians for music that was being copied onto blank media. Some argue that reducing media piracy, thereby pleasing the US government, and punishing the Canadian public for audacious copying were secondary goals. There appears to have been no serious contemplation of crossborder shopping in the legislative debate that led to the 1997 changes to Canada’s Copyright Act. After the fact, the Retail Council of Canada did indeed note the damage to Canadian retailers and is now calling for the abolition of the blank media levy [6]. A similar levy in Australia was struck down by their Supreme Court. Some European countries have blank media levies, as permitted under the EU Copyright Directive of 2001 [7]. Canada’s blank media levy is currently 29 cents Canadian per unit for recordable media such as CD-R, CD-RW, MiniDisc, etc. This money is placed into a fund to compensate copyright owners for their losses due to private copying of digital media such as music CDs. The levy often exceeds the purchase price of the blank CD. Comparing US and Canadian vendors for blank Memorex CD-Rs, a US firm, Best Buy, offers a 50 CD spindle for $16.99 US. The Canadian branch of the same store sells a comparable 50 pack for $39.99. These are regular prices, not temporary sale ones. Since the currencies are currently close to par, the difference in unit price (34 cents vs. 80 cents) is largely due to the blank media levy. Because most Canadians live close to the US border, the temptation to cross-border shop is high and the risk of getting caught is very low. So consumer behavior has changed and the blank media levy is indeed material, in a way that it would not be if it were set at, say, two cents per unit. 2.2 Technology Substitution – Is There Another Way to Do This? Consumers usually have choices, and will take their business to the vendor who gives them what they perceive as the best deal. For example, many drivers will avoid toll roads, seeking toll-free alternatives, unless the perceived value in terms of time and fuel savings exceeds the toll cost. In a similar fashion, Canadian consumers have “voted with their pocketbooks” by largely shunning the overpriced CDs in favor of other ways of storing digital content. In fact, it is difficult to find blank CDs in many general retail stores. Where did that business go?
Oops - We Didn’t Mean to Do That!
111
DVDs are not subject to the Canadian blank media levy, illustrating the fact that legislation almost always lags behind technology. A spindle of 50 DVD-Rs is available in Canada for $9.99, pushing the unit cost down to 20 cents each, and that’s for media with almost six times the capacity of a CD! While it may be annoying to store your music on a disc that will not play in your car’s CD player, people are coping by simply transferring their files from DVD to their mp3 player, or even directly to the mp3 player, thereby finding a way around the artificially high cost of CDs in Canada. The advent of cheap flash memory has altered the landscape here, with the very real possibility of storing significant amounts of content in semipermanent solid state memory. Of course, this has led the recording industry (the main force behind the blank media tax) to demand a new “iPod tax” to recoup their perceived losses and there is some legislative support for this [8]. This illustrates the iterative nature of policy in which a cycle (some would say an endless and futile one) of measures are proposed, each trying to “plug the leak” discovered in the previous “solution.” The ability to substitute a functionally equivalent technology is certainly an important consideration is searching for unintended consequences, especially in the information technology domain where “a bit is a bit.” 2.3 Scope Conflict – My Law Is Better Than Your Law An interesting situation arose after the passage of the US Health Insurance Portability and Accountability Act of 1996 (HIPPA.) It illustrates a problem when laws (or policies) are created in a vacuum, without paying proper attention to the other laws and policies that will be affected. HIPPA provides much-needed safeguards to give patients control over the use and disclosure of their medical information. However, as a US Federal law, it takes precedence over state laws including the ones that govern lawsuits brought by patients against health care providers [9]. Since the definition of a health care provider in HIPPA is very broad, the net effect of its passage was to allow patients to sue their health care provider and also demand that their information be withheld from the opposing side. The lawyers for the health care providers were thereby denied access to the very information they required to properly build their cases. For a period of time, this caused difficulty in the legal process and had to be addressed by further legislative changes [10]. In a similar fashion, online gambling is currently illegal in the United States by virtue of the enactment of the Unlawful Internet Gambling Enforcement Act (UIGEA) in 2006. Banks and credit card issuers are banned from paying money to online gambling sites. Despite this, offshore online casinos are thriving, and have created an indirect technique called e-wallets to take money from U.S. gamblers. However, if a gambling site fails to pay a winning gambler, there is no way to sue the operators under US law since this activity is explicitly illegal under UIGEA. As Vogel points out [11], if a person wanted to track down their winnings at an offshore casino, “the winner could go to the UK, Aruba or Bermuda, or to the locale stated in the terms of service. It may sound romantic, but it could cost more than it's worth to make the trip -- not to mention payment of litigation fees.” Effectively, US-based gamblers using foreign gaming sites are dependent on the honesty and integrity of the site operators as they lack legal protection under the laws of their own country.
112
T.P. Keenan
2.4 Combinations of Information – The Devil Is in the (Very Minor) Details Increasingly, the concept of a stateless, memoryless interaction between computers on the Internet is being replaced by a complex and somewhat opaque web of identifications and quasi-identifications. Websites leave cookies, collect personal data, and services like Spokeo (www.spokeo.com) provide easy cross-correlation based on something as simple as an email address. The new web standard, HTML 5, will increase the number of places where cookies can be stored to make them even more persistent and harder to eradicate. Even apparently innocuous leftovers on your computer like the bits that govern what color a link is displayed in can provide clues as to where you’ve been online. This information is made available to websites via the a: visited pseudo-class, part of the CSS style sheet language that controls how text is presented. Gilbert Wondracek and Thorsten Holz of the International Secure System Lab at the Technical University of Vienna demonstrated a way to steal the history of a user by exploiting the information that browsers use to render sites that have been previously visited in a different color than other sites [12, 13]. They then combine this with social network sites visited, on the theory that very few people belong to exactly the same combination of social networks. Another researcher, Joerg Resch of analyst company Kuppinger Cole used this technique to uniquely identify himself on the social network Xing and has provided an online experiment to allow others to try to do this [14]. Because of a fix applied to Xing, this technique is no longer functional, but it certainly made a point. One might also argue that if a small set of social networks become dominant, this type of uniqueness will decrease. However, there will always be some class of sites visited that is fairly unique to a specific user and could potentially be exploited. In a similar spirit, Peter Eckersley of the Electronic Frontier Foundation (EFF) has developed this concept using the precise versions of software installed on a computer (browser, Flash version, plugins etc.) to create a “browser fingerprint.” All of the information necessary to create this fingerprint is available upon request to websites visited from the browser [15]. He ran a fingerprinting algorithm on 470,161 informed participants visiting a particular EFF website. Eckersley concluded that, for this sample, “the distribution of our fingerprint contains at least 18.1 bits of entropy, meaning that if we pick a browser at random, at best we expect that only one in 286,777 other browsers will share its fingerprint.” He concludes that “there are implications both for privacy policy and technical design.” With an estimated 1.5 billion people using the Internet worldwide, browser fingerprinting may not identify a specific person. However, it can certainly be combined with other information to narrow the search. Clearly the designers of browsers never intended to allow this type of deanonymization, which is why information such as full browsing history is not passed to websites. However, they failed to realize that through the style information and the a: visited list they were essentially giving out a similar capability to anyone who was clever and devious enough to use it. It is important to note that combinations of unintended consequence factors may come into play in a specific situation. For example, in the browser exploit example, materiality is also relevant since it would only be worthwhile to bother tracking a user
Oops - We Didn’t Mean to Do That!
113
if there was a purpose, such as sending targeted advertising or something more nefarious, like identity theft or blackmail. 2.5 Pricing Failure – What Is the Price of a Human Life? It is a basic tenet of economics that “everything has a cost, everything has a price.” Yet, the actual implementation of that principle is often flawed. As Schumacher points out “to press non-economic values into the framework of the economic calculus, economists use the method of cost/benefit analysis…it is a procedure by which the higher is reduced to the level of the lower and the priceless is given a price. It can therefore never serve to clarify the situation and lead to an enlightened decision. All it can do is lead to self-deception or the deception of others; all one has to do to obtain the desired results is to impute suitable values to the immeasurable costs and benefits. The logical absurdity, however, is not the greatest fault of the undertaking: what is worse, and destructive of civilisation, is the pretence that everything has a price or, in other words, that money is the highest of all values.” [16]. The 2010 oil spill off the US Gulf Coast is a stark example of the difficult of mixing monetary values (getting oil efficiently) with non-monetary ones. Long term ecological damage, injury to British Petroleum’s reputation, and loss of human life clearly have symbolic and absolute values that defy translation into dollars and cents. While many observers argue that the spill was predictable, and perhaps even inevitable, few high level decision makers had considered the consequences such as possible health effects from the dispersants being used, as well as the psychological trauma on area residents. In the realm of privacy and security policies, a dollar cost is often attributed to the compromise of private information, either by citing the amount spent by the party committing the breach to “remedy” it, (e.g. through paying for customers’ credit insurance,) or the cost (in time and out of pocket expense) to the victim to reestablish their identity credentials. As one example, TJX Companies, Inc., the US-based parent of retail stores such as TJ Maxx and Winners, estimated the cost of its 18-month long privacy breach at $17 million US [17]. This line of thinking makes the implicit assumption that a victim can be “made whole” by financial compensation. Disturbing examples are arising that demonstrate that is not always possible. According to the media reports, a 24 year old teacher named Emma Jones committed suicide in February 2010 “'after (her) ex-boyfriend posted naked pictures of her on Facebook” [18]. While the man involved denies this action, relatives and acquaintances have stuck to the story that her embarrassment, combined with the fact that she feared legal action as she was living in Abu Dhabi, drove her to take her life. Clearly, for Jones and her family, this privacy breach falls into the realm of “priceless.” Further confirmation that some things are considered “beyond price” comes from the outcry over online tributes to UK gunman Raoul Moat, who attacked a police officer. A Member of the UK Parliament called for the page to be taken down and “took the step of ordering Downing Street officials to contact Facebook, which has allowed 30,000 people to join a bizarre 'tribute group' glorifying Moat's crimes, to lodge a formal protest” [19]. In cases like this, financial compensation seems, as Schumacher argues, to be at best irrelevant and even crass.
114
T.P. Keenan
2.6 Malicious and Semi-malicious Actors – An Unlocked Door Invites Intruders No discussion of unintended consequences would be complete without acknowledging the important role of human beings in the ultimate outcome. The term “hacking” has come to refer to the malicious and often criminal activity of breaking into computer systems. However, in its original sense, hacking was a noble calling that saw highly talented people, often at universities, exploring the corners of technology for the sheer joy of it. Levy documented this very well 25 years ago and his book on the subject has recently been updated and re-released [20]. Proof that this playfulness is still with us comes from the case of prankster/journalist John Hargrave [21] who successfully obtained a credit card in the name “Barack Obama” simply by phoning American Express and asking for a “supplementary card” in that name. It was duly issued and arrived in the mail. Another example relates to the way in which airlines alter fare prices, often in the middle of the night. Clever travel agents wrote automated scripts that queried the airline computers on a continuous basis asking for specific fares on behalf of clients. This put such an unexpected load on the airline systems that they had to put limits on the number of queries made by travel agents. Many jurisdictions have online systems to allow citizens to check their own, and their neighbors’ property assessments for tax fairness purposes. Of course, these are often used for totally unrelated purposes (how much is my boss’ house worth?) and this was enough of an issue that the City of Calgary had to take both legal and technical countermeasures to control it [22]. While most people who mis-used this system were, at best, “semi-malicious” in that they were driven by curiosity, at least one person objected when a newspaper published a photograph of his house as one of the most expensive in the city, citing privacy concerns and fears that he would be burglarized. The best policy in terms of human behavior is to simply assume that if a feature exists in a system, someone will try to figure out a way to exploit it, and will often succeed. 2.7 Future Technologies – Expect the Unexpected Well beyond the scope of this paper, but impossible to ignore, is the ongoing impact of “not yet invented” future technologies. As one example, petabytes of data are being self-generated by users on Facebook, MySpace, Twitter, etc. and, we can assume, being archived somewhere. Many people even go to the trouble of identifying faces on Facebook through “tagging.” Then of course there is the growing presence of surveillance cameras, biometric identifications, etc. Improved facial recognition and data-matching technologies, already under development, will mean that a curious (or malicious) person (or government) in the future may well be able to retrospectively track our movements and perhaps, even, accuse us of committing crimes that aren’t even crimes yet! Sitting between current reality and things not yet invented, there is a whole class of technologies that are already existing, or known to be possible, but which have not yet emerged on a mass commercial basis. Some can be discovered by looking at the patent filings of high tech companies. For example, Apple was assigned a broad patent US
Oops - We Didn’t Mean to Do That!
115
patent 7814163 issued Oct. 12, 2010 [23] under the title "Text-based communication control for personal communication device." It provides an automated mechanism for controlling objectionable language in text messages, and also enforcing certain educational goals. As noted in the patent document, “These embodiments might, for example, require that a certain number of Spanish words per day be included in e-mails for a child learning Spanish.” Pundits are already speculating that teenagers will find new euphemisms for the dirty words that are banned, and that pre-written Spanish emails will be readily downloadable to meet the quota requirements. Those introducing new technologies, whether they are technology providers, network operators, companies, or governments would be well advised to stop and think about possible misuse before releasing new capabilities into the world. Simply stating that “we didn’t think of that” will become less acceptable as disgruntled users demand compensation like that exacted from TJX and other firms that have breached consumer privacy.
3 Some Techniques for Anticipating Consequences Whole organizations have been created to deal with the field of “strategic foresight,” an attempt to gain insight into the future without making the assumption that it can be predicted by extrapolation of current situations. While still bedeviled by Taleb’s Black Swans, which seemingly come out of nowhere, they have provided useful insights for many organizations. Then again, William Gibson, author of numerous science fictions novels and the man who coined the phrase “cyberspace,” has acknowledged “most futurists are charlatans…when I made up that word (in 1982) I had no idea what it meant but it seems to have stuck.” [24] A specific technique within strategic foresight is the use of “scenario planning.” Indeed this is the basis of a popular executive seminar taught at many business schools. The one taught at the University of Oxford holds out the promise of “using the future to improve our understanding of today” [25] and traces scenario planning back to the ancient Greeks. Since scenarios are, in essence, stories about the future, it may well be productive for those faced with privacy and security decisions to adapt sagas like those identified here (and the future will provide a continuous supply.) To illustrate this, consider the case, investigated by the author, of a large US University that compromised the personal data on a large group of students. It happened because a student employee, wanting to test out cloud computing based file storage, needed a large file. He thoughtlessly chose a file containing personal data on all students currently living in the University’s residence halls, and posted it on what turned out to be a publicly accessible server at Google. The unencrypted file contained the names, addresses, and social security numbers of a large number of students who had trusted the University to keep them confidential. The case was further complicated by the fact that the person responsible was a casual student employee and had left the job by the time the matter was discovered. It wound up costing the University a substantial amount of money (to pay for credit insurance for students, etc.) as well as a public relations nightmare that involved writing apologetic letters to affected students.
116
T.P. Keenan
While the perpetrator’s superiors could not have anticipated that he would try to do ill-advised cloud computing experiments, system designers should craft systems, especially those used by casual employees, to restrict file access and sharing. As another example, also investigated by the author, a major Canadian hospital suffered breaches in patient confidentiality because a curious part time nurse made queries on patients all over the hospital with no proper controls.
4 Automated Policy Analysis – A Potentially Useful Tool to Minimize Unintended Consequences Dan Lin et al. of Purdue University and Jorge Lobo of IBM T.J. Watson Research Center have developed a system called “EXAM - a Comprehensive Environment for the Analysis of Access Control Policies” [26]. It handles policies which can be expressed in XACML, a standard access control language that supports reasonably complex rules. They have recently started applying EXAM to privacy policies and are able to provide quantitative measures of the similarities of two privacy policies as well as identifying contradictions between them and flagging areas that are covered by one but not by the other. They have also developed graphical interfaces to display this information. Damianou et. al. [27] have created a policy analysis tool called PONDER which provides a declarative language for specifying both security and management policies, as well as a hyperbolic tree viewer, a policy compiler and a policy editor. Acknowledging that “the refinement process is not expected to be fully automated,” the authors make provisions for interactive editing, and graphical display. In the future, they hope to add animation. In the specific domain of privacy, Sirin has applied a policy analysis tool called OWL to the US HIPPA legislation, which governs the use of medical information. As noted in a presentation at the APQC conference in Houston [28] doing a deep analysis of policies does indeed turn up “holes in our policy” (in this case a nurse having “read and re-write” access to medical history files). As policies become more complex, tools like this will become even more necessary, and may help to identify contradictions, inconsistencies, and omissions. While EXAM, PONDER, OWL, and similar approaches provide excellent technical vehicles for exploring policies, they may be of limited use in finding truly novel unanticipated consequences, simply because the “rule-makers” do not even think about them. How, for example, would browser designers have thought to worry about the color coding of already visited websites? Bertino has expressed enthusiasm for broadening the application of EXAM to other situations such as cloud computing policies, and work on that is ongoing [29].
5 Conclusions Collecting stories about unintended consequences is certainly entertaining and instructive, and there is every indication that an endless supply of them is forthcoming. Indeed, Peter Neumann of SRI has been hard at work documenting the foibles of technology in the Risks Forum [30] since 1985, and that list of techno-human foibles shows no signs of slowing down. The hard, but really important work, is to derive from
Oops - We Didn’t Mean to Do That!
117
these anecdotes some principles that can help technology developers and policy makers to at least minimize the consequences of the LUC on their work. Applying concepts from other fields (e.g. materiality from accounting) seems to be a useful tool, and worthy of further exploration. Scenario planning also has a place, especially since it forces people to think about unlikely events and sensitizes them to the inherent unpredictability and inaccuracy of the future. Black Swans come into play too, but mainly as something to be enjoyed (or not) in retrospect. Taleb makes it clear that attempting to predict them is futile, though we often try to convince ourselves when looking back that developments like penicillin and the Internet were actually logical outgrowths of previous events. Automated policy analysis is another tool that may be useful. However, it is mainly confined to clarifying the underlying logic of policies. EXAM-like approaches may have some utility in finding subtle contradictions in privacy and security policies, but the really big issues will still require human thought and creativity. Most of the examples in this paper were gleaned from publicly available sources such as newspapers and Internet postings. There are undoubtedly other rich sources of information about unintended consequences such as the files of national and local privacy commissioners, accident reports filed with government agencies, and internal corporate documents. The advent of “whistle blowing sites” such as the now-famous WikiLeaks can also provide a treasure trove of inspiration, especially as diplomatic cables which were intended for a very restricted audience are read and interpreted by a much broader community. The real challenge is to understand the underlying patterns of thought and action that lead to unintended consequences, and, where possible, anticipate and prepare for them. This paper has identified seven proposed classes of factors driving LUC. These classes were developed by considering real life instances of unintended consequences and seeking similarities. Ideas from other disciplines such as economics, computer science and accounting were incorporated. The naming of the classes was an important part of the exercise, accomplished iteratively in conjunction with the input of participants in the PrimeLife/IFIP Summer School in August, 2010. Armed with this model, researchers can take further examples of LUC and decide which category, or categories they fall into, what new categories should be defined, and how the existing ones should be refined. It is also worth noting that the focus of these examples was on undesirable unintended consequences. There is certainly a whole universe of unintended but positive “silver linings” to be explored. As technology becomes more complex, and makes an even larger impact on our lives, it will be increasingly important to track unintended consequences and to learn from them. Unintended consequences will always be with us – they will just get more subtle and harder to predict as we think harder about them. Still, every instance of an undesirable consequence that is anticipated and avoided is a small victory that should improve our interaction with technology. Acknowledgements This work was funded in part by a Research and Study Travel Grant from the University of Calgary. The helpful comments of Prime Life project reviewers and other participants at the August 2010 Summer School in Helsingborg, Sweden are also gratefully acknowledged.
118
T.P. Keenan
References 1. Healy, T.: The Unanticipated Consequences of Technology. articled posted at Santa Clara University website, http://www.scu.edu/ethics/publications/submitted/healy/ consequences.html (accessed February 10, 2011) 2. Perrow, C.: Normal Accidents: Living with High Risk Technologies. Princeton University Press, Princeton (1999) 3. Tenner, E.: Why Things Bite Back: Technology and the Revenge of Unintended Consequences. Vintage Books, New York (1996) 4. Taleb, N.: The Black Swan. In: The Impact of the Highly Improbable. Random House, New York (2007) 5. Dörner, D.: The Logic of Failure: Why Things Go Wrong and What We Can Do To Make Them Right. Metropolitan Books, New York (1989) (English Translation, 1996) 6. Retail Council of Canada Submission – On Copyrights, September 11 (2009), http://www.ic.gc.ca/eic/site/008.nsf/eng/02560.html (accessed February 10, 2011) 7. http://eur-lex.europa.eu (accessed February 10, 2011) 8. Flatley, Joseph, L.: Is Canada’s iPod Tax Back? posted March 17 (2010), http://www.engadget.com (accessed February 10, 2011) 9. Antognini, Richard, L.: The law of unintended consequences: HIPAA and liability insurers; at first glance, the Privacy Regulations appear to be adverse to insurers and defense counsel, but McCarran-Ferguson and exceptions may save the day. Defense Counsel Journal 69(3), 296–305 (2002) 10. Kapushian, M.: Hungry, Hungry HIPPA: When Privacy Regulations Go Too Far. Fordham Urban Law Journal 31(6), 1483–1506 (2004) 11. Vogel, P.: US Law Against Online Gambling Makes it the Biggest Loser. E-Commerce Times, September 9 (2010), http://www.ecommercetimes.com/rsstory/70775.html? wlc=1287123815 (accessed February 10, 2011) 12. Cameron, K.: More Unintended Consequences of Browser Leakage, http://www.identityblog.com/?p=1088 (accessed February 10, 2011) 13. Wondracek, G., Holz, T., et al.: A Practical Attack to De-Anonymize Social Network Users. Technical Report TR-iSecLab-0110-001, http://www.iseclab.org/papers/sonda-TR.pdf (accessed February 11, 2011) 14. http://www.iseclab.org/people/gilbert/experiment (accessed February 10, 2011) 15. Eckersley, P.: How Unique is Your Web Browser?, https://panopticlick.eff.org/browser-uniqueness.pdf (accessed February 11, 2011) 16. Schumacher, E.F.: Small is Beautiful – Economics as if People Mattered. Harper & Row, New York (1975) 17. Gaudin, S.: TJ Maxx Breach Costs Hit $17 Million, http://www.informationweek.com/news/security/showArticle. jhtml?articleID=199601551 (accessed February 10, 2011) 18. http://www.huffingtonpost.com/2010/02/26/emma-jones-britishteache_n_477337.html (accessed February 10, 2011)
Oops - We Didn’t Mean to Do That!
119
19. http://www.dailymail.co.uk/news/article-1294700/FacebooksRaoul-Moat-tribute-page-breached-termsconditions.html#ixzz0v4NoxM50 (accessed February 10, 2010) 20. Levy, S.J.: Hackers: Heroes of the Computer Revolution – 25th Anniversary Edition. O’Reilly, Sebastopol (2010) 21. Metzger, T.: Prank Uses Obama in Attempt to Obtain Centurion Bling, http://blogs.creditcards.com/2008/10/the-amex-centurioncard.php (accessed October 15, 2010) 22. As explained, https://assessmentsearch.calgary.ca (accessed February 10, 2011) 23. US Patent Office, http://patft.uspto.gov/netacgi/nph-Parser? Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrc hnum.htm&r=1&f=G&l=50&s1=7814163.PN.&OS=PN/7814163&RS=PN/781 4163 (accessed February 10, 2011) 24. Gibson, W.: Wordfest speech at the University of Calgary (October 13, 2010) 25. http://www.sbs.ox.ac.uk/execed/strategy/scenarios/Pages/defa ult.aspx (accessed February 10, 2011) 26. Lin, et al.: EXAM – a Comprehensive Environment for the Analysis of Access Control Policies, CERIAS Tech Report 2008-13, http://www.cerias.purdue.edu/assets/pdf/bibtex_archive/ 2008-13.pdf (accessed February 10, 2011) 27. Damianou, N., et al.: Tools for Domain-Based Management of Distributed Systems. In: IEEE/IFIP Network Operations and Management Symposium (NOMS 2002) Florence, Italy, April 15-19, pp. 213–218 (2002) 28. Siren, E.: Automated Policy Analysis: HIPAA, XACML and OWL, http://weblog.clarkparsia.com/2008/12/10/automated-policyanalysis-hipaa-xacml-and-owl/ (accessed February 10, 2011) 29. Bertino, E.: Private communication, May 7 (2010) 30. http://www.sri.com/risks (accessed February 10, 2011)
Supporting Semi-automated Compliance Control by a System Design Based on the Concept of Separation of Concerns Sebastian Haas1, Ralph Herkenhöner2, Denis Royer3, Ammar Alkassar3, Hermann de Meer2, and Günter Müller1 1
Universität Freiburg, Institut für Informatik und Gesellschaft, Abteilung Telematik
[email protected] 2 Universität Passau, Lehrstuhl für Rechnernetze und Rechnerkommunikation
[email protected] 3 Sirrix AG, Bochum
[email protected] Abstract. Manual compliance audits of information systems tend to be time consuming. This leads to the problem that actual systems are not audited properly and do not comply to data protection laws or cannot be proven to comply. As a result, personal data of the data subject are potentially threatened with loss and misuse. Automatic compliance control is able to reduce the effort of compliance checks. However, current approaches are facing several drawbacks, e.g. the effort of employing cryptographic hardware on every single subsystem. In this paper a system design is presented that is able to circumvent several drawbacks of existing solutions thereby supporting and going beyond existing mechanisms for automated compliance control.
1 Introduction With respect to privacy of data subjects, several countries already have established laws and regulations that obligate companies to have extensive measures, which ensure proper and secure handling of personal data (PD). However, recent incidents of data loss1 raise the question why these companies have failed in fulfilling this obligation. The given incidents indicate that weak or ineffective data protection is employed by those companies and that the law enforcement is deficient in detecting the lack of data protection. The latter points show that the current procedures of inspecting the compliance to data protection laws seem to be ineffective and improper. This implication is supported by statistics that cast serious doubt on the effectiveness of governmental inspections. For example, in 2009, 2.2 governmental inspectors were responsible for 100,000 companies in Germany resulting in an average data protection audit every 39,400 1
See e.g. http://voices.washingtonpost.com/securityfix/2009/05/ hackers_break_into_virginia_he.html (last visit: 25.01.2011) or http:// www.wired.com/threatlevel/2009/11/healthnet/ (last visit: 25.01.2011).
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 120–129, 2011. © IFIP International Federation for Information Processing 2011
Supporting Semi-automated Compliance Control
121
years per company2 With regard to the increasing automated processing of personal data, by storing data sets in databases and making these accessible via the internet, such as in online social networks or in government-driven databases (e.g. central data storages of health cards), this issue poses a significant threat to information privacy. Privacy can also be considered as an economic problem [Va09]. Thus, assuming that an economically driven company weighs the costs of losing PD (personal data) against the costs of implementing effective measures against data loss. Only an increase in the value and a reduction in the cost of effective data protection can help to improve this situation. Introducing adequate sanctions for not complying with the data protection laws is limited by the effectiveness of law enforcement (i.e. inspection). Hence, increasing the efficiency of compliance audits is a crucial task. Our solution propagates a system design supporting automatic compliance checks, offering companies a tool to maintain and prove their compliance. Employing the design science paradigm [HMP04] as research framework, this paper presents a schema for data protection supporting automated compliance control that transfers Dijkstra’s concept of separation of concerns [Di82] into the domain of data protection (data protection schema and implementation as primary artifacts). The approach has been successfully instantiated as a prototype for mobile health-care services, in which sensitive PD of patients, such as names, addresses, and medical information are processed. In this paper, the considered scenario is settled in the eHealth domain of home care services. Figure 1a depicts a usual data flow for a home care service. Here, nurses transmit PD d to the storage service from which a nursing service is able to retrieve the data. The storage service has the following functions: receiving data with associated policies and storing both embedding data in answer documents and transmitting them to legitimate receivers. As there is no further computation, there is no need for the storage service to access any of the unencrypted PD. The storage service is composed of several complex sub-systems (e.g. load balancers, web servers, databases etc.) or could be a cloud storage database. The problem addressed in this paper is that current auditing mechanisms for data protection (manual or automated) fail to achieve their goals for complex systems like the described storage service. There is a need for reducing the effort of compliance audits, without decreasing their coverage.
Fig. 1. The scenario and the location of the solution inSel 2
http://www.xamit-leistungen.de/downloads/XamitDatenschutzbarometer2009.pdf (in German; last visit: 25.01.2011).
122
S. Haas et al.
This paper is structured as follows: Following the introduction, Section 2 presents the related work in the domain. Next, Section 3 depicts the developed system for supporting automated compliance control, whose implementation is presented in Section 4. Section 5 discusses the resulting system, including potential attack scenarios and its capabilities. Section 6 summarizes the findings and gives an outlook on future research opportunities.
2 Related Work By employing automated compliance checks, e.g. by using compliance patterns [Gho007], necessary and verified business process models can be achieved [Lui07], which addresses the issue of compliance on the organizational level. Compliance on the technical level can be achieved by two complementary approaches: by employing a-priori mechanisms that enforce the compliance of a system before or at runtime, or by posterior controlling that checks compliance after runtime. However, it can be argued that a-priori policy enforcement (e.g., usage control [PS04] and compliance engineering [HJ08]) is too inflexible in some scenarios [Po99, EW07]. Furthermore, posterior controlling is currently under heavy research (e.g. [CC07, EW07, Ac08]). Approaches in this area have in common that log files of a system are used to create a formal model of the usage of PD. Afterwards, the model is used to check, whether the usage was legitimate in relation to a set policy. This approach can be combined with a rollback mechanism for the enforcement of data integrity [Po99]. While these approaches are well suited in theory, there are several drawbacks when applying them to real word systems: a) The reconstructed model is an abstraction of a complex system which might be inaccurate to a certain degree. This is due to, e.g., limited reconstruction capabilities or log inconsistencies. b) A model can only be considered complete, if logs from every component and every layer (i.e. OS, middleware, application, etc.) are used for reconstruction. Otherwise vulnerabilities in the system design or the implementation might lead to unspecified and unlogged behavior of the system. c) Furthermore, in order to ensure authentic logs, secure logging has to be employed on every single subsystem. Any of these drawbacks might render the result of an automated compliance check unusable, as there is no proof that the reconstruction reflects the actual processing of PD. Applicable measuring points for compliance checks, regarding data protection, are points of enforcement of data access and transfer. Internal data processing is usually regulated by access control mechanisms [PS06] while data transfer is secured by security gateways [GH06] and the application of layer firewalls [Sc02] and semantic firewalls [As04]. All these approaches are similar in that they enforce security policies that must be compliant with the respective laws. Thus, even with a compliant policy an incompliant transmission might be allowed and will stay undetected if decision or enforcement of the policy fails. If a security system has complete and audit proof logging, they can be extended by compliance checks [Us04]. For retrieving data, these approaches are very similar to our approach. As innovation, our approach assumes that the provider of the sorting service is not a trusted party for storing data. Thus, personal data is protected by using encryption mechanisms against access by the provider of the storage service. Such features are not supported by current approaches.
Supporting Semi-automated Compliance Control
123
3 inSel: An approach Supporting Automated Compliance Control The essence of our work is the reduction of the number of components, which must be audited for a meaningful compliance check, and the creation of a trustworthy logging environment. This is achieved applying Dijkstra’s concept of separation of concerns to separate security functions from data processing and storage functions. This is done by introducing our solution inSel (informational self-determination), implementing the security functions in a dedicated security gateway as a barrier for PD (cf. Figure 1b). inSel’s purpose is that unencrypted PD text is held off from large parts of the system and access to it will be logged in a secure environment. The main benefit of using this approach is that compliance checks are reduced to checks of the proxy and its secure logs. 3.1 Compliance Control Schema The schema underlying inSel uses three functions: identify which separates PD from other data, substitute which encrypts PD and leaves other data as it is and resubstitute which recovers the original form of PD. substitute and re-substitute are logging access to data. Limitations and applicability are discussed in Section 5. When correctly applying the schema and adjusting identify, no unencrypted PD is passed to the storage service and every access to PD is logged. A compliance check therefore is reduced to a) a proof of the correct functioning of the proxy as well as b) a check of the respective access logs. Assuming that the correct functioning is correctly described in a model, proof a) is divided into two parts. The first part consists of checking that the schema has been employed this can be done for example by using a trusted computing platform [TCG05] and a certificate of a trusted third party (TTP), showing that the implementation matches the schema. This only has to be done once, as long as the schema and the employed software do not change. Secondly, the correct adjustment of identify must be proven. This is done manually by a TTP and has to be repeated for every transmission protocol used. b) can be proved by either using manual methods or automatic approaches mentioned in the related work section. Both approaches can be reduced to a check on the log files of the proxy as the proof of a) states that there is only encrypted PD in the background system. 3.2 Three Use Cases for Compliance To show the ability for supporting compliance control, three real world use cases were selected. First, the most general and common case is the performance reliability. Purpose of this case is to prove normal operation and, in any case of potential data misuse, to provide evidence on data access and performed operations. To achieve this, there has to be a consistent and audit proof documentation of the data access by subject, object, time and purpose. For reasons of data protection, there has to be access control and consistent and audit proof access documentation. The second use case is the inspection of commissioned data processing. In this case, the data is processed by a third party on behalf of the data controller. To take the German law as an example, the data controller (principal) has to inspect the data processor (agent) for divergence from normal operation. For the reason of data
124
S. Haas et al.
minimization these inspections have to be random (not consistently) but still on a regular basis. A good strategy for triggering the inspection is to do it from both the data controller’s and the data processor’s side. This minimizes the risk that one single party is able to prevent the inspection of specific operations. Again, a consistent and audit proof documentation of the data access including denied access requests by subject, object, time and purpose is required. Also, there has to be access control and a consistent and audit proof access documentation. For the reason of data minimization, the access should be limited to the documentation of the inspected operation(s). The third use case is supporting the right of access. In Europe the right of access enables data subjects to get information about the processing of their personal data by the data controller. There are specific requirements, to support the right of access in an automated manner [HJ10]. In particular, the legitimate interest has to be checked and the information has to be provided confidentially. Again, a consistent and audit proof documentation of the data access by subject, object, time and purpose is required. Also, there has to be access control and a consistent and audit proof access documentation. For privacy reasons, the access has to be limited to the documentation related to the processing of personal data of the data subject. In particular, personal data of a third person must not be disclosed.
4 Implementation The schema depicted in Section 3 is implemented by the inSel system, whose architecture is visualized in Figure 2. The system itself is comprised of the following three main components: • • •
The Secure Hypervisor, representing the basis of the architecture. The inSel Core, serving as the security gateway/proxy. Realizing the schema presented in Section 3 and containing the rights management and a user interface which is beyond the scope of this article. The Secure Log, offering audit proof log facilities for the substitute and the re-substitute functions.
Fig. 2. Resulting architecture of inSel
Supporting Semi-automated Compliance Control
125
4.1 The inSel Architecture Operating as a security gateway/proxy, inSel represents the measuring point for compliance checks. This is based on a secure hypervisor, which is the architectural basis of inSel. It comprises of a TURAYA-based [EP10] security kernel and a TPM (Trusted Platform Module) as hardware-based security anchor for performing measurements of the system and the isolated virtual machines (VMs) for the inSel Core and Secure Log. This way, a complete information flow control can be realized. Furthermore, the functionality of the security gateway (inSel Core) and the logging and audit mechanisms (Secure Log) are logically separated. For every storage and retrieval transaction going through the security gateway/proxy the logging mechanisms stores the following data: identifier of the data subject, identifier of the involved user, type of operation (i.e. storage or retrieval), transferred data types (e.g. name, address), the result of identify), a timestamp, and a session identifier (for linking log entries of the same user’s session). For compliance checks, authorized users can send requests to the Secure Audit-Log via the user interface, which only supports pre-defined use cases (e.g. inspection of suspected data misuse or controlling the storage service). This limits the possibility of misusing retrieved logging data and allows the audit of compliance checks itself. Additionally, this can help to provide information to data subjects practicing their right of access by evaluating all transactions for a specific data subject’s identifier. For privacy reasons, results of such evaluations should only be provided to the data subject and must not be visible to any other user. This is ensured by the access control mechanism in the compliance interface that will allow requests on information for a given data subject only to the data subject himself. 4.2 Application of the inSel Schema Before transmitting PD to the storage service, the nurse must authenticate her/himself to inSel. The data is transmitted via HTTP over SSL/TLS in standardized XML documents that are created by the nurse’s systems, which are assumed to be trustworthy. Transmitted documents are checked for validity against the RelaxNG schema which is also provided by the nursing service and which is identified via the address of the request. If the document matches the schema, then PD is marked as such by identify (cf. figure 3). Hence, identify is implemented as a search on defined XML trees. Substitute then extracts the data subjects as well as the types of data (e.g. name,
Fig. 3. Exemplified XML-Transformations implementing the inSel schema
126
S. Haas et al.
address, etc.) from the document according to the schema and encrypts PD and its type with the key of the data subject. After encryption the original PD is replaced by the cipher text and the subjects ID, which is extracted from the XML document. This procedure is repeated for all PD and a log entry is written into the Secure Log, containing data subjects and types. If writing the log is successful the document containing, the encrypted PD is passed to the storage system. These log entries form a protocol showing what data has been collected. Again, before retrieving data from the storage system, the (employee of) nursing service must authenticate himself to inSel. If the nursing service is not allowed to access the requested URL, the request is discarded and a log entry is written stating the prohibited access. Otherwise inSel retrieves the requested document from the storage service. Relevant data is identified by re-substitute searching for a predefined pattern (“ID|Base64-string” in the example in figure 3). The matching string is removed and ID and PD are separated. Decrypted PD is inserted where the pattern has been found. After all PD has been processed a log entry is written. The entry contains time, data subject and type of PD accessed as well as an indicator of the success of the decryption. Again, if writing the log is successful the document is passed to the nursing service. 4.3 User Management In order to manage the users of the inSel system, an integrated rights management component is provided, having the functionality to identify the users on the inSel system (authentication), managing their roles and access control permissions (administration) and to store and manage the encryption keys for the individual users. Configuration of the rights management component will be dedicated to the responsible data controller (in the given scenario to the nursing service). This is done exclusively and for a given instance of inSel in an irreversible manner. Neither the data processor (here the storage service), nor other data controller or the inSel provider himself have the permission to interfere.
5 Discussion In a productive environment it needs to be assured that the inSel system's availability is guaranteed and that the system itself is capable of scaling to the actual needed demands. While the prototype implementation does not yet allow for scaling and availability mechanisms (e.g. replicating or synchronizing VMs), future extensions need to include appropriate mechanisms for load balancing and cross machine handling of VMs. The configuration of the function identify is a vital part of our system. In its current implementation the system takes the RelaxNG schemas (stating which elements are valid) and the categorization of PD (which of the valid elements are PD) from the nursing service, which is assumed to be trustworthy. But there is no guarantee that the type of text elements’ content matches its assumed type. Malfunctions or misconfigurations of the nursing services’ systems might lead to unwanted disclosure of PD. The configuration of the rights management is done via the user interface, which is designed under the principles of usable security. It allows data subjects to set access
Supporting Semi-automated Compliance Control
127
rights to their PD according to their preferences (within legal boundaries). Misconfiguration, leading to unwanted disclosures, is minimized by applying usable security. The rights management, though, is only capable of controlling access to data stored at the service provider and does not control the usage of PD after it has been transmitted to legitimate data consumers. Usage Control principles could be used at data consumers to counter this problem, which is beyond the scope of this paper. 5.1 Security Discussion Our approach involves three parties: provider of the storage service, provider of the inSel system, and authorized service users (including data subjects and data consumers). Assuming that there are non-authorized externals, we have four parties to look at with regard to threads against the data subject’s privacy. In the following outgoing threads of each of this parties will be discussed. The provider of the storage service might get non-authorized insights into the stored data; in particular, he might do data mining to increase his knowledge on the data subject beyond the stored encrypted data. Since he stores the data, this is reasonable. In our approach, the personal data that is not required for the storage service is encrypted by the inSel service. This limits the insight into the actual stored data and thus, limits the gain of data mining. The provider of the inSel service might get nonauthorized access by sniffing through-going traffic. In our approach, de- and encryption is done inside the inSel Core and incoming and outgoing connections are encrypted via HTTPS. Thus, direct access to the VM is required to gain knowledge of PD. The Trusted Server itself does not allow a direct access to the VMs, as this functionality is centrally deactivated on the system. The only access is via defined interfaces, allowing the remote management of VMs, but not the access to them. Thus, there is no feasible way to read data passing the inSel system. It might be possible that an authorized user is getting non-authorized access. In our approach, we are using a strict regulation of access control to limit access to only authorized data. Further, the safe use of secure credentials like the new German electronic ID card is supported. Outgoing threads from non-authorized externals are counted by using a very restrictive design of the access points. The inSel service uses proven security mechanisms, such as HTTPS and XML-Encryption with AES, in order to protect and combine it with secure credentials and TPM-based virtual channels. In our approach, we introduced security mechanisms to increase the effort to get non-authorized access to the data outbalance the value of the data. As any other security gateway, our approach is limited to the capability of current cryptographic algorithms and security protocols. For supporting compliance checks, our approach comes with a secured and predefined interface. Access to the logging data is clearly controlled and regulated. Thus, it is resilient to data mining attacks and non-authorized access. The predefined interface also supports the principle of data avoidance. E.g. for the use case of the inspection of commissioned data processing the access is limited to the documented operations of a specific session ID and can only be done by specific authorized persons. The provided information is: timestamp, ID (linkable pseudonym) of the subject, message type (providing object and purpose of the access), provided data types (not the data itself),
128
S. Haas et al.
ID (linkable pseudonym) of the data subject. Anyhow, for every new compliance check the interface has to be extended. This may be a drawback for the maintainability and configurability of our approach. But it is reasonable that new compliance checks are not that often introduced, since the underlying regulations and laws do not change that often. The possibility of existing side channel attacks resulting from the altered architecture or necessary network metadata has not been discussed properly in a systematic way and is topic for future work.
6 Conclusion In this paper, we presented a schema called inSel that combines the abilities of a security gateway and an automated compliance checking system in a single system. By optimizing the solution for protecting personal data in storage services with multiple accessing parties, the complexity of data protection and compliance controlling could be reduced to a simplified but powerful gateway system. It allows the separation of security functions from data processing and storage functions. Using encryption to protect data from unauthorized usage, inSel operates as a gatekeeper for storing and retrieving data from the data storage. It combines strong access control mechanisms with consistent and audit proof logging. inSel itself is protected from external manipulation by using a TURAYA-based security kernel and a TPM (Trusted Platform Module) as hardware-based security anchor. Further, the access to the logging can be restricted to predefined and use-case related interfaces introducing robust and simplified semi-automated compliance checks. Based on the given scenario, three use cases for compliance control were implemented, proving inSel as a real-world solution. While our approach clearly supports data subjects and controlling agencies, there are even benefits for companies complying with data protection laws as they are able to prove their compliance to e.g. independent certification authorities and thus obtaining certificates which in turn can be used as a selling argument or to gain customers’ trust. inSel is able to reduce the effort of the compliance check of a complex system by reducing the relevant components that have to be audited. Its applicability has been shown successfully by a prototypic implementation in the scenario of mobile time and service recording of nursing services. In this case, the processing system only stores and prepares personal data for retrieval, allowing for a higher efficiency when conducting compliance checks. For future uses, the system could be adapted and extended, so it can be used in other application domains, such as eMail or many eHealth applications.
Acknowledgements This work was part of the inSel-project “Informationelle Selbstbestimmung in Dienstnetzen”, funded by the German Federal Ministry of Education and Research (BMBF) within the support code 01IS08016(A,B,C). The authors are responsible for the content of this article.
Supporting Semi-automated Compliance Control
129
References [Ac08]
Accorsi, R.: Automated Privacy Audits to Complement the Notion of Control for Identity Management. In: Policies and Research in Identity Management, vol. 261, pp. 39–48. Springer, Boston (2008) [As04] Ashri, R., Payne, T., Marvin, D., Surridge, M., Taylor, S.: To-wards a Semantic Web Security Infrastructure. In: AAAI Spring Symposium on Semantic Web Services. Stanford Univ., Stanford (2004) [Ce07] Cederquist, J.G., et al.: Audit-based compliance control. International Journal of Information Security 6, 133–151 (2007) [Di82] Dijkstra, Edsger, W.: On the role of scientific thought. In: Dijkstra, Edsger, W. (eds.) Selected Writings on Computing: A Personal Perspective, pp. 60–66. Springer, New York (1982) [EP10] emSCB Project: Towards Trustworthy Systems with Open Standards and Trusted Computing, http://www.emscb.de (accessed 01.07.2010) [EW07] Etalle, S., Winsborough, W.H.: A posteriori compliance control. In: SACMAT 2007: Proceedings of the 12th ACM Symposium on Access Control Models and Technologies, pp. 11–20. ACM, New York (2007) [GH06] Gruschka, N., Herkenhöner, R., Luttenberger, N.: WS-SecurityPolicy Decision and Enforcement for Web Service Firewalls. In: Proceeding IEEE/IST Workshop on Monitoring, Attack Detection and Mitigation, Tübingen, Germany, pp. 19–25 (2006) [HJ08] Höhn, S., Jürjens, J.: Rubacon: automated support for model-based compliance engineering. In: ICSE 2008: Proceedings of the 30th International Conference on Software Engineering, vol. 878. ACM, New York (2008) [HJ10] Herkenhoener, R., Jensen, M., Poehls, H., De Meer, H.: Towards Automated Processing of the Right of Access in Inter-Organizational Web Service Compositions. In: Proc. of the IEEE 2010 Int’l Workshop on Web Service and Business Process Security, WSBPS 2010 (2010) [HMP04] Hevner, A.R., March, S.T., Park, J.: Design Science in Information Systems Research. MIS Quarterly 28(1), 75–105 (2004) [Sc02] Scott, D., Sharp, R.: Abstracting application-level web security. In: Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA, pp. 396–407 (2002) [SS94] Sandhu, R., Samarati, P.: Access control: Principles and practice. IEEE Communications Magazine 32(9), 40–48 (1994) [PS04] Park, J., Sandhu, R.: The UCONABC usage control model. ACM Transactions on Information and System Security 7, 128–174 (2004) [Po99] Povey, D.: Optimistic security: a new access control paradigm. In: NSPW 1999: Proceedings of the 1999 Workshop on New Security Paradigms, pp. 40–45. ACM, New York (2000) [TCG05] Trusted Computing Group: TPM main specification. Main Specification Version 1.2 rev. 85. Trusted Computing Group (2005) [Us04] Uszok, A., Bradshaw, J.M., Jeffers, R., Tate, A., Dalton, J.: Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 425–440. Springer, Heidelberg (2004) [Va09] Varian, H.R.: Economic Aspects of Personal Privacy. Internet Policy and Economics, Part 4, 101–109 (2009)
Security Levels for Web Authentication Using Mobile Phones Anna Vapen and Nahid Shahmehri Department of Computer and Information Science, Link¨ oping University, SE-58183 Link¨ oping, Sweden {anna.vapen,nahid.shahmehri}@liu.se
Abstract. Mobile phones offer unique advantages for secure authentication: they are small and portable, provide multiple data transfer channels, and are nearly ubiquitous. While phones provide a flexible and capable platform, phone designs vary, and the security level of an authentication solution is influenced by the choice of channels and authentication methods. It can be a challenge to get a consistent overview of the strengths and weaknesses of the available alternatives. Existing guidelines for authentication usually do not consider the specific problems in mobile phone authentication. We provide a method for evaluating and designing authentication solutions using mobile phones, using an augmented version of the Electronic Authentication Guideline. Keywords: Authentication, information security, mobile phone, security levels, evaluation method.
1
Introduction
Most people today have multiple accounts and identities on the Internet, which they use in everyday life for a variety of purposes, from the inconsequential to the vital. To access these accounts, digital identities consisting of username/password pairs are commonly used [7]. Since users usually have many identities to remember, there is a risk that they will write the passwords down or choose the same password for several sites, which increases the risk of identity theft [12]. Users also tend to choose simple passwords that can be revealed to an attacker in an online guessing attack [3]. A hardware device can be used to ensure strong authentication by providing a tamper-resistant environment in which an authentication algorithm can run. Examples of hardware devices for authentication are smart cards, USB sticks, and devices with a display and a keypad [4]. Hardware devices for authentication are mainly used in online banking and other security-critical applications. The hardware is issued by the service provider and dedicated to a specific application [6]. Dedicated hardware can require additional equipment, such as cables or card readers. It may be inconvenient for the user to carry the device and other
Corresponding author.
S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 130–143, 2011. c IFIP International Federation for Information Processing 2011
Security Levels for Web Authentication Using Mobile Phones
131
equipment at all times, especially for mobile users who use multiple computers in different locations, for example at work, at home and at an Internet kiosk. Another option is to use a device that a user will already be carrying for a different reason. A mobile phone is an example of a device always available to the user. Furthermore, it does not require distribution to users, since most users already have access to a mobile phone [2]. Examples of authentication solutions where a mobile phone is used are: – 2-clickAuth [10], an optical authentication solution for identity providers; – Strong Authentication with Mobile Phones [9], an authentication solution using SIM cards [8]; – SWA [11], authentication adapted specifically for untrusted computers. Since authentication solutions for mobile phones differ significantly from each other, and since there are many choices with regard to data transfer and communication, it can be difficult to determine how secure a solution is. It is also difficult to design a new authentication solution with a specified security level in mind, since the choices of input and communication depend on the specific situation. We propose a method for the evaluation and design of authentication solutions that use mobile phones as secure hardware devices. Our method focuses on mobile phones and also considers usability, availability and economic aspects, as well as security. The method uses the security level concept from the Electronic Authentication Guideline from NIST [4]. We also propose supplements to the guideline to include secure hardware devices that can communicate with a local computer or a remote authentication server. The outline of this paper is as follows: section 2 describes aspects of the use of mobile phones in web authentication, section 3 describes the security level concept and shows our proposed supplements to the NIST Electronic Authentication Guidelines and section 4 explains our evaluation and design method. Section 5 describes case studies, section 6 describes related work, section 7 describes future work and section 8 concludes the paper.
2
Mobile Phones in Web Authentication
The variety of communication and input channels differentiates mobile phones from other hardware devices for authentication. The channels allow transfers of large amounts of data without time-consuming typing. The phone can also connect directly to a remote server via a long-range channel [9]. However, the location of the user affects the availability of the channel, e.g. whether a longrange channel can be reached and if it is costly to use. The availability of an authentication solution depends on whether the communication channels in the solution can be used without extra equipment and costs. Since users are mobile and authenticate from different places, the hardware they use will not be consistent. An authentication solution that is available to the user and reaches the required security level should also be easy to use, regardless of the user’s skill
132
A. Vapen and N. Shahmehri
level. Usability in a broader sense should also be considered, e.g. reduction of user actions needed to perform authentication. There are also economic aspects to authentication, such as costs incurred by the user to access the long range communication channels and costs for purchase and distribution of equipment. We discuss these factors in our design and evaluation method in section four. However, first we describe how the existing security levels can be adapted to mobile phones as authentication devices.
3
Proposal for New Security Levels
The Electronic Authentication Guideline from NIST [4] defines four security levels (1–4 where 4 is the highest) that can be used for evaluating the security of authentication solutions in general. Below is a short overview of the guidelines. Level 1: Single or multi-factor authentication with no identity proof. Protection against online guessing and replay attacks. Level 2: Single or multi-factor authentication. Protection against eavesdropping and the attacks from level 1. Level 3: Multi-factor authentication with protection against verifier impersonation, MitM attacks and the attacks from level 2. Level 3 requires a token used for authentication to be unlocked by the user using a password or biometrics. Only non-reusable data (e.g. one-time passwords) may be used. Level 4: Multi-factor authentication with FIPS-140-2 certified tamper-resistant hardware [1]. Protects against session hijacking and the attacks from level 3. Since there are specific concerns related to phones, we suggest complementing the Electronic Authentication Guideline [4] so that it can handle phone specific issues, such as eavesdropping on short range (i.e. between a phone and a nearby computer) communication channels. We propose two new levels between the existing security levels, to make the Electronic Authentication Guideline better suited for evaluating web authentication solutions, especially with mobile phones. We make the original levels more fine-grained by dividing levels 2 and 3 into two levels each, adding level 2.5 and 3.5. The goal is to be better able to compare authentication solutions to each other. With the current levels, two solutions with very different security may end up on the same level, which makes comparisons more difficult. When designing new solutions with a specific level in mind, a less secure design may be chosen since it is located at the same level as one that is more secure. Since the Electronic Authentication Guideline is mainly intended for solutions where individuals are authenticated, identity proofing is required for all levels above 1. Identity proofing means that at registration the user must prove their physical identity, e.g. by providing their passport number and credit card number. Most web applications authenticate digital identities but are not concerned with physical ones. In such applications, security may be relevant even if identity proofing is not. Furthermore, our interest lies in the technological aspects of
Security Levels for Web Authentication Using Mobile Phones
133
mobile authentication, whereas identity proofing is an administrative issue. For these reasons, identity proofing is not relevant to web authentication. Requiring identity proofing would make most web authentication solutions end up on level 1, independent of the overall security of the solutions. Level 2.5: The same requirements as for level 3, but with only one of phone locking or MitM protection. Level 3.5: The same requirements as for level 4, but with a SIM or USIM card (or similar tamper-resistant module) that is not FIPS-140-2 certified. When using a mobile phone for password storage, passwords may be strong and have high entropy, without any extra inconvenience for the user. Because it is possible to store a password securely on the phone and transfer it to a local computer without a risk of keylogging, such solutions meet the requirements of level 2, except for the identity proofing aspect. Since we consider identity proofing a separate issue, we consider these solutions to be at level 2. For level 2.5, one of phone locking or MitM protection may be left out. An MitM attack is difficult to protect against, since such protection requires a side channel or the exchange of several rounds of data. Phone locking may be timeconsuming for a user that authenticates often. Verifier impersonation lies in level 2.5 since it is a protocol issue that is unrelated to the properties of the phone. Very few phones today can reach level 4, because of the hardware requirements. Level 3.5 requires both protection against verifier impersonation and that sensitive data transactions must be authenticated. However, a SIM or USIM card can be used in authentication, either in an EAP-SIM protocol [9] or for running authentication algorithms, e.g. calculating responses to a challenge. The SIM card may not be used in such a way that secret keys or other secrets are revealed to an attacker.
4
Evaluation Method for Mobile Phone Authentication
We provide a list of steps, outlined in Figure 1, to follow for evaluation and design of authentication solutions using any type of mobile phone, including smartphones. The list is used in conjunction with Table 1 to determine the highest security level that a solution can achieve or to suggest solutions for a chosen security level. Table 1 describes features present in the communication
Fig. 1. Description of the evaluation and design method
134
A. Vapen and N. Shahmehri
IR
NFC
Cable
Audio
Optical
Manual
Keylogger resistant
Bluetooth
Features
Factors
Table 1. Features of mobile phone communication channels
S
x
x
x
x
x
x
(x) Manual input may be vulnerable to keyloggers when using passwords. Non-HID Bluetooth devices are not vulnerable to keyloggers.
Cannot spread S malware
x
x
x
x
Comments
For private environments
S
(x) x
x
x
x
x
Bluetooth can be eavesdropped from outside a building.
For public environments
S
(x) (x) x
x
(x) (x) x
Channels in parenthesis can be eavesdropped and replayed by a nearby attacker, if the data is used several times.
For phone unlocking
S
x
In specific cases a touch screen or fingerprint reader could be used for biometric unlocking.
For noisy environments
U
x
x
x
x
For users with U poor eyes
x
x
x
x
x
x
For users with U shaky hands
x
x
x
x
x
x
No extra equipment
AE
x
x
x
x
(x) (x) x
An optical channel from the phone to the computer requires a web camera. Audio channels require speakers and a microphone.
S: security factor, U: usability factor, A: availability factor, E: economic factor. x: the channel has this feature. (x): the channel usually has this feature, but there are exceptions that are noted in the Comments column.
channels and makes it easy to compare the variety of communication channels for mobile phones, based on their features. The set of features is initial and will be extended. The features in the current set are the most important ones and can affect the security level of a solution. 1. Authentication methods: There are methods with reusable data (e.g. passwords) and with one-time data (e.g. one-time passwords). Evaluation: Identify the authentication method used in the solution. If a method with reusable data is used, level 2 is the highest level possible. For
Security Levels for Web Authentication Using Mobile Phones
135
level 1, passwords must have reasonable security (see online guessing below). To reach level 2, passwords with at least 10 bit entropy are required [4]. Biometrics may not be used, according to the Electronic Authentication Guideline. Design: Choose the authentication methods that are feasible. For level 3 and higher, only methods with one-time data may be used. For level 2, passwords with 10 bit entropy are needed. Using a phone for password storage makes it possible to use a strong password that is difficult to remember. 2. Data protection (a) Locking methods: To protect the phone when it is not in use, it may be locked using biometrics (e.g. voice or face recognition) or a manual method (e.g. password or PIN). Evaluation: If the phone can be locked, the solution can reach level 3 or higher. Otherwise the solution can at most reach level 2. Design: For level 3 and higher, choose a locking method. This requires manual, optical or audio input on the phone, depending on the locking method. (b) Secure hardware: Tamper-resistant hardware in the phone can be used for data protection. Level 3 requires multi-factor authentication with a secure hardware device [1], a software token (e.g. software used for calculating non-reusable authentication data), or a one-time password device. Phones can be considered hardware devices, without the FIPS certification needed for level 4, but can also be seen as devices containing software tokens. Evaluation: If the phone uses secure hardware certified by the FIPS-1402 standard (FIPS-140-2 level 2 overall security and FIPS-140-2 level 3 physical security) [4], the solution can reach level 4. If a SIM card is used as secure hardware, level 3.5 is possible. Otherwise level 3 is the highest level. Design: See evaluation. 3. Protocols: Depending on which protocol is used in authentication, the security of the solution may vary. The following authentication protocols may be used: proof-of-possession-protocols with either private or symmetric keys, tunneled or zero-knowledge password protocols and challenge-response password protocols. Evaluation: Challenge-response passwords can at most reach level 1. Tunneled or zero-knowledge password protocols, which are protocols that cannot be easily eavesdropped, reach level 2 at most. Proof-of-possession protocols can reach level 4. Design: See evaluation. On a mobile phone, proof-of-possession protocols require keys to be securely loaded into and stored in the phone. 4. Attack resistance (a) Online guessing: Passwords must be strong enough not to be guessed. Evaluation: For level 1 and 2, passwords must resist online guessing. This is achieved by using strong passwords and not sending them in the clear. The maximum chance of success of a password guessing attack must be
136
A. Vapen and N. Shahmehri
1/1024 for level 1 and 1/16384 for level 2, over the complete lifetime of the password [4]. Passwords are not used in levels 3 and 4. Design: See evaluation. (b) Replay attacks: In a replay attack, authentication data is reused by the attacker. Evaluation: Resistance against replay attacks is required for levels 1–4. If passwords are sent in the clear, the solution does not reach level 1. Design: To reach level 1, passwords must be tunneled or salted while sent over a network or phone channel. (c) Eavesdropping: Table 1 shows which channels can be used in a private environment, i.e. a room without untrusted people present, and which channels can be used in a public environment, e.g. an open area with untrusted people present, without being eavesdropped by an attacker. Evaluation: For non-reusable authentication data, the solution can reach level 4. For reusable data (e.g. passwords), use Table 1 to see if the channels are vulnerable to eavesdropping. If not, the solution can reach level 2. Otherwise the solution can reach level 1 at most. Level 2 also requires reusable data to be tunneled (e.g. using SSL) or otherwise protected, when sent from the local computer to a remote server. Design: For reusable data, choose channels not vulnerable to eavesdropping and tunnel the communication to the remote server to reach level 2. For non-reusable data, any channel may be used and tunneling is not necessary. (d) Man-in-the-Middle-attacks (MitM): If there are long range channels available, such as a phone network or Wi-Fi, they can be used as a secure side channel to protect against MitM attacks. Without long range channels, mutual authentication can be used to prevent MitM attacks. Evaluation: With MitM mitigation the solution can reach level 3 and higher. For level 2.5, either MitM mitigation or phone locking must be present. Design: See evaluation. (e) Verifier impersonation: In this type of attack, an attacker claims to be the verifier in order to learn passwords and secret keys. Evaluation: Resistance against verifier impersonation is required for level 3 and higher. The data sent should not give any clues on the secret key used in authentication. Design: See evaluation. (f) Session hijacking: An attacker modifies or reroutes parts of a session. Evaluation: If there is a shared secret per session, this may be used to protect against hijacking. Such protection is required for level 4. Design: See evaluation. 5. Other factors: Evaluation: Use Table 1 to learn about features not discussed in the Electronic Authentication Guideline. These features do not affect the security level, but may affect other aspects of the solution. For solutions in which
Security Levels for Web Authentication Using Mobile Phones
137
third parties are involved, shared secrets must not be revealed to the third party, in order for the solution to reach level 2 or higher. Design: Choose channels based on the user’s equipment, if known. Manual input is default for both computers and phones. If challenge-response is used as authentication protocol, data transfer in both directions between the computer and the phone is needed. If there is a risk of malware, users with poor eyes etc, check Table 1 for solutions that may be used in the specific cases. For level 2 and higher secrets must not be revealed to third parties. 6. Conclusions: Evaluation: Applying these steps will allow identification of the solution’s maximum security level. This will also provide information about the properties that prevent the solution from reaching a higher level. Design: Given the preferred security level, this process will provide recommendations about possible channels, authentication methods and other features. An example of this would be one-time passwords or challenge-response with manual input and Wi-Fi as a side channel.
5
Case Studies
We now present four case studies that apply our method from section four and show design and evaluation. 5.1
Design Case Study: Password Storage
Consider a simple application in which the user stores their password on a mobile phone, in order to have strong passwords without having to remember them and type them on the keyboard. The goal of this password storage is to make the use of passwords as secure as possible, i.e. level 2. It should be possible to use it both in private and public environments. 1. Authentication methods: The default method is passwords that may reach level 2 if the passwords are tunneled or zero-proof passwords. 2. Data protection (a) Locking methods: Phone locking is not needed for level 2. (b) Secure hardware: For level 2, no secure hardware is needed. 3. Protocols: All protocols except challenge-response passwords may be used. 4. Attack resistance (a) Online guessing: Passwords must be strong enough to resist password guessing and dictionary attacks. They must also have at least 10 bit entropy. (b) Replay attacks: Tunneling can be used for mitigating replay attacks over a remote network. For transmission between the phone and a local computer, choose channels from Table 1 that are not open to attacks in public environments. (c) Eavesdropping: See replay attacks.
138
A. Vapen and N. Shahmehri
(d) MitM: Not needed for level 2. (e) Verifier impersonation: Not needed for level 2. (f) Session hijacking: Not needed for level 2. 5. Other factors: For transferring passwords from the phone to a local computer, manual input should be avoided in order to protect against keyloggers. In a public environment, NFC and a cable connection may be used. In a private environment IR, audio transfer and optical transfer may also be used. 6. Conclusions: One possible solution for this application would be to use strong, high entropy passwords (possibly automatically generated) stored on a phone and transferred to a nearby computer using a cable or NFC. Channels such as IR or Bluetooth are not appropriate since they are vulnerable to eavesdropping in a public environment. Finally, the password must be tunneled when sent over a remote network, or the solution would fail to reach even level 1. 5.2
Design Case Study: Online Banking
Consider a general online banking solution in which the user can perform several different tasks which require different levels of security. Examples of these tasks are: A) check account balance, B) withdraw money from the account and move it to another account, owned by another person and C) close the account. For A we assume that the security of a password is sufficient, given that there is additional protection against keyloggers and eavesdropping. Therefore, level 2 is chosen for A. For B, we aim for a higher security level than A. C is considered the most security-critical case and requires as high a level of security as is possible. We will now use our evaluation method to find suitable authentication solutions for A and B. For this scenario we assume Bluetooth access, a phone network, a phone camera and manual input. We assume that the solutions are to be used in public environments. 1. Authentication methods: A: All methods can be used. B: Only methods with non-reusable authentication data can be used. 2. Data protection (a) Locking methods: A: No locking needed. B, C: Manual or optical (biometric) input can be used for locking the phone. (b) Secure hardware: A and B: Secure hardware is not needed. If a SIMcard is used for secure storage, B may reach level 3.5. C: Secure hardware must be used, but is not present in phones. 3. Protocols: A: A strong password protocol such as tunneled or zero-knowledge password protocols may be used. Proof-of-possession protocols may also be used. B, C: A proof-of-possession-protocol must be used. 4. Attack resistance (a) Online guessing: A: Passwords must be strong and have a least 10 bit entropy. B, C: Passwords are not used. (b) Replay attacks: A: Bluetooth and optical data transfer should not be used for sending reusable data. Tunneling must be used for reusable data. All channels can be used for one-time data. B, C: All channels can be used.
Security Levels for Web Authentication Using Mobile Phones
139
(c) Eavesdropping: A, B and C: The same as for replay attacks apply. (d) MitM: A: MitM protection is not needed. B, C: The phone network can be used as a side channel. Other mitigation methods can also be used. (e) Verifier impersonation: A: Not needed for level 2. B, C: If an attack occurs, the attacker must not get access to sensitive data. Eavesdropping resistance will suffice in this case. (f) Session hijacking: A and B: protection against this type of attack is not needed for levels 2-3. C: Session secrets can be used for protection against hijacking. 5. Other factors: A: Only manual input is available for reusable data, but not suitable for passwords (malware risk). B: No other factors apply. C: Data transactions must be authenticated. 6. Conclusions: For checking account balance (A), one-time passwords with manual input may be used. An alternative is to transfer data via Bluetooth instead of manual input. A challenge-response protocol can be used instead of one-time passwords. In that case the response may be sent optically. Otherwise, manual input or Bluetooth may be used. For withdrawing money and moving it to another person’s account (B), the requirements for A applies, but with locking using manual or optical input and with additional MitM mitigation. For closing the account (C), the requirements for B applies, but with authenticated data transfer and session secrets. Since secure hardware is not available, sufficient security for C cannot be achieved with a mobile phone alone. The user can initiate the closing of the account with the solution from B, but then must add another method such as physically signing a form from the bank and sending it by mail. 5.3
Evaluation Case Study: 2-clickAuth
2-clickAuth [10] is an optical challenge-response solution that uses a cameraequipped mobile phone as a secure hardware token together with a web camera to provide fast, simple, highly available and secure authentication. Data is transferred both to and from the phone using two-dimensional barcodes. 1. Authentication methods: 2-clickAuth uses challenge-response with nonreusable data. Max level: 4 2. Data protection (a) Locking methods: 2-clickAuth can be used with a PIN-code to lock the phone. Max level: 4 (b) Secure hardware: No secure hardware is used. Max level: 3 3. Protocols: A proof-of-possession protocol with shared keys is used. Max level: 4 4. Attack resistance (a) Online guessing: Passwords are not used. Max level: 4 (b) Replay attacks: Non-reusable data is used. Max level: 4 (c) Eavesdropping: Since 2-clickAuth is intended for use by mobile users in diverse locations there is a risk of eavesdropping, but the data cannot be used to gain knowledge about secret keys. Max level: 4
140
A. Vapen and N. Shahmehri
(d) MitM: MitM protection is not used, due to availability. Max level: 2.5 (e) Verifier impersonation: There is a risk of verifier impersonation, but since the authentication data transferred does not reveal any sensitive information, the impersonating attacker does not gain access to secrets. Max level: 4 (f) Session hijacking: No session secrets are used. Max level: 3 5. Other factors: It should be possible to use 2-clickAuth in noisy environments such as public places, because it uses optical data transfer. Optical channels are also malware resistant, since data can only be sent as a direct result of user action. No secure hardware can be assumed. Secrets are not revealed to third parties. 6. Conclusions: 2-clickAuth may reach level 2.5 if used with a PIN code. To reach level 3, some kind of MitM mitigation (e.g. using SMS as a side channel) must be used. 5.4
Evaluation Case Study: Strong Authentication (SA)
The Strong Authentication (SA) solution consists of several variants wherein a phone operator is considered a trusted third party. A secret identifier stored on the user’s SIM card and listed in the phone operator’s user database is used to calculate session IDs and challenges. The user authenticates by using one of the following alternatives: A: The user sends an SMS to the SA server acknowledging that a session ID shown in the computer’s browser and one sent to the user’s phone are the same. B: The user sends an SMS to the SA server, containing a response calculated by the phone to a challenge shown in the computer’s browser and typed into the phone by the user (or sent via Bluetooth). A variant is that the user receives the challenge via SMS and sends the response via the computer. C: The EAP-SIM protocol is used for strong authentication via Bluetooth or SMS [9]. 1. Authentication methods: A: Identifier based on a static value. Max level: 2. B, C: Non-reusable data, comparable to one-time passwords. Max level: 4 2. Data protection (a) Locking methods: Since the SIM card is used, a PIN code can be assumed for phone locking. Max level: 4 (b) Secure hardware: SIM cards are used in all SA variants. Max level: 4 3. Protocols: A: Similar to using passwords. Max level: 2. B, C: Proof-ofpossession. Max level: 4 4. Attack resistance (a) Online guessing: Non-reusable data is used. Max level: 4 (b) Replay attacks: Authentication data can only be used during the session. Max level: 4 (c) Eavesdropping: A: The session ID is based on a static identifier and may be eavesdropped. Max level: 2. B, C: Not vulnerable to eavesdropping. Max level: 4
Security Levels for Web Authentication Using Mobile Phones
141
(d) MitM: A, B: Does not protect against MitM attacks, even if SMS is used for B. This is because the SMS channel is not used as a side channel, but as an alternative to short range channels. Max level: 2. C: MitM protection. Max level: 4 (e) Verifier impersonation: A, B: Does not protect against verifier impersonation. Max level: 2. C: Verifier impersonation protection. Max level: 4 (f) Session hijacking: A, B: Does not protect against verifier impersonation. Max level: 2. C: Verifier impersonation protection. Max level: 4 5. Other factors: SA requires the participation of a mobile phone operator. Shared secrets are not revealed to third parties in any of the SA variants. In A there are no shared secrets. Only C has authenticated data transfer. It is stated that SA needs to be usable. Bluetooth can spread malware between the phone and the computer. When there is a choice between SMS and Bluetooth, SMS is a better choice if it is feasible to use the phone network. 6. Conclusions: The session ID solution (A) is a simple solution that reaches level 1. Challenge-response solutions (B) reaches level 2 and EAP-SIM solutions (C) reaches level 3.5.
6
Related Work
The NIST guidelines for authentication [4] cover different areas of authentication and discuss technical details and formal security requirements. However, in their guidelines, security is the only factor. In this paper, the aim is to combine the well known and accepted security levels from NIST with factors such as availability, usability and economic factors. This should help developers and evaluators make the best choice among several solutions that meet the same security requirements. There is no comparison of authentication channels and methods made specifically for mobile phones. However, for the authentication system Strong Authentication with Mobile Phones, which uses a phone’s SIM card in authentication, there is a comparison between the different modes of the system in which different channels are used. The comparison shows how the different modes compare to each other when it comes to factors such as cost, infrastructure, security and usability [9]. Cost and infrastructure, e.g. which equipment and networks are needed, are not factors that are explicitly discussed in this paper. There is also work in progress on evaluating authentication solutions in the area of IMS (IP Multimedia Subsystem). The IMS evaluation method considers several different factors such as security, simplicity and userfriendliness [5].
7
Future Work
Our evaluation and design method can be extended to include cryptographic methods as well as examples of trusted hardware modules and their usage. We will introduce new factors, such as infrastructure and learnability to make the method more detailed.
142
A. Vapen and N. Shahmehri
We have already taken into account the cost issues regarding the use of phone networks and equipment that the user may need for short range communication via specific channels. We intend to investigate other cost issues, such as factors related to deployment and running of authentication systems. We will integrate these factors into our method to help developers create economically feasible solutions. Our method will also be adapted for different types of applications and user groups as well as for services and parts of services. We will also investigate the possibility of designing authentication solutions in which the user can actively change the authentication level, based on other requirements such as the current situation and application.
8
Summary and Conclusions
Hardware devices can help increase the security of authentication solutions. The mobile phone is a flexible and capable device with several channels for transferring authentication data. Different channels and authentication methods influence the level of security of authentication solutions, but it may be difficult to get an overview of all combinations of channels and methods. We provide a method for evaluating authentication solutions where mobile phones are used as hardware devices. This is different from the case where a user authenticates to a web site using the phone’s browser (i.e. the phone is used as a handheld computer). In that case the phone itself becomes untrusted. Our method is related to the Electronic Authentication Guideline, but has been adapted to apply to the specific problems of mobile phone authentication, especially considering communication channels. We have also introduced intermediary security levels to improve the granularity at which authentication methods can be compared. To the best of our knowledge, there are currently no other evaluation methods for phone authentication. The method is to be extended and can be used both for evaluating existing authentication systems and for designing new solutions. The goal is to help developers create secure authentication, taking availability, usability and economic factors into account.
References 1. Security requirements for cryptographic modules. Technical Report 140-2, National Institute of Standards and Technology (2001), http://csrc.nist.gov/publications/fips/fips140-2/fips1402.pdf 2. Aloul, F., Zahidi, S., El-Hajj, W.: Two factor authentication using mobile phones. In: AICCSA 2009, pp. 641–644 (May 2009) 3. Bonneau, J., Preibusch, S.: The password thicket: technical and market failures in human authentication on the web, pp. 1–10 (2010) 4. Burr, W.E., Dodson, D.F., Polk, W.T.: Electronic authentication guideline. Technical Report 800-63, National Institute of Standards and Technology (2008), http://csrc.nist.gov/publications/nistpubs/800-63/SP800-63V1_0_2.pdf
Security Levels for Web Authentication Using Mobile Phones
143
5. Eliasson, C., Fiedler, M., Jorstad, I.: A criteria-based evaluation framework for authentication schemes in IMS. In: Proceedings of the 4th International Conference on Availability, Reliability and Security, pp. 865–869. IEEE Computer Society, Los Alamitos (2009) 6. Hiltgen, A., Kramp, T., Weigold, T.: Secure internet banking authentication. IEEE Security & Privacy 4(2), 21–29 (2006) 7. Mannan, M.S., van Oorschot, P.C.: Using a personal device to strengthen password authentication from an untrusted computer. In: Dietrich, S., Dhamija, R. (eds.) FC 2007 and USEC 2007. LNCS, vol. 4886, pp. 88–103. Springer, Heidelberg (2007) 8. Rannenberg, K.: Identity management in mobile cellular networks and related applications. Information Security Technical Report 9(1), 77–85 (2004) 9. van Thanh, D., Jorstad, I., Jonvik, T., van Thuan, D.: Strong authentication with mobile phone as security token. In: IEEE 6th International Conference on Mobile Adhoc and Sensor Systems, MASS 2009, pp. 777–782 (October 12-15, 2009) 10. Vapen, A., Byers, D., Shahmehri, N.: 2-clickAuth - optical challenge-response authentication. In: Proceedings of the 5th International Conference on Availability, Reliability and Security, pp. 79–86. IEEE Computer Society, Los Alamitos (2010) 11. Wu, M., Garfinkel, S., Miller, R.: Secure web authentication with mobile phones. In: Proceedings of DIMACS Workshop on Usable Privacy and Security Software (2004) 12. Yan, J., Blackwell, A., Anderson, R., Grant, A.: Password memorability and security: empirical results. IEEE Security & Privacy 2(5), 25–31 (2004)
Challenges of eID Interoperability: The STORK Project Herbert Leitold A-SIT, Secure Information Technology Center - Austria, Inffeldgasse 16a, 8010 Graz, Austria
[email protected] Abstract. Secure means of identification and authentication is key to many services such as in e-government or e-commerce. Several countries have issued national electronic identity (eID) infrastructure to support such services. These initiatives however have often emerged as national islands; using eID crossborder has not been on the agenda in most cases. This creates electronic barriers. The Large Scale Pilot STORK aims at taking down such barriers by developing an interoperability framework between national eID solutions. The framework is tested in six concrete cross-border applications. In this paper, an overview of the STORK architecture and the pilot applications is given. Keywords: electronic identity, eID, STORK.
1 Introduction Who one is on the Internet is becoming an issue once sensitive or valuable data is being processed. Services that can be as diverse as filing a tax declaration online, inspecting one’s electronic health record, or accessing a bank account online need secure means of unique identification and authentication. To address the need for secure electronic identities (eID) in e-government, many states launched national initiatives. A study carried out by the European Commission in 2007 and updated in 2009 showed that 13 out of 32 surveyed countries issued government supported eID cards, eight countries have mobile phone eID solutions, and 22 also have username/password approaches [1]. The national eID infrastructure often emerged in isolation, developed only to meet sectorial, regional, or national requirements. Using the eID tokens – whatever the technological implementation is (card, mobile phone, etc.)– across borders was no priority for most states. In the Internal Market this can be a significant barrier for prospering of electronic services. The European Union has recognized that deficiency early: In the Ministerial Conference of Manchester 2005 a political agreement has been reached that European citizen shall benefit from “secure means of electronic identification that maximise user convenience while respecting data protection regulations” within a five year timeframe [2]. The ministerial declaration also addressed the national responsibility and competences: “Such means shall be made available under the responsibility of the Member States but recognised across the EU.” Concrete Action plans followed the declaration [3] and the discussion between Member States and the European Commission on eID interoperability intensified. S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 144–150, 2011. © IFIP International Federation for Information Processing 2011
Challenges of eID Interoperability: The STORK Project
145
A project as complex as to make heterogeneous national eID infrastructure interoperable, however, cannot be carried out just from pure desk research. Getting concrete experience in piloting the approaches in real world applications is advisable before making policy decisions. Such piloting shall make legal, operational or technical hurdles visible when deploying the concepts in practice. The project STORK1 is such an attempt of piloting approaches. The project and its findings are described in this short paper.
2 Project Overview STORK has been launched when fourteen EU and EEA Member States gathered together to form a consortium to bid for a Large Scale Pilot (LSP) grant under the European Commission Competiveness and Innovation (CIP) in the Information and Communication Technology Policy Support Programme (ICT-PSP) stream [4]. The STORK consortium has been extended to 17 partners in 20102. The STORK LSP started as a so-called “type A LSP” in June 2008 and lasts for three years until May 2011. The idea of type A LSPs is to advance European key ICT policy areas by large scale projects driven by the Member States themselves and cofunded by the European Commission. Four such key areas have been defined in the CIP ICT-PSP Programme, eID being one key area that finally resulted in STORK3. The STORK project structure can be divided into three successive phases: 1.
2. 3.
First came a taking stocks exercise. The purpose was to get in-depth insight into the national eID systems that need to be incorporated into the STORK eID interoperability framework. Legal, operational, and technical aspects have been investigated. The second phase was to develop and implement common technical specifications. This phase resulted in the STORK eID interoperability framework. Its results are described in the next section 3. As the proof of the pudding is in its eating, the interoperability framework has been deployed in several national production applications. Six such pilots have been defined. The pilots are sketched in section 4.
At time of writing this short paper, the first two phases have been completed and the third phase “piloting” has been launched. Deliverables – both specifications and its reference implementations – are in the public domain and can be accessed via the project web1.
3 Architecture and Interoperability Models The first issue the STORK project had to address was the heterogeneous nature of eID in Europe. One might think of eID being an ID card amended by smartcard or 1
2
3
STORK: Secure Identities Across Borders Linked, Project co-funded by the European Commission under contract INFSO-ICT-PSP-224993; https://www.eid-stork.eu The project started in 2008 with Austria, Belgium, Estonia, France, Germany Iceland, Italy, Luxemburg, Portugal, Slovenia, Spain, Sweden, The Netherlands, and United Kingdom. The 2010 extension included Finland, Lithuania, Norway, and Slovak Republic. The three other key policy areas under ICT-PSP and STORK’s sibling LSPs are eHealth (epSOS), electronic procurement (PEPPOL), and the Services Directive (SPOCS).
146
H. Leitold
signature functions, an online banking access certificate, or non-PKI solutions such a SMS transaction numbers or simply a username password combination. All these eID technologies are deployed to access national or regional e-government services in Europe. Given the various technical implementations, the question of how to trust the other’s systems comes upfront. The road followed by STORK was that the various eIDs are categorized into quality classes. This resulted in a Quality Authentication Assurance (QAA) scheme to map the Member States’ eIDs to a common scheme. The QAA scheme is based on an IDABC proposal [5] and also compatible with the Liberty Identity Assurance Framework [6]. The STORK QAA model defines four levels ranging from “QAA-1: low assurance” to “QAA-4: high assurance”. Assessing the QAA-level takes both the technical security of the eID token and the technical and organizational security of the issuance process into consideration. The model was developed so that secure signature-creation devices (SSCD) to create qualified electronic signatures and the issuance of qualified certificates compliant with the EU Signature Directive [7] meet the highest level QAA-4. A rationale is that legal recognition of qualified certificates is already given throughout the EU. It was felt that no higher requirements than for qualified signatures are needed for eID. The QAA-model is however defined, so that qualified certificates or SSCDs are no necessary condition to reach QAA-4. Recognition of eID across borders and data protection are the two main legal issues to be addressed. The former – legal recognition of eID – is the scare exemption in Europe. While recognition of qualified certificates is given with the Signature Directive, just few Member States such as Austria yet recognize foreign identifiers and eID tokens. On the second issue – complying with data protection requirements – the project came to the conclusion that explicit user consent is the proper basis for legitimacy of crossborder eID processing. The user giving consent does not rule out all legal obstacles, as e.g. some states defined protection of national identifiers in a way that its cross-border use is prohibited4. The STORK project is however not meant to solve complex legal issues. Identifying and documenting legal issues is however a main purpose of the LSPs in order to prepare for sound policy actions beyond the LSP lifetime. The main goal of the project STORK has been to develop common technical specifications for a cross-border interoperability framework. Two conceptual models “middleware” and “proxy” are covered and piloted: 1. 2.
In the middleware model a citizen directly authenticates at the service provider In the proxy model authentication is delegated to a separate entity
In the middleware model the service provider (SP) remains responsible both from a data protection perspective (as the data controller) and from an official liability perspective (no responsibility and thus no liability is shifted to a third party). The citizen-to-SP relation is just extended to foreign citizens. Each SP needs components (middleware) that can handle foreign eID tokens (we call that service-provider-side middleware “SPware”). The user experience is as if she would access a SP in her home country, as the components to recognize her eID are integrated to the SP. The proxy model centralizes integration of eID tokens by carrying out the authentication for the SP. This releases the SP from any integration of foreign eID tokens, but introduces an intermediary – the proxy – in data protection aspects (being 4
This can e.g. be overcome in STORK by cryptographically deriving other identifiers.
Challenges of eID Interoperability: The STORK Project
147
a data controller or processor) and a liability shift at least for the authentication process. A single supranational proxy instance was out of question, STORK decided in favor of one proxy service per Member State that handles its own eIDs and SPs. We refer to the proxy as Pan-European Proxy Service (PEPS). Two components are needed: The S-PEPS is located in the country of the SP. The C-PEPS is located in the citizen country. The process is as follows: If a foreign citizen accesses a SP, the SP delegates authentication to the S-PEPS. The S-PEPS redirects the citizen to her home C-PEPS that carries out authentication of its citizens. Successful authentication is asserted back to the S-PEPS, that finally is asserting that back to the SP. A remaining component in the architecture is referred to as virtual identity provider (V-IDP). This has been invented to bridge between the two conceptual models middleware and proxy. In a “middleware country” no central infrastructure (PEPS) is installed for privacy reasons. Thus the V-IDP has been introduced as a decentralized bridge. The V-IDP is installed either at the S-PEPS (for citizens from middleware countries accessing a SP in a PEPS country) or at the SP in a middleware country. The overall scenario is illustrated in figure 1. The STORK common specifications defined the cross-border protocols between the C-PEPS, S-PEPS and V-IDP (that are indicated as black boxes in figure 1). Established national protocols with the SP or Identity Service providers may remain, as STORK does not impose changes in the established national infrastructure.
Fig. 1. STORK overall architecture showing both the “PEPS model” and the “middleware model”
For the common specifications STORK relied on existing standards. The Security Assertion Markup Language (SAML) version two has been chosen. A browser single sign on profile [8] using an HTTP post binding has been chosen. Amendments to the existing standards have been kept to a minimum. Such amendments are e.g. needed to communicate the QAA levels a SP needs as a minimum to provide service.
148
H. Leitold
4 Pilots The project has originally defined five pilots, each addressing a specific challenge related to eID. A sixth pilot has been defined in the course of the project. The six pilots are: 1. 2. 3. 4. 5. 6.
Cross-border Authentication Platform for Electronic Services Safer Chat Student Mobility Electronic Delivery Change of Address A2A Services and ECAS integration
The main function needed in each pilot is authentication of the user. The first pilot Cross-border Authentication Platform for Electronic Services aims at integrating the STORK framework to e-government portals, thus allowing citizens to authenticate using their electronic eID. The portals piloting in STORK range from sector-specific portals such as the Belgian Limosa application for migrant workers to regional portals serving various sectors such as the Baden-Württemberg service-bw portal or national portals as the Austrian myhelp.gv for personalized e-government services. A specific challenge when deploying eID widely is that the strong identification that is usually needed in e-government is not necessary in all cases and sometimes even undesirable. Think e.g. of users that pseudonymously want to communicate in the Web. In the Safer Chat pilot juveniles shall communicate between themselves safely. The pilot is being carried out between several schools. The specific requirement is that in the authentication process the age group delivered by the eID is evaluated to grant access. Unique identification that is the basis of the other pilots is less important. Student Mobility supports exchange of university students, e.g. under the Erasmus student exchange program. As most universities nowadays have electronic campus management systems giving services to their students, STORK shall be used to allow foreign students to enroll from abroad using their eID, to access the campus management system’s services during their stay, respectively. The prime requirement is authentication, as in the first pilot on cross-border authentication. Enrolment in university however usually needs accompanying documentation in addition to eID, such as transcripts of records. These are often not available electronically. A preenrolment process is piloted where the student is preliminarily granted access to the campus management system until the accompanying evidence is presented. The fourth pilot’s Electronic Delivery objective is cross-border qualified delivery, replacing registered letters. On the one hand, delivering cross-border requires protocol conversions between the national delivery standards. On the other hand, qualified delivery usually asks for signed proof of receipts. The latter – signed proof of receipts – is the specific requirement in this pilot. This enables cross-border tests of the qualified signature functions that most existing smart-card based eIDs have. To facilitate moving house across borders, the pilot Change of Address has been defined. In addition to authentication, the pilot has transfer of attributes, i.e. the address, as a specific requirement. This extends the other pilots by addressing attribute providers that may be different from the identity providers.
Challenges of eID Interoperability: The STORK Project
149
The European Commission Authentication Service (ECAS) is an authentication platform that serves an ecosystem of applications that are operated by the European Commission. Member States use these to communicate among themselves and with the Commission. Piloting administration-to-administration (A2A) services with national eIDs is an STORK objective. The pilot A2A Services and ECAS Integration serves this objective by linking up STORK to ECAS.
5 Conclusions STORK has brought seventeen EU and EEA Member States together to define an electronic identity (eID) framework to support seamless eID use across borders. The idea was to make use of the existing national eID programmes and to build an interoperability layer on top of it. Two models have been investigated – the PanEuropean Proxy Service (PEPS) model and the middleware model. The PEPS model establishes central national authentication gateways, thus aiming at interoperability by dedicated services installed for the cross-border case. The middleware model integrates the various eID tokens technically into common modules deployed at the service provider (SP). Both models take explicit user consent as the basis for legitimacy of data processing and transfer, thus – aside technical measures – establishing consent as the root to data protection compliance. Six pilots have been defined to test the interoperability framework in real world environments. At the time of writing this paper, all six pilots have been launched and are operational. This gives confidence in the technical results. The piloting period of a few month is however too short to give sound results on user satisfaction, pilot stability, or protocol robustness. Such results are expected at the end of the piloting phase mid 2011. The Large Scale Pilots are expected to give valuable input into related policy actions. A major one in the eID field is advancing legal recognition of eID across borders. This is expected from the EU Digital Agenda that in its key action 16 defines to “Propose by 2012 a Council and Parliament Decision to ensure mutual recognition of e-identification and e-authentication across the EU based on online 'authentication services' to be offered in all Member States (which may use the most appropriate official citizen documents – issued by the public or the private sector);” [9]. Achieving such legal recognition together with the technical infrastructure that has been developed by STORK is expected a major leap on seamless eID use in Europe.
References 1. IDABC: Study on eID Interoperability for PEGS, European Commission (December 2009), http://ec.europa.eu/idabc/en/document/6484/5938/ 2. Ministerial Declaration approved unanimously on November 24, Manchester, United Kingdom (2005) 3. European Commission: i2010 eGovernment Action Plan: Accelerating eGovernment in Europe for the Benefit of All, COM(2006) 173 (2006) 4. Project STORK: Secure Identities Across Borders Linked, INFSO-ICT-PSP-224993, https://www.eid-stork.eu
150
H. Leitold
5. IDABC: eID Interoperability for PEGS - Proposal for a multi-level authentication mechanism and a mapping of existing authentication mechanisms (2007) 6. Liberty Alliance Project: Liberty Identity Assurance Framework (2007) 7. Directive 1999/93/EC of the European Parliament and of the Council of December 13, 1999 on a Community framework for electronic signatures 8. OASIS: Bindings for the OASIS Security Assertion Markup Language (SAML) V2.0 (2005) 9. European Commission: A Digital Agenda for Europe, COM(2010) 245 (2010)
Necessary Processing of Personal Data: The Need-to-Know Principle and Processing Data from the New German Identity Card Harald Zwingelberg Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein, Holstenstr. 98, 24103 Kiel, Germany
[email protected] Abstract. The new German electronic identity card will allow service providers to access personal data stored on the card. This imposes a new quality of data processing as these data have been governmentally verified. According to European privacy legislation any data processing must be justified in the sense that the personal data are necessary for the stipulated purpose. This need-to-know principle is a legal requirement for accessing the data stored on the eID card. This text suggests a model as basis for deriving general guidelines and aids further discussion on the question whether collecting personal data is necessary for certain business cases. Beyond the scope of the German eID card the extent and boundaries of what can be accepted as necessary data processing poses questions on a European level as well.1 Keywords: Necessary data processing, identity management, anonymous authentication, German electronic identity card, neuer Personalausweis.
1 Introduction Since November 2010 a new German identity card, called “neuer Personalausweis” (nPA) is being rolled out. The nPA promises new functionalities allowing the holder to securely and trustworthily identify herself online. Security is meant in relation to the underlying technology. Trustworthiness refers to the identifying information on the nPA which are verified by governmental bodies. The legal regulation on which the nPA is based took into account the protection of the holder’s privacy and enacted the requirements following from the European legal framework. The purpose binding principle is reflected by the law which requires that data collected from the nPA must be necessary for a previously specified and legally allowed purpose, § 21 sec. 2 PAuswG.2 Generally only the minimum data required for the stated purpose may be collected [1, p. 14]. As further details on which personal 1
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 216483 for the project PrimeLife. 2 Gesetz über Personalausweise und den elektronischen Identitätsnachweis (PAuswG = German Law on Identity Cards), available online: http://www.bmi.bund.de/SharedDocs/ Downloads/DE/Gesetzestexte/eperso.pdf?__blob=publicationFile S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 151–163, 2011. © IFIP International Federation for Information Processing 2011
152
H. Zwingelberg
data may be deemed necessary are missing in the law, this assessment is left to (1) legal practitioners at the Bundesverwaltungsamt (BVA), a German federal authority issuing the permit to access the data, (2) the data protection authorities supervising the use of access credentials and, eventually, (3) courts in case they become involved. The law provides for a final control of the holder prior to a transmission of any data from the nPA to data controllers. Contrary to other European eID solutions, the German solution employs a double-sided authentication [2, p. 4] requiring data controllers to transparently display their identity prior to any collection of personal data from the nPA. This requirement is enforced as access to data on the nPA is possible for third parties only when they present a valid access certificate containing their own identity to the holder of the nPA [3, p. 43; below section 2.1]. Governmental eIDs such as the German nPA also introduce a new level of privacyrelated issues: The personal data stored on the card have been proven and confirmed by a governmental authority. This provides a higher level of certainty for parties relying on the information and a safeguard against identity fraud. However, the use of such accurate data may have privacy-infringing aspects as well because it may become harder or even impossible for users to employ services anonymously or pseudonymously. It thereby contravenes basic concepts of modern privacy-enhancing identity management systems [4, p. 16]. While integrity of identifying information is valued and necessary in many business transactions, it would highly compromise privacy if personal information was always provably correct. In such a scenario humans were incapable to plausibly deny anything logged by a machine. In consequence information technology could take over control over personal privacy [cf. 5]. Therefore a balance between integrity of information required for trusted transactions and deniability preserving privacy was sought by the legislator and found by allowing the use of verified information only where necessary for the legitimate purpose at hand. Additional policy-related requirements in this respect are the principles of data minimization and purpose limitation that both may be derived from existing European and Member States’ privacy legislation [1, p. 14][6 at para. 2.30, 2.89]. Applying these principles to the nPA, the German law requires a governmental proof that the personal data transferred to a third party are actually necessary for the transaction in question. This proof must be provided in form of an access certificate. Necessity for processing of personal data / necessary data processing: According the German law, accessing personal data on the nPA is only allowed if it is necessary for a legally allowed purpose. This paper analyzes common use cases for the deployment of the nPA. It aims to identify general guidelines for the assessment whether the data processing is necessary.3 Such evaluation of the necessity will be done by the German authorities prior to allowing access to data stored on the nPA. 3
To the best knowledge of the author there had been no generic analysis of the requirement of necessity beyond the scope of individual cases [e.g. 10 at § 28 BDSG para. 14 et seq. with further references]. However, requirements for a web shop scenario had been analyzed within the Project PrimeLife3 in the context of developing a privacy compliant web shop frontend for collecting user data [9]. In regard to use of personal data retrieved from the nPA only the explanatory statements from the legislative process [3] are currently available. Based on the research leading to this paper an ad hoc working party of the German data protection authorities published guidelines for the access to data on the nPA [8].
Necessary Processing of Personal Data
153
The holder of an nPA is the customer or citizen involved in a transaction with the need to identify or authenticate himself towards a service provider or, in case of citizens, towards a public authority. The service provider offers any kind of service or goods. For the nPA, the German government acts as identity provider administering the personal data stored on the nPA.
2 Regulatory Framework The German regulatory framework for processing personal data in connection with the nPA is strongly influenced by the European Privacy Directives, namely the Data Protection Directive 95/46/EC (DPD) and the Directive on Privacy and Electronic Communications 2002/58/EC (e-Privacy Directive). The Directive 2009/136/EC amending directive 2002/58/EC has not been transposed into German law yet. 2.1 The German Personalausweisgesetz The legal basis for data processing with the nPA is the German Personalausweisgesetz (PAuswG) passed in June 2009. The nPA contains a chip which stores certain personal data on its holder. The data available for the identification service comprises: lock flag, expiry date, family name, given name, doctoral degree, date of birth, place of birth, address, type of document, service-specific pseudonym, abbreviation “D” for Germany, information whether the holder is older or younger than a given age, information whether the place of residence is identical to a given place, and a stage name for artists. This exclusive list is provided in § 18 sec. 3 PAuswG. Additional personal data such the holder’s picture or fingerprints are reserved for use by certain authorities with the right to identify persons (police, customs, tax fraud investigation and border police) [7, p. 672 et seq.]. These biometric data will not be available as part of the eID function and are therefore not part of this analysis. The personal data will be transmitted only when the service provider has shown the holder of the nPA an access certificate indicating the service provider’s identity, the categories of personal data to be transmitted, the purpose of the planned data processing, the address of the data protection authority in charge and the expiry date of the certificate, § 18 sec. 4 PAuswG. Prior to the actual transmission the holder has to give consent to the request which is displayed together with the access certificate in the software to be used with the nPA (called AusweisApp). At this stage she is enabled to deselect some of the information which will then not be transmitted to the service provider. This provides a certain extent of user control over the data transmission. For enhanced transparency every data transmission could be stored in a protocol file also including information about the receiving service provider. This protocol should be stored on the local machine under the user’s control. Whether such a feature will be part of the official software is unknown to the author but certainly it is advisable to keep such a protocol. Attaining an access certificate is a two-step process.4 To attain an access certificate, a service provider must first apply for an authorization issued by the BVA. 4
Data controllers will also have to proof secure data processing in a separate process. However, here only the legal requirements are mentioned as requirements in respect to technical and organizational measures are beyond the scope of this contribution.
154
H. Zwingelberg
At this step the BVA checks whether the purpose of the data processing is not violating the law and that the personal data is actually required for the stipulated purpose, § 21 sec. 2 no. 1 and no. 2 PAuswG. Authorizations are valid for a period up to three years. Access certificates issued on the basis of the authorizations by privately owned trust centers allow multiple accesses for the purposes fixed in the authorization and named in the certificate. This paper will identify patterns and develop guidelines for the question whether certain personal data are necessary for a given type of use case, e.g., a certain business or governmental process. 2.2 Staged Approach The system set forth by the PAuswG allows practically for three legal consequences: denial of access to the personal data stored on the nPA, disclosure only of the pseudonym or derived information, e.g., being over 18 years of age, or allowing access to those personal data necessary for the purpose at hand. The analysis done on basis of use cases provided a bigger picture showing that the amount of necessary data processing correlates with stages of typical contract negotiations. Another result of the analysis had been that besides the stage of the negotiations also other factors such as the chosen payment method are of major relevance. As the model helps to identify such issues this approach has been suggested for the assessment of the necessity of data requested by service providers when applying for an authorization at the BVA [8]. The staged model could also provide a basis for future discussion on the assessment of which processing of personal data can be deemed necessary. A better understanding of what processing is necessary will be useful for the development of privacy enhancing technologies such as automated matching and negotiating of user’s privacy preferences with a service provider’s privacy preferences [cf. 9]. In Stage 1 the holder only seeks information about a service. At this stage processing of personal data is usually not necessary. Rather the services should allow the holder to access the information anonymously or under pseudonym. This is also in conformity with European privacy legislation, [cf. recital 9 of the e-Privacy Directive]. Thus at the stage that an interested person contacts a service provider to get further information on a service or goods and the communication does not require personal information, in particular if the inquiry is done online and thus no mailing address is needed. The retrieval of answers should be possible anonymously as well. Stage 2 refers to all cases where proof is necessary, that the same person is acting at a later point of time. This may be accomplished by a means to recognize the person which is currently done with cookies or a pseudonymous username and password combination. The nPA provides for a specific pseudonym function allowing the proof that the same person is acting. Such a proof may be required for early contact negotiations where an interested person asks for a personalized service such as the conditions for a personalized insurance contract. Here the data controller does not need to know the identity of the interested person until the interested person decides to conclude the contract under the conditions offered. At Stage 3 some kind of auf authentication of the holder is required, e.g., being of a certain age for viewing age restricted previews on a video portal or the proof is needed of being domiciled in a certain municipality for accessing services reserved to its citizens. At this stage the holder may still remain anonymous as only the requested information is necessary but not an identification of the person acting.
Necessary Processing of Personal Data
155
Stage 4 requires an identification of the holder which may be the case when the service provider bears a financial risk due to delivering the service prior to receiving full consideration in return or where identification is required by law. In relation to public authorities a clear identification is always necessary when an administrative act addresses just a single person. The described pattern has been derived from typical negotiation processes where the stages often follow each other in a chronological order and seem to correlate with specific needs to gain information about the other party. But the stages may also appear in parallel and will then relate to different purposes such as an age verification required by law or the need to collect the address to ensure payment. Summarizing the conclusions from the model, it can be said that when assessing the necessity for a certain data processing due regard must also be held to the stages of the underlying negotiation process. Often it may not be not necessary to collect personal data immediately but rather at a later point. For the German nPA the conclusion has to be drawn that some data controllers may need more than a single certificate for certain business cases. For example allowing access to age restricted video previews on a website only requires the anonymous age verification but another certificate is needed to identify the holder when she eventually subscribes to the service. 2.3 Disclosures to Third Parties An imminent danger, also seen by the legislator, is that the verified information may be disclosed to third parties. In consequence the service provider may even become an identity provider herself thus undermining the security measures of the nPA. There are business models imaginable which are well justifiable and probably even beneficiary to the holders such as having verified identities on online auction platforms for increased trust. However, a transfer of the collected data to third persons would usually constitute a change of purposes and thus lay beyond the purposes for which the authorization has been issued and is not permissible, § 21 sec. 2 no. 1 PAuswG. An interesting legal question in this context is whether holders may allow such a change of purposes by means of informed consent. While consent in the sense of the DPD is system immanent for the nPA, as it is required for every transaction and ensured by the requirement to show the service provider’s access certificate before the holder may even enter her PIN to release the data. It is not, however, mentioned that consent may replace the proof of the necessity for clearly specified purposes. While the German legislator blocked processing for purposes of business-like transfer of data to third parties (e.g., address brokers or credit rating agencies) with the provision in § 21 sec. 2 no. 2 PAuswG [3, p. 43], it is not so clear whether uses such as having verified identities on social networks or online auction platforms may constitute an acceptable purpose. On the one hand one should bear in mind the danger that third party identity providers will undermine the security measures provided with the nPA and its underlying infrastructure. On the other hand introducing eIDs in Germany and other Member States of the European Union aims at making identification of oneself in electronic interactions easier so these opportunities for a voluntary identification should not be limited in an overdue manner. In particular having a verified identity in communities where participants other than the service provider itself rely on such
156
H. Zwingelberg
information, should be possible. But the holder must still retain power over the provided information and the data must only be passed on to third parties in certain predefined cases such as breach of contract. The holder must be able to revoke the consent for the future or for future transactions respectively. Besides this she should still be able to act under a pseudonym 2.4 Enforcement of Limitations To be effective the limitations on the use of personal data attained from an nPA require enforcement and continuous control by the responsible public authorities. For this the German law provides for possibilities to revoke authorizations and the BVA and federal and state data protection authorities (DPA) should stay in contact to ensure interpretation of the law consistent with the DPAs’ view on processing of personal data outside of the limited scope of application of the nPA. In respect to the enforcement of the limitations the German law provides that the authorization and in consequence any certificate issued on basis of the respective authorization must be revoked if the service provider received it by providing false information or if it should not have been issued according to the law in the first place, § 21 sec. 5 PAuswG. Besides these cases, the authorization should be revoked when the DPA for the specific service provider demands this as facts indicate that the service provider processes the data in an illegal manner, § 21 sec. 5 PAuswG. While all other legal instruments provided to the DPAs to react to illegal processing of personal data such as administrative fines remain unaffected and should be deployed as appropriate in a given case, the specific measure of asking the BVA to revoke authorizations should always be duly considered by DPAs whenever data retrieved from nPA are subject matter of the illegal processing. Even though § 21 sec. 5 PAuswG provides for a measure to correct individual decisions by the BVA in case of illegal processing, it seems preferable to maintain an ongoing information exchange between the BVA and the DPAs. For individual cases such a collaboration is already stipulated in § 29 sec. 3 of the German Personalausweisverordnung (PAuswV, German Regulation on Identity Cards). This allows the BVA to obtain a comment of the competent DPA whether a specific service provider applying for an authorization is known for practices of illegal processing. Further, periodical meetings between the BVA and representatives of the DPAs to exchange experiences and opinions are highly recommendable.
3 Necessary Data Processing – Standard Use Case Some standard business applications and use cases have been analyzed and evaluated in [8]. In the following the most central use cases will be shortly introduced showing the practicability of the staged model described above. The use cases have been drafted with the nPA in mind, but they may also aid to the understanding of what constitutes necessary data processing in general. So far the necessity of data processing has been analyzed in Germany for a set of individual cases, however, none of which directly related to applications of the nPA. The general rule used so far in data protection law is rather vague and states that processing is not necessary if a purpose may be reached without deployment of personal data in particular as an
Necessary Processing of Personal Data
157
adequate and reasonable alternative is equally or even better able to serve the desired purpose [10 at § 28 BDSG para. 14 et seq. with further references and historic overview][11 at § 28 BDSG para. 48]. 3.1 Cases Where the Law Requires an Identification Processing of personal data is generally allowed where specifically required by legal provisions.5 Some legal regulations require that the service provider identifies or authenticates6 its customers. This is for example the case for telecommunication service providers in Germany that are required by law to collect name and address for potential inquires by public authorities.7 Also German banks are required to identify customers opening an account due to regulations against tax fraud8 and money laundering9. According to the principle of purpose limitation set forth in Art. 6 sec. 1 b DPD, the data collected for such purposes must be stored separately from other personal data and must not be used for other than the legally required purposes which made collecting the data necessary [cf. 6 sec. 2.89 et seq.][12, p. 49]. 3.2 Exchange of Goods and Services For the probably most frequent transaction in practice, the exchange of goods or services for payment, an identification of the customer is necessary only when one party bears the risk of not getting the agreed benefit in return. This is particularly the case when one party is obliged to perform the contract before the other party. Filing a lawsuit against the customer may then become necessary. For this the civil procedural codes of the Member States presently require that name and address of the debtor must be indicated.10 Also within European civil procedural law it is accepted doctrine that the defendant must be served with the document which institutes the proceedings in sufficient time and in such a way as to enable him to arrange for his defense or otherwise recognition and enforcement of a judgment may be declined.11 In the absence of other reliable means to serve such documents knowledge of the name and address of a potential defendant can be deemed necessary. For the mentioned cases of pre-performance of the selling party the necessity of accessing data stored in the buyer’s nPA must be accepted while in cases where a risk of loss of payment is absent it may not be necessary to process personal data at all. In particular such a necessity cannot be derived solely from the fact of having concluded a contract with the data subject. This assumption is also backed by German and European data protection law which allows processing of personal data only to the extent necessary for the performance of a contract to which the data subject is party, see § 28 sec. 1 no. 1 BDSG and Art. 7 lit. b) DPD. 5
Cf. Art. 7 c) DPD. E.g., age verification in case of content or goods reserved for adults. 7 See for Germany § 111 TKG (Law on Telecommunication). 8 See § 154 AO (Abgabenordnung = German General Tax Code). 9 See § 3 sec. 1 GWG (Geldwäschegesetz = German Prevention of Money Laundering Act). 10 See for Germany § 253 sec. 1 ZPO (Civil Procedural Code). 11 Art. 34 no. 2 of the Council Regulation (EC) No 44/2001of 22 December 2000 on jurisdiction and the recognition and enforcement of judgments in civil and commercial matters (Brussels I Regulation), see also the almost identical rules in the so called Lugano Convention of 1988 applicable in relation to Iceland, Norway and Switzerland. 6
158
H. Zwingelberg
The flowchart in Figure 1 below shows how to assess whether a risk for a loss of payment may be assumed for the service provider making the processing of personal data from the nPA necessary. Note that in the opposite direction customers of services or goods are usually allowed to identify the service provider as the law grants the customer guarantees and other rights and remedies in case the service or goods lack conformity12 while the rights of the service provider usually extend only to the full and timely payment. contract with non-recurring obligations
legal duty to identify / authenticate customer?
no
payment fully received?
yes no
yes (e.g., prepayment by customer) yes (e.g., concurrent exchange of goods, paysafecard)
necessity (-)
necessity (-)
safe payment method? no (e.g., invoice)
necessity (+) name and address of holder
necessity (+) limited to the legal purpose
Fig. 1. Necessity to process personal data for contracts with non-recurring obligations
As any processing that is required by law (see 3.1) to identify or authenticate customers is necessary this question must be answered first and independent from the risk of financial losses. If no duty to identify the customer exists and the service provider has fully received the payment, e.g., in cases of prepayment via bank 12
In relation to consumers within the European Union see Art. 3 et seq. of Directive 1999/ 44/EC and other directives strengthening the rights of consumers.
Necessary Processing of Personal Data
159
transfer, processing of any personal data cannot be considered necessary. The same is true when a safe payment method is chosen that ensures that the service provider receives the payment, e.g., if the credit card company bindingly acknowledged the transfer or if the payment is done in direct exchange for the goods. If, however, a financial risk remains for the service provider, identifying information required to raise a civil lawsuit (name, address, as well as the date of birth) are necessary and may be processed. The necessity to collect name and address of contractual partners due to the potential need to file an action in court may fall away once alternatives for the reliable service of documents become available. In Germany a draft Act for a verified form of e-mail is currently in the parliamentary process.13 The so-called De-Mail service will provide a proof valid in court proceedings that a document has been received by the addressee. In consequence Art. 2 of the current draft of the De-Mail Act provides an amendment to § 174 sec. 3 German Code of Civil Procedure (ZPO) allowing service of documents (law: the formal delivery of a document such as a writ of summons) via the De-Mail service. Whether this will lead to the consequence that users of the DeMail service may provide the De-Mail address instead of the name and address and which consequences follow for existing authorizations needs to be evaluated in the future. 3.3 Contracts with Recurring Obligations Similarly in contracts with recurring obligations a need for identifying information may be given, particularly if one party is obliged to advance performance and thus has a risk of not receiving what has been promised in return. However, some specialties of recurring obligations should be taken into account. The necessary decisions to assess whether processing of personal data is necessary are displayed in Figure 2. Again a legal duty to identify or authenticate the customer always allows the collection and processing of personal data, while the use of this information is limited to fulfilling the said legal requirements. For contracts with recurring obligations the risk for the service provider differs from contracts described in section 3.2 above when she is legally allowed and able in fact to terminate the service immediately and unilaterally (e.g., pay-per-view services, prepaid mobile phone cards). A substantially higher risk for the service provider must be accepted where she is unable to withhold the service unilaterally as for example goods need to be returned (e.g., a rented car). A financial risk may even exist when the provider is able to terminate the performance unilaterally but the customer is bound for a fixed duration and obliged to pay for the whole period of the contract irrespective of quitting the use of the service (e.g., membership in fitness clubs or newspaper subscriptions with a fixed duration). In this case the service provider runs the risk of not receiving the whole payment unless the customer has paid for the whole period in advance.
13
Entwurf eines Gesetzes zur Regelung von De-Mail-Diensten und zur Änderung weiterer Vorschriften as of October 13th, 2010, available online: http://www.bmi.bund.de/ SharedDocs/Gesetzestexte/_Gesetzesentwuerfe/Entwurf_Demail.html
160
H. Zwingelberg
contract with recurring obligations
legal duty to identify / authenticate customer?
no
yes
unilateral termination by service provider?
yes no (e.g., rental car)
obligation of recurring payments? no (e.g., prepaid services: online games, pay-per-view) yes
necessity (-)
necessity (-)
yes (e.g., sports club)
prepayment for entire contract duration?
no
necessity(+) name and address of the holder
necessity (+) name and address of the holder
necessity(+) limited to the legal purpose
Fig. 2. Necessity to process personal data for contracts with recurring obligations
3.4 Pseudonymous Access Besides providing the classical identifying personal data, the nPA also offers the possibility to use pseudonyms. The pseudonyms generated from the certificates of the held by the service provider and stored on the user’s nPA allow a secure reidentification of a holder using the nPA by a service provider. Anytime where the primary use of the nPA is to securely re-identify a person, the pseudonym should be used instead of other personal data from the nPA, thus making it unnecessary to collect data such as the (verified) name and address. The pseudonyms will be created for every relation of an nPA and a service provider, thus avoiding any risk to link a holder’s activities across websites of different providers. With the pseudonym function the nPA offers a secure token-based method for authentications. But the effective necessity for such functionality will have to be seen in practice. At present cookies provide a similar functionality of re-identifying a user, at least wherever the sensitivity of a service does not require a more secure tokenbased authentication. While non-sensitive services such as virtual shopping carts for online stores may be implemented with cookies this evaluation will change for more personalized services. A potential use case for the pseudonym function of the nPA
Necessary Processing of Personal Data
161
may be the request for individual insurance conditions by providing data such as age, weight or previous diseases without disclosing ones identity. The secure re-identification will then allow retrieval of calculated individual insurance rates and the conclusion of the contract under the previously stipulated conditions. The current version of the nPA does not allow disclosing the identity of the holder using a pseudonym in case this may become necessary. This may be the case if disputes arise requiring a civil lawsuit. Lacking the capability to reveal the true identity of the holder under certain predefined conditions the pseudonym function will not be able to replace the verified name and address in the abovementioned contractual relationships. If eventually future versions of the German nPA or other European eID schemas offer such recoverable pseudonyms, the necessity to process personal data must be reconsidered. In this case it would be very desirable from a privacy perspective if the recoverable pseudonym would not substitute the existing pseudonym but rather complement it for the use cases mentioned above. 3.5 Claiming the Right of Access The nPA will also be useful for verifying that only the authorized person claims the right of access to her personal data as granted by Art. 12 DPD and § 34 German Federal Data Protection Act (BDSG). But identifying oneself in general or even using the nPA must not become a prerequisite for claiming the right of access granted by the DPD. Rather the data controller has to send the requested information to the address stored in his system unless there is doubt regarding the identity of the requesting data subject, e.g., when persons with identical names and birthdates appear in the database [11 at § 34 BDSG para. 26]. Identification with the nPA will be necessary if the information is asked to be transferred elsewhere than to the address registered with the data controller. The nPA may in particular promote the right of access insofar as it enables data controllers to offer easy access to data subjects in online environments and eventually in real time. But again data controllers must not use the data collected for the identification of the requesting person for other purposes than complying with the right of access. In particular the data provided must not be used to update or complement existing customer data.
4 Conclusion and Outlook The model described in section 2.2 above allows the assessment of the necessity to process personal data in a variety of use cases. It provides a first refinement and aid for legal practitioners that need to evaluate the question of necessity in a given case. In particular the model allows checking whether less information may suffice. A central finding that needs to be communicated to the legal practice is that the amount of data required depends massively on the stage within the process of contractual negotiation. As there is no need to collect information about customers that just seek for first information on goods or services, it is in consequence not allowed to force users to disclose personal data unless a contract is concluded (see above sections 3.2 and 3.3). For the nPA this will mean that businesses cases in which earlier stages of the negotiation require a safe identification will need more than one certificate to collect the data from the nPA.
162
H. Zwingelberg
The anonymous proof of the holder’s age and place of living provided by the nPA is a major advancement for the user’s privacy as it makes more detailed identifications unnecessary when actually only an age verification is required. From the privacy viewpoint a desirable improvement of the nPA would be the support of the anonymous credential technology as is currently provided by IBM’s Identity Mixer14 and Microsoft’s U-Prove15 system. This would allow for anonymous authentication of claims such as being member student of a certain university or employee of a certain entity. Here the nPA might be of value in the process of attaining these certificates online from the issuer (a trusted party, e.g., the university).16 Once such a technology is widely available, a re-evaluation of the necessity of data processing will presumably show that in many cases processing of personal data may become dispensable as anonymous credentials issued by a trusted party can replace the identification while still providing the needed security and trust.
References 1. Naumann, I. (ed.): Privacy and Security Risks when Authenticating on the Internet with European eID Cards. ENISA Risk Assessment Report (2009), http://www.enisa.europa.eu/act/it/eid/eid-online-banking 2. Kubicek, H., Noack, T.: The path dependency of national electronic identities. In: Identity in the Information Society (IDIS), pp. 111–153 (2010), http://www.springerlink.com/content/17t6467515511359/ fulltext.pdf 3. Bundesregierung (German Federal Government): Entwurf eines Gesetzes über Personalausweise und den elektronischen Identitätsnachweis sowie zur Änderung weiterer Vorschriften (reasoning for German Law on Identity Cards). In: Bundestagsdrucksache (BT-Ducks.) 16/10489 (2008), http://dipbt.bundestag.de/dip21/btd/16/104/1610489.pdf 4. Leenes, R., Schallaböck, J., Hansen, M. (eds.): PRIME White Paper. Deliverable of the Project PRIME – Privacy and Identity Management for Europe (2008), https://www.prime-project.eu/prime_products/whitepaper/PRIMEWhitepaper-V3.pdf 5. Rost, M., Pfitzmann, A.: Datenschutz-Schutzziele – Revisited. Datenschutz und Datensicherheit (DuD) 33(6), 353–358 (2009), http://www.maroki.de/pub/privacy/DuD0906_Schutzziele.pdf
14
Binaries, source code and documentation are available online at: http://www.primelife.eu/results/opensource/55-identity-mixer 15 Documentation available online: http://connect.microsoft.com/site642/ Downloads/DownloadDetails.aspx?DownloadID=26953 16 A project providing a connection between the nPA and U-Prove has recently received the “Tele TrusT Innovation Award”, http://www.heise.de/security/meldung/ISSE-2010-Innovation spreis-fuer-Fraunhofer-Projekt-zum-neuen-PersonalausweisUpdate-1104004.html. The European project ABC4Trust which was launched in November 2010 will take up PrimeLife results on anonymous credentials and address the challenge of interoperability between different credential systems, see www.abc4trust.eu
Necessary Processing of Personal Data
163
6. Kuner, C.: European Data Protection Law: Corporate Compliance and Regulation, 2nd edn. Oxford University Press, USA (2007) 7. Polenz, S.: Der neue elektronische Personalausweis. E-Government im Scheckkarten format. Multimedia und Recht (MMR) (10), 671–676 (2010), http://beck-online.beck.de/ 8. Ad-hoc working party of the German data protection authorities: Datenschutzrechtliche Leitlinien für die Erteilung von Berechtigungen nach § 21 Abs. 2 PAuswG aus Sicht der Ad-hoc-Arbeitsgruppe nPA der Datenschutzbeauftragten des Bundes und der Länder. Final version as of September 10th (2010), http://www.datenschutzzentrum.de/neuer-personalausweis/ 9. Fischer-Hübner, S., Zwingelberg, H. (eds.): UI Prototypes: Policy Administration and Presentation – Version 2. PrimeLife Deliverable D4.3.2 (2010), http://www.primelife.eu/results/documents 10. Gola, P., Klug, C., Körffer, B., Schomerus, R.: BDSG Bundesdatenschutzgesetz – Kommentar, 10th edn., Beck, Munich, Germany (2010) 11. Däubler, W., Klebe, T., Wedde, P., Weichert, T.: Bundesdatenschutzgesetz – Kompakt kommentar zum BDSG, 3rd edn., Bund, Frankfurt, Germany (2010) 12. Carey, P.: Data Protection – A Practical Guide to UK and EU Law, Oxford (2009)
A Smart Card Based Solution for User-Centric Identity Management Jan Vossaert1 , Pieter Verhaeghe2 , Bart De Decker2 , and Vincent Naessens1 1
Katholieke Hogeschool Sint-Lieven, Department of Industrial Engineering, Gebroeders Desmetstraat 1, 9000 Ghent, Belgium
[email protected] 2 K.U. Leuven, Department of Computer Science, DistriNet, Celestijnenlaan 200A, 3001 Heverlee, Belgium
[email protected] Abstract. This paper presents a prototype of a previously proposed user-centric identity management system using trusted modules. The trusted module, implemented using a smart card, can retrieve user attributes from identity providers and offer them to service providers, after authentication. This paper allows an evaluation of the practical feasibility of the identity management architecture and provides insight in several design decisions made during the prototype implementation. Also, the cryptographic protocols implemented in the prototype are discussed. Keywords: user-centric identity management, privacy, security.
1
Introduction
Many service providers want to control access to their services and critical resources. To address accountability and support personalized services, many services require the user to disclose personal information during a registration phase. Existing federated identity management systems (FIMs) offer a straightforward solution, in which a trusted party manages and releases attributes of an individual. Unfortunately, the privacy of a user is often neglected or not dealt with appropriately. Growing concerns about the privacy of individuals require new solutions that give an answer to these requirements. On the other hand, electronic identity solutions are rolled out in many countries. Many approaches are based on smart cards. An electronic identity card typically stores a set of immutable attributes that can be released during authentication. However, electronic identity cards are combined with FIMs if mutable attributes are requested. Moreover, the user has no or limited control over the attributes that are released. This paper presents an implementation of a user-centric identity management system based on trusted modules, presented in [15]. The prototype is implemented using a TOP IM GX4 smart card as trusted module. The contribution of this paper is threefold. First the paper allows for a evaluation of the practical feasibility of the proposed architecture. Second, this paper presents concrete protocols that realize the security and privacy requirements stated in [15]. Third, S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 164–177, 2011. c IFIP International Federation for Information Processing 2011
A Smart Card Based Solution for Identity Management
165
this paper provides insight in the important design decisions made during the prototype implementation. The rest of this paper is structured as follows. Section 2 discusses related work. Section 3 recapitulates the general approach presented in [15]. Section 4 presents the roles, key infrastructure, requirements and notations. The major protocols are discussed in section 5. Section 6 focuses on implementation details. A prototype evaluation is done in section 7. This paper ends with general conclusions and points to future research.
2
Related Work
In the federated identity management model [6,1], of which Shibboleth [9], CardSpace [4] and OpenID [13] are common examples, a user is known to at least one organization (i.e., the identity provider) in the federation (i.e., a group of organizations with mutual trust agreements). If a user contacts a service provider, authentication is delegated to the identity provider of the user. The identity provider releases the personal data that are requested by the service provider. Therefore, user impersonization by identity providers is inherent to the FIM model. Moreover, attribute aggregation [7] is often not supported. Further, many identity providers still use password based authentication since often no infrastructure for strong authentication is available. Several European countries are issuing governmental eID cards [10] to tackle these shortcomings. However, many designs are not flexible as service providers can only request attributes that are stored in the card itself. Attributes are typically stored in the card during the whole card’s lifetime. This implies that only immutable attributes can be stored in the card. Our approach aims at eliminating the drawbacks of existing federated identity management systems and current eID initiatives Strong authentication is also realized by Jøsang and Pope [8] who present a user-centric identity management model using a personal authentication device (PAD). Each service provider can store an authentication token on the PAD of the user. Our work generalizes the PAD concept to a personal identification device with extended functionality (e.g., support for multiple identity providers, deanonymization) and a concrete implementation is presented. A similar approach is taken in PorKI [12] where users can delegate temporary proxy credentials to workstations using their mobile devices. Suriadi et al. [14] propose a user-centric federated single sign-on system based on the private credential system proposed in Bangerter et al [2]. However, the computationally expensive nature of the credential scheme limits the feasibility in mobile solutions. Similar concerns apply to the identity management system [5] proposed in the PRIME project. Moreover, this system is less flexible since attributes are embedded in immutable credentials. Multiple credentials need to be used if not all the requested attributes are contained in a single credential.
166
3
J. Vossaert et al.
General Approach
This paper proposes an implementation of a privacy friendly user-centric federated identity management approach based on a trusted smart card. The smart card is the mediator between identity providers and service providers. More precisely, an identity provider can store some of the user’s personal attributes (or properties thereof) in her smart card. Information that is endorsed by the identity provider can then be disclosed to service providers. The latter use the information to provide fine-grained access control and offer personalized services. The smart card controls access to identity information. Before the user is authenticated, the service provider first has to authenticate to the smart card and prove that it is authorized to access certain personal attributes. The smart card verifies the acceptability of the service provider’s information request. This verification ensures that only information from identity providers is queried, for which the identity providers (or their representative) gave their consent. The authorization info is included in the certificate (or credential) of the service provider. Additionally, the user may further restrict access to personal information through a policy or an explicit consent. If the query is acceptable, the smart card forwards this request to the identity provider(s) that can provide the information.
4
Roles, Key Infrastructure, Requirements and Notation
Roles. Nine different roles are distinguished in our solution. The user (U) is the owner of a smart card (SC). The card is considered a trusted computing platform by the different actors in the system. It means – amongst others – that the user trusts that the card will never release more than the requested attributes (for which the service provider has been authorized) and that the service provider trusts the card to release genuine attributes. Further, the card also provides a safe environment for storing private information such as keys (i.e., no data can be directly extracted from the cards memory). A service provider (SP) can implement attributed-based access control and/or offer customized services by requiring user authentication with a valid smart card. An identity provider (IP) manages user attributes that can be retrieved by identity cards during authentication with service providers. Deanonymization providers (DP) are responsible for deanonymizing users in case of abuse. A provider (P) can either be an identity, service or deanonymization provider. Certification authorities (CA) issue card and server certificates. These certificates are used during mutual authentication between card and provider. The card issuer (CI) issues identity cards to users. The revalidation center (R)(re)validates or blocks cards at regular times. The middleware (M) is a software layer that enables communication between card and provider(s). The middleware also allows the user to monitor and influence the authentication process (e.g., view the service provider authentication certificate, view and modify the attribute query, enter PIN). The card also stores an internal private variable lastValTime which represents the time R last verified the revocation status of the card (see section 5). This variable serves two goals.
A Smart Card Based Solution for Identity Management
167
First, it is used by the card during authentication with a provider for verifying that the revocation status of the card was verified by R at a time after the last accepted validation time posed by the provider. Second, since smart cards do not have an embedded real-time clock, the lastValTime is used to verify the validity period of certificates. Key Infrastructure. Each card holds a unique master secret, KU , that is used to generate service specific pseudonyms. KU can either be user-specific or cardspecific. The former strategy allows individuals to reuse KU in multiple smart cards (e.g., when the previous card is defect or lost). If KU is card-specific, it would be useful to have a mechanism that allows proving a link between (old and new) pseudonyms generated by two different cards of the same user. A revalidation key pair (SKR , PKR ) is used to set up a secure authenticated communication channel between a smart card and the card revalidation authority. PKR is stored in each smart card during initialization, the corresponding private key is only known by the revalidation center. A common key pair (SKCo , PKCo ) is used by a large set of smart cards. PKCo is embedded in a certificate CertCo . SKCo and CertCo are stored on the card during initialization. This allows the identity, service and revalidation providers to verify that a genuine smart card is used (without revealing unique identifiers). Each service provider, identity and deanonymization provider has an asymmetric key pair (SKP , PKP ). SKP and CertP are used to authenticate to smart cards. SKP is certified by a CA, resulting in CertP . Section 6 elaborates on additional data that is kept in CertP . The public keys of (root) certification authorities in the system are placed on the card during initialization. This allows smart cards to verify the validity of certificates. A newly generated session key Ks is used to securely transmit data between a card and providers. Requirements. The requirements to which the prototype system must satisfy are listed below: – Functional requirements: • F1 : Service providers can retrieve personal attributes either stored in the identity card and/or managed by an identity provider. • F2 : Cards can be personalized (e.g., through privacy policies and preferences). • F3 : Adding new providers is straightforward. • F4 : Cards can be used online and offline. – Security and privacy requirements: • S1 : Strong mutual authentication and secure communication between users and providers (including revalidation authorities). • S2 : Controlled access to user attributes (i.e., based on rights/privileges and personal preferences). • P1 : Service specific pseudonyms of a user are unlinkable (even by the card issuer). These pseudonyms also encompass pseudonyms of users towards deanonymization and identity providers. One service provider can offer multiple services for which different pseudonyms are generated.
168
J. Vossaert et al.
• P2 : Support for conditional anonymity during authentication. • P3 : Support for anonymous subscriptions. – Performance and scalability: • O1 : The prototype system should have straightforward and easy management functions. • O2 : Economic use of computationally expensive operations should result in acceptable performance of the prototype system. Notation. During the protocol listings, authenticated encryption is assumed. Authenticated encryption can be realized using several block cipher modes [11] or by explicitly adding a MAC to the message [3] (requiring an extra integrity key). If message integrity verification fails, an exception is thrown and retransmission of the previous message is requested. Arrows (→ or ← ) represent the direction of communication. We assume that during a protocol run, the same connection is used. Dashed arrows ( or ) represent communication over an anonymous channel. Variables of the card are shown in teletype font; if the variable is underlined, it is stored in temporary memory.
5
Card Operations
This section discusses the most important operations performed by the card and gives the cryptographic protocols implemented in the prototype to achieve the security and privacy requirements. Card Validation. During card validation (Table 1), the lastValTime is updated with the current time. Prior to updating, the revocation status of the card is checked. This requires unique identification of the card. This is realized by setting up an secure authenticated channel between the card revalidation center and the card using CertR and CertCo . Over this channel, the card releases its (unique) serial number which allows the revalidation authority to block the card (if it has been reported lost or stolen) or update its lastValTime. The lastValTime will be used during authentication with providers allowing them to trust the card’s current revocation status without the card releasing any uniquely identifying information. A location hiding communication channel can be setup between the middleware and R in order to hide the whereabouts of the card holder. Mutual Authentication between Card and Providers. During mutual authentication (Table 2), the provider first authenticates to the card. This is done using CertP and the public key of the CA stored on the card. Next, the card authenticates to the provider using the common certificate CertCo and the lastValTime. CertCo enables the verification of genuine smart cards (i.e., no uniquely identifying information is released) and the card checks its revocation status using lastValTime and a reference time from the service provider. During authentication, a session key is generated. This phase results in a secure anonymous mutually authenticated communication channel1 . Authentication of 1
We assume that several other requirements are met e.g., the identity set is sufficiently large, an anonymous communication channel is used.
A Smart Card Based Solution for Identity Management
169
Table 1. The card is periodically revalidated by the revalidation center
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
M R : SC ← M R : SC : SC : SC : SC : SC → M R : R: R: R: R: R: R: SC ← M R : SC :
revalidateCard(): "RevalidationRequest" c := genRandom() sig := sign(c , SKCo ) Ks := genRandom() Ekey := asymEncrypt(Ks , PKR ) Emsg := symEncrypt([CertCo , sig, chip number], Ks ) Emsg , Ekey Ks := asymDecrypt(Ekey , SKR ) [CertCo , sig, chip number] := symDecrypt(Emsg , Ks ) if (verifyCert(CertCo ) == false) abort() if (verifySig(sig, c, PKCo ) == false) abort() time := getCurrentTime() if (isValid(chip number) == false) time := -1 Etime := symEncrypt(time, Ks ) lastValTime := symDecrypt(Etime , Ks )
the provider is realized by generating and encrypting a session key with PKP on the card and sending it to the provider. All communication between card and provider is now encrypted with the session key. Hereby, a cryptographic link between the authentication of the provider and card is established. The authentication of the card is then executed by signing a challenge from the provider on the card using SKCo . Release of User Attributes. After mutual authentication, the card can disclose the user’s attributes over the previously established secure channel. Attributes that are not stored on the card can be retrieved by the card from identity providers. This also requires authentication between card and identity provider. The certificates of the service and identity provider restrict the attributes that can be queried and provided respectively (see section 6). These restrictions are enforced by the card. Attributes can also be a service specific pseudonym or a deanonymization item (i.e., an encryption of an identifier of the user). The first allows users to have persistent identifiers with a service without revealing identifiers that can be linked to other services. This identifier is generated by taking a cryptographic hash of KU with the identifier of the service provider (obtained from the certificate). The deanonymization entry is realized by probabilistically encrypting the concatenation of the service specific pseudonym of the user towards the selected deanonymization provider, the lastValTime and the conditions required for deanonymization with the key of the deanonymization authority (i.e., {hash(KU || IDDP ) || lastValTime || CertSP .deanonCond}P KDP ). A list of trusted deanonymization providers is contained in the service providers’ certificate. Before users can use the services of a deanonymization provider, registration is required. During registration the deanonymization authority requires users to release some attributes and stores these in a database linked to the service specific pseudonym of the user. The card stores the public key of the authority
170
J. Vossaert et al. Table 2. Mutual authentication between provider and card
(1) SC (2) SC (3) SC (4) SC (5) SC (6) SC (7) SC (8) SC (9) SC (10) SC (11) SC (12) SC (13) SC (14) (15) (16) (17) (18) SC (19) SC (20) SC (21) SC (22) SC (23) SC (24) SC (25) (26) (27)
← M P : : : : : : : : : : : : → M P : P: P: P: P: ← M P : : : : : : → M P : P: P: P:
authenticate(): CertP if (verifyCert(CertP )==false) abort() if (CertP .validEndTime < lastValTime) abort() sesId := startNewSession(); session[sesId].maxRights := CertP .maxRights session[sesId].Subject := CertP .Subject Ks := genRandom() session[sesId].Ks := Ks c1 := genRandom() session[sesId].chal := c1 Ekey := asymEncrypt(Ks , CertP .PK) Emsg := symEncrypt(c1, Ks ) EKey , Emsg , sesId Ks := asymDecrypt(Ekey , SKP ) c1 := symDecrypt(Emsg , Ks ) c2 := genRandom() Eresp := symEncrypt([c1 +1, accValTime, c2 ], Ks ) sesId, Eresp [resp, accValTime, c2 ] := symDecrypt(Eresp , session[sesId].Ks) if (resp != session[sesId].chal+1) abort() if (lastValTime < accValTime) abort() session[sesId].auth = true sig := sign(c2, SKCo ) Emsg := symEncrypt([CertCo , sig], session[sesId].Ks ) [CertCo , sig] := symDecrypt(Emsg , Ks ) if (verifyCert(CertCo ) == false) abort() if (verifySig(sig, c2, CertCo .PK) == false) abort()
together with the name and identifier. Once the deanonymization authority receives a deanonymization item, it can decrypt it and, consequently, link it to the unique identifiers stored in the database. Before revealing the identity of the user, the conditions contained in the deanonymization item are verified. Although the card itself does not foresee a secure interaction mechanism with its holder, the query from the service provider can be displayed to the card holder using the middleware. The attributes are only released after the user’s consent (e.g., after entering a PIN code). The user can also choose not to release some attributes requested by the service provider by removing them from the query.
6
Implementation Details
This section zooms in on several design decisions. The prototype is developed using the Java Card 2.2.1 framework and is deployed on a TOP IM GX4 smart card. Certificates. A hybrid certificate solution is used in the prototype. The card uses standard X.509 certificates to authenticate towards providers; a provider uses card verifiable certificates (CVCs) to authenticate towards a smart card. This strategy ensures interoperability (i.e., providers do not need to install a
A Smart Card Based Solution for Identity Management
171
custom certificate verification module) while avoiding parsing complex certificate structures on the card. Each CVC contains standard information (issuer, subject, role, validity interval etc.). Moreover, each CVC that is issued to an identity provider contains an attributeList that lists the identifiers of attributes it may supply, together with a level of assurance (LOA). Authoritative identity providers typically guarantee a high level of assurance for many attributes. The CVC used by the deanonymization and service providers also contains an attributeList, which indicates the attributes that can be requested, together with the minimal level of assurance. Moreover, a trustedIDPList restricts the set of acceptable identity providers (or IDP groups) for the service. The card will only release attribute values to that particular service provider that were fetched from identity providers of the list (with an appropriate LOA). Note further that CVCs have a short lifetime. This is necessary to ensure a short window of vulnerability as revocation checks are not performed on the card. Each provider also has a X.509 certificate with a long lifetime. The latter is used to authenticate towards CAs that issue the CVCs. Discovery of Identity Providers. Each attribute in the system is represented by an Attribute object. Each of them has a unique identifier. Some attribute values are cached on the card. Other attribute values need to be fetched from identity providers. Therefore, each card keeps a list of IdentityProvider objects. They define the set of identity providers in which the owner has enrolled. Each IdentityProvider object maintains references to a set of attribute objects. They refer to the Attribute objects that can be retrieved from the respective provider. When an attribute query is received, the query handler first looks for the attribute value in the cache. The cached attributes that meet the prerequisites (e.g., LOA, trusted identity provider) are selected. The remaining attributes are fetched from identity providers. The handler selects a minimal set of acceptable identity providers that can supply the remaining attributes, hence, ensuring that only a minimal set of connections to identity providers is required. Memory Management. Smart cards have limited memory. The card used for the prototype has around 70K bytes of available EEPROM. Moreover, the Java Card virtual machine does not implement a garbage collector, nor is it possible in Java to explicitly release memory. Therefore, all required memory should be allocated at the beginning of the program and continously reused. Caching attributes. A fixed set of byte arrays of variable length is allocated to cache attribute values. These arrays are embedded in AttributeValue objects that also keep context information such as retention time, LOA, time of last usage, etc. For an optimal implementation, the distribution of the average length of each type of attribute should be calculated. This caching strategy is straightforward while limiting fragmentation. When an attribute value is fetched from an identity provider, it might be necessary to remove another attribute from the cache. The following selection strategy is applied. First, a predefined number of AttributeValue objects with the smallest memory footprint still large enough to keep the new attribute value are selected. Consequently, the least recently
172
J. Vossaert et al.
used attribute value is replaced. Persistent attributes (see section 6) are not considered by the cache update policy. Static memory configuration. Since all memory allocations occur when the applet is deployed on the card, the attribute query length, the maximum number of supported identity providers and cached attributes . . . are fixed. Dynamically assigning memory increases flexibility. However, replacement strategies become more complex. Therefore, in the prototype, memory is configured statically. Hereby, the initializer can define the amount of memory assigned to the different parts of the program when installing the applet. For instance, the initializer can opt for allocating only a limited amount of memory for identity providers while increasing the attribute cache. Personalized Policies. The policy engine on the card enforces the policies of the service and identity providers. The policies are specified in the CVCs (cfr. ...List). The user can further restrict the access policy (i.e., he can update his policy using his PIN code). First, the user can select the attributes that will be cached on the card until their retention time has expired. Those attributes are marked as persistent. Moreover, the user can assign a trust level to service providers: untrusted, default and trusted. Requests from untrusted service providers are blocked. In the default policy, user confirmation is required before the attributes are released. If a service provider is trusted, the user is no longer involved in the attribute(s) disclosure. The query, however, is still verified using the privileges listed in the CVC. Moreover, the user can also mark an Attribute as sensitive. If so, user consent (i.e., PIN code) is always required if that attribute needs to be released. The card issuer determines a set of trusted CAs of which the public key is stored in the card during initialization. However, users can further restrict the list of trusted CAs by deactivating keys of untrusted CAs. Although the user cannot further expand this list of trusted CAs, previously deactivated keys can be reactivated. Analogously, keys from deanonymization providers can be deactivated (or even removed which allows registration with a new deanonymization organization if no empty slots were available) and reactivated. Anonymous Subscriptions. For some applications, e.g., news sites, entrance control in buildings,. . . , it is not necessary for a user to disclose a persistent identifier. For instance, news sites only need to verify whether the user has a subscription that allows him to view the requested content. To support this, a Subscription object is initialized during enrollment of the user with the service provider. The Subscription object contains the id of the service provider, a validity period and a type. The type allows the service provider to verify whether the requested service (or content) is included in the subscription of the user. The validity period allows the service provider to specify subscriptions that expire after a specified time. Note that the actual validity period does not have to be released but can be verified by the card using a time provided by the service provider. The type field constraint can also be verified by the card but is less critical if released since typically only a limited number of subscription types
A Smart Card Based Solution for Identity Management
173
are available. A pseudonym that allows the user to retain his subscription when re-enrolling with a new card (e.g., when the previous card was lost or expired) could be released during enrollment. Cryptographic Parameters. To realize the cryptographic functions defined in section 4 the cryptographic capabilities of the smart card are used. Since RSA is the only asymmetric cryptosystem available on the card, RSA keys are used for all the asymmetric keys in the system. Key lengths of 1024 bit were chosen to achieve good performance, section 7 discusses the performance impact of other key lengths and the potential use of other ciphers. For signing, a SHA-1 digest is encrypted using PKCS #1 padding. Asymmetric encryption uses OAEP padding and can, hence, also be used to realize the probabilistic encryption required for deanonymization. For authenticated encryption, 128 bit AES keys are used to achieve confidentiality and a MAC is generated by encryption a SHA-1 digest with a 128 bit AES integrity key. The 256 bit session key Ks is divided in two parts to obtain the integrity and confidentiality key. Random number generation is realized using the on-board random number generator of the chip.
7
Evaluation
Functionality Analysis. The proposed solution combines the benefits of smart cards and federated identity management systems. The card is a trusted module to all players in the system. Common secrets allow providers to establish authenticated sessions with the card. Over these channels, service providers can send attribute queries which are fulfilled by the card. Possibly, the requested attributes are retrieved from identity providers that are trusted by the service providers (cfr. F1 ). Although certificates of service providers already restrict the type of attributes that can be queried, the user can be even more restrictive by restricting the attribute query. Further, users can mark attributes as sensitive, assign trust levels to providers, influence caching policy, etc. Hence, extensive personalization functions are available (cfr. F2 ). Adding new providers only requires them to retrieve an authentication certificate from a CA. This requires them to have an audit done that determines what attributes they can request or supply. Service providers also require permissions from identity providers (or groups of) for requesting attributes they provide. Hence, adding new providers introduces some overhead for the respective instance and the certification authority. However, it is a clearly defined procedure transparent for the users and the majority of providers (cfr. F3 ). The caching policy maintains a set of attributes stored on the card. These cached attributes can also be used in offline settings. Hence, apart from authentication and identification also attributes queries are supported in offline environments if the required attributes are cached on the card (cfr. F4 ).
174
J. Vossaert et al.
Security and Privacy Analysis. Reference protocols for realizing secure authenticated sessions between card and providers using the proposed key infrastructure are given in the section above. Providers are assured that the card was not revoked before a certain date/time (i.e., accValTime< lastValTime). If required, the card will execute the card revalidation protocol before proceeding with the authentication. Revocation of the CVCs of providers is not supported, however, as they are short-lived potential abuse is limited. Hence, S1 is satisfied. When a common key is compromised, the set of cards on which the keys are stored should be revoked. This can be done using certificate revocation lists or OCSP responders. The CRL will typically contain little entries since smart cards are designed with countermeasures against attempts to extract secret information. When a card is lost or stolen, this should be reported with the revalidation center, who will block the respective card. Hence, these cards should not be added to the CRL. However, stolen cards can be misused during a service specific time interval (i.e., accValTime< lastValTime) if the PIN is compromised. A potential misuse of the revocation strategy is that providers require realtime revocation checks of the card (i.e., accValTime= currentTime) while this might not be necessary for the requested service. This decreases performance of the card as revalidation is required for each authentication. The CRLs, currently used for revocation checks, are also typically only periodically updated. This may also lead to timing attacks (i.e., linking a profile of a provider to the card number released to the revalidation center) if revalidation center and provider collude. The public key of the revalidation center stored on the card can be updated by the revalidation center itself. After establishing a secure session with the card, the new certificate can be sent to the card. It will verify that the certificate was issued to the revalidation authority, that the old validity interval precedes the new and the validity of the signature (using the CA public key stored on the card). The revalidation authority can also deactivate keys of CAs that are no longer trusted over the secure channel. Other keys stored on the card cannot be updated or deactivated by the revalidation center. Access to user attributes is controlled on three levels. First, an audit organization determines the set of user attributes relevant for the offered services. Second, identity providers control the set of service providers that can acquire attributes they provide. These restrictions are embedded in the service provider certificate and, hence, can be verified by the card. Third, users can implement policies or determine at runtime which attributes are released. Hence, extensive control over the release of attributes is enforced (cfr. S2 ). The card is the policy enforcement point and should, hence, be trusted by both the user and providers. After authentication, users can release service specific pseudonyms which are unique pseudonyms which providers nor card issuer can link (cfr. P1 ). Certain services may allow users to remain anonymous but require support for deanonymisation when abuse is detected. As mentioned in section 5, providers can request a probabilistic encryption of an identifier of the user using the attribute query. Hence, these providers can cooperate with the deanonymization service to identify users in case of abuse (cfr. P2 ). The user can select supported
A Smart Card Based Solution for Identity Management
175
(and hence trusted) deanonymization authorities by registering with the respective authorities. Further, the certification authorities should employ rigorous security requirements for the deanonymization authorities. This provides suitable security for the users and limits the number of deanonymization authorities. Further, different authorities could provide different levels of deanonymization, depending on the attributes that were released during registration. During authentication, no unique identifiers are released. Hence, users can use the subscription functionality of the card to request access to content which does not necessarily requires identification (cfr. P3 ). Although a user can theoretically remain anonymous, note that the anonymity set for a user largely depends on the size of the set of identity cards with an identical private key and the frequency with which these cards are used. For instance in the case of subscriptions, if the anonymity set is too small and only one user with common certificate x has a certain subscription he is linkable. Hence, when determining the size of the set of identity cards with identical keys one has to consider both the impact of key revocation and potential linkable profiles. Performance and Scalability Analysis. Each party in the system has one or more clearly defined responsibilities. Further, the revocation strategy allows offline services without requiring manual updates of CRL on these devices. Card validation can be triggered by users, but if required, can be executed transparently. Hence O1 is satisfied. The rest of this section focuses on performance results. For the prototype implementation mutual authentication between card and provider takes, on average, 1040 ms. The influence of the communication delays between card, workstation and server are minimized by running the test applications locally. Processing the attribute queries takes around 180 ms for attribute sets of around 100 bytes. Hence, when all necessary attributes are cached on the card, the total required authentication time is around 1220 ms. For each identity provider that needs to be contacted, another 1220 ms is added to the required time. The card revalidation operation requires around 900 ms. Table 3 lists the number of cryptographic operations required during each step. In Table 4 reference performance numbers for cryptographic operations on the prototype smart card are given. These tables illustrate that the asymmetric cryptographic operations are the major performance bottleneck. During authentication, the card performs one private and two public key operations. These operations alone, require around 615 ms, which is more than half of the total authentication time. However, the caching policy and the identity provider selector minimize the number of identity providers required for satisfying the query (cfr. O2 ). Further, replacing RSA with a more efficient algorithm provides an opportunity for significantly improving the performance of the entire system. For instance, elliptic curve cryptography (ECC) could be used to replace the private key operation on the card which could significantly increase performance, especially for increasing key lengths. The smart card used for the prototype, however, does not yet support ECC, hence, no concrete results can be
176
J. Vossaert et al.
Table 3. Number of cryptographic operations during the different stages of identification Card revalidation Mutual authentication Attribute exchange Card Reval. Serv. Card Provider Card Provider Verify/asymEnc 1 2 2 2 0 0 Sign/asymDec 1 1 1 1 0 0 Sym. Enc/Dec 2 2 4 4 2 2
Table 4. Average (μ) timing results and standard deviation (σ) of 1000 runs of cryptographic operations on the TOP IM GX4 smart card. Tests are done with 128 byte input data, results in ms.
Key length in bits Verification Signing Encryption Decryption
RSA AES 1024 2048 128 192 256 μ σ μ σ μ σ μ σ μ σ 32,00 2,85 72,10 1,02 555,33 2,62 2318 1,64 31,00 3,00 70,10 0,99 31,20 0,40 31,20 0,40 31,21 0,41 554,48 3,52 2316 1,42 36,93 1,85 41,9 4,57 44,78 4,03
given. Moreover, during a normal authentication procedure the card and service provider first mutually authenticate. Then user interaction is required (entering PIN, modifying – if necessary – the attribute query), after which the card retrieves the attributes from the identity providers and sends them to the service provider. The user interaction divides the waiting time in two smaller pieces and, hence, improves the perceived experience. The instantiated applet on the card requires around 17000 bytes of memory. This includes temporary memory to store session data but excludes memory required for storing setting dependant information (e.g., cached attributes, registered identity providers, deanonymization authorities). However, as around 70k bytes of memory are available on the prototype smart card, sufficient memory is left to support large numbers of identity providers, cached attributes.
8
Conclusion
This paper presents an implementation and evaluation of a smart card based solution for user-centric identity management [15]. Several implementation details are given and an evaluation is performed, illustrating the practical feasibility of the system. Further research is currently being conducted in two directions. First, the security of the system could be improved by using group signature for authenticating cards to providers. If the secret keys in one smart card are stolen, it suffices to revoke only that card. Also, support for signed attributes could give a stronger level of assurance which might be required for some services. Second, the usability could be improved by replacing RSA with a more efficient algorithm. The architecture is also being ported to mobile phones with secure elements. To validate the choices made during implementation (e.g., caching policy, selection of identity providers) several realistic use-cases are being implemented.
A Smart Card Based Solution for Identity Management
177
References 1. Ahn, G.-J., Ko, M.: User-centric privacy management for federated identity management. In: COLCOM 2007: Proceedings of the 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 187– 195. IEEE Computer Society, Washington, DC (2007) 2. Bangerter, E., Camenisch, J., Lysyanskaya, A.: A cryptographic framework for the controlled release of certified data. In: Security Protocols Workshop, pp. 20–42 (2004) 3. Bellare, M., Namprempre, C.: Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Journal of Cryptology 21, 469– 491 (2008) 4. Bertocci, V., Serack, G., Baker, C.: Understanding windows cardspace: an introduction to the concepts and challenges of digital identities. Addison-Wesley Professional, Reading (2007) 5. Camenisch, J., Shelat, A., Sommer, D., Fischer-H¨ ubner, S., Hansen, M., Krasemann, H., Lacoste, G., Leenes, R., Tseng, J.: Privacy and identity management for everyone. In: DIM 2005: Proceedings of the 2005 Workshop on Digital Identity Management, pp. 20–27. ACM, New York (2005) 6. Chadwick, D.W.: Federated identity management. In: FOSAD (2008) 7. Chadwick, D.W., Inman, G., Klingenstein, N.: A conceptual model for attribute aggregation. Future Generation Computer Systems 26(7) (2010) 8. Jøsang, A., Pope, S.: User centric identity management. In: Asia Pacific Information Technology Security Conference, AusCERT 2005, Australia (2005) 9. Morgan, R.L., Cantor, S., Carmody, S., Hoehn, W., Klingenstein, K.: Federated security: The shibboleth approach. EDUCAUSE Quarterly (2004) 10. Naumann, I., Hogben, G.: Privacy features of european eid card specifications. Technical report, ENISA (2009) 11. NIST. Block cipher modes, http://csrc.nist.gov/groups/ST/toolkit/BCM/ current_modes.html 12. Pala, M., Sinclair, S., Smith, S.: Portable credentials via proxy certificates in web environments. In: Public Key Infrastructures, Services and Applications. LNCS. Springer, Heidelberg (2011) 13. Recordon, D., Reed, D.: OpenID 2.0: a platform for user-centric identity management. In: DIM 2006: Proceedings of the Second ACM Workshop on Digital Identity Management, pp. 11–16. ACM, New York (2006) 14. Suriadi, S., Foo, E., Jøsang, A.: A user-centric federated single sign-on system. Journal of Network and Computer Applications 32 (2009) 15. Vossaert, J., Lapon, J., De Decker, B., Naessens, V.: User-centric identity management using trusted modules. In: Public Key Infrastructures, Services and Applications. LNCS. Springer, Heidelberg (2011)
The Uncanny Valley Everywhere? On Privacy Perception and Expectation Management Bibi van den Berg Tilburg University, Tilburg Institute for Law, Technology and Society (TILT), P.O. Box 90153, 5000 LE Tilburg, The Netherlands
Abstract. In 1970 Mori introduced the notion of the ‘uncanny valley’ in robotics, expressing the eeriness humans may suddenly feel when confronted with robots with a very human-like appearance. I will use the model of the uncanny valley to speak about privacy relating to social network sites and emerging technologies. Using examples, I will argue that the uncanny valley effect is already manifesting itself in some social network sites. After that, I will project the uncanny valley into the near future in relation to emerging technologies, and argue that awareness of the uncanny valley effect is of great importance to technology designers, since it may be a factor in humans’ acceptance of and willingness to use these technologies. Will the uncanny valley be everywhere in the technological world of tomorrow? Keywords: uncanny valley, privacy, social network sites, emerging technologies, robotics.
1
Introduction
In the spring of 2010 Google introduced a new social service called Buzz. Buzz is an add-on to Google’s GMail, which enables users to view information feeds about individuals they choose to follow, quite similar to Twitter. Days after its launch, there was a worldwide outcry regarding privacy violations in Buzz. What was all the buzz about? In an attempt to increase user-friendliness and ease of use, Google’s technology developers made a fundamental error. When users accessed their GMail in the days after Buzz’s launch, they were asked to complete a wizard to introduce them to this new service. In this wizard, Google’s technologists presented them with a list of individuals they were very likely to know. Buzz would make their GMail profile information would accessible to the individuals on this list, unless they opted out. Alternatively, users could sign up as followers of the individuals on the list, so that they would be kept up to date of changes in their profile information as well. It turned out that users were collectively outraged by this automatically generated list of contacts. They claimed that encountering a list of individuals they S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 178–191, 2011. c IFIP International Federation for Information Processing 2011
The Uncanny Valley Everywhere?
179
knew in an online setting that is generally considered to be private, such as one’s e-mail facilities, made them concerned this was suddenly a public environment, and that, hence, their privacy had been violated. Moreover, since the lists of individuals they were presented with were quite accurate, many individuals felt highly uncomfortable and even scared by the realization of how much Google actually knew about them and the networks they operate in. What was worse, the lists were quite accurate, but not completely accurate. This made users feel eerie as well: how was it that Google knew about some of their social relations but not others, and how had the collection they were presented with – a mixture of well-known, intimate contacts and acquaintances that occupied only the fringes of their social circle – been compiled? When discussing this example at an international conference in Texas this spring, danah boyd, one of the leading experts in research on teenagers’ use of social media, made the following passing remark: Google found the social equivalent of the uncanny valley. Graphics and AI folks know how eerie it is when an artificial human looks almost right but not quite. When Google gave people a list of the people they expected them to know, they were VERY close. This makes sense – they have lots of data about many users. But it wasn’t quite perfect. [1] Google Buzz’s users experienced the setup of this new service as a privacy infringement and were left with an eerie feeling. Some weeks after the Buzz buzz an ‘eeriness incident’ arose at my own institute. One bad day my boss, professor Ronald Leenes, received an e-mail from the social network site Facebook, supposedly sent by one of his contacts – whom for the sake of privacy we will call X – who is a member of Facebook. In this email X invited Ronald to become a member of Facebook as well. The e-mail did not just contain a message from X, but also a list of ‘other people you may know on Facebook’. Note that Ronald himself is not a member of Facebook and never has been, and that the list of people presented to him, therefore, could not be based on Ronald’s behaviors, address book, or on information he may have disclosed himself. It was based entirely on data about him that had been distributed by others – accidentally I presume. Using information about Ronald and his engagements with others, Facebook had built up a picture of his social network. What’s more, eerily enough, the picture they presented was quite accurate. Ronald did indeed know almost all of these individuals. But as with the lists presented to users of Buzz, the picture was quite accurate, yet not entirely so. It was incomplete – some of Ronald’s closest friends and colleagues were not on it, despite being active Facebook users –, and it was a bit of a haphazard collection: distant acquaintances were mixed with closer contacts. That made it even more eerie. Where did this collection come from and how had it been composed? How much does Facebook really know, not only about its users, but also about individuals outside the network? These two incidents led me to start thinking about the term danah boyd used in the quote above: the ‘uncanny valley’, a concept well-known in robotics. I realized boyd had a point when she used this term in the context of social
180
B. van den Berg
media. Even more so, I realized that the uncanny valley may become a more frequently encountered phenomenon in the near future, when technologies will become increasingly autonomic and proactive. In this article I will explain why.
2
The Uncanny Valley – Theoretical Background
In a 1970 article the Japanese roboticist Masahiro Mori introduced the idea of an ‘uncanny valley’ in relation to the design of life-like and likable robots [2]. What did this valley consist of? Misselhorn summarizes the central thrust of the uncanny valley as follows: . . . the more human-like a robot or another object is made, the more positive and empathetic emotional responses from human beings it will elicit. However, when a certain degree of likeness is reached, this function is interrupted brusquely, and responses, all of a sudden, become very repulsive. The function only begins to rise again when the object in question becomes almost indistinguishable from real humans. By then, the responses of the subjects approach empathy to real human beings. The emerging gap in the graph [. . . ] is called the ‘uncanny valley’. The term ‘uncanny’ is used to express that the relevant objects do not just fail to elicit empathy, they even produce a sensation of eeriness. [3] This trajectory is expressed in the image below. What does this picture tell us, exactly? On the far left of the figure, Mori says, we find ‘industrial robots’, such as robot arms. In designing these kinds of robots, which over the years have collectively come to be called ‘mechanical robots’ or ‘mechanoids’ [4], the focus is on functionality, rather than on appearance [2]. Robot arms don’t need to look like anything that provokes life-likeness or empathy in human beings – they need to complete specifically defined tasks, and that is all. Since functionality is the main design goal for these kinds of robots, mechanoids tend to be “relatively machine-like in appearance” [4]. However, there are also robot types that have more recognizable animal or even human forms. This group includes ‘toy robots’, which can be found halfway the rising line in Mori’s diagram, and ‘humanoid robots’, almost at the first peak. They are much more familiar to us and seem more life-like. Mori writes: . . . if the designer of a toy robot puts importance on a robot’s appearance rather than its function, the robot will have a somewhat human-like appearance with a face, two arms, two legs, and a torso. This design lets children enjoy a sense of familiarity with the humanoid toy. [2] This cluster of robots, which we will collectively call ‘humanoids’ is “not realistically human-like in appearance” and can easily be perceived as being robots, yet “possess some human-like features, which are usually stylized, simplified or cartoon-like versions of human equivalents” [4]. Sony’s robot dog AIBO and Honda’s humanoid Asimo are examples in case. With this type of robots we reach the first peak in the figure: they come across as being quite human-like and have a considerable degree of familiarity.
The Uncanny Valley Everywhere?
181
Fig. 1. The uncanny valley as introduced by Masahiro Mori in 1970
However, this is where an interesting turning-point emerges. If robots are too human-like, yet display behaviors that are less than perfectly human, Mori predicts there will be a steep decline in the level of familiarity, so much so that a deep sense of eeriness, or uncanniness, arises. If their appearance has “high fidelity”, Walters et al. write, “even slight inconsistencies in behavior can have a powerful unsettling effect” [4]. To explain how this works, Mori discusses the example of a prosthetic hand. He writes: . . . recently prosthetic hands have improved greatly, and we cannot distinguish them from real hands at a glance. Some prosthetic hands attempt to simulate veins, muscles, tendons, finger nails, and finger prints, and their color resembles human pigmentation. [. . . ] But this kind of prosthetic hand is too real and when we notice it is prosthetic, we have a sense of strangeness. So if we shake the hand, we are surprised by the lack of soft tissue and cold temperature. In this case, there is no longer a sense of familiarity. It is uncanny. In mathematical terms, strangeness can be represented by negative familiarity, so the prosthetic hand is at the bottom of the valley. So in this case, the appearance is quite humanlike, but the familiarity is negative. This is the uncanny valley.[2] Bryant remarks: The uncanny valley itself is where dwell monsters, in the classic sense of the word. Frankenstein’s creation, the undead, the ingeniously twisted demons of anime and their inspirations from legend and myth, and indeed all the walking terrors and horrors of man’s imagining belong here. [5]
182
B. van den Berg
Why does the uncanny valley occur exactly? In general terms, one could argue that the sense of eeriness arises when a mismatch occurs between a robot’s appearance and its actual capabilities. If a robot’s looks are quite sophisticated, it seems logical that individuals interacting with it will assume that its behaviors will be quite sophisticated as well – just like we assume a level of behavioral sophistication from our fellow human beings whenever we engage in interactions with them. If there is a (significant) discrepancy between the high level of human-likeness of a robot’s exterior and a low level of behavioral refinement, this provokes a response of disgust and repulsion: the uncanny valley. At the right end of the diagram the valley is overcome. In the last stage of the figure the line goes up again. This is where we encounter interactions between real human beings: very human-like and very familiar. But some non-humans can also be placed on this rising slope, says Mori. These non-humans are not necessarily even better copies of human beings than the humanoids at the peak of the slope – quite the reverse is true. The solution of avoiding eeriness does not involve aspiring to ever more human-likeness, according to Mori. What is relevant is a greater degree of behavioral familiarity, or a greater similarity in movements. Mori uses bunraku puppets, used in the traditional Japanese puppet theater, as an example in case. These puppets are certainly not exceptionally human-like in their appearances – it is obvious for the viewers that puppets are used on the stage, rather than real human beings. However, Mori argues that since the movements of these puppets, generated by the human puppeteers working them, are remarkably human-like “their familiarity is very high” [2]. Based on the uncanny valley, Mori’s conclusion is that robotics designers should limit themselves to creating relatively human-like, yet quite familiar robots. They should not aspire to create robots that mimic humans too closely. In the words of Bryant: . . . designers of robots or prosthetics should not strive overly hard to duplicate human appearance, lest some seemingly minor flaw drop the hapless android or cyborg into the uncanny valley.[5] In short: to avoid the emergence of the uncanny valley, according to Mori designers should focus on the creation of ‘mechanoids’ and ‘humanoids’ only. However, despite Mori’s warning, a community of researchers has been conducting research into the creation of a third category of robots: ‘androids’. These are robots that have an “appearance (and behavior) which is as close to a real human appearance as technically possible. The eventual aim is to create a robot which is perceived as fully human by humans. . . ” [4]. Whether their efforts will turn out only produce permanent inhabitants of the uncanny valley, or whether they will manage to get their machines to climb the final slope, time, practical constraints and our changing philosophical understanding of human-likeness will reveal. I will return to this point at the end of the article. For now, let’s look at the impact that Mori’s ideas have had in robotics and outside and see whether they can be applied to two new empirical contexts: that of social media and that of emerging technologies.
The Uncanny Valley Everywhere?
3
183
The Uncanny Valley Spills Over
After Mori’s original Japanese article was finally translated into English in 2005 a surge of interest has emerged for this phenomenon. A number of empirical studies have been conducted in recent years to investigate whether Mori’s theoretical paper could be underpinned with real-world evidence [6,7,8,9,10,11]. Some of these studies claim that Mori’s model is (at best) an oversimplification of a complex world [6] or even that it does not exist at all [8]. Especially those working in the field of android development feel defensive about Mori’s valley, and quite understandably so. If the uncanny valley does in fact exist, it threatens the viability of their research projects. More refined (and less defensive) investigations regarding the empirical and conceptual basis of the uncanny valley exist as well [3,4,9,11]. For instance, some researchers argue that the valley may well exist, but can eventually be overcome in one of two ways. First, as time progresses robots will be developed that are ever more life-like, and display ever more complex behaviors, thus coming to mimic the behaviors and appearances of humans to an ever greater degree [12]. Second, some researchers argue that the occurrence of the uncanny valley is temporary in the sense that, as time progresses, human beings will become more and more used to dealing and interacting with (behaviorally and aesthetically) imperfect robots, and will thus overcome their initial sense(s) of repulsion. Yet other researchers have aimed to uncover what exactly constitutes the eeriness of the valley, and what exactly causes it. For instance, Karl MacDorman argues that the uncanny valley “elicits an innate fear of death and culturally supported defenses for coping with death’s inevitability” [11] – an explanation that is not wholly uncontested itself [3]. Mori himself argued that the valley emerged when individuals were confronted with life-likeness – with objects (corpses, prosthetic hands) that they assumed were ‘alive’ at first glance, and at closer look turned out to be lifeless. Lack of motion was key in the emergence of eeriness, he claimed. Other researchers have developed their own hypotheses on the appearance of the uncanny valley. For example, Hanson [7] has argued that the uncanny valley does not arise so much because of confrontation with too little life-likeness, but rather because of a confrontation with too little “physical attractiveness or beauty” [3]. One of the more convincing explanations, to my mind, is the idea that the uncanny valley effect has to do with our human tendency to anthropomorphize non-human and even non-living things, to attribute “a human form, human characteristics, or human behavior to nonhuman things such as robots, computers and animals” [13]. The more ‘life-like’ cues they give off, the more easily we will be tempted to ascribe intentions and animism to them, and the more easily we will be compelled to like them. All of this happens effortlessly, almost automatically, and mostly outside our awareness [13,14,15,16,17,18,19]. In the process of finding empirical and theoretical evidence for the uncanny valley, investigations into the matter have migrated from the field of robotics and android science into other research areas as well. For one, it has gained popularity, and found empirical proof, in the field of computer graphics and video game character design [10]. Moreover, it has been used in art analysis (e.g.
184
B. van den Berg
sculptures) and (animated) movie design. This is not surprising. As Bryant aptly summarizes it, . . . though originally intended to provide insight into human psychological reactions to robotic design, the concept expressed by [the phrase ‘the uncanny valley’] is equally applicable to interactions with nearly any nonhuman entity.[5] In the rest of this article I will attempt to apply the ideas underlying the uncanny valley to yet another domain: that of privacy perception in relation to (1) social network sites, and (2) emerging technologies, or to be more specific, to the proactive, autonomic technologies of the technological world of tomorrow.
4
Privacy Perception and the Uncanny Valley?
Let us return to the two examples discussed at the beginning of this chapter: the e-mail from Facebook, containing ‘individuals you may know’ and sent to a non-member of the network, and the automatically generated list of individuals to follow in Buzz, which led to widespread protests in the spring of 2010. What happened in both of these cases? Why did they cause a sense of eeriness, or, in the words of danah boyd, the “social equivalent of the uncanny valley” [1]? One of the central ideas on the emergence of the uncanny valley and robots is that of consistency, or rather, a lack thereof. There ought to be consistency between the complexity and sophistication of robots’ appearance on the one hand, and the complexity and sophistication of their behaviors (movements) on the other. The moment this consistency is breached, a disconnect arises between what an individual expects of a robot in terms of behavior, based on its appearance and what he or she perceives in reality. This we have seen above. Much of this relates to what is popularly known as ‘expectation management’. A robot that looks simple and highly mechanical evokes lower expectations in human beings in terms of refined and complex behavior than one that looks highly advanced and/or resembles human beings to a great degree. Now, let us generalize this principle to web 2.0 domains such as social network sites. When interacting with software, whether in internet environments or on our own computers, we also use a wide array of assumptions regarding what kinds of behaviors we may expect from these systems. Usually, the system’s appearance is a give-away in this respect: the more complex it looks, the more sophisticated its workings generally tend to be. In our interactions with various types of software, including online services, over the last years, we have thus built up a set of expectations with regard to the behaviors and possibilities of these systems relating to their appearance.1 On the same note, systems we use for a wide array 1
Of course, many more factors are relevant here, most notably our past use of these systems (or systems that resemble them or have a similar purpose or functionality). I am not claiming that appearance is the beginning and end-all of behavioral expectations with regard to software and online services. I am only pointing out that the system’s appearance is a relevant factor in a similar way that the appearance of a robot is a relevant factor in the way individuals predict their behavior.
The Uncanny Valley Everywhere?
185
of different tasks may reasonably be expected to be more complex in terms of the services they can deliver than environments in which we only conduct simple or limited tasks. This is all the more true for add-ons. Most add-ons are single-purpose, simple little programs that add one limited feature to a system. And this is where the issues surrounding Buzz come in. Buzz was presented as a simple add-on, a new service to be added to users’ GMail account, with one key functionality: keeping users updated on the information streams of individuals they chose to follow and making their own information stream public for followers in return. Buzz used a wizard that seemed accessible enough. By presenting individuals with a pre-populated list of possible followees, suddenly this seemingly simple and straightforward service displayed a level of intricacy and a depth of knowledge that caught users off guard entirely. Out of the blue Buzz displayed behaviors that most of us would consider quite sophisticated, and – even more eerie – it committed these acts with a seeming ease that surpassed even the most advanced computer users’ expectations. The Facebook e-mail is another example of the same mistake. It gathered a random collection of individuals that the receiver ‘might know’, and, what’s worse, confronted a non-member of the network with this analysis of his social circle. It delivered no context and no explanation, but merely provided a simple, straightforward message. It left the receiver feeling eerie, precisely because the simplicity of the message greatly contrasted with the intricacy, the ‘wicked intelligence’ some would argue, of the system behind it. It revealed immense system depth – how else could such a complex deduction have been made? – but since the simple message displayed did not reveal anything about that depth, it left the receiver feeling uncanny indeed. One thing is important to note. In robotics, we have seen, the uncanny valley emerges when the high level of sophistication of a robot’s appearance does not match the limited level of sophistication of its behaviors. In these two examples the reverse is true: the low level of sophistication of the message/wizard does not match the immensely high level of sophistication of the behaviors suddenly displayed. In both cases, though, the disconnect between appearance and behavior is there, and hence an uncanny valley emerges. Software developers and system designers need to take into consideration that the emergence of the uncanny valley is lurking in the corner when this disconnect arises. User-friendliness requires a close connection between behavior and appearance, so that users’ expectations are properly met and eerieness, either because of overly simplistic behavior in a complex-looking machine or because of overly intricate behavior in a simple-looking machine, can be avoided.
5
Autonomic Technologies: The Uncanny Valley Everywhere?
In recent years visions such as autonomic computing [20,21,22], ubiquitous computing [23,24,25], the Internet of Things [26] and Ambient Intelligence [27,28,29]
186
B. van den Berg
have started predicting, and preparing for, a world in which technological artifacts will surround us everywhere and will constantly provide us with personalized, context-dependent, targeted information and entertainment services. They will not only do so reactively (i.e. in response to requests for information by users), but even proactively: technological artifacts and systems will cooperate to provide us with information and services that are relevant to us in the specific contexts in which we find ourselves, and they will do so by anticipating what users’ needs might be. Users do not need to make their wishes explicit – the technologies in these visions of the technological world of tomorrow are said to be able to see users’s needs coming, maybe even before the user himself knows. As I wrote elsewhere: This aspect of Ambient Intelligence is by far the most far-reaching. It means, among other things, that systems will be given a large responsibility in managing and maintaining a user’s information sphere. The technology [. . . ] will decide what information is relevant, useful and even meaningful for the user in his current situation; the responsibility of finding, filtering and processing this information is removed from the user and placed squarely on the shoulders of the technology. It is the technology that will decide what is significant, and interesting, not the user. [29] In order to be able to proactively provide users with the right kinds and types of information and services, technologies in tomorrow’s world will use ‘profiles’, in which their preferences and past behaviors are stored, and thus, over time they will learn to adjust their behaviors to match users’ situated needs and wants as perfectly as possible. The only way in which technologies in the world of tomorrow could learn these things is when they would have long-term, intimate contact with the user. One of the crucial questions that arises in relation to these technological paradigms, therefore, is whether (or to what extent) people are going to accept the fact that they will be watched and monitored by technologies always and everywhere, particularly knowing that all the information gathered thus is stored into profiles and used to make predictions with respect to future behaviors. As researchers in various fields have pointed out, there are serious privacy issues with respect to these types of technologies, since user profiles can act as infinitely rich sources of personal information [30,31]. But profiling may cause more than privacy problems alone. It raises a number of issues that could lead to the emergence of an ‘uncanny valley’ in users’ perceptions of their interactions with technologies. First of all, one of the key parameters of technological visions of the future, such as Ambient Intelligence, the Internet of Things and ubiquitous computing, is the idea that technologies will be embedded into the background of our everyday surroudings – into walls, windows and floors in our homes, our offices and shops, and even in public spaces. Hiding technologies from view for users improves the ease of use of these technologies and makes them less obtrusive, which is especially important as more and more technological devices are gathered in our homes and workplaces, according to the originators of these visions [24,28].
The Uncanny Valley Everywhere?
187
However, the combination of embedding technologies into the background of everyday spaces and profiling users’ behaviors to proactively provide them with approriate and correct information and other services, leads to a high risk of ending up in an uncanny valley. After all, when technologies are hidden from view, how will users know whether they are being traced by technologies in any given situation, and how will they know what information about them is being captured and stored by cameras, sensors and other technologies in each situation? If technologies will proactively deliver personalized information, on the one hand, yet gather their data about users invisibly and imperceptibly on the other, chances are that users will regularly feel uneasy with the (eerily adequate) information they will suddenly be presented with, since they have no way of knowing how the system has come to collect that information, or how it has based its information provision on the users’ behaviors. The discrepancy between the systems’ complexity and the users’ limited perception of that complexity (which is, after all, hidden from view) may lead users to feel the same way that non-users in Facebook or members of Buzz felt when confronted with a mismatch between a system’s behavior (or system depth) and the same system’s appearance: very eerie indeed. The mixture of proactivity, profiling and hiding technology from view may very well lead users’ to feel unpleasantly surprised by the capacities and breadth of knowledge that systems may unexpectedly display in any given place or at any given time. As Abowd and Mynatt correctly remark, it is vital that users be aware of the fact that technologies are tracking them and how much data they are storing about them, lest they keep a sense of control over these technologies and accept them as part of their everyday life [32]. However, as said, the key design parameters of visions such as Ambient Intelligence and ubiquitous computing that we’ve discussed here – embedding technology, generating massive user profiles, acting proactively – appear to contradict that possibility and hence the emergence of uncanny valleys everywhere is a serious risk. Second, users may find the personalized, targeted information thrown at them by these systems uncanny, because technological systems will act as though they know users intimately, yet because of the way profiles are composed, these systems are bound to make the wrong ‘suggestions’ every now and then, or deliver the wrong pieces of information. Profiling builds on combining two strands of information to create an expectation of individual users’ future preferences, wishes and behaviors. I will use an example to show how this works. When online shoppers visit the bookstore Amazon.com, their past purchases and their search history are stored in an individual history. The past choices, wants and needs of each individual visitor to the store are registered, so that the store can provide him or her with ever more personally tailored product offers as time progresses and more visits have been brought to the store. At the same time, the behaviors of all the store’s visitores combined are collectively aggregated, so that new visitors can find products matching their wishes more easily and efficiently. This is how that is done. Whenever a customer buys a book or a product, he or she will be offered a set of suggestions for products in the same category that were bought by other customers. The central idea is that “if a group of people bought
188
B. van den Berg
book A and also bought book B, others who might buy A might also be interested in B ”[33] This form of so-called ‘collaborative filtering’ is also known as ‘planned serendipity’ [33]. In profiling these two streams of information are combined: the totality of past behaviors and choices of a single individual are merged with the collective behaviors of a large group of people, with respect to one single choice or purchase, and the integrated ‘image’ that arises thus is used to provide users with ever more accurate information and services, or so the thinking goes. In many cases, in fact, this thinking is quite correct. The reason why a business such as Amazon.com deploys the profiling mechanisms that it uses, is – obviously – because they work: based on their buying behaviors one can conclude that shoppers are often quite happy with the product suggestions they are provided with. Needless to say, however, there will always be exceptions to the rule. Some buyers will feel uncomfortable with product suggestions as such, for instance because they will wonder how much Amazon.com is actually registering about them, or because they will feel uncomfortable that some of their past searching or buying behaviors come back to ‘haunt’ them in the store – imagine that you’ve conducted a search on a topic that also returned pornographic titles (which, of course, you were not out to find!), and being spammed with titles from that category ever after. Moreover, since people are different in their likes and dislikes, profiling based on the collective behaviors of a large group could easily lead to quite accurate product suggestions, but not entirely right ones. Compare this to the examples of Google Buzz and Facebook discussed above. Here, too, users may feel eerie because the personalized suggestions made are just a little too haphazard, and just a little too weird or unfit to match their exact preferences, although coming quite close. . . The uncanny valley lurks here as well. Now, when realizing how central profiling mechanisms such as those used by Amazon.com will be in tomorrow’s world of smart, adaptive, proactive, personalized technologies, it becomes clear how urgent this issue really is. These two points raise the question: in the technological world of tomorrow, will the uncanny valley be everywhere?
6
Designers, Beware!
We have seen that Mori provided a solution for avoiding the eerieness of the uncanny valley. He argued that designers in the field of robotics ought to prevent the valley from opening up by limiting themselves to the creation of ‘mechanoids’ and ‘humanoids’, since the mismatch between appearance and behaviors was unlikely to occur in these types of robots [2]. Users wouldn’t expect a relatively simple-looking machine to conduct incredibly complicated tasks. Similarly, the more complicated the behaviors of a machine, the more refined its appearance ought to be as well – again, to ensure a match between humans’ expectations and reality. To my mind, the thrust of Mori’s argument was correct. At the time he wrote his article, very few actual examples of the uncanny valley effect existed in robotics – the science of robotics was simply not advanced enough for it to arise regularly. In the past decades, however, robotics has developed rapidly.
The Uncanny Valley Everywhere?
189
And, although their creators are sure to contest it, the uncanny valley does in fact arise in many who watch some of the creations that have been developed, especially with respect to androids.2 Moreover, as this article has shown, the uncanny valley effect may not necessarily be limited to robotics only. Perhaps the lesson that Mori’s work teaches us is a broader one, which is that mismatches between all technologies’ performances and their appearances ought to be avoided, if we strive for smooth interactions between humans and machines. Technology designers have a role to play in ensuring that a mismatch between impressions and expectations on the one hand and actual behaviors on the other does not occur. This does not only apply to robotics, but to any system we design. In a world that becomes ever more technologically saturated this is a lesson to be learnt sooner rather than later, lest we end up living with uncanny valleys everywhere.
References 1. boyd, d.: Making sense of privacy and publicity. Paper presented at SXSW in Austin (TX), USA (2010) 2. Mori, M.: The uncanny valley (translated by Karl F. MacDorman and Takashi Minato). Energy 7(4), 33–35 (1970) 3. Misselhorn, C.: Empathy with inanimate objects and the uncanny valley. Minds & Machines 19, 345–359 (2009) 4. Walters, M.L., Syrdal, D.S., Dautenhahn, K., Te Boekhorst, R., Koay, K.L.: Avoiding the uncanny valley: Robot appearance, personality and consistency of behavior in an attention-seeking home scenario for a robot companion. Autonomous Robots 24(2), 159–178 (2008) 5. Bryant, D.: The uncanny valley: Why are monster-movie zombies so horrifying and talking animals so fascinating? (2004), http://us.vclart.net/vcl/Authors/ Catspaw-DTP-Services/valley.pdf 6. Bartneck, C., Kanda, T., Ishiguro, H., Hagita, N.: My robotic doppelg¨ anger: A critical look at the uncanny valley. Paper presented at the 18th IEEE International Symposium on Robot and Human Interactive Communication in Toyama, Japan (2009a) 7. Hanson, D.: Exploring the aesthetic range for humanoid robots. Paper presented at ICCS/Cog-Sci in Vancouver (BC), Canada (2006) 8. Hanson, D., Olney, A., Pereira, I.A., Zielke, M.: Upending the uncanny valley. Paper presented at the 20th National Conference on Artificial intelligence (AAAI) in Pittsburg (PA), USA (2005) 9. Brenton, H., Gillies, M., Ballin, D., Chatting, D.: The uncanny valley: Does it exist? Paper presented at the Conference of Human Computer Interaction: Workshop on Human-Animated Character Interaction (2005) 10. Schneider, E.: Exploring the uncanny valley with Japanese video game characters. Paper presented at the DiGRA 2007 Conference (Digital Games Research Association) (2007) 2
For a personal experience of the uncanny valley, please watch (just a selection of many) ‘eerie’ clips of androids on YouTube: http://www.youtube. com/watch?v=L4z3cs4Ocug, http://www.youtube.com/watch?v=MY8-sJS0W1I, and http://www.youtube.com/watch?v=091ugdiojEM).
190
B. van den Berg
11. MacDorman, K.F.: Androids as an experimental apparatus: Why is there an uncanny valley and can we exploit it? Paper presented at the Cognitive Science Society (CogSci 2005): Workshop ‘Toward Social Mechanisms of Android Science’ (2005a) 12. MacDorman, K.F., Minato, T., Shimada, M., Itakura, S., Cowley, S., en Ishiguro, H.: Assessing human likeness by eye contact in an android testbed. Paper presented at the Cognitive Science Society, CogSci 2005 (2005b) 13. Bartneck, C., Kulic, D., Croft, E., Zoghbi, S.: Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal of Social Robotics 1(1), 71–81 (2009) 14. Nass, C.I., Moon, Y.: Machines and mindlessness: Social responses to computers. Journal of Social Issues 56(1), 81–103 (2000) 15. Nass, C.I., Moon, Y., Fogg, B.J., Reeves, B., Dryer, D.C.: Can computer personalities be human personalities? International Journal of Human-Computer Studies 43(2), 223–239 (1995) 16. Turkle, S.: The second self: Computers and the human spirit. Simon and Schuster, New York (1984) 17. Turkle, S.: Evocative objects: Things we think with. MIT Press, Cambridge (2007) 18. Reeves, B., Nass, C.I.: The media equation: How people treat computers, television, and new media like real people and places. CSLI Publications/Cambridge University Press, Stanford (CA); New York, NY (1996) 19. Friedman, B., Kahn Jr., P.H., Hagman, J.: Hardware companions? What online AIBO discussion forums reveal about the human-robotic relationship. Paper presented at the Computer-Human Interaction (CHI) Conference 2003 in Ft. Lauderdale, FA (2003) 20. Ganek, A.G., Corbi, T.A.: The dawning age of the autonomic computing era. IBM Systems Journal 42(1), 5–19 (2003) 21. Sterritt, R., Parashar, M., Tianfield, H., Unland, R.: A concise introduction to autonomic computing. Advanced Engineering Informatics 19, 181–187 (2003) 22. Hildebrandt, M.: Technology and the end of law. In: Claes, E., Devroe, W., Keirsbilck, B. (eds.) Facing the Limits of the Law. Springer, Heidelberg (2009) 23. Weiser, M.: The computer for the 21st century. Scientific American 265(3), 66–76 (1991) 24. Weiser, M., Brown, J.S.: The coming age of calm technology. Xerox PARC, Palo Alto (1996) 25. Araya, A.A.: Questioning ubiquitous computing. In: Proceedings of the 1995 Computer Science Conference. ACM Press, New York (1995) 26. ITU: The Internet of Things. In: ITU Internet Reports – Executive Summary: International Telecommunication Union (2005) 27. Aarts, E., Harwig, R., Schuurmans, M.: Ambient Intelligence. In: Denning, P.J. (ed.) The Invisible Future: The Seamless Integration of Technology into Everyday Life. McGraw-Hill, New York (2002) 28. Aarts, E., Marzano, S.: The new everyday: Views on Ambient Intelligence. 010 Publishers, Rotterdam, The Netherlands (2003) 29. Van den Berg, B.: The situated self: Identity in a world of Ambient Intelligence. Wolf Legal Publishers, Nijmegen (2010) 30. Hildebrandt, M.: Defining profiling: A new type of knowledge? In: Hildebrandt, M., Gutwirth, S. (eds.) Profiling the European Citizen: Cross-disciplinary Perspectives. Springer Science, Heidelberg (2008)
The Uncanny Valley Everywhere?
191
31. Punie, Y.: The future of Ambient Intelligence in Europe: The need for more everyday life. Communications & Strategies 5, 141–165 (2005) 32. Abowd, G.D., Mynatt, E.D.: Charting past, present, and future research in ubiquitous computing. ACM Transactions on Computer-Human Interaction 7(1), 29–58 (2000) 33. Weinberger, D.: Everything is miscellaneous: The power of the new digital disorder. Times Books, New York (2007)
50 Ways to Break RFID Privacy Ton van Deursen University of Luxembourg
[email protected] Abstract. We present a taxonomy of attacks on user untraceability in RFID systems. In particular, we consider RFID systems in terms of a layered model comprising a physical layer, a communication layer, and an application layer. We classify the attacks on untraceability according to their layer and discuss their applicability. Our classification includes two new attacks. We first present an attack on the RFID protocol by Kim et al. targeting the communication-layer. We then show how an attacker could perform an application-layer attack on the public transportation system in Luxembourg. Finally, we show that even if all of his tags are untraceable a person may not be untraceable. We do this by exhibiting a realistic scenario in which the attacker uses the RFID profile of a person to trace him. Keywords: RFID, privacy, untraceability, attacks, taxonomy.
1
Introduction
Radio frequency identification (RFID) systems consist tags, readers, and a backend. RFID tags are small, inexpensive devices that communicate wirelessly with RFID readers. Most RFID tags currently in use are passively powered and respond to queries from legitimate, but also rogue RFID readers. They allow to uniquely identify everyday items such as passports [1], electronic transportation tickets, and clothes. A key property of RFID systems is that tags can be scanned without the owner’s consent and without the owner even noticing it. Therefore, one must ensure that RFID tags embedded in items carried by a person do not reveal any privacy-sensitive information about that person. A major privacy threat in current RFID systems is that the RFID system maintainer can monitor and profile the behavior of its users. Consider an RFID system used for public transportation e-ticketing such as the Oyster card1 or the OV-chipkaart2. Every time a person uses public transportation a transaction is registered. By collecting this information over a long period of time, the public transportation companies build large databases of privacy-sensitive information. 1 2
Ton van Deursen was supported by a grant from the Fonds National de la Recherche (Luxembourg). http://www.tfl.gov.uk/oyster/ http://www.ov-chipkaart.nl/
S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 192–205, 2011. c IFIP International Federation for Information Processing 2011
50 Ways to Break RFID Privacy
193
In some cases, outsiders to the RFID system may also be interested in monitoring and profiling the users of the RFID system. If a person does not want others to know what items he carries, then the RFID tags attached to these items must not reveal this information to unauthorized RFID readers. For instance, some people may not want to reveal the kind of underwear they are wearing, the amount of money in their wallet, their nationality, or the brand of their watch. Therefore, RFID systems must enforce anonymity: the property that items and users cannot be identified [2]. An RFID system that satisfies anonymity does not necessarily prevent an attacker from linking two different actions to the same RFID tag. In this work, we study the privacy notion called untraceability. To break anonymity, the attacker’s goal is to identify the tag and its user. By contrast, to attack untraceability the attacker’s objective is to find out that two (or more) seemingly unrelated interactions were with the same tag. We define untraceability as follows: Definition 1 (Untraceability). An RFID system satisfies untraceability if an attacker cannot distinguish, based on protocol messages, whether two actions were performed by the same tag or by two different tags. If untraceability is not satisfied, an attacker can attribute different actions to one (possibly unknown) tag. By linking one of these actions to the person that carries the tag the attacker effectively traces that person. Untraceability of RFID tags is hard to achieve for a number of reasons. Due to their small size and the absence of an active power source, RFID tags are severely restricted in the types of computation they can perform. Also, no physical connection is needed for RFID communication, easing deployment of rogue devices by the adversary. Finally, theoretical results by Damg˚ ard and Pedersen show that it is impossible to design an RFID system that satisfies efficiency, security, and untraceability simultaneously [3]. The goal of this paper is to study untraceability of RFID systems from the attacker’s perspective. Due to the vast number of different RFID systems, no silver-bullet solution to RFID privacy exists yet. It is, therefore, essential to understand how an attacker can break untraceability before deciding what defenses to deploy. We refer to Juels [4] and Langheinrich [5] for a survey of possible defensive techniques to RFID privacy. Contributions. Our first contribution is a classification of attacks on the untraceability of RFID systems. We describe a layered communication model for RFID communication (Section 2) consisting of a physical, a communication, and an application layer. We classify existing untraceability attacks according to the corresponding layer they attack. Section 4 describes physical-layer attacks, Section 5 describes communication-layer attacks, and Section 6 describes application-layer attacks. As a second contribution, we describe new attacks on the communication-layer and the application-layer. Section 5.1 presents a communication-layer attack on
194
T. van Deursen
the RFID protocol by Kim et al. [6] and Section 6.1 describes how an attacker can recover the date and time of the last 5 travels of a person from his public transportation card. As a last contribution we show in Section 7 that even if all provide untraceability and an individual tag cannot be traced, a person’s RFID profile may still allow an attacker to trace him. Such attacks consider only the particular set of tags carried by a person in order to trace him.
2
RFID Communication Model
The communication flow in an RFID system is commonly described by a set of protocols. These protocols form a layered structure reminiscent of the OSI reference model for computer networks [7]. To classify attacks on untraceability, we separate the following three layers3 (see Figure 1): – The physical layer is the lowest layer in the model and provides a link between an RFID reader and a tag. Protocols for modulation, data encoding, and anti-collision are implemented in this layer. The physical layer provides the basic interface for transmission of messages between a reader and a tag. – The communication layer implements various types of protocols to transfer information. Protocols implemented in this layer facilitate tasks such as identification or authentication of a device and updates of cryptographic key material stored on a device. – The application layer implements the actual RFID applications used by the user of the system. Application-layer protocols facilitate fetching and interpretation of data, as well as updating the data on a tag. Examples of such data are account and balance information on a public transportation card and the photo on the tag in an e-passport.
Tag
Reader 3. Application 2. Communication 1. Physical Fig. 1. RFID system layers
As shown in Sections 4 through 6, each of the layers can leak information that can be used to trace a tag. It is, therefore, important to protect untraceability at every layer of the communication model. 3
Our model differs slightly from the layered communication model by Avoine and Oechslin [8] since they separate the physical layer into two layers. We additionally introduce an application layer which allows us to reason about high-level attacks.
50 Ways to Break RFID Privacy
3
195
Attacker Model
One of the difficulties in designing privacy-preserving RFID systems is that they face powerful attackers. Moreover, the cost of an attack and the knowledge required to perform it are limited. Most equipment necessary to attack RFID systems can be bought for less than $100 and software libraries for most hardware devices are available online. When analyzing RFID systems we assume the attacker has the following capabilities: – Impersonating readers: A rogue reader can be used for communication with a genuine tag. It implements the same protocol and sends the messages the tag expects to receive. – Impersonating tags: Similar to impersonating a reader, a rogue tag can be constructed to communicate with a genuine reader. – Eavesdropping: The attacker captures the transmitted signals using suitable radio frequency equipment [9]. He recovers the transmitted data and listens in on the communication between the reader and the tag. Since the eavesdropping device does not have to power the RFID tag itself, eavesdropping is possible from a larger distance than impersonating a reader. – Modifying/blocking messages: Although it is hard to carry out in practice, it is possible to relay messages from a legitimate tag to a legitimate reader using a man-in-the-middle device [10]. The man-in-the-middle device can selectively modify transmitted messages, or even block them. The main difficulty in carrying out attacks is to install the equipment close enough to the legitimate RFID readers and tags. In case of privacy attacks the attacker must carefully install his rogue equipment in a point of interest. Such locations can be entrances to a building, checkout counters of a store, or crowded places. For a discussion on communication distances and eavesdropping distances we refer to Hancke [9].
4
Physical-Layer Attacks
Physical-layer attacks exploit vulnerabilities that are introduced in the manufacturing process of the RFID tags, the transmission protocols, or the implementation of higher-level protocols. We will first explore a weakness related to the anti-collision identifiers specified by the ISO 14443 [11] standard. We subsequently describe a traceability attack by Danev et al. [12] that abuses the variations in the manufacturing process of RFID tags. 4.1
Static Anti-collision Identifiers
The greater part of RFID tags currently available implement the physical layer defined by the ISO 14443A standard. Examples of such tags are e-passports, MIFARE tags, and near field communication (NFC) chips. ISO 14443 part 3 describes physical layer protocols for communication with a tag. One of these
196
T. van Deursen
physical-layer protocols is the anti-collision protocol. The protocol allows the reader to select a particular tag with which it wants to communicate. It prevents communication collisions by ensuring that tags do not respond to the reader simultaneously. It is initiated by the reader after which the tag broadcasts its 32-bit unique identification number (UID). The anti-collision protocol is not cryptographically protected. Therefore, anybody with an ISO 14443A compliant RFID reader can query a tag for its UID. Almost all currently available ISO 14443A compliant tags have static UIDs. The UIDs cannot be rewritten and never change. Therefore, an attacker can trace a tag (and thus its owner) by repeatedly querying for UIDs. Since static UIDs provide a unique mapping between tags and people, the attacker knows that if the same UID reappears then the same person must be present. This UID-based traceability attack is very effective in terms of success rate and investment needed. One exception on which the attack outlined above does not work is the e-passport. The e-passport implements randomized UIDs: it is designed to respond with a fresh randomly chosen UID during anti-collision. In terms of implementation costs the attacker needs hardware and software. The hardware needed consists of a computer and an ISO 14443A compliant RFID reader. The latter can be a low-cost off-the-shelf RFID reader currently available for approximately 30 euro4 . Alternatively, an attacker can use an NFC-enabled phone to carry out the attack. Software to perform the communication between reader and tag can be found online in the form of free software libraries5 . 4.2
Physical Fingerprinting
The manufacturing process of RFID tags introduces very small variations in the circuitry of an RFID tag. These variations can be used by an attacker to trace tags. Danev et al. have recently shown that if the radio frequency of the communication is varied, tags of the same brand and type behave differently [12]. Since these differences are stable, an attacker can use them to fingerprint tags and consequently trace tags. Under laboratory conditions and with a small set of tags, the attacks are quite effective. In a set of 50 identical JCOP tags a tag could be correctly recognized in 95% of the cases. The equipment needed by the attacker is relatively expensive and it is hard to perform the attack without being noticed. Therefore, the applicability of the attack is at present quite low.
5
Communication-Layer Attacks
Communication-layer attacks target the protocols that are used for, among others, identification, authentication, and cryptographic key updates. These protocols are often cryptographic protocols designed to securely authenticate a tag while keeping it untraceable. 4 5
http://www.touchatag.com/ http://www.libnfc.org/
50 Ways to Break RFID Privacy
5.1
197
Unique Attributes
In an effort to keep RFID tags cheap, RFID protocols must be computationally as lightweight as possible. Due to the implied absence of strong cryptographic primitives, RFID protocols frequently suffer from algebraic flaws that allow an attacker to perform an attribute acquisition attack [13]. In such an attack, the attacker abuses the algebraic properties of the messages exchanged in the protocol to perform a computation that results in a fixed value that is particular to a tag. By repeating this computation at a later stage on different messages and obtaining the same fixed value, the attacker can trace that tag. We will now restrict ourselves to a subclass of attribute acquisition attacks in which the attack strategy is as follows. Let f (a, T, i) denote the response sent by the tag T upon receipt of its i-th query, where the query equals a. The attacker queries two tags T1 and T2 with queries a and a of his choice and records the responses r and r , where r = f (a, T1 , i) and r = f (a , T2 , j). He then performs a computation g that takes the challenge and response as input and satisfies the following conditions: (a) If T1 and T2 are the same tag, then g(a, r) = g(a , r ). The attribute g(a, r) is a unique attribute; (b) If T1 and T2 are different tags, then g(a, r) = g(a , r ). We capture the above intuition in the following definition. Definition 2 (attribute acquisition attack, adapted from [13]). Let Term be the set of all possible messages of a protocol, let Tag be the set of tags in an RFID system, and let f (a, T, i) be the response of tag T in session i upon receipt of query a. We define presence of a unique attribute as follows. ∃T =T ∈Tag ∃a,a ∈Term ∃i=j∈N ∃g:Term∗ ×Tag →Term g(a, f (a, T, i)) = g(a , f (a , T, j)) ∧ g(a, f (a, T, i)) = g(a , f (a , T , j)) We call g(a, f (a, T, i)) a unique attribute. The presence of a unique attribute gives the attacker an efficient way of tracing tags. The attacker merely has to query tags, perform the computation g, and compare the attributes. For a protocol to be untraceable, a necessary condition is that no unique attributes exists. The absence of unique attributes, however, does not guarantee untraceability [13]. An example of an RFID protocol that is vulnerable to an attribute acquisition attack is the protocol proposed by Kim et al. [6] depicted in Figure 2. The protocol is designed to authenticate a tag T to a reader R. Each tag has an identifier ID T and a key kT , both known to the reader. The reader initiates the protocol by generating a fresh random value (called a nonce) n. Upon receipt of the query n, the tag generates a nonce s. It then computes the bitwise exclusive-or (⊕) of its identifier ID T and s as well as the exclusive-or of s and the cryptographic
198
T. van Deursen
kT , ID T R
kT , ID T T
nonce n n nonce s ID T ⊕ s, h(n, kT ) ⊕ s
Fig. 2. Privacy protection protocol [6]
hash of n and kt . The response is then sent to the reader and verified. The exclusive-or function has the following algebraic properties. For any terms a, b, and c and a constant term 0: a⊕a=0 a⊕0=a
a⊕b=b⊕a (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c)
(1)
An attribute acquisition attack can be carried out by an attacker that repeatedly queries tags with the same query a. If we let sT,i denote the nonce generated by tag T after the i-th query, then the tag’s response to query a is defined by f (a, T, i) = ID T ⊕ sT,i , h(a, kT ) ⊕ sT,i . A unique attribute can be computed by defining g(w, (y, z)) = y ⊕ z. To show that g(a, f (a, T, i)) is indeed a unique attribute following Definition 2 requires that (a) for two sessions of the same tag, g is the same, and (b) for two sessions of a different tags, g is different. By repetitive application of Equations (1) we obtain: g(a, f (a, T, 0)) = ID T ⊕ sT,0 ⊕ h(a, kT ) ⊕ sT,0 = ID T ⊕ h(a, kT ) g(a, f (a, T, 1)) = ID T ⊕ sT,1 ⊕ h(a, kT ) ⊕ sT,1 = ID T ⊕ h(a, kT ) g(a, f (a, T , 1)) = ID T ⊕ sT ,1 ⊕ h(a, kT ) ⊕ sT ,1 = ID T ⊕ h(a, kT )
(2) (3) (4)
The term ID T ⊕ h(a, kT ) is a unique attribute for tag T . 5.2
Desynchronization and Passport Tracing
The following two examples illustrate non-algebraic communication-layer attacks reported in literature. – One of the first RFID protocols with an untraceability claim was proposed by Henrici and M¨ uller [14]. The protocol relies on a symmetric key that is updated at the end of a successful protocol execution. Avoine showed [15] that the protocol suffered from a number of weaknesses. A particularly interesting attack allowed the attacker to force the reader and tag to perform
50 Ways to Break RFID Privacy
199
different key updates, effectively desynchronizing the reader and the tag. As soon as that happens, a genuine reader will no longer be able to successfully complete the protocol and will thus always reject the tag. Assuming that no other tags are desynchronized, carrying out a desynchronization attack on one tag allows the attacker recognize, and thus trace that tag. – In some RFID systems, an attacker can trace tags by exploiting flaws in the communication-layer and physical-layer simultaneously. Chothia and Smirnov demonstrated [16] that e-passports can be traced by sending a previously observed message to it. It turns out that the e-passport from which the message originated takes significantly longer to respond than a different e-passport would. Therefore, an attacker can trace tags by sending such messages and carefully measuring the time it takes for an e-passport to respond.
6
Application-Layer Attacks
Application-layer attacks target the application implemented by the RFID system. Therefore, if the RFID system is solely used for identification of items, the application-layer does not implement any protocols. However, RFID tags are becoming more powerful and in some cases the contact interface of a smart card is replaced by a contactless interface using RFID technology. In such cases, the card becomes an RFID tag and care must be taken that the application-layer protocols do not leak any privacy-sensitive information. 6.1
E-go Transaction Data
The e-go system. In 2008, an electronic fare collection system, called e-go, was introduced for public transportation in Luxembourg. E-go is an RFID-based system in which users hold RFID tags and swipe them across RFID readers in buses and on stations. Users can purchase a book of virtual tickets which is loaded on the tag. Upon entering a bus a user swipes his e-go tag and a ticket is removed from it. Since most RFID readers of the e-go system are deployed in buses the e-go is an off-line RFID system [17]. Readers do not maintain a permanent connection with the back-end, but synchronize their data only infrequently. Since readers may have data that is out-of-date and tags may communicate with multiple readers, off-line RFID systems store data on the tags. To store, retrieve, and interpret the data stored on an RFID tag, the RFID system needs to implement application-layer protocols. The RFID tags used for the e-go system are MIFARE classic 1k tags. These tags have 16 sectors that each contain 64 bytes of data, totaling 1 kilobyte of memory. Sector keys are needed to access the data of each sector. Garcia et al. [18,19] recently showed that these keys can be easily obtained with off-the-shelf hardware. The data on the tag must therefore be considered to be freely accessible by anybody with physical access to the tag. We thus extend the attacker model from Section 3 to allow attackers to access the data on a tag.
200
T. van Deursen
Transaction data. Using unprotected tags for public transportation ticketing has obvious security drawbacks. The data can be modified, restored, and even corrupted. Although it is hard to prevent fraud it can be detected and it can be confined by regularly blacklisting abusive tags. If personal data is stored on an unprotected tag, then the privacy of the user is at stake. On tags in an off-line RFID system such as e-go one expects to find, for instance, the products purchased, the number of unused virtual tickets, and the date and time of the last swipe. A similar fare collection system in The Netherlands stores the date-of-birth of the card-holder and the last 10 transactions on the RFID tag. Researchers have discovered that attackers can recover this data by surreptitiously reading a user’s tag [20]. The transaction data provides a history of where the card holder has been on specific dates and times. An attacker could recover this data to profile the users of an RFID system. Such an application-layer attack is more powerful than attacks on the physical layer and communication layer since the attacker does not have to be present when the user swipes his card. He obtains this information from the data stored on the tag. To understand the transaction data, the attacker must know on which memory location on the tag it is stored and how it is encoded. We will now describe how an attacker can isolate and then decode the encoded transaction data. Isolation. To recover the address of the transaction data an attacker can use an e-go tag with a book of 10 tickets on it. Upon swiping the tag a ticket is removed and a transaction is written to the tag. A common technique in digital forensics to recover data is to create memory dumps of devices [21]. The attacker can repeatedly swipe the tag and dump the memory to obtain a set of dumps. A MIFARE 1k tag’s memory consists of 16 sectors each of which may contain data. In comparing memory dumps of the tag before and after swipes, only few sectors appear to be updated during a swipe. Five sectors are written to in a cyclic manner and are very similarly structured. It turns out that these five sectors contain the transaction data. Decoding. If we know the location of the date and time information, all that remains is to recover how the date and time are encoded. The encoding can be recovered using the date and approximate time at which the attacker swiped the cards. Table 1(a) gives the raw data and the date of a swipe for a subset of the swipes and Table 1(b) gives similar data for the time of the swipe. A standard way of encoding date and time is to select a reference date or time and to store the number of days, minutes, seconds, or milliseconds since the reference date or time [22]. The example dates in Table 1(a) are the same if the date of the swipe is the same, but differ by 1 if the date differs by one day. An educated guess suggests that these bits represents the number of days since a particular reference date. Indeed, 01000101001111 in base 2 (4431 in base 10) indicates that the first swipe occurred 4431 days after 01/01/1997: on 18/02/2009. A similar analysis of the time information in Table 1(b) shows that the time is encoded as the number of minutes elapsed since midnight.
50 Ways to Break RFID Privacy
201
Table 1. Sample data for (a) date and (b) time information (a) Date Raw data 01000101001111 01000101001111 01000101010000 01000101010000 01000101010001
Date 18/02/2009 18/02/2009 19/02/2009 19/02/2009 20/02/2009
(b) Time Raw data Appr. time 01101101000 14.32 10010001011 19.32 01000000011 08.35 01010011011 11.10 01000000001 08:35
Once the attacker has isolated the date and time and discovered how to decode it, he has a simple procedure of performing an application-layer traceability attack. The attacker needs to have brief physical access to an e-go tag. He can then scan the tag and read its contents with his own hardware. The attacker needs to position his reader reasonably close to the tag. Therefore, crowded areas such as shopping centers or buses, or simply when the user leaves his wallet with e-go tag in his jacket, provide excellent opportunities for an attacker to scan the tag. He then has access to the last 5 transactions stored on the tag. The date and time of these transactions can be recovered by the above decoding. The attacker then knows at what times the owner of the tag has swiped his tag. 6.2
Side-Channel Information and Compositionality
Application-layer attacks are not common in RFID systems, since most RFID systems do not implement application-layer protocols. A more prevalent type of attack is to combine application-layer and communication-layer information to attack privacy. In practice, these attacks are hard to carry out without being noticed since they often require man-in-the-middle hardware to be installed. – Consider an RFID system that is used for building access. In such a system, the fact that a door opens indicates that the authentication protocol between the tag and the reader was carried out successfully. Such “side-channel” information can sometimes be used by the attacker to attack communicationlayer protocols. An example of such an attack is given by Gilbert et al. [23] where the attacker performs a man-in-the-middle attack on a communication protocol and uses the application-layer information to trace a tag. – An equally complicated attack abuses the fact that an RFID tag implements more than one application, for instance an identification protocol and an ownership transfer protocols. These applications implement different communication-layer protocols P1 and P2 each of which could be untraceable in isolation. However, in some situations the messages of protocol P1 can be combined with messages of P2 to trace a tag. In [24] a traceability attack that combines messages from a tag-authentication protocol and reader-authentication protocol is described.
202
7
T. van Deursen
RFID Profiling
In the previous sections, we have described attacks against all layers of the RFID communication model. To maintain the privacy of a user, all these layers must be properly protected. But even if all RFID tags are untraceable, the fact that a person carries a collection of different tags can make him traceable. Recall that if a tag is untraceable it cannot be distinguished from any other tag of the same type. However, it can be easily distinguished from tags of a different type or brand. An attacker can query a tag with his rogue reader to find out what protocols it runs and hence discover the type of the tag. If everybody carries the same number of tags of the same types the attacker gains no information. However, if people carry different sets of tags an attacker can create a profile of them by scanning all their tags and registering their type. The attacker can later recognize a person if he observes the same profile. The attacks shown in the previous sections abuse design flaws that allow an attacker to trace one particular tag. Since only one person carries that tag, the attacker actually traces that person. RFID profiles, however, may be shared among a large set of people. If an attacker observes the same profile twice, he cannot be sure that he observed the same person twice. Therefore, untraceability becomes a probabilistic property. Obviously, if fewer people share the same profile, the probability that two observations of that profile belong to the same person increases. In order to show that the privacy loss due to a person’s RFID profile can be significant, we construct and analyze a possible scenario. 7.1
Scenario: United Kingdom
To create a representative data set we use statistical data on inhabitants of the United Kingdom. We study the case where driving licenses, bank cards and store loyalty cards contain RFID tags. We make the following assumptions. – Each of these RFID tags is untraceable and can, therefore, not be distinguished from other RFID tags of the same type. For instance, a Barclays bank card cannot be distinguished from another Barclays bank card, but it can be distinguished from a driver’s license or from an HBOS bank card. – All types of cards are distributed among the population independently at random. – Unless stated otherwise, the probability that a person carries tag of type A is independent of the probability that he carries a tag of type B for any two types of tags A and B. – If a person possesses an RFID tag, he will always carry it on him. Since different types of tags can be distinguished, the fact that a person carries a certain type of tag reduces his privacy. After all, the person can be distinguished from people who do not carry a tag of the same type. A natural way to express privacy loss is by computing the entropy of a profile [25]. Entropy expresses the uncertainty of a random variable and we will measure it in bits. For convenience, we will refer to the entropy of a person instead of the entropy of the random variable associated with the information of a profile.
50 Ways to Break RFID Privacy
203
For instance, there are about 61.6 million inhabitants in the United Kingdom. If we have no identifying information about a random unknown inhabitant then the entropy is log2 (61600000) ≈ 25.9 bits. Learning a fact about a person decreases the uncertainty about that person and thus the entropy. If a fact occurs with probability Pr[X], the entropy reduction is − log2 (Pr[X]) bits. We will now analyze how much privacy is lost if we know which RFID tags are carried by inhabitants from the United Kingdom. Driver’s licenses. According to the Department for Transport, 34.7 million out of 61.6 million inhabitants of the United Kingdom possess a driver’s license [26]. Carrying a driver’s license thus reduces the entropy by − log2 (Pr[License]) = − log2 (34700000/61600000) = 0.82 bits. Bank cards. There are an estimated 54M checking accounts [27]. The 5 largest banks in terms of market share are Lloyds TSB (19%), RBSG (17%), Barclays (15%), HSBC Group (14%), and HBOS (14%). Nationwide has a market share of 5%. We now assume that exactly one RFID tagged bank card exists for each checking account of these six banks and that the market shares correspond to the number of checking accounts. If inhabitants carry at most one bank card, then carrying a Nationwide card reduces a person’s entropy by − log2 (Pr[Nationwide]) = − log2 (0.05 · 54000000/61600000) = 4.51 bits. Store loyalty cards. An estimated 85% of consumers are part of a store loyalty program [28]. For simplicity, we take this to mean that there are 0.85·61.6 = 52.4 million store loyalty cards in circulation distributed among all grocery chains according to their market shares. We assume that only 6 chains have RFIDtagged loyalty cards: Tesco, Asda, Sainsbury’s, Morissons, Co-operative, and Netto. Their respective market shares are 30.6%, 16.9%, 15.7%, 11.3%, 9.1%, and 0.8% [29]. In our scenario, inhabitants may go shopping at different grocery stores and may thus carry more than one loyalty card. The entropy reduction of a person carrying a Co-operative card is thus − log2 (0.091 · 0.85) = 3.69 bits and of a person carrying no Tesco card it is − log2 (1 − (0.306 · 0.85)) = 0.43 bits. A person carrying a Co-operative and a Morissons card, but no other store loyalty cards loses 7.95 bits of entropy. Implications. Each of the observations about a person’s driver’s license, bank card, and loyalty cards reduces the entropy. For instance, a person with a driver’s license, a Nationwide card, and Co-operative and Morissons loyalty cards will lose 13.7 bits of entropy. Therefore, only one in every 213.7 ≈ 13300 inhabitants will have the same profile. The situation becomes worse when people carry “rare” cards. Such cards could be company badges, foreign driver’s licenses, or loyalty cards of small stores. In our scenario, a person with no driver’s license, a Nationwide bank card, and a Co-operative and Netto loyalty card will lose 17.6 bits of entropy, meaning only one in approximately 200000 will have the same profile.
204
T. van Deursen
It is important to note that to carry out an attack that exploits “RFID profiles”, no flaw in the design of RFID systems is abused. The attacker only uses the information concerning the types of tags carried by a person to fingerprint that person. In our limited scenario, tracing a person is already possible based on profiles of driver’s licenses and some bank cards and loyalty cards. Obviously, fingerprinting becomes more effective as more RFID systems are being deployed and people carry more RFID tags on them.
8
Conclusion
The introduction of RFID tags into items we always carry with us has sparked concerns about user privacy. Understanding the attacks against RFID systems is a first step towards defending a person’s privacy. We have provided a classification of untraceability attacks according to the RFID system layer they attack. Untraceability can be violated at every layer and must therefore be studied at each layer. We have described two new attacks: one on a communication-layer protocol and one on an application-layer protocol. Finally, we have shown that even if all layers are properly protected, the “RFID profile” of a person may still allow an attacker to trace him. Acknowledgments. The author thanks Saˇsa Radomirovi´c, Sjouke Mauw, and the anonymous reviewers for valuable comments that helped improve this work.
References 1. Hoepman, J.-H., Hubbers, E., Jacobs, B., Oostdijk, M., Schreur, R.W.: Crossing borders: Security and privacy issues of the european e-passport. In: Yoshiura, H., Sakurai, K., Rannenberg, K., Murayama, Y., Kawamura, S.-i. (eds.) IWSEC 2006. LNCS, vol. 4266, pp. 152–167. Springer, Heidelberg (2006) 2. Pfitzmann, A., Hansen, M.: A terminology for talking about privacy by data minimization: Anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity management, v0.34 (2010) 3. Damg˚ ard, I., Pedersen, M.Ø.: RFID security: Tradeoffs between security and efficiency. In: CT-RSA, pp. 318–332 (2008) 4. Juels, A.: RFID security and privacy: a research survey. IEEE Journal on Selected Areas in Communications 24(2), 381–394 (2006) 5. Langheinrich, M.: A Survey of RFID Privacy Approaches. Personal and Ubiquitous Computing 13(6), 413–421 (2009) 6. Kim, I.J., Choi, E.Y., Lee, D.H.: Secure mobile RFID system against privacy and security problems. In: SecPerU 2007 (2007) 7. Zimmermann, H.: OSI reference model — The ISO model of architecture for open systems interconnection. IEEE Transactions on Communications 28(4), 425–432 (1980) 8. Avoine, G., Oechslin, P.: RFID traceability: A multilayer problem. In: S. Patrick, A., Yung, M. (eds.) FC 2005. LNCS, vol. 3570, pp. 125–140. Springer, Heidelberg (2005)
50 Ways to Break RFID Privacy
205
9. Hancke, G.P.: Eavesdropping Attacks on High-Frequency RFID Tokens. In: Workshop on RFID Security – RFIDSec 2008 (2008) 10. Hancke, G.P.: Practical attacks on proximity identification systems (short paper). In: IEEE Symposium on Security and Privacy, pp. 328–333 (2006) 11. ISO/IEC 14443: Identification cards – Contactless integrated circuit(s) cards – proximity cards (2001) ˇ 12. Danev, B., Heydt-Benjamin, T.S., Capkun, S.: Physical-layer identification of RFID devices. In: USENIX, pp. 125–136 (2009) 13. van Deursen, T., Radomirovi´c, S.: Algebraic attacks on RFID protocols. In: Markowitch, O., Bilas, A., Hoepman, J.-H., Mitchell, C.J., Quisquater, J.-J. (eds.) WISTP 2009. LNCS, vol. 5746, pp. 38–51. Springer, Heidelberg (2009) 14. Henrici, D., M¨ uller, P.: Hash-based enhancement of location privacy for radiofrequency identification devices using varying identifiers. In: PerCom Workshops, pp. 149–153 (2004) 15. Avoine, G.: Adversary model for radio frequency identification. Technical Report LASEC-REPORT-2005-001, EPFL (2005) 16. Chothia, T., Smirnov, V.: A Traceability Attack against e-Passports. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 20–34. Springer, Heidelberg (2010) 17. Garcia, F.D., van Rossum, P.: Modeling privacy for off-line RFID systems. In: Gollmann, D., Lanet, J.-L., Iguchi-Cartigny, J. (eds.) CARDIS 2010. LNCS, vol. 6035, pp. 194–208. Springer, Heidelberg (2010) 18. Garcia, F.D., de Koning Gans, G., Muijrers, R., van Rossum, P., Verdult, R., Schreur, R.W., Jacobs, B.: Dismantling MIFARE classic. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 97–114. Springer, Heidelberg (2008) 19. Garcia, F.D., van Rossum, P., Verdult, R., Schreur, R.W.: Wirelessly pickpocketing a MIFARE classic card. In: IEEE Security and Privacy, pp. 3–15 (2009) 20. Teepe, W.: In sneltreinvaart je privacy kwijt (in Dutch). Privacy & Informatie (October 2008) 21. Swenson, C., Manes, G., Shenoi, S.: Imaging and analysis of GSM SIM cards. In: IFIP Int. Conf. Digital Forensics, pp. 205–216 (2005) 22. Boyd, C., Forster, P.: Time and date issues in forensic computing - A case study. Digital Investigation 1(1), 18–23 (2004) 23. Gilbert, H., Robshaw, M., Sibert, H.: An active attack against HB+ - A provably secure lightweight authentication protocol. Cryptology ePrint Archive, Report 2005/237 (2005) 24. van Deursen, T., Radomirovi´c, S.: EC-RAC: Enriching a capacious RFID attack collection. In: Ors Yalcin, S.B. (ed.) RFIDSec 2010. LNCS, vol. 6370, pp. 75–90. Springer, Heidelberg (2010) 25. Eckersley, P.: How unique is your web browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010) 26. Department of Transport Statistics: Table nts0201: Full car driving licence holders by age and gender: Great Britain, 1975/76 to 2009 (2009), http://www.dft.gov.uk/pgr/statistics/datatablespublications/nts/ 27. Office of Fair Trading: Personal current accounts in the UK (2008), http://www.oft.gov.uk/OFTwork/markets-work/completed/personal/ 28. Bosworth, M.H.: Loyalty cards: Reward or threat? (2005), http://consumeraffairs.com/news04/2005/loyalty_cards.html 29. TNS Worldpanel: Tesco share turnaround (plus an update on grocery price inflation) (2009), http://www.tnsglobal.com/news/news-56F59E8A99C8428989E9BE 66187D5792.asp%x
The Limits of Control – (Governmental) Identity Management from a Privacy Perspective Stefan Strauß Institute of Technology Assessment, Austrian Academy of Sciences, Strohgasse 45/5, 1030 Vienna, Austria
[email protected] Abstract. The emergence of identity management indicates that the process of identification has reached a stage where analog and digital environments converge. This is also reflected in the increased efforts of governments to introduce electronic ID systems, aiming at security improvements of public services and unifying identification procedures to contribute to administrative efficiency. Though privacy is an obvious core issue, its role is rather implicit compared to security. Based on this premise, this paper discusses a control dilemma: the general aim of identity management to compensate for a loss of control over personal data to fight increasing security and privacy threats could ironically induce a further loss of control. Potential countermeasures demand user-controlled anonymity and pseudonymity as integral system components and imply further concepts which are in their early beginnings, e.g., limiting durability of personal data and transparency enhancements with regard to freedom of information to foster user control. Keywords: privacy, IDM, e-ID, user control, e-government, transparency, freedom of information.
1 Introduction The role of identity is changing in the information society as every day life becomes increasingly pervaded by information and communication technologies. Novel and more sophisticated online services are emerging and transaction services are becoming mainstream activities [1]. Together with a significant increase in personalization, a growth in the provision and processing of personal data is inevitable. This development reinforces concerns about security and induces a certain demand to facilitate individuals in controlling their personal data and safeguarding their privacy. Identity management (IDM) deals with this demand and has become an emerging field of research in the information society [2]. E-government was one important trigger for the introduction of systems for electronic identity management (e-IDMS). Functional equivalents to traditional forms of identification in service relationships have to be developed for a digital environment. Thus, many governments in Europe and world-wide have already introduced e-IDMS or are about to do so. Most of the current systems are based on smart card technology as it allows to combine possession (i.e., the card) and knowledge (i.e., a PIN) and thus provides a S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 206–218, 2011. © IFIP International Federation for Information Processing 2011
The Limits of Control – (Governmental) IDM from a Privacy Perspective
207
higher level of security than knowledge-based concepts (i.e., username and password) without a physical device. The carrier device for the electronic ID (e-ID) is not necessarily a chip card; there are also other tokens possible (e.g., mobile phones or USB-devices). But as chip cards already enjoy a broad range of use (e.g., ATM cards, social security cards), these are the preferred tokens [3; 4; 5]. The e-ID usually fulfills two functions: the unique identification of a person and the authenticity of her request. The primary intent is to enable and strengthen secure and trustworthy interactions between government, citizens and businesses. Further intentions aim at improving security of e-commerce and at enabling new business models. Governments expect higher levels of security, efficiency and cost-effectiveness of electronic communication and transactions to be major benefits of a national e-IDMS, for the public administration itself as well as for citizens and businesses. The two central objectives of this trend towards national e-IDMS are: to improve security of online public services and to unify identification and authentication procedures of these services. Identification is a core function of governments and thus the creation of national eID systems implies far reaching transformations with many different aspects involved (e.g., technological, organizational, legal, political) [6], which contribute “to alter the nature of citizenship itself” [7]. Thus, e-ID is more than a device for citizen identification; it becomes a policy instrument. Following the distinction between “detecting” and “effecting” tools of government [8], the e-ID more and more shifts from being a “detecting” tool to an “effecting” tool. While the former primarily addresses an instrument for supporting administrative procedures such as the ascertainment of identity in public services, the latter terms an instrument for governments to enable services and to impact societal and political objectives [3]. This is inter alia reflected in information society policies of the European Union: an eIDMS is seen as a “key enabler” for e-government [9]. The vision is to set up a “panEuropean infrastructure for IDM in support of a wide range of e-government services” [4]. Introducing national e-ID (and in a long term view also of an interoperable e-IDMS for Europe) is also seen as instrument to fight identity fraud and terrorism [4]. According to the EU action plan i2010, “one safeguard against identity fraud” is the “[a]ssertion of the authenticity of online identity” and the “easier ownership and management of personal/business data” [9]. Privacy is obviously of vast importance for e-ID. However, current governmental e-IDMS developments seem to explicitly focus at improving administrative efficiency and security, while privacy seems to be a rather implicit objective. The sometimes tense relations between privacy and security1 are also visible in the e-ID discourse (cf. [6], [7]). The capability of an e-IDMS to enhance privacy naturally depends on the concrete system implementation and its surrounding framework it is embedded in. This paper aims to contribute to make the treatment of privacy in (governmental) IDM more explicit in the e-ID discourse and to reveal potential impacts in this regard. Of special interest are the limits of IDM regarding user control and self-determined handling of personal data and relevant aspects for overcoming these limits. The 1
Security in the e-ID context primarily means information security not national security although there are many intersections between both. However, a detailed incorporation of national security aspects would exceed the scope of this paper. For an in-depth analysis of identity cards with a focus on national security issues see e.g., [3] [7] [[24].
208
S. Strauß
analysis includes major privacy aspects of IDM, their implementation in national eIDMS as well as an assumed control dilemma of IDM. Based on these issues, potential threats to individual privacy and emerging challenges will be discussed. To some extent the paper ties in with results of a previous comparative research project (conducted in 2008/9) about selected national e-IDMS [10]. The author was involved in analyzing the innovation process of the Austrian system, where a combination of different methods were applied: 20 interviews with major e-government stakeholders, complemented by a literature review, an analysis of official documents, discussion statements, technical specifications, and practical tests in a user role. The paper is structured as follows: Section 2 describes IDM in a privacy context and outlines preconditions for privacy-enhancing IDM; Section 3 deals with their implementation in governmental e-IDMS. In Section 4 the control dilemma of e-IDM and its major determinants are explained. Section 5 discusses how to resolve this dilemma and Section 6 concludes with the major findings of the paper.
2 IDM and Concepts for Privacy Protection A general definition describes IDM as “the representation, collection, storage and use of identity information” [11]. Of vast importance for IDM is the (often neglected) fact, that every individual is not represented by one universal identity, but has multiple identities in different contexts. There is no such thing as ‘the identity’ [12] and hence IDM can be more specifically described as “managing various partial identities (...) of an individual person (...) including the development and choice of the partial identity and pseudonym to be (re-)used in a specific context or role” [12]. Privacy-enhancing IDM combines privacy and authenticity [13]. Obviously, central privacy principles (such as commensurability, purpose limitation, data minimization, transparency) have to be fulfilled to allow for informational self-determination [14] [1]. 2.1 User Control User-centricity and the users’ control over their personal data respectively their identities are essential aspects for privacy protection. The e-ID should facilitate users in controlling which data they want to share and in which different contexts these data are allowed to be processed and linked. In cases where this is not feasible users should at least be able to comprehend who processed their data, on what foundation and for which purpose [13; 14] [1]. Managing different (partial) identities is important for purpose limitation, where only data absolutely required for a specific context should be processed. In this regard, the concepts of anonymity and pseudonymity are relevant. In conjunction with governmental e-IDMS, one might presume that in every context which does not require identification or even demand for anonymity, the e-ID should not be used at all. However, due to tendencies towards ubiquitous computing which imply a significant decrease of areas of anonymity [15], this might be insufficient. Particularly, when considering that an “identity as a set of attribute values valid at a particular time can stay the same or grow, but never shrink” [16]. Hence, it seems expedient to incorporate anonymity as an integral element into the system.
The Limits of Control – (Governmental) IDM from a Privacy Perspective
209
2.2 Unlinkability The linkage of personal data for profiling beyond the individual’s control is a particular menace to privacy, which primarily derives from the use of unique identifiers. Thus, unlinkability is one crucial property that must be ensured to prevent “privacy-destroying linkage and aggregation of identity information across data contexts” [1]. The efficient implementation of unlinkability is a sine qua non of privacy-enhancing IDM [17]. A precondition is the use of pseudonyms in different contexts according to the intended degree of (un)linkability. In [12], five forms of pseudonyms are described: transaction pseudonyms enable the highest level of unlinkability and thus strong anonymity. Each transaction uses a new pseudonym, which is only applied for a specific context2. A person pseudonym, i.e., a substitute for the civil identity of the holder (e.g., a unique number of an ID card, phone number or nickname) provides the lowest anonymity level. Moderate linkability is given by role and relationship pseudonyms, which are either limited to specific roles (e.g., client) or differ for each communication partner. Closely connected to unlinkability is the significance of decentralized data storage as well as context separation. Hence, personal data should be separated in as many different domains as possible to prevent data linkage [1]. For data minimization only data that are absolutely inevitable should be processed (e.g., age verification does not demand knowing the date of birth. A query whether the date is over or under the required date is sufficient).
3 Privacy Incorporation of Governmental e-IDMS Several different dimensions including technical, organizational, legal and sociocultural aspects influence a system’s particular shape. This is one explanation for European e-IDMS having several differences regarding technical design and privacy features [5], whereas the latter are “by no means universally implemented” [18]. Although privacy-enhanced techniques for public key infrastructure (PKI) have already existed for several years, these techniques have scarcely been adopted in mainstream applications and e-ID card schemes [18]. Hence, the level of unlinkability is rather diverse. Exceptions are e-IDMS in Austria and Germany, which “have taken some important steps towards unlinkability and selective disclosure” [18]. Most European systems utilize unique identifiers which are often derived from national registers (e.g., social security, public registration). Some store these identifiers directly on the device (e.g., Belgium), others in an encrypted form. In Austria the unique identifier from the Central Register of Residents (CRR-no.) is used, which is unique for every citizen. The device only contains an encrypted version of the CRRno., the so-called sourcePIN. For identification during services, this sourcePIN is not used directly either. Instead, sector-specific identifiers (ssPINs) based on an irreversible cryptographic function are created, which are unique for 26 sectors; one ssPIN allows unique identification only in the corresponding sector. Such sectors are for instance tax, health and education. To prevent privacy abuse, storing an ssPIN is restricted to the sector it belongs to or that is allowed to use it [10]. The sophisticated concept is similar to a relationship pseudonym as a person is always identified with the same ssPIN in a specific sector. Although this approach theoretically allows users 2
E.g., transaction authentication number (TAN) method for online banking.
210
S. Strauß
to manage partial identities, pseudonymity is not sufficiently implemented yet and serious privacy concerns remain. The ssPINs are used to avoid linkability and are unique for each person. However, one of the 26 sectors is delivery (of verdicts, official documents etc.) which is part of almost every public service. As every authority providing a service that includes delivery is able to process the corresponding ssPIN, critics suspect that privacy infringement is feasible as a person’s data is linkable with this PIN over different contexts [10]. As identity data (e.g., name, address, date of birth) are still being processed in almost every service, the use of ssPINs does not sufficiently protect from illegal data linkage [10] [19]. Processing these data might be necessary for e-government transactions, but not per se for every service (e.g., information services). Currently, the user has neither influence over the pseudonyms used, nor over which of her data is processed in an application. Thus, users have very limited control over their e-ID.
4 The Control Dilemma of e-ID Current e-IDMS are lacking in privacy enhancement, especially as unlinkability is mostly as yet insufficiently provided. This circumstance, combined with the main objectives of IDM can be described as a control dilemma: IDM primarily aims to improve security of e-transactions and unify authentication with privacy as an implicit aim. Or, more generally: the increasing relevance of IDM can be seen as a demand to regain control over personal data flowing in digital environments. On the other hand, tendencies towards e-ID and personalization may lead to further services which require identification. This would imply a significant reduction in anonymity. In other words: the attempt to compensate for a loss of control would ironically, at least from a user’s point of view, induce yet a further loss of control over personal data. The following subsections highlight some critical aspects to explain the dilemma. 4.1 “Identity Shadow” - Data Linkage without Unique Identifiers Due to poor pseudonym management, current e-ID card schemes are often provided with more information than necessary and thus allow “unnecessary disclosure of personal data via linkage between different transactions“ [18]. A basic precondition for unlinkability is that utilization of a pseudonym does not entail further information which allow for data linkage. However, as e-ID usage usually entails further data, these can undermine unlinkability. I subsume these under the term „identity shadow“3. This term comprises all the data appearing in a digital environment which can be used to (re-) identify an individual beyond her control and/or infringe her privacy. One possibility for data linkage is given by utilizing semi-identifying data or quasiidentifiers, which are not necessarily unique but are related to a person [19; 20]. In almost every (e-government) service, a set of common data (CD) is requested or is a byproduct. The common data can be e.g., distinguished in a) person-specific data, which usually has to be entered in web-forms during a user-session (typical examples are name, date of birth, postal address, e-mail address, ZIP code); b) technologyspecific data, which refer to the technical devices involved in the e-ID session (e.g., the number of the smart card, MAC-address, IP-address). This data can be used to 3
In recognition of the work of Alan Westin: Privacy and Freedom, 1967 and the term “Data Shadow”.
The Limits of Control – (Governmental) IDM from a Privacy Perspective
211
gather quasi-identifiers which enable cross-linkage of separated data without the need of a unique ID. Thus, using sector-specific identifiers alone is not sufficient to prevent privacy infringement. Hence, the e-ID itself could become a privacy threat. The size of the identity shadow depends on the amount of data the e-ID entails. E.g., a mobile phone as e-ID device might provide more data than a chip card, such as the mobile phone number, geo-location, the IMEI of the SIM card. Data traces of online activities (e.g., meta data, web browser history) offer further entry points for de-anonymization: e.g., data of web browsers can be exploited to (re-)create a digital “fingerprint” for uniquely identifying a specific user [21]. Social networks offer further ways to gather quasi-identifiers, as demonstrated in [22]. Individual users were de-anonymized by applying the web browser history attack and exploiting information of users' group memberships (social networks have only limited impact on governmental IDM yet. However, further e-ID-diffusion could change this). Further potential threats may arise from protocol data which occur during the creation of elements required for e-ID. Although the function of log files is to detect unauthorized access and protect e-ID abuse, it can also be used for privacy breaches: if every creation and usage of the e-ID items is stored in log files, then these files provide a rather comprehensive profile of the users’ activities in cyberspace. Hence, log files can be abused for profiling activities. Figure 1 shows the different aspects and the idea of the identity shadow:
log-files Data linkage over quasi-IDs
„Shadow“ of e-ID usage
e-ID usage
tax no. profession employment status income data
social security no. symptoms diagnosis medication blood type
nickname age country interests onlinefriends
CD
CD
CD
dsID 1
dsID 2
dsID n
No direct use of unique ID
set of common (semiidentifying) data attributes related to the user suitable as quasi-identifiers person-specific: name date of birth e-mail ZIP sex etc.
technology-specific: card number MAC-address IP-address webbrowser history meta data etc.
ůŝĐĞǁŝƚŚĞͲ/
Fig. 1. The identity shadow. Alice uses her e-ID for different services, e.g., for doing her tax declaration, for different health services or for a social network. The identity shadow describes the problem, that despite of the use of domain-specific identifiers (dsID) for providing unlinkability, data can still be linked over other common data (CD) which can be used to gather quasiidentifiers.
212
S. Strauß
As the identity “never shrinks” [16] the identity shadow cannot be expected to do so either. Current and future trends (increase of browser-based applications, mobile services, cloud computing, RFID, biometrics, etc.) towards pervasive computing environments with a further growth of data traces will make this threat even more challenging. 4.2 Function Creep In our context, the danger of function creep addresses the extended use of identification data for purposes it was not originally intended for. One problem is an incremental obligation to identification, which seems plausible in an increasingly pervasive computing environment. Obligation is not just meant in the sense of legal compulsion for e-ID, where one could argue that the problem might be avoided by keeping e-ID voluntary. However, an increase of a broader range of e-ID services leads to a situation, where the e-ID becomes de facto mandatory [7]. The growth of services demanding identification could lead to a violation of privacy principles such as data avoidance and commensurability [23]. Further e-ID applying might be convenient to some extent for services which demand identification anyway. But in services which essentially do not require identification, this would be of real concern (e.g., information or communication services). With increasing identification anonymity more and more shifts from the norm to the exception. As one objective of governmental e-ID development is also to support e-commerce and to enable further business applications it is conceivable that private businesses extend its usage from securing e-transactions to further contexts where identification is not legally required (e.g., as customer card or even for social networks). There is already evidence for function creep regarding e-ID cards in different countries and contexts (such as described in [3]). Some examples are: usage of e-ID for public libraries, health services, access control, age verification, chat rooms, online report of child abuse, public transport, social networks and online games [3]. Identification and surveillance are strongly interrelated (e.g., [7] offers a detailed analysis), and there are many historical examples for the abuse of personal data for social discrimination and population control (cf. [3] [4] [24]). The process of identification implies the classification and categorization of personal data for rationalizing citizens’ identities [24] [7], because to proof one’s identity requires at least one unique piece of personal data (typically a unique identifier), that serves as identification criteria. One aim of e-IDMS development is to unify the processing of personal data in the back office within public administration to make service provision for citizens and businesses more efficient. While this is an important objective for the public good, it also holds the danger that the classification of personal data leads to social sorting, i.e., “the identification, classification and assessment of individuals for determining special treatment (…)” [24]. The consequence is discrimination of special citizen groups which become classified as suspicious (e.g., unemployed, welfare receivers, criminal suspects, persons with a police record, etc.). This sort of discrimination is of course already a problem without e-IDMS but it might intensify with e-ID when the classification mechanisms become accelerated and lead to automatic decisions which reflect and foster already existing stereotypes or other prejudicial typing [24] [7]. A recent example which could lead to social sorting provides the current
The Limits of Control – (Governmental) IDM from a Privacy Perspective
213
discussion in Germany about the creation of an electronic alien card analog to the recently created personal e-ID. While the storage of a fingerprint on the e-ID for Germans is voluntary, this storage is planned to be compulsory on this alien e-ID card.4 For good reason, i.e., to relativize power, governmental sectors are separated in democratic societies. Lacking context-separation in e-IDMS would imply linkability of per se separated domains. With tendencies towards centralizing identity-related data which flow “through a small number of standardized infrastructure components“ [1], the vulnerability e-IDMS increases, entailing further risks for privacy infringement as data storage, linkage, and profiling from commercial as well as governmental institutions are facilitated with a “pervasive IDM layer“ [1]. The danger of function creep intensifies in the angle of recent political measures towards extended monitoring of online activity for preventing crime: e.g., European Data Retention Directive5, internet content-filtering plans6 (i.e., against child abuse and copyright offense), or the INDECT project aiming to merge several surveillance technologies (e.g., CCTV, data mining, automatic threat detection) into one intelligent information system7.
5 Resolving the Dilemma – Towards Transparency-Enhancing IDM The ways out of this dilemma require measures embracing different determinants of privacy to foster the effectiveness and controllability of privacy protection. A necessity for adapting privacy regulations to the changed requirements due to new technologies has been pointed out by privacy experts for many years. This necessity becomes very much visible also in the e-ID discourse. Thus, new regulatory approaches might be demanded to cope with the challenges of electronic identities. However, this alone might not be sufficient as „lawful collection and processing of personal data does not prevent per se unethical or unjust decisions based on them“ [14]. Hence, a combination of different measures involving technology as well as policy aspects is required. One crucial point is how to compensate the imbalance regarding this control over personal data between citizens and governments, as citizens have very limited control yet. This requires an explicit focus on improving user control in combination with privacy-enhancing IDM. Hence, user-controlled linkability of personal data based on thorough data minimization [12] and purpose limitation. One crux is the implementation of anonymity and pseudonymity as integral system components. Only few e-IDMS use pseudonym approaches which provide a certain degree of unlinkability and contribute to improving the security of 4
http://www.heise.de/newsticker/meldung/ElektronischeAufenthaltskarte-fuer-Nicht-EU-Buerger-in-der-Diskussion1083049.html 5 http://epic.org/privacy/intl/data_retention.html 6 http://www.ispreview.co.uk/story/2009/10/16/uk-mps-proposeaction-to-filter-internet-traffic-and-stop-illegal-p2p-cutoffs.html 7 http://www.telegraph.co.uk/news/uknews/6210255/EU-fundingOrwellian-artificial-intelligence-plan-to-monitor-public-forabnormal-behaviour.html http://www.indect-project.eu/
214
S. Strauß
e-transactions. If at all applied, e-IDMS so far always pre-create pseudonyms giving users very limited control over their e-ID as there is no possibility to use the e-ID for self-determined creating and managing pseudonyms [10; 18]. Providing pseudonym management as an additional option would enhance informational self-determination as one could freely handle her pseudonyms respectively partial identities and decide whether to be identifiable or not (in any case without ID-obligation). The implementation of unlinkability has to range throughout the whole system, i.e., also the inner system logics and the databases involved. Wherever possible, anonymous credentials or transaction pseudonyms should be used. Otherwise, e.g., when unlinkability is lacking in the back office, then the e-IDMS does not provide effectual privacy protection for the individual and is rather cosmetic. This aspect seems underrepresented in governmental e-ID discourse, as the procedures within the system, i.e., how personal data is being processed are mostly opaque and unrevealed from a users’ point of view. Effective prevention of de-anonymization demands data minimization. As digital data can be copied in no time to an arbitrary number of repositories and per default do not expire, technical approaches to limit data permanence might enhance control over its timely durability. One could then decide whether data should be permanently or temporarily available. An expiration date contributes to privacy as it “is an instance of purpose limitation“ [25]. One recent example for a technical approach of this idea is “Vanish”8, which combines cryptographic techniques with bit torrent technology to create self-destructing data [26]. Similar concepts contribute to privacy-enhancing IDM. However, these approaches are in the early stages of investigation and development, e.g., in [27], the vulnerability of Vanish is described as well as some measures demanded to improve its security. Hence, before a practicable use of these concepts, further research is needed by all means. But even if a more practicable technical approach would already exist, an expiration date is not feasible in many applications and thus its practicability remains limited. However, the idea of an expiration date has to be understood not simply as a technical concept which cannot be realized in a strict sense, but more as a policy concept, which could contribute to induce a paradigm shift from the current status quo of storing data without any limits to a more prudent handling of personal data and information. But still, an expiration date will not solve the problem of imbalanced control over information [25]. This imbalance is a key determinant of the control dilemma. The system needs to have mechanisms integrated that allow citizens and the public sphere, to control the proper and legal use of the data processed within the e-IDMS. Hence, there are also other measures required to enhance user control in addition to technical concepts for privacy enhancement. One aspect of vast importance is transparency. “Without transparency one cannot anticipate or take adequate action“ [28]. Only when users can comprehend how their e-ID is being processed they can protect their privacy. Low transparency and incremental ID-obligation could cause a situation similar to a panopticon: individuals have to reveal their ID without knowledge about whether and for what purpose it is used - analog to the uncertain presence of the guard in the watchtower. Consequences would be self-censorship and limited individual freedom [25]. In this respect, freedom of information (FOI) plays a vital role. It addresses “the right to know” of the public regarding government actions [29], aims to improve their 8
http://vanish.cs.washington.edu
The Limits of Control – (Governmental) IDM from a Privacy Perspective
215
controllability and thus to compensate the “knowledge asymmetry between profilers and profiled“ [28]. Although freedom of information mainly addresses a policy paradigm aiming at scrutinizing governmental policies and actions [23], fostering this paradigm might contribute to privacy enhancement as well. FOI and privacy are strongly interrelated and data protection laws also include FOI principles such as the right to access one’s own personal data. For e-ID, freedom of information mainly implies options to enhance user control in this respect. [28] argues for a shift from privacy-enhancing tools (PET) to transparency-enhancing tools (TET) to limit the threats of autonomic profiling of citizens. The basic idea of TET is to give users the possibility for counter-profiling, i.e., to support users in understanding how the system processes their personal data and “which profiles may impact their life in which practical ways” [28]. While PETs aim to protect personal data, TETs aim to protect from invisible profiling [30]. One important aspect here is supporting users in their right to information and granting them access to their personal records including information about how they are used, for what purposes, by whom and on which legal term. Some e-ID applications already include access to personal records (e.g., some Austrian e-ID services grant access to tax, health or public registration records). However, current e-IDMS do not seem to follow a systematic approach in terms of FOI and transparency enhancement. Services that allow users to view their personal records are currently rather the exception than the norm and the insights users get into the e-IDMS are limited (e.g., citizens could also receive information about the progress of administrative procedures they are involved, access to public registers etc.). The existing applications do not reveal further information about how personal data are used9. This is a crucial aspect for transparency enhancement, as the mentioned knowledge asymmetry can only be reduced if the system allows grasping deeper insights into “the activities of the data controller” [30]. E.g., by providing users not just access to their personal records, but also by revealing information about how these are treated and processed within the system, which user profiles are created by whom for which purpose. Such approaches are important to improve the currently rather opaque situation of e-ID from a user perspective and contribute to raise the users’ awareness and comprehension of how their data is treated in the system. However, fostering transparency on an individual level for the single user is only one aspect of the transparency enhancement. The controllability of an e-IDMS cannot be merely a matter of individual users, because they are not in the position to verify whether personal data is properly protected within the system (e.g., by a certain level of unlinkability). Thus, transparency is not just demanded on an individual level but has to be implemented on a systemic level as well. On the systemic level, approaches to improve transparency of the system mechanisms on a larger scale should be implemented. A scenario might be conceivable where groups or institutions, typically privacy organizations, are enabled to verify proper treatment of personal data in the e-IDMS (e.g., with applications and tools that allow them to make random samples in order to check if unlinkability is given in databases and registers).
9
One exception is the Belgian e-ID that provides information about which government agencies accessed a users’ personal record.
216
S. Strauß
However, transparency enhancement is not to be understood only as a technical approach because the privacy challenges to cope with are primarily societal ones which require adequate measures on at least these two different levels. While one level addresses the implementation of options of improving transparency for the individual interacting with the e-IDMS, another level addresses possibilities on a larger scale for the civil society and institutional actors to comprehend and examine the e-ID system, its architecture and how individual privacy is being protected as well as the purpose of an e-ID processing on what (legal) foundation. These aspects cannot be considered by technical means only but require a deeper understanding of the role of transparency for privacy protection by government actors and stakeholders involved in e-IDMS development.
6 Summary and Conclusions Governmental e-IDs are at the core of the relationship between citizens and governments and thus entail several transformations beyond a technological dimension. They are not just devices for identification but also policy-instruments connected to societal and political objectives. While the primary aims are improving security of online public services and administrative efficiency, privacy is a rather implicit goal somewhere in between these objectives. This is inter alia visible in the often neglected incorporation of privacy features. Some systems already contribute to strengthen security and privacy in e-government to some extent, but with a main focus on security of e-transactions. Crucial aspects, i.e., anonymity and pseudonymity are – compared to unique identification – so far underrepresented and need to become integral system components with respect to a sustainable privacy-enhancing IDM. While this is not yet implemented, further emerging challenges intensify the need for effective privacy concepts. If IDM does not respond appropriately, this could lead to the outlined control dilemma: despite of aiming to (re)gain control over personal data, e-IDMS itself could foster further loss of control over individual privacy. Several issues shape this: insufficient prevention of linkability, increasing threats due to the identity shadow with data traces facilitating linkage and de-anonymization, the evident danger of function creep and further potential surface for privacy abuse entailed by centralized IDM infrastructures. To resolve this dilemma, governmental IDM should first and foremost foster more strict concepts for unlinkability with usercontrolled pseudonymity. Additional approaches might be expedient e.g., an expiration date of personal data to pro-actively support data minimization and purpose limitation. The major challenge is to compensate the imbalanced control over personal information. This implies to give citizens and the public possibilities to effectively control their personal data and the proper processing of personal data within the e-IDMS. Solutions for enhancing transparency on an individual as well as on a systemic level are demanded in line with FOI paradigms, of course in strict accordance with privacy principles. This could also lever accountability of public authorities for legal processing of personal data and thus contribute to citizens' trust in government. Additional research is necessary to reveal further determinants of the dilemma and to design appropriate strategies to cope with the resulting challenges. In order to make the concept of transparency enhancement practicable, further analysis
The Limits of Control – (Governmental) IDM from a Privacy Perspective
217
is demanded regarding its role for privacy protection and its different dimensions, especially on a systemic level. The effectiveness of transparency does not least depend on an appropriate combination of legal and technological aspects as well as on proper system design regarding usability.
References 1. Rundle, M., Blakley, B., Broberg, J., Nadalin, A., Olds, D., Ruddy, M., Guimarares, M.T.M., Trevithick, P.: At a crossroads: ”Personhood” and digital identity in the information society, No. JT03241547, OECD (2008), http://www.oecd.org/dataoecd/31/6/40204773.doc 2. Halperin, R., Backhouse, J.: A roadmap for research on identity in the information society. Identity in the Information Society 1(1), 71–87 (2008) 3. Bennett, C.J., Lyon, D.: Playing the identity card - surveillance, security and identification in global perspective. Routledge, London (2008) 4. Comité Européen de Normalisation (CEN), CEN/ISSS Workshop eAuthentication Towards an electronic ID for the European Citizen, a strategic vision, Brussels (2004), http://www.vaestorekisterikeskus.fi/vrk/fineid/files.nsf/ files/EE116CC13DFC98D0C225708C002BA544/$file/WS-eAuth_ Vision_document+V017.pdf 5. Kubicek, H., Noack, T.: The path dependency of national electronic identities - A comparison of innovation processes in four European countries. Identity in the Information Society (2010), doi:10.1007/s12394-010-0050-2 6. Kubicek, H.: Introduction: conceptual framework and research design for a comparative analysis of national eID Management Systems in selected European countries. Identity in the Information Society (2010), doi:10.1007/s12394-010-0052-0 7. Lyon, D.: Identifying citizens - ID cards as Surveillance. Polity Press, Cambridge (2009) 8. Hood, C.C., Margetts, H.Z.: The Tools of Government in the Digital Age, 2nd edn. Public Policy and Politics. Palgrave Mcmillan, Hampshire (2007) 9. EU Commission: i2010 eGovernment Action Plan: Accelerating eGovernment in Europe for the Benefit of All, No. SEC (2006) 511, Brussels (2006) 10. Aichholzer, G., Strauß, S.: Electronic identity management in e-Government 2.0: Exploring a system innovation exemplified by Austria. Information Polity 15(1-2), 139– 152 (2010) 11. Lips, M., Pang, C.: Identity Management in Information Age Government. Exploring Concepts, Definitions, Approaches and Solutions. Research Report, Victoria University of Wellington (2008), http://www.e.govt.nz/services/authentication/library/docs/ idm-govt-08.pdf 12. Pfitzmann, A., Hansen, M.: Anonymity, Unlinkability, Unobservability, Pseudonymity, and Identity Management – A Consolidated Proposal for Terminology version 0.33 (2010), http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_ v0.33.pdf 13. Clauß, S., Pfitzmann, A., Hansen, M., Herreweghen, E.V.: Privacy-Enhancing Identity Management, 67, Institute for Prospective Technological Studies, IPTS (2005) 14. De Hert, P.: Identity management of e-ID, privacy and security in Europe. A human rights view. Information Security Technical Report 13, 71–75 (2008)
218
S. Strauß
15. Roßnagel, A.: Datenschutz im 21. Jahrhundert. In: Aus Politik und Zeitgeschichte Band 5-6 (Digitalisierung und Datenschutz), pp. 9–15 (2006) 16. Pfitzmann, A., Borcea-Pfitzmann, K.: Lifelong Privacy: Privacy and Identity Management for Life. In: Bezzi, M., Duquenoy, P., Fischer-Hübner, S., Hansen, M., Zhang, G. (eds.) Privacy and Identity Management for Life, 5th IFIP WG 9.2, 9.6/11.7, 11.4, 11.6/ PrimeLife, International Summer School. IFIP AICT, vol. 320, pp. 1–17. Springer, Heidelberg (2010) 17. FIDIS: Privacy modelling and identity. Deliverable-Report 13.6. Future of Identity in the Information Society (2007), http://www.fidis.net/fileadmin/fidis/deliverables/fidiswp13-del13.6_Privacy_modelling_and_identity.pdf 18. Naumann, I., Hobgen, G.: Privacy Features of European eID Card Specifications: European Network and Information Security Agency, ENISA (2009), http://tinyurl.com/2unj3la 19. Priglinger, S.: Auswirkungen der EU-DL Richtlinie auf die E-Gov-Welt. In: Jahnel, D. (ed.) Jahrbuch Datenschutzrecht und E-Government, Neuer wissenschaftl. Verlag, Graz, pp. 267–283 (2008) 20. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002) 21. Eckersley, P.: How Unique Is Your Web Browser? Electronic Frontier Foundation, EFF (2010), https://panopticlick.eff.org/browser-uniqueness.pdf 22. Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A Practical Attack to De-Anonymize Social Network Users. Technical report, iSecLab (2010), http://tinyurl.com/yccfqqd 23. Pounder, C.N.M.: Nine principles for assessing whether privacy is protected in a surveillance society. Identity in the Information Society (IDIS) 1(1), 1–22 (2008) 24. Lyon, D. (ed.): Surveillance as social sorting - privacy, risk and digital discrimination. Routledge, London (2003) 25. Mayer-Schönberger, V.: Delete: The Virtue of Forgetting in the Digital Age. Princeton University Press, Princeton (2009) 26. Geambasu, R., Kohno, T., Levy, A., Levy, H.M.: Vanish: Increasing Data Privacy with Self-Destructing Data. In: Proceedings of the USENIX Security Symposium, Montreal, Canada (2009), http://tinyurl.com/nmwfg9 27. Wolchok, S., Hofmann, O.S., Heninger, N., Felten, E.W., Halderman, J.A., Rossbach, C.J., Waters, B., Witchel, E.: Defeating Vanish with Low-Cost Sybil Attacks Against Large DHTs (2009), http://www.cse.umich.edu/~jhalderm/pub/papers/unvanish-ndss 10-web.pdf, doi:10.1.1.161.6643 28. Hildebrandt, M.: Profiling and the rule of the law. Identity in the Information Society 1(1), 55–70 (2008) 29. Mendel, T.: Freedom of information – a comparative legal survey, 2nd edn. UNESCO, Paris (2008) 30. FIDIS: Behavioural Biometric Profiling and Transparency Enhancing Tools. Deliverable Report 7.12. Future of Identity in the Information Society (2009), http://www.fidis.net/fileadmin/fidis/deliverables/fidis-wp7del7.12_behavioural-biometric_profiling_and_transparency_ enhancing_tools.pdf
Privacy Concerns in a Remote Monitoring and Social Networking Platform for Assisted Living Peter Rothenpieler, Claudia Becker, and Stefan Fischer Institute of Telematics, University of L¨ ubeck, Germany {rothenpieler,becker,fischer}@itm.uni-luebeck.de http://www.itm.uni-luebeck.de/
Abstract. In this paper, we present an online platform for the field of Ambient Assisted Living (AAL) which is designed to support a selfdetermined and safe life for elderly people in their own homes instead of admission to a nursing home. This goal is achieved through in-home monitoring of the clients using wireless sensor networks in combination with a social networking approach based on personal Patrons. The platform further acts as a marketplace for third party service providers which can extend the functionality of the platform, supplying the users with individual made-to-measure assistive services. This paper provides an overview of the concept behind this platform with special focus on privacy issues. Keywords: Privacy, Ambient Assisted Living, AAL, Monitoring, Sensor Network, Social Network, SmartAssist.
1
Introduction
According to figures from the Statistisches Bundesamt (see [1]), the number of people aged over 65 in Germany will increase from 19% to over 30% in the next 50 years, preceded by an increase of people in need of medical care by 58% already in the next 20 years. This phenomenon will not only be limited to Germany since many other countries especially in Asia and Europe are facing severe population ageing in the near future, e.g. Japan, Italy and Spain [2]. All around the world, population ageing will thus lead to an overall increase in the costs of healthcare and annuity in the next 20 to 50 years, but at the same time may cater towards profitable services which increase the quality of life for elderly people. The project described in this paper belongs to the field of Ambient Assisted Living (AAL) and aims at creating an online platform that is designed to support a self-determined and safe life for elderly people in their own homes. It is targeted at providing needs-based care by personal Patrons through the use of remote accessible in-home sensor networks and an online social networking platform. Since Patrons can be represented by relatives, medical doctors and friends, as well as commercial service providers, several privacy issues arise out of this context. The proposed system aims at postponing the admission of elderly people to a nursing home to achieve permanent monitoring. As mentioned in the work S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 219–230, 2011. c IFIP International Federation for Information Processing 2011
220
P. Rothenpieler, C. Becker, and S. Fischer
of [20], ”the loss of privacy is significant when elderly persons are admitted to nursing homes, where they are likely to become permanent and often dependent residents”. In a nursing home, the staff members are required to care for residents’ intimate personal functions, e.g. in a common shower or tub room. In [20] it is further stated that ”although actions to protect residents privacy are viewed as important, staff members may passively accept the premise that privacy as a right and as a norm is not feasible [...] for getting the work done”. In the course of this paper, we will describe the concept and architecture of our service portal in more detail and also identify issues regarding informational self determination as well as the protection of personal information. We will give an overview of related work in the field of social networks and data privacy and will conclude in the last section with an outlook on the steps to be taken in the future.
2
Concept
This section contains an overview of the concept and architecture of the service portal and a description of the two main user groups, Seniors and Patrons. The web based service portal is used to provide the user with increased personal safety, independence, better means of communication as well as easier social interaction and an extendable set of individual assistive services. The basis for this support consists of two data sources: Personal data entered by the user on the one hand and automatically generated data measured by an in-home sensor network on the other hand. This information is aggregated and analysed at the service portal, as depicted in Figure 1, and made accessible to the Patrons, family, friends and service providers of the user, if granted. To provide the user with increased safety, Patrons can monitor the automatically generated sensor readings in combination to the rich information available through the social network like e.g. recent activities or habits of the user. As mentioned above, the users can be divided into two main groups - the Seniors and the Patrons: In SmartAssist, Seniors are elderly people of age 65 or above, who are either retired or unemployed and thus spend a large amount of time in their own apartments. Seniors considered in this work are further set to have a score of at least 23 points at the MMSE (mini-mental state examination / Folstein test) and thus do not suffer from dementia or cognitive impairment yet. Demographic characteristics include the following: According to [3], more than 60% of people aged 65 to 80 are women, increasing to more than 70% beyond the age of 80, while 18% of people aged 65 to 70 are single-person households, increasing to 53% for people in the age of 80 to 85 according to [4] and [1]. The group of Seniors can further be characterized by their motivation for using the platform. On the one hand, Seniors can be interested in preventive care and thus use the system for early detection, diagnosis and prevention of diseases and disabilities (group 1.1.3 in [5]). On the other hand, Seniors are people in need of assistance (group 1.2 in [5]) who are willing to use the services provided through the platform, or both. The role of a Patron can be played by a variety of groups, ranging from family members, friends, neighbours, medical personnel or professional care providers.
Privacy Concerns in an Assisted Living Platform
221
Fig. 1. Architecture overview
The first of these groups can be characterized as caregivers or healthcare assistants (group 2.1.1 in [5]) like children, grand-children or siblings who can watch over the user or close friends or neighbours who can periodically check if everything is in order. The group of medical personnel (group 2.2 in [5]) like doctors, physicians and paramedics will act in the role of a Patron and monitor or diagnose the medical condition of the user. Third party service providers (group 2.4 and 3.8 in [5]) such as security companies, domiciliary care providers or ambulance services may also be incorporated as Patrons for taking immediate action in the case of an emergency. The service portal can be divided into three main functional components which are the monitoring component, the communication component and the services platform, as seen in Figure 1. We will describe each of these components in more detail in the following subsections. 2.1
Monitoring
The monitoring component of the online portal displays all information gathered through the in-home sensor network to the Patrons and gives warning in the case of sudden or subtle changes in the readings. The sensor network gathers information about the daily routine of the Seniors by continuously monitoring the electric power and water consumption, the opening and closing of doors as well as observing ambient temperature and humidity. The following examples are used to outline the information which may be gained through the analysis of the temporal changes in the readings of the inhome sensors. The significance of the following indicators has not been analyzed yet as this will be subject of field tests performed in the course of this project. These indicators merely serve as examples for the quality of information collected about the users which will be discussed in Section 3.
222
P. Rothenpieler, C. Becker, and S. Fischer
Fig. 2. Domestic sensors
Figure 2 shows the sensors located in the home of a prospective user as follows: Monitoring the power consumption of the coffee machine in the kitchen and the reading lamp in the bedroom is used to indicate the circadian rhythm or sleep-wake cycle of the user, while the consumption of the computer and television can provide information about his daily routine and activities. The amount of water consumed in the bathroom can e.g. be used to monitor the usage count and frequency of the water-closet. In combination with the subsequent water consumption at the lavatory, this can further be used as an indication for the personal hygiene of the user. The ambient temperature and humidity in every room and its temporal change can be used to monitor heating and ventilation in the apartment. This can not only be used to show the actual temperature but also to indicate the perceived temperature or heat index (HI). The sensor readings are gathered by the sensor nodes and transmitted to a residential gateway, via a wireless connection. To protect the transmitted information against eaves-dropping or manipulation, the communication between sensors and gateway is encrypted along with authentication and integrity protection. The residential gateway stores and continuously forwards the sensor readings to the service portal via an existing Internet connection using HTTPS. Gateway and server both feature digital signal processing and pattern recognition software components to generate an alarm in case of an emergency which will then be sent to the Patrons using Short Message Service (SMS) or E-Mail and being displayed in the service portal. To reduce the incorrect classification of sensor readings, the sensor network repeatedly undergoes a (re-)training phase in each apartment in which the signal processing and classification algorithms are primed to its specific environmental conditions. These components are not part of the service portal and thus beyond the scope of this paper.
Privacy Concerns in an Assisted Living Platform
2.2
223
Communication
The online portal acts as a frontend for the communication and exchange of experience and information between all users of the platform, Seniors and their Patrons alike. While being comparable to online social networking sites like e.g. Facebook or MySpace, the proposed platform focuses on the personal relationship between the elderly people and their personal Patrons. The use of social networks in assisted living systems is also proposed in [18]. In their work, the authors argue that even though elderly people tend to live alone, they should live embedded in a caring personal network. According to [18], ”this is the most effective way to ensure [...] longevity [...] (and) a good quality of life in the face of high probability of chronic physical and cognitive impairment”. The service portal offers social networking features like profile creation, photo sharing and a messaging service. The Patrons and service providers can use the information stored in the user’s profile, like name, age and hobbies, as metadata for the diagnosis in the remote monitoring component. Especially data about recent activities or mood of the user available through a micro-blogging feature comparable to the Facebook Newsfeed (”What is on your mind?”) could prove to be useful for medical diagnosis. Emotional conditions such as stress or depression or physical conditions such as headache or tiredness can hardly be monitored through the sensors but may be available to the Patrons if the user enters them in his profile. Social information may further not only be useful for medical purposes but can also help to protect the privacy of the users. In their study [20] on personal privacy in a nursing home, the authors analyze the interaction between staff and residents and suggest that privacy violations occur more easily with increasing degree of depersonalization and create less guilty conscience among the violators. According to [20], depersonalization was reduced through presenting pictures and stories about the residents’ past. The authors summarize this as: ”Respect for privacy was respect for the person revealed in those pictures and stories”. In the social network proposed in this paper, Patrons who are not already personally involved with the Seniors, are supplied with photos and stories. This personal information may in turn help to reduce the above mentioned degree of depersonalization and thus help to increase the protection of the Seniors’ privacy. Even though the portal focuses on the relationship to their Patrons, users may also interact with each other and their individual peer groups. The use of messaging services, interest groups or event calendars may increase the social interaction between the users and their relatives, friends and local community. Through the use of messaging and photo sharing or virtual tea parties, the enhancement of social interaction across distances or for people with disabilities could be achieved. Interest groups or event calendars on the other hand can be used as sources of information about local sports events, garage sales or casual game tournaments which can strengthen the participation and integration of the users into their local community and therefore decrease their social isolation.
224
2.3
P. Rothenpieler, C. Becker, and S. Fischer
Services
The aforementioned service providers are not limited to the role of a Patron but can also use the platform as a marketplace to offer their own services. Just like modern cellphones can be extended in their functionality through the Android Market (Google Inc.) or App Store (Apple Inc.) by third party software, the portal allows the incorporation of user-centric services from independent providers. Users are given the choice to grant access to personal information and sensor readings to certain services which in turn produce additional benefit to the user. Services for the target user group of elderly people include e.g. timetables and ticketing for public transportation, ”Meals on Wheels” with a menu customized for each user’s needs along with rating functions providing direct feedback on the quality of the meals to other users. On the other hand, services that health insurance companies could be interested in may be the creation of anonymized statistics about users. Services comparable to the one demonstrated in [6], where a sensor-enabled blister watches over the medication of a patient, are also likely to be integrated into the platform. Depending on the individual use case of each service, the usage of the service may be free-of-charge, financed by advertising or require a monthly fee. Instead of advertising, services may also be partly or fully financed through the health insurance companies, either as a form of health care or prevention or by letting the insurance company collect statistical data about the users. Reduced monthly fees for insurances or the opportunity to collect bonus points that in turn can be exchanged for future prevention or treatment may encourage users to install and participate in third party services.
3
Privacy
The use of social networking sites alone often leads to a variety of privacy and safety concerns. Examples include the stealing and selling of user data such as e-mail addresses, passwords and personal data for spamming or phishing as well as identity theft crimes. Since these sites often contain more than just general information about people like names, addresses and dates of birth, e.g. also photos of users and friend lists, identity theft has become considerably easier and automatable. In the work of [7], an automated attack mechanism is proposed, in which existing user profiles are cloned. The authors also describe how profile information can be collected through sending friend requests to the contacts of the cloned victim. This can then obviously be extended to cross-site cloning of these profiles and so on. Given the fact that today already more than 400 Million users are active on Facebook every day [8], there is a big potential for this kind of attacks and especially inexperienced users are easy targets. The proposed online portal, which contains even more information about the users, including medical or behaviour pattern information, thus is an even more attractive target to attacks because it contains worthwhile information. We will describe the security threats and privacy concerns for each platform’s component in the following subsections.
Privacy Concerns in an Assisted Living Platform
3.1
225
Monitoring
Even though the monitoring component is targeted at fulltime surveillance, the user’s privacy has to be respected and thus total control over individual sensors and the system as a whole has to be granted to the user himself. The user needs to be enabled to make use of his right on informational self-determination - i.e. take control on which data is collected at any time, and the opportunity to view, edit and delete the data which has already been collected. In case the privacy of the user and his visiting guests is not dealt with correctly, this may very likely result in a lowered system acceptance by the user. This may also increase the user’s social isolation if his friends refuse to enter the bugged apartment. To accomplish this, every sensor needs to be equipped with an easily accessible on/off switch and a status indicator in combination with a global privacy switch, e.g. located at the entrance door. A reasonable extension of this manual override could include a scheduling system which can be set to privacy mode on planned occasions/appointments in the future or even an automatic system which detects the presence of visitors when entering the user’s apartment. The user further needs to be able to view all collected data and to edit incorrect information or to delete unwanted information without any restrictions or consequences. The collected data about the user has to be minimized to the amount which is required to effectively fulfil the monitoring task. This will result in the adjustment of the sample rate on the one hand but may also include different mechanisms for data fuzziness. Fuzziness may be achieved through the use of high-pass or low-pass filters or by adding a (reversible) noise signal comparable to the Selective Availability feature of GPS. The data further should be kept anonymous as long as possible and should only contain references to the originating household if this information is needed, e.g. by an ambulance. The use of wireless sensor nodes instead of primitive sensors offers the possibility of in-network analysis and processing of data. In contrast to the user’s control over the system, the system has to be secured against external attacks. The communication between the sensor nodes in the user’s apartment and the gateway as well as the communication between the gateway and the service portal has to be secured. All transmitted data has to be protected against manipulation, eavesdropping or man-in-the-middle attacks. The communication thus needs to be encrypted using similar means as WPA2 for IEEE 802.11 and may face similar attacks, like e.g. wardriving. If an attacker is able to gain access to the sensor data of a specific household, similar attacks such as described in [9] may be used to plan a burglary through prediction of the user’s presence in his apartment. 3.2
Communication
The communication component contains the user interface and lets the user enter, view, edit and delete his personal information along with the sensor information as described in Section 3.1. In addition to the adequate presentation of this information, the user interface needs to provide means for the user to control
226
P. Rothenpieler, C. Becker, and S. Fischer
the access rights to the private information and to be capable of informing or warning the user about which information is currently accessible to or accessed by whom. The definition of access rights through the user needs to be accessible in terms of conformance with standards such as the WCAG1 of the W3C’s WAI2 or the German BITV3 but also needs to be comprehensible for the target audience of users aged 65 and above as described in Section 2. While the accessibility can be reviewed through automatic tests like [10] the suitability or usability for the target audience cannot. The usability of the privacy settings needs to be based upon the use of restrictive default settings (hide-all) which prevents information leakage through in-action of the user. Instead of sharing the information itself, configurable views which contain the shared information, comparable to business cards, may be used. These views should be both owner- and viewer-accessible and may as well include disinformation about the user, depending on the specific viewer or purpose of sharing. The user interface should further protect the user’s information through the use of warning messages, in case confidential information is about to be shared involuntarily or unknowingly. These warning messages need to be unobtrusive and not annoying for the user, to avoid that the user ignores or even deactivates this security mechanism. Even though the information needs to be protected against illegitimate access while at the same time being fully accessible to the user who is the owner of this information, there are certain circumstances under which the user himself should be protected from accessing his own information. Along with the right of informational self-determination, it is commonly accepted [21] that the user has the right of nescience (unknowingness). Given, for example, the automatic diagnosis of a disease through the service portal, this information should not be accessible instantaneously to the user. Instead, this information should be communicated through the use of a human intermediary such as the user’s family doctor. The computer system lacks the ability to correctly inform the user about the actual chance of the diagnosis’ correctness and its consequences while not being able to judge if the user actually wants to be informed about this special diagnosis. Reasons for this nescience do not have to be provided by the user, but can include an existing advance directive or financial and legal consequences of the user towards his insurance company. Comparable to the communication described in Section 3.1, the information exchange between the user’s personal computer, which is used to display the user interface, and the service platform need to be secured using mechanisms like HTTPS to ensure data confidentiality and integrity protection as protection against eavesdropping, manipulation and man-in-the-middle attacks.
1 2 3
Web Content Accessibility Guidelines. Web Accessibility Initiative. Barrierefreie Informationstechnik-Verordnung.
Privacy Concerns in an Assisted Living Platform
3.3
227
Services
The incorporation of external services into the platform is targeted at providing additional safety or benefit for the user through third party extensions. Access to the user’s personal information and sensor readings must be secured and restricted by external service providers in the same way as mentioned in the previous Section 3.2 but faces additional security and privacy concerns. Due to the external nature of these services, all data which once has been transferred is no longer under direct control of the user but of the service provider. Therefore, the user must be informed explicitly about the data which is available to the external services and which information has been transferred to whom. The external service providers must in return be forced either through technical means or at least be obligated to take measures by themselves to ensure that the information is protected. This can be achieved in the form of a Terms of Use Agreement which should be presented in a layered complexity fashion, reflecting the user’s capabilities and willingness to read multiple pages of legal text. This includes that information is not passed on to other external parties and that information is used only for the specific purposes accepted by the user. Data access for the external providers must therefore be supplemented with a secure and authenticated logging function which must also include the possibility for the user to revoke, delete or alter transferred data without any restrictions. Comparable to the use of views described in Section 3.2, the user may as well provide disinformation about him to the external services. The forging of information in this case is not only limited to personal information but may also include virtual or physical tampering with the sensor readings. The readings should be protected from physical tampering and the platform should include a form of virtual tampering, i.e. uncertainty/privacy enhancement features where a user can control the degree of privacy which is added to the data before transmitting it to the service provider. Properties such as authenticity, uncertainty and precision of sensor readings must be determinable to the service provider e.g. in form of a ”similarity-index” to the original data in case the sensor data is used for insurance measures. In contrast to the concept of identity theft introduced in Section 3, the concept of sensor data theft could otherwise result in an insurance fraud.
4
Related Work
The domain of social networks with focus on elderly people already features a variety of different approaches, especially in the USA. Examples for these networks include conventional approaches like Eons.com [11] which was launched 2006 and features game, photo and video sharing components and provides its users with information about topics like health, relationships, fitness, debt, retirement and insurance in the form of interviews and articles. Online platforms like PatientsLikeMe [12] and DailyStrength [13] on the other hand focus on people which share the same disease or face the same problems in their lives.
228
P. Rothenpieler, C. Becker, and S. Fischer
These platforms allow their users to exchange information and experiences about symptoms and treatments of their diseases or strategies and emotional support for people dealing with e.g. depressions, a divorce or midlife-crisis. The online community MedHelp.org [14] features discussion boards on health related topics and cooperates with medical doctors and physicians for ”finding cures together” and, according to information available on the website, is visited by more than 10 million people each month. Along with the increasing amount and availability of personal information through these online platforms, the manageability of access control needs to be increased. The users on the one hand need to be informed about which information is accessible to whom, including the provider of the platform and subscribed services. On the other hand, they need to be able to use the platform’s privacy settings correctly. The work of Vu et al. [15] presents a study on how users read online privacy policies and how well those policies are understood by their readers. The authors investigate the user’s abilities to recall information after reading a policy or to search for information within the policy in response to specific questions. It is stated that regardless of whether the participants were allowed to search the policy while answering the questions or not, the comprehension scores were still low across all participants, featuring only 42% to 55% correct answered questions. The authors conclude that important information in privacy policies is not stated in a way that can be easily understood by users. In addition, the way that privacy relevant information is presented, is confusing to the users. The fact that the participants of this study were in their twenties and experienced with computers and the Internet and further either graduate or undergraduate university students, shows that even experienced users struggle with online privacy policies. To overcome the problem of privacy policy complexity for the user, the use of Agents for Privacy-Preference Specification is proposed in [16], which in our use-case could include the concept of Patrons in addition to or instead of Agents. Agents or Patrons alike could in term help the user specify or correct his privacy settings. Explanations as to how and why users make decisions to share or protect their personal information on the social networking site Facebook is given in [17]. This study demonstrates that even though users are aware of the publicity of their profiles and attempt to disclose sensitive information, the user’s privacy decisions are not reflected upon very often. Once the privacy settings of a user are configured, e.g. on account creation, the users only changed their settings after noticeable and disturbing events, such as a privacy intrusion. It is mentioned further that users often disclose information to a broader audience than really intended, e.g. through their wall posts. The authors suggest solutions to these problems which include enforcing more restrictive privacy settings as well as more restrictive default settings. They also demand for improved user interfaces which provide a more accurate mental model of the outcomes of the user’s settings and actions.
Privacy Concerns in an Assisted Living Platform
5
229
Conclusion
In this paper we presented the current status of one of the components of the SmartAssist project. Development of the proposed service portal will continue in the following months, culminating in the deployment of the system in several households in the city of L¨ ubeck, Germany, in the next two years. The goal of this field test will be to test the medical benefit for the user and the significance of the indicators mentioned in Section 2.1. Our next step, prior to the field test, will be to identify the Seniors needs and requirements in user surveys and interviews. The goal of this analysis will be to pinpoint the most important features for assisting the Seniors with their daily activities from their perspectives. During the field test we will continuously improve our system, considering content and user-experience issues as mentioned in [19]. Special attention will be paid to the privacy concerns raised in Section 3 and the usability of our proposed system will be tested regarding these concerns. The corresponding privacy mechanisms will be expanded to meet the expectations and needs of the users as well as new concerns arising during deployment. Acknowledgements. This paper is part of the research project SmartAssist funded by the Federal Ministry of Education and Research, Germany (BMBF, F¨ orderkennzeichen: 16KT0942). SmartAssist belongs to the research area of AAL and is a joined project between the University of L¨ ubeck, the Vorwerker Diakonie, coalesenses GmbH and the L¨ ubecker Wachunternehmen.
References 1. Georgieff, P.: Ambient Assisted Living - Marktpotenziale IT-unterst¨ utzter Pflege f¨ ur ein selbstbestimmtes Altern. In: FAZIT-Schriftenreihe Forschung, Band 17 (2008); ISSN:1861-5066, MFG Stiftung Baden-W¨ urttemberg, Fraunhofer ISI (2008) 2. Percentage distribution of the population in selected age groups by country, 2009 and 2050. In: World Population Prospects, The 2008 Revision, Summary Tables / Annex Tables, pages 2226. United Nations, Department of Economic and Social Affairs, Population Division (2009) 3. Statistisches Bundesamt. Bev¨ olkerung und Erwerbst¨ atigkeit, Sterbetafel Deutschland (2008) 4. Menning, S.: Haushalte, familiale Lebensformen und Wohnsituation ¨ alterer Menschen. In: GeroStat Report Altersdaten 02/2007. Deutsches Zentrum f¨ ur Altersfragen, Berlin (2007) 5. Eberhardt, B.: Zielgruppen f¨ ur AAL-Technologien und Dienstleistungen. In: AAL Kongress- und Fachbeitr¨ age. AG Kommunikation, BMBF/VDE Innovationspartnerschaft AAL (2009) 6. Brandherm, B., et al.: Demo: Authorized Access on and Interaction With Digital Product Memories. In: 8th Annual IEEE International Conference on Pervasive Computing and Communications, PerCom 2010, March 29 - April 2. IEEE Computer Society, Mannheim (2010) 7. Bilge, L., Strufe, T., Balzarotti, D., Kirda, E.: All your contacts are belong to us: automated identity theft attacks on social networks. In: WWW 2009: Proceedings of the 18th International Conference on World Wide Web. ACM, New York (2009)
230
P. Rothenpieler, C. Becker, and S. Fischer
8. Facebook Press Room: Statistics, http://www.facebook.com/press/info.php? statistics (accessed on June 24, 2010) 9. PleaseRobMe, http://www.PleaseRobMe.com 10. BIK BITV Test, http://www.bitvtest.de 11. Eons, http://www.Eons.com 12. PatientsLikeMe, http://www.PatientsLikeMe.com 13. Dailystrength, http://www.DailyStrength.org 14. MedHelp - online discussion board on healthcare topics, http://www.MedHelp.org 15. Vu, K.-P.L., Chambers, V., Garcia, F.P., Creekmur, B., Sulaitis, J., Nelson, D., Pierce, R., Proctor, R.W.: How users read and comprehend privacy policies. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 802–811. Springer, Heidelberg (2007) 16. Proctor, R.W., Vu, K.-P.L., Ali, M.A.: Usability of user agents for privacypreference specification. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 766–776. Springer, Heidelberg (2007) ISBN 978-3-540-73353-9 17. Strater, K., Lipford, H.R.: Strategies and struggles with privacy in an online social networking community. In: BCS-HCI 2008: Proceedings of the 22nd British HCI Group Annual Conference on HCI 2008, Swinton, UK, pp. 111–119. British Computer Society (2008) 18. Waterworth, J.A., Ballesteros, S., Peter, C., Bieber, G., Kreiner, A., Wiratanaya, A., Polymenakos, L., Wanche-Politis, S., Capobianco, M., Etxeberria, I., Lundholm, L.: Ageing in a Networked Society – Social Inclusion and Mental Stimulation. In: Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2009). ACM, Corfu (2009) ISBN 978-1-60558409-6 19. Marcus, A.: SeniorCHI: the geezers are coming! Interactions 13(6), 48–49 (2006) 20. Applegate, M., Morse, J.M.: Personal privacy and interactional patterns in a nursing home. Journal of Aging Studies 8(4), 413–434 (1994) 21. Weichert, T.: Datenschutzrechte der Patienten, https://www.datenschutzzentrum.de/medizin/arztprax/dsrdpat1.htm
*
Privacy Settings in Social Networking Sites: Is It Fair? Aleksandra Kuczerawy and Fanny Coudert Interdisciplinary Centre for Law & ICT (ICRI) – K.U. Leuven – IBBT, Sint-Miechielsstraat 6, 3000 Leuven, Belgium
[email protected],
[email protected] Abstract. The present paper examines privacy settings in Social Networking Sites (SNS) and their default state from the legal point of view. The analysis will be conducted on the example of Facebook as one of the most popular –and controversial- SNS and one of the most active providers constantly amending its privacy settings. The paper will first present the notion of privacy settings and will explain how they can contribute to protecting the privacy of the user. Further on, this paper will discuss the general concerns expressed by users and data protection authorities worldwide with regard to the changes of Facebook’s privacy settings introduced in February 2010. Focus will be put on the implementation of the fairness principle in SNS. This principle implies that a person is not unduly pressured into supplying his data to a data controller, and on the other hand that the processing of personal data is transparent for the data subject. Keywords: Social Networking Sites, privacy, data protection, privacy settings, fairness principle.
1 Introduction In 2009 a Canadian lady lost her health benefits when her insurance company discovered ‘happy’ pictures of her on her Facebook profile. She was on a sick leave due to a long term depression and following an advice of her doctor, she was trying to get engaged in fun activities. Pictures of her smiling on a beach in Cancun or during a night out were taken by her insurance company as a proof that she is no longer depressed and able to work. Although the company did not confirm that the decision was taken solely on the basis of the pictures it admitted that it uses the popular site to investigate clients [1]. *
Part of the research leading to these results has received funding from the European Community’s Seventh Framework Program (FP7/2007-2013) under grant agreement n° 216483 (PrimeLife) and n° 248726 (+Spaces). The information in this document is provided "as is", and no guarantee or warranty is given that the information is fit for any particular purpose. The above referenced consortium members shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials subject to any liability which is mandatory due to applicable law.
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 231–243, 2011. © IFIP International Federation for Information Processing 2011
232
A. Kuczerawy and F. Coudert
Stories like this do not surprise anybody anymore as every few days there is a new one appearing in the news. With the explosion of the social networking tsunami the level of private life’s exposure has dramatically increased within a short period of time [2]. In views of Mark Zuckerberg, the founder of Facebook, ‘people have really gotten comfortable not only sharing more information and different kinds but more openly and with more people, and that social norm is just something that has evolved over time’ [3]. The increase of public exposure Internet users seem to be willing to accept does not however always reflect a conscious choice, but can rather be explained by the false sense of intimacy given by the computer. The amount of highly personal data voluntarily posted by users on social network sites is enormous. Most of accounts on social networks contain data like birth names and dates, addresses, phone numbers, and pictures as well as ‘sensitive’ data such as sexual preferences, relationship status, political views, and health information. What is astonishing is the fact that users very often do not realize the consequences of making that much information available to the public, or to other unintended recipients such as parents, teachers, employers and many others. Important work in raising privacy awareness of users is done by increasing media coverage of privacy violations in social networks. However, cases such as the ones mentioned above prove how fragile this awareness still is. It goes without saying that different people have different ideas on how much information they want to share with their friends, or how much information they want to hide from some of their contacts. After all, Facebook makes use of Internet a means of socialization. We shall however not forget that what is called Friendship within Facebook environment is not always the same as friendship in the off-line world. The contact list of almost every user is full of real friends, but also of acquaintances, colleagues from work, ex-lovers, friends of friends, and sometimes even people they do not really know. This compilation of different types of contacts often leads to an oversharing of information. Allowing equal access to all information on the profile to all contacts frequently results in unwanted disclosures for example when a grandma sees pictures from a drinking game at a college party [4] or when an employer finds out through a post that a sick leave is actually a nice time off in some exotic resort. In the off-line world people function within different social contexts and roles. For each of those contexts (e.g. e-government, e-commerce, social networks, etc.) and roles (e.g. citizen, consumer, friend, student, employee, etc.) individuals assume different partial identity [5]. According to those contexts, roles and identities they also adjust their behaviour. Such segregation of the contexts, or of the targeted audience, prevents discrediting one role by the information related to another [5] [6]. To some extent this is also possible in SNS1. Just like we create different personas to interact at work and within the closest friends group, we can create different personas on the Facebook profile by creating lists of contacts and adjusting visibility of the profile depending on which persona we want to show to a different group [4]. All examples mentioned above show that most of conflicting situations occur when information posted online is taken out of context because addressed to the wrong recipient. Perfectly admissible behaviours in a close friends’ environment may become totally inappropriate in a work environment. In the off-line world people 1
Such segregation was done in the EU funded project PrimeLife, in the prototype application called Clique. See more at: http://clique.primelife.eu, and in [5].
Privacy Settings in Social Networking Sites: Is It Fair?
233
learn to manage these subtle barriers by adjusting their behaviour to each situation. That way, they can reveal only a part of oneself in a certain context, and show different face in another context [6]. This social ability, however, seems to be a struggle to reproduce in online environments. Social networking sites, as socialization platforms, need to address these concerns and empower their users to reproduce their off-line behaviour in an online environment. This has given way so far to the emergence of technical tools that enable increased granularity in the information disclosed, often designated under the term of “privacy settings”. However, privacy settings have often been criticized for not being easy to manage by users, requiring a complex learning process, and for serving other needs proper to the SNS provider not always in the benefit of users. The business model SNS currently rely on, the free advertisement-based model proper to Web 2.0 environments, push service providers to encourage users to make the more information publicly available to feed, amongst others, their advertisers’ and third parties applications’ needs. We explore in this paper whether the confusion created by not always transparent privacy settings, in addition to regular changes, complies with the fairness principle within the meaning of the 95/46/EC Data Protection Directive (hereinafter DPD). The analysis presented in this paper is conducted on the example of Facebook, as the most popular –and controversial- SNSs worldwide. With 500 million active users [7], Facebook is also the most media-present SNS, and the recognition it gets is not always for positive reasons. Frequent changes of the privacy policy are always highly commented by users, journalists, watchdog organizations and regulators. For these two reasons Facebook constitutes a one of a kind case study providing enough stories to create its own ‘shame chronicles’. Actually, testimonials of the most embarrassing posts and photos are already collected by independent sites like for example Lamebook [8] – a regularly updated proof of users’ low privacy awareness. Despite the fact that this paper focuses specifically on Facebook’s privacy architecture, the main question of the paper applies to all other SNSs which use the function of ‘privacy settings’ - a technical tool designed to allow users to control the amount of information they reveal on their SNS profile. Finally, because of the large popularity of Facebook both in the US and in the EU, similar concerns have arisen in both regions, leading to a common search for the best solution on how to tackle the problem.
2 Privacy Settings – Trick or Treat? 2.1 Privacy Settings: Empowering Users to Manage the Information They Share Privacy settings, present in most of the major social networking sites can be used by the user to adjust the visibility of their profile or of certain information on the profile. As a result, this could eliminate a certain amount of unwanted disclosures and upgrade the level of privacy of the profile. It is however not clear whether users are actually making use of their privacy settings. Some surveys show that very few users decide to change their privacy preferences. Only 20% of them ever touch their privacy settings, according to Facebook Chief Privacy Officer Chris Kelly [9]. A study conducted in 2007 by a security firm confirmed that 75% of users never
234
A. Kuczerawy and F. Coudert
changed the default settings [10]. Some of them are not even aware that it is possible. By contrast, other surveys appoint toward a greater use of privacy setting. Two Pew Research Center studies showed that 66% of teenagers and 60% of adults restrict access to their profiles so that only friends can view it [11].With the help of media, significant attention is given to the risks and benefits of privacy settings. It is in any case undeniable that after numerous articles about undesired effects of oversharing, including examples of disciplinary problems of college students, criminal charges pressed, evidence found for divorce cases and lost jobs, users start to realize that ‘everything you post can be used against you’ [12]. Thanks to these stories, SNS users are more often aware that they are able to adjust their profile and its visibility to better match their needs. The effectiveness of this type of warning can be seen in a growing trend to protect Facebook profiles by changing display names and tightening privacy settings to hide photos and wall posts [13]. Solutions therefore appoint towards an increase of users’ awareness about the use of privacy settings, which should be sufficiently clear and granular to empower users to better manage the information they disclose by distinguishing between the recipients of this information – as they do in the off-line world. Facebook actually offers a large amount of options to its users in the privacy settings to discriminate the recipients of the information uploaded. First of all, they can hide their profile from the public and make it visible only to their friends. Next, they can hide it from search engines, so their profile will not be indexed and will not come up in a Google or other search engine. Another option is a possibility to customize the visibility of certain parts of the profile by adjusting it according to the various audiences of the profile. Such audience segregation can be made by creating lists and grouping contacts depending on a type of relationship, or a level of intimacy. Facebook offers highly granular options in the privacy settings, which allow to adjust a specific visibility for each photo album, separate photo, and even for separate post. What is more, it offers also a possibility to control what a particular contact can see by impersonating that person and seeing the profile from his perspective [4][14]. The ‘view as…’ function is described as a type of a “privacy mirror” technology which provides a “useful feedback to users by reflecting what the system currently knows about them” [14], or in this case, what other users know about them. 2.2 Limited Uses of Privacy Settings Why, despite all the possibilities to control the level of the information disclosure, are there still so many privacy incidents happening on Facebook? With such a powerful and highly granular technical tool, which allows specifying access controls different for each contact, it should be a very popular tool among the users. It enables them to avoid a decontextualisation of the information posted online. According to H. Nissenbaum, all arenas of life constitute contexts that are governed by norms of information flow, and the problems occur when individuals inappropriately transmit information and collapse contexts [15]. She identifies the lack of “contextual integrity” as the main reason of the privacy problems on SNS. Used as an impression management system, privacy settings can definitely allow users to regain the control over their information and eliminate most of the unwanted situations. It is however still not commonly used or understood, and it is often seen as a ‘mysterious’ part of
Privacy Settings in Social Networking Sites: Is It Fair?
235
the profile. A reason for this could be that generally, regulating social behavior by technology seems to be problematic [5] [16].Some commentators, like Grimmelman, argue of social aspects, such as the fact that it is “deeply alien to the human mind to manage privacy using rigid ex ante rules” [16]. This of course depends on the manner it is conducted. Grimmelman was referring to a specific scenario when SNS providers design the entire complexity of social relationships for the users to group their contacts into [16]. Such approach indeed seems to be pointless as it is the users who should describe the categories of contacts they need [5]. An example of a successful attempt to use technology to allow users to segregate their audience groups can be found for example in the EU project PrimeLife and its prototype application Clique. According to Nissenbaum, a general reason why privacy settings are not used as often as we would expect, should be found in that people think about privacy in terms of social roles and not in terms of access-control lists and permissions [15][16]. Another likely explanation is that the privacy settings offered by Facebook are just too difficult to use. Numerous studies show that average users are often confused about them and about the final effects of their choices [17][18]. Most of them simply get lost between all the options. It is a sign that complex interfaces, when not explained properly, can be worse for privacy then less detailed ones [16]. According to Peterson, “superbly powerful and precise technical controls would be too unwieldy and difficult for anyone to use” [4]. It would be a shame however to throw the baby with the bathwater – but is the reconciliation of complexity and simplicity possible? This leads us to the core question dealt with by this paper, namely, whether the tool itself is designed to actually facilitate privacy management. 2.3 The Dark Side of Facebook’s Improved Privacy Settings: Increased User’s Visibility Since the changes introduced in December 2009 and later in March 2010, it is hardly contestable that Facebook provides tools to adequately manage one’s posts. Significant attention to the subject of privacy settings was firstly brought by the highly commented, and equally criticized amendments of privacy settings of Facebook from December 2009. According to Facebook officials, the introduced change provides more control to users and makes the privacy settings section more clear and user-friendly. This however did not manage to stop the flow of criticism by users and privacy organizations [19]. The improvement in privacy setting’s management came with an increase of the data made publicly available by default. The introduced changes allowed access not only to friends or friends of friends, but to every Facebook user [20]. Moreover, such state was actually marked by Facebook as the “recommended” one, and had to be unclicked to limit access to the profile. Another change introduced in December 2009 was the indexing of users’ profiles in search engines. This as well was pre-selected and hidden in one of the sections of privacy settings, in a way that most of users did not realize they had to look for it and deselect it themselves. After a series of negative comments backed up by disappointed users whose mistrust was growing fast, ten major privacy groups filed a complaint to US Federal Trade Commission [21]. The complaint argued that the introduced privacy settings “violate user expectations, diminish user privacy, and contradict Facebook’s own representations” [22]. The
236
A. Kuczerawy and F. Coudert
response to this complaint is still to be seen but the amount of media attention reminds of what happened with Facebook Beacon2, when massive protest led to its bitter end. This proves that the protests of the users can actually have a positive result and influence behaviour of SNS providers. Despite critical reception of the mentioned changes Facebook did not hesitate to introduce even more ‘improvements’ in April 2010. Since then, a group of previously selected third parties is allowed to access users’ accounts. This time again, the relevant box in privacy settings was pre-selected by default. The new feature, called ‘Instant Personalization’ allowed three outside partners of Facebook: Pandora, Yelp, and Microsoft Docs to access users’ profiles. In order to disallow the feature users had to dig out the appropriate field and deselect it, and then block each site separately to make sure that no information is shared through profiles of friends who have not disabled this feature. This activity was complicated and only possible if a user knew what exactly he was looking for, and where. Introduction of the Instant Personalization and the manner in which it was done resulted in another complaint to FTC [23][24][25]. One of the arguments of the complainants was that Facebook’s "privacy settings are designed to confuse users and to frustrate attempts to limit the public disclosure of personal information that many Facebook users choose to share only with family and friends"[26]. Facebook had, for instance, effectively concealed the process of disabling the feature, and only with the information provided by numerous outside articles could the users oppose to such processing [26][27]. Looking at the introduced changes, three groups of unwanted disclosure can be distinguished. First, data can be disclosed by Facebook to third party service providers. Second, users’ data can be disclosed by making profiles public by default. Third, data can be disclosed inadvertently by users themselves. For instance, a user with a private profile might still share information with a broader audience than intended by failing to restrict access appropriately (e.g. due to the complexity of and/or technical difficulties surrounding the reconfiguration of privacy settings). Whereas in the first case, Facebook actively provides users’ data to third parties, in the other two cases, the intervention of Facebook is more subtle. Users are apparently the ones empowered to share (or not) their information by managing their privacy setting, i.e. the tools put at their disposal to that effect by Facebook. However, as shown above, by designing the tool in such a complex fashion, and by marking some options by default or recommending specific configurations, Facebook can covertly influence users’ behavior. As Grimmelman warns, “users are voluntarily, even enthusiastically, asking the site to share their personal information widely” [16]. It is however not clear to what extent they do so consciously, and when they are driven by Facebook privacy settings configuration. In the end Facebook needs users to make their data public to compete with other platforms such as Twitter3 and feed the needs 2
In 2007 Facebook introduced its new feature called ‘Beacon’. Facebook formed partnerships with third party retailers which allowed it to obtain information about users’ activities on these partner businesses and publish information about these activities in a way that would be publically visible. See more on: Facebook Halts Beacon, Gives $9.5M to Settle Lawsuit, PC World, 8 December 2009, http://www.pcworld.com/article/184029/face book_halts_beacon_gives_95m_to_settle_lawsuit.html; RIP Facebook Beacon, Mashable – The Social Media Guide, 19 September 2009, http://mashable. com/2009/09/19/facebook-beacon-rip/ 3 http://www.insidefacebook.com/2009/12/15/is-facebooksacrificing-its-privacy-legacy-for-an-open-future/
Privacy Settings in Social Networking Sites: Is It Fair?
237
of its advertisers and third parties applications. “Member-created data is the lifeblood of Facebook” [28]. "Facebook, and everybody else, uses all this data for marketing and advertising purposes," and "that's where it complicates things. Because our information, the public's information, is being sold left and right and reused for advertising purposes" [28]. Letting aside concerns raised by behavioural advertising that base web 2.0 successful entrepreneurs, question arise whether Facebook could be held liable for unclear privacy settings that push users to make their information publicly available, irrespective of the way how Facebook makes use of this information.
3 Looking for More Fairness in the Design of Privacy Settings: Is Privacy the Way through? 3.1 The Fairness Principle under the Data Protection Directive The DPD requires that all processing of personal data must be fair (Article 6.1.a) [29]. The concept of fairness as such is not however further defined and should be looked for in other provisions of the text. First of all, fairness means that data processing must be transparent to the data subject. Recital 38 of the DPD indicates that if the processing of data is to be fair, the data subject must be in a position to learn of the existence of a processing operation and, where data are collected from him, must be given accurate and full information, bearing in mind the circumstances of the collection. Strict compliance with the provisions contained in Articles 10 and 11 of the DPD about the information to be provided to data subjects seems crucial to ensure the transparency of the data processing. This requirement is however most often provided through long privacy policies written with the clear aim of protecting the company against potential lawsuits, rather than with the intention of providing clear and readable information to the data subject. Facebook does not escape this trend and the length of its privacy policy is often compared to the US constitution, which is shorter in number of words [30]. This phenomenon has already led data protection authorities, for instance in the case of collection of information from minors, to require that information should be provided in a clear and comprehensible way, taking into account the final recipient of the information. For example the Safer Social Networking Principles for the EU principles requires providers to “create clear, targeted guidance and educational materials designed to give children and young people the tools, knowledge and skills to navigate their services safely” [32]. Information designed for this group of users “should be presented in a prominent, accessible, easy-to-understand and practical format” [32]. Fairness also means that data subjects should not be unduly pressured into supplying their data to a data controller or accepting that the data are used by the controller for particular purposes [31]. This suggests a guarantee of certain protection to data subjects, whenever they are the weaker party in the relation, from abuse by data controllers of their monopoly position [31]. Fairness therefore means in this context that the consent provided to the data processing should be free in a way that users are not tricked into providing data. This was for instance one of the points of the investigation of Facebook by the Canadian Data Protection Authority [40]. During the investigation it became
238
A. Kuczerawy and F. Coudert
evident that while consenting to use a third party application users were granting a virtually unrestricted access to their personal information. This forced Facebook to introduce changes to this practice. Currently, application providers must inform users about the categories of data they need to run the application and to seek prior consent from users [41]. Finally, a third implication of the concept of fairness could be found in the obligation for data controllers to take into account the interests and reasonable expectations of data subjects when processing their personal data. In other words, it means that “controllers cannot ride roughshod over the latter” [31]. As Bygrave explains, the collection and processing of personal data must be performed in a way that does not intrude unreasonably upon the data subjects’ privacy nor interfere unreasonably with their autonomy and integrity [31]. In this sense, Grimmelmann observed that Facebook sudden changes in its privacy policy, and in the amount of information publically available by default “pulled the rug out from under users’ expectations about privacy”[16]. 3.2 The Approach of European Bodies The Art.29 Working Party4 and the European Commission have so far mainly tackled the problem of (lack of) fairness in Facebook’s privacy settings advocating for the implementation of privacy-friendly default settings and the preference of opt-in rather than opt-out procedures. The 2009 Pact on Safer Social Networking Principles for the EU first paid significant attention to the role of privacy settings. The document introduced specific principles recommending users’ empowerment through tools and technology, or enabling and encouraging users to employ a safe approach to personal information and privacy [32]. Despite being a non-binding Code of conduct, it formally engaged Facebook to improve its privacy settings. However, the Pact was limited in its scope only to services targeted at minor users. Following the Pact, Art. 29 Working Party issued an opinion on social networking in June 2009 [33]. In this document the Working Party stressed the importance of clear privacy settings to empower users to consent to the disclosure of his or her information beyond the members of their contact list. According to the Working Party “SNS should offer privacy-friendly default settings which allow users to freely and specifically consent to any access to their profile's content that is beyond their selfselected contacts in order to reduce the risk of unlawful processing by third parties”[33]. In February 2010, in reaction to the changes operated by Facebook in December 2009, the European Commission announced its plans to take an action and address the amendments introduced by Facebook in a broader scope [34]. Following this announcement, a letter was sent to Facebook by Art. 29 WP in May 2010. In the letter 4
Under Article 29 of the Data Protection Directive, a Working Party on the Protection of Individuals with regard to the Processing of Personal Data is established, made up of the Data Protection Commissioners from the Member States together with a representative of the European Commission. The Working Party is independent and acts as an advisory body. The Working Party seeks to harmonize the application of data protection rules throughout the EU, and publishes opinions and recommendations on various data protection issues.
Privacy Settings in Social Networking Sites: Is It Fair?
239
the Working Party underlined the importance of privacy friendly default settings and called for maximum of control by the user over who has access to his profile information and connections lists. It also stressed that any access by people beyond the members of contact lists should be an explicit choice by the user. Art. 29 WP therefore called for the generalization of opt-in procedures. The big concern was expressed about the effect that the changes may have on the use of Facebook by minors and a possibility of exposing them to severe threats by allow public access to their profiles. However, it was strongly highlighted that such control should be provided to users regardless of their age. In this context, the changes of Facebook default privacy settings were called unacceptable [35] [36]. 3.3 Limits of the Privacy Approach In the presented context, could we consider that Facebook complies with the fairness principle as outlined under the data protection framework? In the light of the last changes in the privacy settings that empower users to manage their online identities, the answer may not be that straightforward. Facebook tried to turn the privacy setting into a more friendly design, tackling most parts of the concerns raised by European bodies. At the same time, these improvements came with features inciting users to make more information public through the use of recommendation and opt-out procedures. It is not certain whether requiring Facebook to implement opt-in procedures would really change the situation. Such procedures are often presented to users in a way as to encourage them to make their information public. As mentioned above, the business model of Facebook (free, advertisement based) does not provide sufficient incentives to force the company to better protect users’ rights. Some commentators observed that their business model “forces them to leverage the size of the network, instead of monetizing on individual user value”, putting them “in a balancing act where the advertisement capabilities need to outweight the individual user rights in order to keep a decent revenue stream” [37]. Another issue may stem from the design of the platform. As suggested by Bygrave, fairness should not only refer to informing about a specific data processing activity but rather, it should apply to the design and the structure of the information system supporting such operations. This would suggest that not only individual processing operations have to be fair but the whole system, also from the technical perspective, should be designed with underlying fairness principle. After all, since Lessig’s introduction of the code concept, it’s been already argued by various authors that adjusting software can be a far more effective privacy-protection mechanism than for example adjusting the text of contractual privacy policies, simply because that conditions of the code cannot be ‘breached’ [38]. According to Edwards and Brown, privacy is determined by the default settings coded into the software by the designers of the SNS [38]. More concretely, this means that SNS software, which defines what users can do with their data – so in our case the privacy settings, is not always consistent with the users’ expectations. Edwards and Brown argue that “users are (…) often mislead as to what their ‘reasonable expectation of privacy’ are on an SNS by the way the code has been written and defaults set” [38]. Users mainly join Facebook to share information with their social network and communicate with their friends. It is clear that any ‘reasonable’ user, when he joins free services like Facebook, usually
240
A. Kuczerawy and F. Coudert
expects that he may receive some ads to make it worthwhile to the service provider [38]. However, he probably does not expect though that access to his account will be given to unrelated service providers, or to all the people he is not friends with and that his information will be possible to find through search engines [38]. It seems that the main problem lays in “reconciling reasonable user expectations of data security and privacy with the ‘disclosure by design’ paradigm concerning personal data on SNSs” [38]. It is however not clear whether the actions undertaken by European bodies, mainly consisting in a set of recommendations, will form sufficient incentives to Facebook to introduce greater fairness in the design of privacy settings. A more promising solution may be found in the concept of unfair commercial practices. Such approach has been used in the complaints to the Federal Trade Commission lodged by the US privacy groups against the new privacy settings of Facebook. In these complaints the activities of Facebook were qualified as unfair and deceptive trade practices [26]. It is hence worth investigating whether a similar approach could be adopted in Europe. The Unfair Commercial Practices Directive [39] sanctions misleading commercial practices. An action is misleading if it contains false information and is untruthful or in any way deceives the average consumer (even if the information is factually correct) and causes him to take a transactional decision that he would not have taken otherwise. Information could also be misleading if it refers to either the nature of the product or, for example, benefits, risks, the results to be expected from its use, or the motives for the commercial practice [39]. Providing information in an unclear, unintelligible, or ambiguous manner may also qualify as misleading behavior. Finally, omissions can be misleading if the information omitted, or hidden, is the one that average consumer needs. Facebook’s practices as regard privacy setting could possibly fall under this definition. Users would then be able to benefit from the protective Consumer law framework often supported by consumer organizations which have the resources and means to challenge unfair practices before Courts.
4 Conclusion Facebook and its privacy settings are frequent guests in the news but they rarely get positive reviews. From what was said above, it can be seen that privacy settings can play a great role in privacy protection and give users a control over their information shared through SNS. It is clear that a necessary tool to prevent situations like the one mentioned at the beginning is already out there. The whole problem is the way the tool is used (or not used) by users, which is mainly a result of ambiguity of the privacy settings and the confusion stemming from regular changes. Users’ unawareness together with complexity of the tool and lack of transparency are the three major factors shaping the current privacy challenging situation. Some dubious practices, like making profiles publically available on an opt-out basis or offering third parties access to the users’ accounts, seriously undermine Facebook’s attempts to convince the public that it designs its system with users’ privacy in mind. At the same time, a closer look at Facebook business models strongly suggest that confusing users with information about available options in privacy settings is intentional as its commercial profit depends on the amount of disclosed users’ data. The fairness principle of the DPD indicates what should be a direction for all SNS providers to take. It also shows that users’ expectations towards privacy cannot be
Privacy Settings in Social Networking Sites: Is It Fair?
241
ignored and have to be always taken into account. However, this does not seem to be enough to make Facebook change its ways. The alternative solution could lay in unfair trade practices regulation. With much more developed doctrine on what is unfair, and with actual means to enforce it, this could be a way to assure more privacy on SNS. Using the ‘consumer protection’ approach is tempting because many SNS users, just like many consumers, are so technology-ignorant or vulnerable that some public protective measures should be extended [38]. It is a particularly relevant argument if we consider the amount of children and young people without necessary experience among the SNS users. The 2005/29/EC Directive contains provision about enforcement of its rules and it urges Member States to introduce penalties for infringements of national rules on unfair trade practices. Such penalties, which must be effective, proportionate and dissuasive, if used against SNS provider involved in unfair practices in the described context could be a way to ensure more privacy to the users of these services. This could be an alternative path to effectively achieve more privacy through a different set of rules and therefore it should be investigated further on.
References 1. Canadian Woman Loses Benefits over Facebook photo, http://abcnews.go.com/International/wireStory?id=9147300 2. Privacy chiefs keep watch over Facebook, http://www.reuters.com/article/idUSTRE63L0UB20100422 3. Facebook’s Zukerberg Says the Age of Privacy is Over, ReadWriteWeb, January 9 (2010), http://www.readwriteweb.com/archives/facebooks_zuckerberg_ says_the_age_of_privacy_is_ov.php 4. Peterson, C.: Losing face: an environmental analysis of privacy on Facebook, draft paper (January 2010), http://works.bepress.com/cgi/viewcontent.cgi? article=1001&context=cpeterson 5. Leenes, R.E.: Context is everything: sociality and privacy in Online Social Network Sites. In: Bezzi, M., Duquenoy, P., Fischer-Hübner, S., Hansen, M., Zhang, G. (eds.) Privacy and Identity. IFIP AICT, vol. 320, pp. 48–65. Springer, Heidelberg (2010) 6. Goffman, E.: The presentation of self in everyday life. University of Edinburgh, Edinburgh (1956) 7. Facebook Statistics, http://www.facebook.com/help/?ref=drop#!/press/ info.php?statistics 8. http://www.lamebook.com (last checked on 01.07.2010) 9. Stross, R.: When Everyone’s a Friend, Is Anything Private?, N.Y. TIMES, March 7 (2009), http://www.nytimes.com/2009/03/08/business/08digi.html 10. Sophos ID Probe Shows 41% of Users Happy to Reveal All to Potential Identity Thieves, SOPHOS, August 14 (2007), http://www.sophos.com/pressoffice/news/ articles/2007/08/facebook.html 11. As referred by EPIC in the complaint submitted to the FTC on the matter of Facebook, INC, May 5, 2010: Pew Internet and American Life Project, Teens, Privacy, and Online Social Network, http://www.pewinternet.org/Reports/2007/Teens-Privacyand-Online-Social-Networks.aspx?r=1, Pew Internet and American Life Project, Social Networks Grow: Friending Mom and Dad, January 14 (2009), http://pewresearch.org/pubs/1079/social-networks-grow
242
A. Kuczerawy and F. Coudert
12. Nelson, S., Simek, J., Foltin, J.: The Legal implications of Social Networking. 22 Regent U. L. Rev. (2009) 13. Goldberg, S.: Young job-seekers hide their Facebook pages, CNN Tech, March 29 (2010), http://www.cnn.com/2010/TECH/03/29/facebook.job-seekers/ index.html 14. Hong, J., Iachello, G.: End-User Privacy in Human–Computer Interaction. Foundations And Trends In Human–Computer Interaction 1(1) (2007) 15. Nissenbaum, H.: Privacy as contextual integrity. 79 Wash. L. Rev., 119 (2004) 16. Grimmelmann, J.: Saving Facebook. 94 Iowa L. Rev., 1185 (2009) 17. Acquisti, A., Gross, R.: Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook. In: Danezis, G., Golle, P. (eds.) Privacy-Enhancing Tech.: 6Th Int’L Workshop, vol. 36 (2006), http://privacy.cs.cmu.edu/dataprivacy/ projects/facebook/facebook2.pdf 18. Livingstone, S.: Taking Risky opportunities in youthful content creation: teenagers’ use of social networking sites for intimacy, privacy ad self-expression. New Media and Society 10 (2008) 19. The Facebook Privacy Fiasco Begins, TechCrunch, December 9 (2009), http://www.techcrunch.com/2009/12/09/facebook-privacy/ 20. Facebook’s New Privacy Changes: The Good, The Bad, and The Ugly, Electronic Frontier Foundation, December 9 (2009), http://www.eff.org/deeplinks/2009/12/ facebooks-new-privacy-changes-good-bad-and-ugly 21. Privacy groups file FTC complaint against Facebook, Guardian, December 17 (2009), http://www.guardian.co.uk/technology/blog/2009/dec/17/ facebook-privacy-ftc-complaint 22. Ten Privacy Groups File FTC Complaint Against Facebook for Recent Privacy Changes, Inside Facebook, December 17 (2009), http://www.insidefacebook.com/ 2009/12/17/ten-privacy-groups-file-ftc-complaint-againstfacebook-for-recent-privacy-changes/ 23. Paul, I.: Facebook privacy complaint: a complete breakdown, PC World, May 6 (2010), http://www.pcworld.com/article/195756/facebook_privacy_ complaint_a_complete_breakdown.html 24. Hachman, M.: Facebook targeted by new FTC privacy complaint, PCmag, May 7 (2010), http://www.pcmag.com/article2/0,2817,2363518,00.asp 25. Kafka, P.: Feds to Facebook privacy critics: let’s talk, All Things Digital, January 19 (2010), http://mediamemo.allthingsd.com/20100119/feds-to-facebookprivacy-critics-lets-talk/ 26. For the full text of the complaint see, http://epic.org/privacy/facebook/EPIC_FTC_FB_Complaint.pdf 27. Paul, I.: Facebook’s new features and your privacy: what you need to know, PC World, April 23 (2010), http://www.pcworld.com/article/194866-3/facebooks_ new_features_and_your_privacy_what_you_need_to_know.html 28. McCarthy, C.: press release, Facebook’s privacy policies hit a language barrier, CNET News.com on July 12 (2010), http://www.zdnetasia.com/facebook-sprivacy-policies-hit-a-language-barrier-62201276.htm 29. Directive 95/46/EC of the European Parliament and of the Council of 24.10.1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (Data Protection Directive) (OJ L 281, 23.11.1995)
Privacy Settings in Social Networking Sites: Is It Fair?
243
30. Rosen, J.: The Web Mean the End of Forgetting, New York Times, July 19 (2010), http://www.nytimes.com/2010/07/25/magazine/25privacy-t2. html?_r=3&pagewanted=1&hp 31. Bygrave, L.A.: Data Protection Law, Approaching its rationale, logic and limits (2002) 32. Safer Social Networking Principles for the EU, http://ec.europa.eu/information_society/activities/social_ networking/docs/sn_principles.pdf 33. Art. 29 Data Protection Working Party, Opinion 5/2009 on online social networking, WP 163, adopted on 12 June (2009) 34. EU to slam new Facebook privacy settings: http://www.euractiv.com/en/infosociety/eu-slam-new-facebookprivacy-settings 35. EU Watchdog slams Facebook’s privacy settings, http://www.euractiv.com/en/infosociety/eu-watchdog-slamsfacebook-privacy-settings-news-494168 36. Letter of Art. 29 Wp to Facebook, http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/ others/2010_05_12_letter_art29wp_facebook_en.pdf 37. Vanelsas, A.: The Facebook business model is the root cause of a lack of transparency, February 18 (2009), http://vanelsas.wordpress.com/2009/02/18/thefacebook-business-model-is-the-root-cause-of-a-lack-oftransparency/ 38. Edwards, Brown, I.: Data Control and Social Networking: Irreconcilable Ideas? In: Matwyshyn, A. (ed.) Harboring Data: Information Security, Law and the Corporation, vol. 226 (2009) 39. Directive 2005/29/EC of the European Parliament and of the Council of May 11, 2005, concerning unfair business-to-consumer commercial practices in the internal market and amending Council Directive 84/450/EEC, Directives 97/7/EC, 98/27/EC and 2002/65/EC of the European Parliament and of the Council and Regulation (EC) No 2006/2004 of the European Parliament and of the Council (‘Unfair Commercial Practices Directive’) (2005) 40. Report of Findings into the Complaint Filed by the Canadian Internet Policy and Public Interest Clinic (CIPPIC) against Facebook Inc. Under the Personal Information Protection and Electronic Documents Act, http://www.priv.gc.ca/cf-dc/2009/2009_ 008_0716_e.cfm 41. Gross, G., McMillan, R.: Canada ends Facebook privacy probe, Computerworld, September 22 (2010), http://www.computerworld.com/s/article/9187381/Canada_ends_ Facebook_privacy_probe?source=CTWNLE_nlt_security_2010-09-23
Privacy Effects of Web Bugs Amplified by Web 2.0 Jaromir Dobias TU Dresden, Germany
[email protected] Abstract. Web bugs are Web-based1 digital tracking objects enabling third parties to monitor access to the content, in which they are embedded. Web bugs are commonly used by advertisers to monitor web users. The negative impact of web bugs on the privacy of users is known for over a decade. In recent years, Web 2.0 technologies have introduced social aspects into the online media, enhancing the ability of ordinary users to act as the content providers. However, this has also allowed endusers to place web bugs online. This has not only increased the number of potential initiators of monitoring of web surfing behaviour, but also potentially introduced new privacy threats. This paper presents a study on end-user induced web bugs. Our experimental results indicate that, in the light of Web 2.0 technologies, the well-known concept of web bugs leads to new privacy-related problems. Keywords: Secret Tracking, Social Network, Surveillance, Web 2.0, Web Bugs, Web Privacy.
1
Introduction
Web bugs are Web-based digital tracking objects enabling third parties to monitor access to the content, in which they are embedded (e.g., webpages, e-mails or other electronic documents). They have been utilised for several years by Internet marketing or advertising companies for the purpose of tracking and profiling users visiting bugged webpages and analysing their behaviour. Web bugs are based on a simple mechanism: an HTTP request sent to the tracking server when the tracked content is accessed (see Section 3). This effect can be induced automatically, e.g., by opening webpage or e-mail in a Web browser, e-mail in an e-mail client2 or document in a text processor, on condition that an external object (e.g., image, script, style sheet, web banner, audio/video stream, etc.) is embedded inside the opened content.
1 2
Part of the research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 216483 for the project PrimeLife. Web-based means that they are based on Web technologies. This behaviour is implicitly disabled by the most of the modern e-mail clients.
S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 244–257, 2011. c IFIP International Federation for Information Processing 2011
Privacy Effects of Web Bugs Amplified by Web 2.0
245
HTML tag pointing to the image
Observer’s image-hosting server
Observer
Content provider
Fig. 1. This figure illustrates observer’s deployment of the web bug in the form of an external advertising image. The observer embeds an IMG element in the webpage pointing out to the external location of the image under the observer’s control.
The existence of a significant amount of scientific papers concerning web bugs [1,2,3,4,5,6] indicates that web bugs are already an old and well-known problem. However, these studies primarily deal with the ’old’ World Wide Web, now known as Web 1.0, and hence do not reflect fundamental changes in the nature of the Web environment introduced by Web 2.0 technologies. In general, web bugs can be utilised for the establishment of a tracking infrastructure across multiple servers, without the need to incorporate tracking functionality to every single server hosting the tracked content. We have discovered that web bugs in Web 2.0 environments (which we will call Web 2.0 bugs), such as online Social Network Sites (SNS), have significantly different properties than traditional web bugs on webpages in the classical Web 1.0 environment. Web 1.0 bugs are usually tracking devices introduced by, or on request of, an (observer ), such as a net marketing company or a web analytics company on a web site of a (content provider ). A practical example might be a content provider intentionally hosting advertising banners loaded from external location under observer’s control (see Fig. 1). This allows the observer to monitor access to the content provider’s content which embeds the advertising banner. In this model, the content provider provides (dynamic) content, and the observer is only capable of keeping track that the content is visited by end-users. There is relation between the entity initiating the tracking device and the entity hosting content. This is different with Web 2.0 bugs. If the web bug tracking mechanism is employed into a Web 2.0 environment, the more dynamic creation of content comes into play. Web 2.0 applications are characterized by the fact that endusers are not only consumers, but can also be producers; they can contribute content. Secondly, in Web 2.0 applications end-users can not only contribute
246
J. Dobias
content to their own ’domain’, but they can also contribute to content provided by others. Hence, in Web 2.0 environment the technical concept of web bugs can be employed by ordinary users for tracking access to content provided by themselves, as well as provided by others, without these others being aware of the introduction of the tracking mechanism. This shift of monitoring web behaviour by (commercial) entities to end-users ’spying’ on each other motivated this research. This paper explores the world of Web 2.0 bugs to study the possibilities of user induced Web 2.0 bugs and explore the privacy effects of those Web 2.0 bugs. The study is set up as a real-world experiments in which we developed a Web 2.0 bug and tested its operation in a real-world Web 2.0 environment. Outline. The remainder of this paper is organized as follows. Section 2 describes existing work related to web bugs. Section 3 describes Web 2.0 bugs and describes the basic settings of the experiments. The results of our experiments are presented in Section 4. Section 5 summarizes the study and presents our conclusions.
2
Related Work
The term “web bug” was introduced by Richard M. Smith. In his 1999 report, The Web Bug FAQ [6], he defines web bug as “. . . a graphics on a Web page or in an Email message that is designed to monitor who is reading the Web page or Email message”. Even though this definition is not incorrect, it omits a wide scale of additional Web-based objects, which can also be used for tracking purposes utilising the same tracking mechanism as web bugs. Additionally, Smith’s definition neglects other types of multimedia integrating external Web-based objects, which makes them prone to tracking via web bugs as well (e.g., e-mail clients, word processors, RSS readers, interpreters of Flash content and others). It is therefore necessary to extend the definition of web bugs beyond utilising images for tracking purposes and to include that webpages and e-mails are not the only environments for tracking via web bugs today. Shortly after releasing his report on web bugs, Smith demonstrated that it is also possible to deploy persistent cookies by web bugs in order to link user requests to their e-mail addresses [5]. After spreading the word about privacy invasive capabilities of web bugs, security researchers, legal scholars, governmental agencies and privacy-aware individuals started to address this problem [2,3,4]. Trying to raise public awareness, a research team from the University of Denver developed a web bug detection tool called Bugnosis [9]. The tool was primarily developed for journalists and policy makers. The authors of Bugnosis hoped, that these two groups might better understand potential threat and privacy impacts of web bugs than the ordinary users and inform the public of the risks and initiate corresponding counteractions. The idea of warning users against web bugs when browsing websites by Bugnosis was successful at the time, proven by the fact that the tool was installed by more than 100,000 users. The essential problem, however, was that Bugnosis
Privacy Effects of Web Bugs Amplified by Web 2.0
247
was capable of generating warning messages only, without any defensive steps actively enhancing user privacy. Another problem was, that the Bugnosis only dealt with external embedded images with specific properties3 . Last but not least, Bugnosis was developed as an extension plugin for Internet Explorer 5 without support for other browsers or environments. Therefore, Bugnosis is an unsuitable solution for eliminating the privacy threats caused by modern day web bugs. Even though there are currently some solutions available to disable web bugs [8,10], none of them deals with the threat of individual users utilising web bugs. As we will see, Web 2.0 bugs introduce novel threats that warrant novel defenses.
3
Web 2.0 Bugs: Web Bugs Employed by Individual End-Users
Web 2.0 bugs are web bugs introduced by users of Web 2.0 applications into the content disclosed through these applications. An example is a user embedding a web bug into a Blogpost on their SNS profile page, or embedding such a web bug into a comment placed on someone else’s profile page. This section describes the basic settings of our experiment. It explains technically the principle of tracking via web bugs by individual users. Furthermore, it describes the key aspects of web bug tracking performed by individual users and presents the tracking code utilised in our experiment. 3.1
A Closer Look at Web 2.0 Bugs
An individual user, acting as an (observer ) can utilise web bugs for tracking all types of media which can carry external objects loaded from the Web. The most common media tracked via web bugs are webpages, while images are the most common external objects used for tracking. An observer, who wants to track access to a particular webpage, just embeds an external object to the content on the page to be monitored, just like in the case of Web 1.0 bugs. However, in this case the observer is not a company, but an individual end-user. Next, when an arbitrary visitor opens the webpage containing the embedded object, the visitor’s browser automatically generates an HTTP request to retrieve the data of that object from the object-hosting server (see Fig. 2).4 Each HTTP request incorporates attributes which are to some extent specific to the visitor and which can thus be used for differentiating a particular visitor from others. The HTTP request incorporates: 3
4
Bugnosis detected web bugs by evaluating additional properties of images, such as length of the URL, domain of the external image, unique appearance on the webpage, etc [1]. It is assumed that an ordinary modern Web browser is used, which automatically loads images and other external objects embedded on a user requested webpage.
248
J. Dobias
4. 3. GET Object 5.
V
V
V
Visitor
Observer’s image-hosting server 2.
1. GET Web page
V Observer
Content provider
Fig. 2. This figure illustrates tracking via web bugs (image is used in this case). The fingerprint icon symbolises visitor-specific attributes which are contained in each visitor’s HTTP request. Paths 3 and 5 depict the side effect caused by the web bug enabling the observer to track the visitor.
– – – – – –
Time when the visitor accessed the tracked webpage; IP address of the device which loaded the webpage5 ; The URL of the tracked webpage6 ; The URL of the external object (including URL parameters); The type of browser accessing the webpage7 ; Cookie (previously stored in the user’s browser by the web bug).
The visitor-specific identifiers contained in the visitor’s HTTP requests can be collected on the object-hosting server (step 3 in Figure 2) as soon as he visits the tracked webpage (step 1 in Figure 2). The observer can collect tracked identifiers from each visitor of the tracked webpage on the object-hosting server and secretly monitor access to the tracked webpage (step 5 in Figure 2). 3.2
Key Aspects of Tracking by Individual Users
Nowadays, there are many free public Web-hosting services enabling users to deploy server-side scripts (for instance written in PHP, ASP, or Ruby) on the 5
6
7
Even if the IP address is not specifically associated to the the user’s device (user can be behind NAT, VPN, proxy, anonymisation network, etc.), it can reveal some context-dependent information (see Section 4) disclosing information about the visitor. This information, extracted from the referer field of HTTP header can be easily spoofed or removed by the user. In our experiment, we detected however only minor cases in which users intentionally modified this header field (see Section 4). This information is extracted from the User Agent field of the HTTP header. Though it can be easily modified by the user, it provided interesting information as well (see Section 4).
Privacy Effects of Web Bugs Amplified by Web 2.0
249
web. Ordinary web users acting as observers can take advantage of these services and use them for hosting the web bugs and collect information about the visitors of the tracked sites. This makes deploying web bugs much easier because it is unnecessary for the observer to run their own dedicated server with public IP address. The observer might additionally use an anonymisation service to upload the script, deploy the web bugs and afterwards collect the tracked data from visitors of the tracked content. This allows the observer to cover his identity, which is usually not the case when the tracking is done by dedicated Internet marketing or advertising company. Internet marketing or advertising companies usually host the tracked content directly on devices under their own control. In contrast, an individual user acting as an observer exploits the fact, that he himself can embed external objects to existing online content and that these objects are loaded from a remote location under his control. Therefore, web bugs provide a covert side-channel enabling individual users to track the content indirectly without the involvement of the content provider. The content providers in this case provide an environment prone to embedding tracking devices. It allows the observer to covertly track other users that interact within the environment created by the content provider. The motives behind tracking performed by individual users may also differ from those of typical Internet marketing or advertising companies. While such companies are primarily motivated by economical interests, tracking performed by individual users may be driven by motivations such as jealousness (e.g., a jealous husband tracking his wife), sexual motives (e.g., deviants tracking their potential victims), hate (e.g., members of extreme organisations tracking their enemies), etc. 3.3
Background of the Experiment
In our experiment, we used an image object for tracking access to selected webpages. The PHP script (see Listing 1.1) was used for logging the data from the HTTP requests. When an arbitrary user accessed the tracked webpage, his browser generated an HTTP request pointing to the location of the embedded image. As soon as the PHP script was activated by the incoming request, it logged the data corresponding to the request and sent back the data of the image. The PHP script was titled index.php and stored in the directory called logo.gif. The advantage of this solution was that the PHP script was activated with a URL in the form of “www.example.org/logo.gif”, even though logo.gif actually activated a PHP script. This way of implementing a web bug makes unnecessary to reconfigure any settings of the object-hosting server, thus allowing the web bug to be hosted on any public web server facilitating PHP scripts. This solution did not affect the user experience in any way when communicating with the tracked webpage and hence raised no suspicion about the tracking.
250
J. Dobias
In our experiment, web bugs were deployed to selected locations in an experimental social networking platform, the well-known social network MySpace and a public profile in a university information system8 . Afterwards the tracked data was collected and analysed for the purpose of this experiment. Listing 1.1. PHP script intercepting data from web bugs
4
Results
4.1
Web Bugs in the Experimental Social Networking Platform
Several selected forums were tracked by web bugs in the experimental social networking platform9 . We used images for tracking10, which were embedded to selected forums via comments. User access to the tracked forums were logged by the PHP script displayed in Listing 1.1. 8
9 10
We are aware that these experiments are not compliant with the EU data protection regulation which prohibits the processing of personal data without a legitimate purpose. We have not informed the users of the various systems that we were collecting personal data, nor have we provided information about the purposes. On the other hand, this is similar to how real web bug exploits would operate. The Slovak experimental social networking platform kyberia.eu served as a basis for the initial phase of our experiments. The image as a binary object is not necessary for activation of the web bug tracking mechanism. Only the HTML tag IMG pointing to the location of tracking script matters.
Privacy Effects of Web Bugs Amplified by Web 2.0
251
Users’ accesses were logged as long as our comments containing web bugs were displayed to visitors of the tracked forums. The number of displayed comments were adjusted individually by each user of the experimental social networking platform, and therefore, the older our comments with embedded web bug became, the more often user access was not detected via web bug. The situation was different in forums under our administration. We embedded web bugs directly into the content of the main topic of some forums under our administration. That guaranteed that the tracking was not dependent on the comments of users in that forums which enabled us to detect each user’s access to these forums. Linkage of nicknames. Thanks to built-in function of the experimental SNS enabling users to see nickname of a user lastly visiting a particular forum and time of his visit, tracked records from users were linked to their nicknames based on the time correlation (see Listing 1.2 and 1.3). The arrival time of the user’s request was extracted locally on the PHP hosting server by using the PHP date command. The extracted time did not always precisely correspond to the last-visited time displayed in the social networking platform (due to communication delay and/or desynchronised clocks). However, this information was sufficient for manual linkage of tracked records to users’ nicknames. nickname245 nickname475 nickname256 nickname023 ...
[02 −07 −2010 [01 −07 −2010 [23 −06 −2010 [23 −06 −2010
− − − −
11:37:48] 09:17:20] 13:46:02] 13:37:00]
Listing 1.2. Last visited information available for each forum of the experimental social networking platform [ TIME : ] 1 3 : 3 7 : 0 1 − 2 3 . 0 6 . 1 0 [ IP : ] 1 . 2 . 3 . 4 [REFERRER : ] h t t p : / / example . o r g / forum007 [BUG LOCATION : ] / l o g o . g i f /? i d=bug−in−forum007 [AGENT : ] M o z i l l a / 5 . 0 ( X11 ; U; Linux x 8 6 6 4 ; en−US ; rv : 1 . 9 . 2 . 6 ) Gecko /20100628 Ubuntu / 1 0 . 0 4 ( l u c i d ) F i r e f o x / 3 . 6 . 6 ] ... Listing 1.3. Tracked record captured by the PHP script linkable to nickname023
Visitor-identifying information. Once the tracked record was linked to a user’s nickname, further information about the user was extracted. This included WHOIS information derived from the user’s IP address, information about the user’s browser extracted from the User-Agent header (providing information identifying the underlying operating system) and information on the URL of the forum tracked by the web bug.
252
J. Dobias
After having assembled this information, we were able to estimate the geographical location of the user and/or (in few cases) the institution from which the user was connecting. That provided us with an interesting picture of the real persons behind the nicknames. Statistical information. As far as each logged record included time information, it gave us a good statistical view on how many users accessed a particular forum in a particular time frame. Based on this data, it was derived which users spent more time on the tracked forum compared to the others, exposing the users’ interest in a particular topic discussed in the tracked forum. Further leaked information. Web bugs deployed to an experimental social networking platform allowed us to detect access to tracked forums via Google Cache and Google Translator. In the first case we detected which keywords a user used before visiting the forum (see Listing 1.411 ). h t t p : / / webcache . g o o g l e u s e r c o n t e n t . com/ s e a r c h ? q= c a c h e : FE7 fiAerdEJ : example . o r g / forum007+XXX+YYY&cd=1&h l=ZZ Listing 1.4. Detected keywords of search request and access via Google Cache
In the second case, we detected the language, which the forum was translated to by the user (see Listing 1.512 ). According to further information provided by the WHOIS service, the request came from a device residing in the country for which the translation was requested. Based on this information, we assume that someone from the particular foreign country was interested in the topic presented in the tracked forum. h t t p : / / t r a n s l a t e . g o o g l e u s e r c o n t e n t . com/ t r a n s l a t e c ? h l=XX&s l=YY&u=h t t p : / / example . o r g / forum007& prev=/s e a r c h%3Fq%3DZZZZZ%26 h l%3DXX%26 s a%3DG &r u r l=t r a n s l a t e . g o o g l e .XX&usg=ALkJrhhjb8ybR1X4 Listing 1.5. Detected request on translation of the tracked forum (from Referer)
From this experience we learned that web bugs can also reveal very specific actions of the user in case of accessing the tracked content by exploiting the data embedded in HTTP headers (searched keywords in this case). Note 1. The experimental SNS is primarily targeted for relatively homogeneous group of users, both geographically and linguistically, and therefore request for translation coming from the foreign IP address was considered unusual. 11 12
Data was anonymised. Strings “XXX” and “YYY” substitute keywords and “ZZ” substitutes country code. Data was anonymised. “XX” substitutes code of the targeted language, “YY” substitutes code of source language and “ZZZZZ” substitutes keyword which navigated user to tracked forum.
Privacy Effects of Web Bugs Amplified by Web 2.0
253
User profiles. We aimed our research on the user profiles as well as a potential resource of data trackable via web bugs. We embedded web bugs to profiles of selected users in the experimental SNS by adding comments with external pictures. Next we collected the same type of data as we did in forums before. In this case however, the data collected by tracking user visits were not related to a specific topic, but rather to specific identity (the profile owner). Based on the information on users’ last visit also incorporated in user profiles, we discovered, that the most frequent visitor of a particular user profile in experimental SNS is the owner of the profile himself. The second interesting finding (specific to the platform) was, that most users visit their own profiles at least once per session. This is probably caused by the design of the platform, which provides access to profile-management functions in the profile interface only. Most information gained from the profiles of the tracked users originated from owners of the profiles themselves. This involved information on their ISPs, types of browsers, operating systems used and information on how often and when do they access their profiles. We also detected other users visiting some tracked profiles on a regular basis. We concluded that those users are interested in the particular tracked profiles. 4.2
Web Bugs in MySpace
The experimental SNS described in section 4.1 was an easy target for web bugs. It was mainly due to its open and transparent design enabling registered users to gain a lot of information on activities and interests of other users by means of functions built in the SNS. Another reason for that can be geographic, national and linguistic homogeneity within the community of the experimental SNS and relatively small number of its active users (up to 10,000 members). Therefore we changed focus to a more established, and potentially more ’secure’ environment, the well-known social network – MySpace. We deployed the web bug to a single profile hosted on MySpace, whose owner agreed to take part in our research. We intentionally selected this participant because he also had an account in the experimental social networking platform. Based on the correlation among the tracked data gained from both of his profiles, we detected access to participant’s MySpace profile from users accessing his profile previously in the experimental social networking platform. Furthermore, we were able to link visitors of the tracked MySpace profile to their nicknames from the experimental social networking profile. This deanonymisation of the MySpace visitors was possible because we already had sufficient information gained from the experimental social networking platform. Another reason facilitating linking identities was that the set of all potential different identities detected was small enough to find highly probable correlations. The detection of unique identities and linkage would be much more easier, if we would utilise cookie mechanism assigning unique identifier to each new visitor detected. However in our experiments we decided to avoid using cookies because we were interested in the capabilities of simple tracking devices.
254
4.3
J. Dobias
Web Bugs in the Public Profile of the University Information System
In our last experiment, we embedded the web bug to the author’s profile in the local university information system, which gave us technical capability to monitor access to it. We were unable to link detected access to our profile with any of the data previously tracked from MySpace or experimental social networking platform. In other words, none of the users visiting the experimental social network site or the tracked MySpace profile visited the author’s profile in the University information system (at least, we could not establish plausible links on the basis of the collected data in the experiments). It is important to mention that users of the University information system are identified by their real names while a user-selected nicknames are used for identifying users in the experimental SNS. If a user visiting the author’s experimental SNS profile and author’s profile in the University information system were detected, it might indicate that it is a user aware of author’s real identity and his partial identity related to experimental SNS. On the other hand, we were able to estimate the probable location of the visitors (at least the country), nationality and operating systems and platforms used by the visitors. We were also able to detect visitors interested in the content of the author’s profile by detecting repeated requests from the same source. In two particular cases, we were able to link the tracked data to requests from identifiable individuals. That was possible because in those cases we had additional information (gained from other channel, i.e. not via web bugs) that an individual, whose identity was known to us, was accessing our profile within a certain time-frame. As far as no further requests from other users visiting our profile within that specific time-frame were detected, we concluded that the intercepted requests originated from the particular identifiable individual. Hence, we were able to link information on the intercepted IP addresses, types of the operating systems and browsers to a natural living person. 4.4
Overall Findings
An interesting overall findings of the experiments conducted are, that the information gained by web bugs, relates to (1) the content published in the tracked media, (2) the time span of tracking, (3) the amount of resources tracked by web bugs and (4) the ability of the observer to link tracked requests each other as well as to link them with other user-related information. Another interesting finding is that in order to find a link among actions performed by a particular user it is not necessary for the intercepted traces (i.e., tracked data in the log file) of the user’s actions to be equipped with a globally unique identifier (e.g., session ID contained in the cookie). It is sufficient to have a set of (not necessarily unique) attributes assuring uniqueness within the specific domain (e.g., combination of the IP address together with HTTP referer unique within a specific time-frame).
Privacy Effects of Web Bugs Amplified by Web 2.0
5
255
Conclusion
Emerging Web 2.0 applications, such as SNSs, contain more and more userrelated data and provide enhanced tools supporting interaction among users. That on the other hand also provides a suitable environment for tracking users. In this paper we aimed to experimentally explore potential privacy effects of web bugs amplified by Web 2.0 technologies. Our small-scale experiments show that the well-known tracking mechanism of web bugs, can be exploited in previously unexplored way – by ordinary users spying other users in Web 2.0 environments. Unlike Internet marketing or advertising companies, who have been known as the most common employers of web bugs, the motivation of users to track other users can go beyond marketing and advertising interests, and thus lead to new potential privacy threats. We decided to call web bug amplified by Web 2.0 technologies ’Web 2.0 bug’ as the term for generalised version of the already known problem. For Web 2.0 bug it is common that it can be exploited by individual users in order to track other users by means of Web 2.0 technologies. Moreover, any Web 2.0 based external object embedded in some content which generates HTTP request automatically when the corresponding content is opened by a visitor can be seen as a potential Web 2.0 bug. Additionally, aside from the webpages or e-mails also other contentproviding media must be concerned as an environment suitable for Web 2.0 bug tracking. The text documents, presentation documents, spreadsheets and other types of media allowing external objects to be embedded can also be used for Web 2.0 bug tracking. The crucial problem is that Web 2.0 bugs exploit a core principle of hypertext – the interconnection of documents – which provides the foundation for Web technologies. Diminishing or eliminating these privacy-impairing effects of the Web 2.0 bugs is therefore difficult. The most vulnerable applications with respect to Web 2.0 bugs are nowadays SNSs because of their massive concentration of user-related data and user interaction. However, Web 2.0 bugs themselves can not influence functionality of SNSs and therefore are usually underestimated and overlooked by providers of SNSs, which was also experienced in our experiments. On the contrary, as we showed in this paper, Web 2.0 bugs can cause a huge damage to privacy of the users of such sites, especially in case that any user has in fact the ability to perform tracking of other users. For elimination or mitigation of this problem it is therefore necessary to take into account both points of view – (1) the Web 2.0-based application as well as (2) the user of a the Web 2.0-based application as a potential victim of Web 2.0 bug tracking. Web 2.0-based applications (especially SNSs as currently the most sensitive environment prone to Web 2.0 bug tracking) should be designed in such a way, that the external content embedded by users does not automatically trigger the Web 2.0 bug tracking mechanism when accessed by visitors. The visitor
256
J. Dobias
accessing potentially risky content should be informed about the risk of being tracked by Web 2.0 bugs. Moreover, such risky actions should require visitor’s consent. Some existing SNSs (e.g., Facebook ) partially solve this problem by creating thumbnail of an external object (e.g., image or video stream), storing this thumbnail on trusted servers and embedding the thumbnail in place of the external object itself. The original external object behind the thumbnail is loaded from its external location on visitor’s demand only (e.g., by clicking the play button) which reduces the set of potentially tracked visitors to those wittingly interacting with the remote content. The user should have a privacy-enhancing tool available which would provide him a user-level protection against Web 2.0 bugs. This is especially important for those cases, when a particular Web 2.0 application itself does not provide protect its users against Web 2.0 bugs. The privacy-enhancing tool should keep track on user’s actions and warn the user in case that user’s action would cause Web 2.0 bug tracking effect. In such a case the user should be asked whether the external object should be loaded and if his decision should be permanent or temporary concerning the particular object. The user should be able to manage his privacy and specify which domains are trusted for the user. Moreover, the user should also be able to switch among several modes of operation: the (deny all ) and (accept all ) modes should enable the user to implicitly deny all traffic potentially having Web 2.0 bug tracking effect (in the deny all case) or accept all traffic regardless of the potential Web 2.0 bug tracking effects (in the accept all case). The (accept trusted/ask untrusted ) mode should load all content considered by the user as trusted and ask the user what to do with untrusted content13 . Last but not least, the (accept trusted/deny untrusted ) mode should automatically accept all content considered by the user as trusted and deny untrusted content without asking the user. Currently, the Privacy Dashboard [7] developed under the project PrimeLife seems to be a promising solution dealing with the problem of Web 2.0 bugs. The Privacy Dashboard is an extension plugin for Firefox web browser which helps the user to track, what kind of information websites collect about the user. It enables the user to see e.g. which external websites are used by the currently visited website, if there are some invisible images on the website or if the website enables third parties to track the user across the web. Additionally, it enables the user to block content from external resources and thus proactively disable the Web 2.0 tracking effect. However, even the current version of Privacy Dashboard14 does not deal with all aspects of Web 2.0 bugs discussed in this paper. Therefore, building more advanced privacy-enhancing mechanisms solving the problem of Web 2.0 bugs comprehensively remains a challenging task for the future.
13 14
Untrusted content is content which is not considered to be trusted by the user. The current version of Privacy Dashboards is 0.8.3 at the time of writing this paper.
Privacy Effects of Web Bugs Amplified by Web 2.0
257
References 1. Alsaid, A., Martin, D.: Detecting web bugs with bugnosis: Privacy advocacy through education. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 13–26. Springer, Heidelberg (2003) 2. Martin, D., Wu, H., Alsaid, A.: Hidden Surveillance by Web Sites: Web Bugs in Contemporary Use. Commun. ACM 46(12), 258–264 (2003) 3. Nichols, S.: Big Brother is Watching: An Update on Web Bugs. Tech. rep., SANS Institute (2001) 4. Office of Inspector General: Use of internet cookies and web bugs on commerce web sites raises privacy and security concerns. Tech. rep., U.S. Department of Commerce (2001) 5. Smith, R.M.: Synchronizing Cookies with Email addresses (1999), http://www.ftc.gov/bcp/workshops/profiling/comments/rsmith.htm (accessed January 2011) 6. Smith, R.M.: The Web Bug FAQ (1999), http://w2.eff.org/Privacy/Marketing/web_bug.html (accessed January 2011) 7. Primelife privacy dashboard, http://www.primelife.eu/results/opensource/76-dashboard (accessed January 2011) 8. Ghostery, http://www.ghostery.com/ (accessed January 2011) 9. Bugnosis, http://www.bugnosis.org/ (accessed June 2010) 10. Web bug detector, https://addons.mozilla.org/en-US/firefox/addon/9202/ (last update April 2009) (accessed January 2011)
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements Marco Casassa Mont1, Siani Pearson1, Sadie Creese2, Michael Goldsmith2, and Nick Papanikolaou1 1 Cloud and Security Lab, HP Labs, Bristol, UK International Digital Laboratory, University of Warwick, Coventry, UK {marco.casassa-mont,siani.pearson,nick.papanikolaou}@hp.com, {S.Creese,M.H.Goldsmith}@warwick.ac.uk 2
Abstract. This paper proposes a conceptual model for privacy policies that takes into account privacy requirements arising from different stakeholders, with legal, business and technical backgrounds. Current approaches to privacy management are either high-level, enforcing privacy of personal data using legal compliance, risk and impact assessments, or low-level, focusing on the technical implementation of access controls to personal data held by an enterprise. High-level approaches tend to address privacy as an afterthought in ordinary business practice, and involve ad hoc enforcement practices; low-level approaches often leave out important legal and business considerations focusing solely on technical management of privacy policies. Hence, neither is a panacea and the low level approaches are often not adopted in real environments. Our conceptual model provides a means to express privacy policy requirements as well as users’ privacy preferences. It enables structured reasoning regarding containment and implementation between various policies at the high level, and enables easy traceability into the low-level policy implementations. Thus it offers a means to reason about correctness that links low-level privacy management mechanisms to stakeholder requirements, thereby encouraging exploitation of the low-level methods. We also present the notion of a consent and revocation policy. A consent and revocation policy is different from a privacy policy in that it defines not enterprise practices with regards to personal data, but more specifically, for each item of personal data held by an enterprise, what consent preferences a user may express and to what degree, and in what ways he or she can revoke their personal data. This builds on earlier work on defining the different forms of revocation for personal data, and on formal models of consent and revocation processes. The work and approach discussed in this paper is currently carried out in the context of the UK collaborative project EnCoRe (Ensuring Consent and Revocation).
1 Introduction Enterprises manage and administer huge sets of personal data which are collected as part of normal business practice. This process is complex and involves meeting a wide range of requirements, including the need to satisfy data protection and privacy laws, as well as any service requirements made by the enterprise or the consumer. Often such requirements are captured in the form of a policy or policies. However, there is not yet S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 258–270, 2011. © IFIP International Federation for Information Processing 2011
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
259
a unified view of the many and varied approaches to policy description and enforcement used in an enterprise. This makes it hard to guarantee that the combination of the various implementations does indeed meet all the requirements being made of the enterprise and is aligned with the law. Furthermore, the process of assessing this alignment is subject to human error. In general there are two extreme approaches to management and enforcement of privacy policies. There is firstly a pragmatic approach, driven mainly by risk assessment and risk management and tailored to current business practices. It involves identifying suitable high level policies and points to act on, but then typically requires the deployment of pragmatic control points, which are very dependent on the specific scenario/environment. The control points enforcing policies are often hardcoded within applications and services in an ad hoc way, and so cannot easily be reused in different scenarios and organisational contexts - but this remains the norm in business practice today. Secondly, frequently research in this space tends to focus instead on a purely technical approach and narrowly propose yet another language or formal model for security, access control or obligation policies without taking into account legal, business and operational requirements. Hence, related policy languages might be too generic or detached from real requirements; often these languages and models are of interest to the research community but seldom widely adopted in real environments. We believe that there is a major gap between the two approaches and that there is a unique opportunity to combine aspects of each and provide mechanisms to bridge the two. Our approach is to develop a conceptual model rich enough to describe high-level policies typically expressed in natural language, and structured to support their refinement and mapping into low-level technical policies for practical enforcement in an information system. In the EnCoRe (Ensuring Consent and Revocation) project1, we are exploring this approach while specifically focusing on an important aspect of privacy: the management of data subjects’ (users’) preferences with regard to the handling of their personal data. In EnCoRe such preferences actually equate to expressions of consent and revocation relating to rights to handle and process personal data. EnCoRe is exploring and demonstrating ways of handling consent and revocation policies in three key areas affecting people: enterprise data management, biobanks and assisted living. While privacy policies mostly describe an enterprise’s personal data handling practices, there is no commonly accepted way of stating clearly to customers whether (and which) mechanisms for granting and revoking consent for personal data are provided. In other words, what is desirable is a description of exactly which controls an enterprise provides to its customers. In this paper we call such a description a consent and revocation policy. There is good reason for consent and revocation policies to be expressed in a machine-readable form, in a manner analogous to website privacy policies [8, 26], for then enforcement can be fully automated; our aim here is not to define the syntax of such policies, rather their structure and general characteristics. Consent and revocation appear in many different forms, and the relationship between the two concepts is rather subtle. For instance, they are not symmetric: it is possible for an individual to revoke personal data for which consent has not been given (in Section 6 we will refer to this as consentless revocation). Consent itself is not a simple 1
See http://www.encore-project.info
260
M. Casassa Mont et al.
yes/no (“store this data” or “do not store this data”), but has many subtly different gradations, depending on the type of data it refers to.
2 Policy Layers and Dependencies in Organisations Organisations need to cope with a variety of policies and constraints that emerge from many different sources, including legislation (national and international), societal expectations, business requirements and (where appropriate) individual preferences expressed by users and customers. We concern ourselves here specifically with those policies relating to the handling of personal data and privacy. Whilst privacy requirements are in general context dependent, we believe that there are a core set of privacy concepts which are common and underpin the various controls designed to deliver privacy against this varying set of requirements. They are, in effect, a tool box which can be utilised depending upon the unique requirements of the situation. But, due to the heterogeneity of the policies and of the languages used to express privacy requirements, it may not be always obvious which core privacy concepts are actually being utilised. However, if these are clearly identified, we will be able to better formalise and classify privacy-related policies, laws and technical solutions enabling a simplification and easier re-use of the technologies and methodologies designed to implement such policies. Further, the extraction of such core privacy concepts might make it easier to compare privacy legislation with the technical implementation of privacy constraints in a product. We consider policies to fit within a layer model which in itself represents a hierarchy of policies. In this model, high-level policies express general requirements and rights that individuals have with regards to their privacy, as embodied typically in the law, business and regulatory requirements as they contain general constraints on business practice with regards to personal data. At the highest level of the classification, there is a set of requirements which are set out by international agreements and directives, such as the European Data Protection Directive or the EU Safe Harbour agreement. Further, many countries have national data protection legislation, such as the Data Protection Act 1998 in the UK, or HIPAA, GLBA, SB 1386, COPPA and various State Breach laws in US. With regards to regulation in particular, there are export and transborder flow restrictions on personal data that need to be enforced. Privacy laws and regulations constitute the topmost layers of policy hierarchy regarding personal data with which an enterprise must comply. Such policies are often expressed in natural language as is typically the case with related data subjects’ preferences. At this high level of abstraction, security requirements may include adherence to the Sarbanes-Oxley Act (SOX) for financial reporting, or the PCI Data Security Standard (DSS). These may be refined to a set of policies at a lower level. Similarly, business requirements include contractual obligations, information lifecycle policies and the enterprise’s own internal guidelines. All of the above influence how personal data is collected, stored and administered. Low-level policies are those which describe how privacy requirements are implemented in a particular piece of hardware, or in software that handles personal data. Such policies comprise detailed conditions on how particular data may be handled within a system: often these are just statements prohibiting particular accesses of the data, in which case they are referred to as access control policies. At lower levels there are various operational and technical policies that are machine readable and enforceable by policy management frameworks, e.g. [1,4,6,19]. Among
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
261
Fig. 1. The different layers in which privacy policies are implemented
these there will be policies expressing how a particular class of data is to be treated and these are only specific to the data, not to the system implementing the policies. Even more low level will be policies that are system-specific, and cannot be ported directly to other privacy-preserving platforms. For instance, policies specific to a particular health information system may contain specialized fields that do not exist in other similar systems. Figure 1 below is a diagrammatic representation of the different layers within which privacy policies are implemented. High level policies relate to the layers from the “Application/Service Layer” and above shown in Figure 1, while the layers below can be considered as low level policies. The preferences of a data subject are high level policies that lie between the Business Layer and the Legal Layer. What is clear from the above analysis is that the origins of privacy requirements which an enterprise has to meet are very diverse, and they arise at many different levels of abstraction. In an ideal world, lower level policies should always be the result of refinements, or special cases, of the higher level ones. In the real world, high-level requirements change over time. Data subjects and data controllers (service providers) exercise choices relating to their preferences and risk appetites. This makes it impossible for a system to always be a correct refinement of requirements, as it will take time for choices to be implemented. It will be for the data subjects to decide whether they are being offered appropriate service levels regarding the response to their choices, and for service providers to determine what level of guarantee is appropriate for their business model. Law and regulation will also evolve over time, although much more slowly and in a manner which should give enterprises sufficient time to ensure that they are addressing (or at least attempting to address) changes to related policy requirements. Privacy requirements are so heterogeneous that is not always possible to treat them consistently, and yet it is necessary to ensure that all these assorted requirements are simultaneously met for the correct functioning of society. A key assumption in our definition of a hierarchy, as opposed to a loose grouping of policies by theme and/or level of detail, is that there is a relation of containment between the different levels described. (In any enterprise, it should be possible to automatically check that this containment holds.) It should often be the case that higher-level policies express requirements that should be made more explicit (refined) in lower-level ones. In that sense, higher level policies contain requirements expressed
262
M. Casassa Mont et al.
high-level policies
IV
legal policies
regulation
I
risk assessment risk management
low-level approaches access control policies
high-level approaches EnCoRe
user preferences
policy languages
III
XACML, P3P Graham/Denning, Gunter et al.
low-level policies
II
Fig. 2. Policy layers versus description and enforcement approaches
at the lower levels, albeit in a more abstract or generic form. This justifies their placement at the upper level of the hierarchy. The more formally a policy can be expressed, the more chance we have of creating automatic enforcement mechanisms reusable technology. However, there will always be policies which are, by design, open to interpretation and requiring human intervention. One might classify current research in privacy policy description, management and enforcement using Figure 2. The vertical axis represents the varying levels at which policies are expressed at ranging from high-level (legal, regulatory) to low level (security/access control policies and user preferences). The horizontal axis characterises the degree to which policies are formalised, ranging from natural language to machine readable formats. A significant amount of research falls into quadrant III. This is no surprise as the development of policy languages goes hand in hand with the development of machine readable descriptions of low-level technical policies. It is evident from the figure that there are many other viewpoints and levels of abstraction that are of concern in policy management, and that there is scope for much work in the areas labelled as quadrants I, II and IV. Quadrants II and III pertain to the low-level policy implementations and the degree to which they directly implement requirements expressed in natural language versus machine readable formats. Specifically, we see that there is a research opportunity providing a link between natural language requirements and policy implementations (II), whereas those requirements expressed directly into machine readable formats are a good fit (III). In reality we expect most requirements to begin life in II and that system implementations find ways of pulling these into III. Our conceptual model is designed to provide a formal framework within which traceability between requirements expressed in II can be linked to corresponding requirements in III. Quadrants I and IV pertain to mapping of high-level policies directly into machine readable formats. Here the research question is to understand just how much of this kind of policy is ambiguous and requires context-dependent human intervention. We focus on ensuring that our conceptual model is rich enough to describe all of these high-level policies, so that where core privacy requirements can be identified we can directly map into formalised machine readable formats to support technology controls.
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
263
The specific area of management of privacy policies, security constraints, consent and revocation [2] is of particular interest because it is at the intersection of legislation, user requirements and management of privacy and security technical policies within and across organisations. What is particularly desirable is to devise an intermediate representation of policies that embodies high-level requirements whilst being directly translatable to (potentially existing) low-level policies or access control languages such as XACML [5], EPAL, P3P [3], P-RBAC [6], PRIME [24] and the like. Such a representation should not be tied to a particular implementation language. It may be argued that our definition of a hierarchy for privacy policies is arbitrary, as the level of detail contained in privacy-related documents, from international legislation down to business and regulatory policies, varies substantially by domain of application. It is the case that how the hierarchy is defined is heavily context-dependent. Our classification is based on research within EnCoRe, taking into consideration privacy requirements coming from a variety of sources (including legal, social and technical ones) related to the following scenarios: • • •
Employee data held within an enterprise Biobanks Assisted living
We expect that these case studies will guide our intuition regarding a core, common set of privacy requirements, and hence suggest the evolution of our conceptual model. We note here the previous work by Samarati et al. [27] which treats further the divide between high-level policies and low-level enforcement mechanisms.
3 Towards a Conceptual Model The examples of policy rules we have given so far demonstrate several different forms that privacy requirements take in real business applications. It is desirable to be able to automatically enforce as many policy rules as possible; for this, a machinereadable representation of the different forms of requirements is necessary. However, the purpose of a conceptual model is to provide a representation that enables human systematic reasoning about policies while at the same time being convertible into machine-readable code. A conceptual model defined in a strict mathematical way would have the benefit of being completely unambiguous, but it would likely be too restrictive, especially if it is intended to capture privacy laws and regulations. A more flexible approach would be to describe privacy requirements in a semi-formal manner. One can be very systematic and formal about the purpose of different policy rules, and in terms of syntax it should be possible to identify the main patterns of usage that occur in privacy requirements. To illustrate this last point: above we identified policy rules as typically having the structure if <some condition is met> then else (the else part being optional). The syntax and semantics of the conditions and actions allowed in such rules are essentially informal. However, our analysis shows that there are at least the following core set of rule types: •
notification rules: such rules describe when and how data subjects should be notified regarding accesses, uses, and transfers of their personal data. Such rules appear in low-level policies – forcing an implemented system to send email or
264
•
•
•
•
M. Casassa Mont et al.
instant messages when a condition is triggered – as well as in high-level policies and legislation: the Data Protection Act, for example, specifies that data subjects can make subject access requests (SARs), forcing a data controller to notify them of any data held about them, for which purposes, and with whom this data has been shared. access control rules: such rules specify who can access data held by an enterprise; for instance, personal data about employees should only be available to the HR department and to the employees (on an individual basis). The standard if-then-else structure of access control rules was first identified by Bonatti et al. [28]. update/creation rules: these rules express who is permitted to modify personal data that is held, and under which conditions. The right to update data or even create new data is usually reserved for the data subject, and certainly such a right exists in legislation. Rules specifying who can perform such changes to data held typically take into account the role of the parties making them (cf. role-based access control). protection rules: there will be rules specifying protections on particular data, usually protections of a technical nature, such as encryption. These are most easily described in technical, low-level privacy policies, since the parameters and algorithm for encryption can be explicitly defined; however, requirements for encryption are increasingly found in privacy regulation and company privacy policies. obligation rules: these are rules specifying a data controller’s commitment to implement a future policy decision, or to apply a particular rule (e.g. forcing the deletion of data at a particular date). Obligation management in information systems is a subject in its own right, and we are still studying this aspect.
These rule types are the essence of our conceptual model, and provide a natural means of expressing both high-level policies and setting requirements for low-level implementations. They enable the expression of actions associated with granting and revoking consent for the use of personal data.
4 How Consent and Revocation Are Perceived by Different Stakeholders This section provides an informal overview of how the concepts of consent and revocation (and related policies) are perceived and dealt with by the key stakeholders, notably organisations, data subjects and regulators. The remaining part of the paper will then primarily focus on the data subjects’ perspective. Conceptually there are several layers of abstraction of policies, ranging from human readable high-level policies to semi-formal representations (which are actionable by humans while also being translatable into operational policies), to machinereadable policies that are not usually intended to be exposed to users. This also applies to consent and revocation policies, which specifically deal with the handling of personal and confidential data based on privacy preferences and constraints driven by data subjects and legislation. Different stakeholders have different priorities and views of consent and revocation, as described below.
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
265
Organisations. Organisations have a pragmatic view about consent and revocation. Incorporating these processes into their business practices requires effort and investment, esp. to provide necessary enforcement mechanisms; also potential liabilities, in case of failures, are introduced. Because of this, it is common practice that these policies are represented in the form of opt-in or opt-out choices for end users – as these are simple to handle and not requiring complex enforcement mechanisms. In addition, consent and revocation policies are just a small part of the overall set of policies organisations need to take into account, for example business policies, security policies, compliance policies, etc. Current legislation enables organisations, in some circumstances, to bypass the need to obtain consent (and revocation) from end-users, based on “fair usage”. Of course organisations need to take into account the trade-off between carrying on with the current minimalistic data management practice and the increasing interest and willingness of end-users to obtain control over their data. EnCoRe is exploring how to ease the pain in dealing with consent and revocation management and the enforcement of related policies hence hoping to shift organisational behaviour towards a more privacy-friendly approach. Data Subjects (End-Users). From the point of view of a data subject, or end-user, if an enterprise provides consent and revocation controls, it increases that user’s choice regarding how his or her personal data is handled. We are interested here in the point of view of the end-user, and in Section 7 we argue that an enterprise can define its own consent and revocation policy, as a way of informing end-users of the choices available to them. Regulators. Regulators have been very active in providing laws and legislation concerned with the management (collection, processing and disclosure) of personal data. Laws are usually abstract and primarily focus on the circumstances in which consent might be necessary. On the other hand the concept of revocation is fuzzy and not properly addressed. As a consequence, policies that derive from these laws and legislation do not properly address these aspects and in particular meeting the expectations of the end-users.
5 Forms of Consent and the Notion of a Consent Variable With regard to an individual’s personal data, there are four principal things for which an enterprise or other data collector requires that individual’s consent: • • •
collection of data, processing of data, sharing of data.
Collection of data refers to the initial process by which data is acquired and stored on the enterprise’s information system. Processing includes any access of the data that has been collected and is characterised by a stated purpose (e.g. research, marketing, aggregation to derive average customer habits). Data may be shared – internally and externally (e.g. to third parties) so that it can be processed, often elsewhere than the site of data collection.
266
M. Casassa Mont et al.
Thus, consent may be defined as a wish for a datum d to be collected, processed, shared, or any combination of the above. This definition is too coarse, however, for it does not account for subtleties such as these: • • • •
It may be desired to restrict data collection so that it occurs only in one country, so that it is subject to that country’s privacy legislation. It may be desired that consent lasts for only a fixed period of time. It may be desired to restrict processing of data so that it is used for only certain stated purposes. It may be desired that the data is shared with only particular parties, and to banish uses of the data by others.
Thus we claim that consent is parameterised by certain quantities referred to as consent variables. As examples we consider: time t for which consent is granted, volume v of data for which consent is granted, the set S of allowed purposes (stated purposes for which consent is granted), the set Π of parties who may access the data. Thus, consent is fully determined when there are specified: • •
the task for which consent is given (collection, processing, sharing, or any combination thereof) for this task, the values of the consent variables of interest.
From this one can extract a mathematical definition of consent and develop a hierarchy of its different forms. Formal models of consent and the attendant logic are addressed in our other work [26].
6 Forms of Revocation Revocation corresponds to the withholding or withdrawal of consent. It is manifested in its simplest form as deletion of data, although there are many variations of revocation, as listed below (from [8]): 1. No Revocation At All: Personal data remains static, and once it has been disclosed, it is either physically impossible to revoke (how could ever revoke reputation) or prohibited for various reasons (e.g. law-enforcement, data from police’s DNA database). 2. Deletion: Data are completely erased and cannot be retrieved or reconstituted in any way. 3. Revocation of Permissions to Process Data: Data subjects withdraw consent that would enable an enterprise to process or analyse their personal data for a specified purpose. 4. Revocation of Permissions for Third Party Dissemination: Data subjects withdraw consent that would enable an enterprise to disclose information to a third party. 5. Cascading Revocation is a variation on any of the above kinds of revocation, whereby the revocation is (recursively) passed on to any party to whom the data has been disclosed. Through this mechanism, data subjects are able to revoke data by only contacting the enterprise that they disclosed their data to originally.
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
267
6. Consentless Revocation: Personal data for whose storage and dissemination no consent has been explicitly given by the user, but which may need to be revoked. Again, any of the fundamental types of revocation may be invoked. The need to revoke consentless data emerges mainly when a breach in privacy has occurred. In the italics below we describe a characteristic example of consentless revocation. 7. Delegated Revocation: This is a kind of revocation which is exercised by a person other than the individual concerned, such as an inheritor or parent/guardian. 8. Revocation of Identity (Anonymisation): Data subjects may be happy for personal data to be held for certain purposes so long as it is not linkable back to them personally. Anonymisation may be regarded as a variant of revocation, in that data subjects request a change to data held so that it is no longer personally identifiable. These forms of revocation should be available as actions that individuals can perform on personal data held about them by an enterprise.
7 Defining Consent and Revocation Policies Having enumerated and explained the main forms and variants of consent and revocation, we are in a position to define the notion of a consent and revocation policy, whose purpose is to enable an enterprise to inform customers of: a) what consent is required of them for each data field, and b) which revocation types are available to them should they wish to exercise control over their personal data. Thus, a consent and revocation policy is defined over a set of data tuples : , , , ,
as a set of
where the , , define together a value of consent (specifying collection, processis a set consisting of the names of ing and dissemination rights respectively), and allowed revocation types. As an example, consider this policy over , : ,
30 days , , , ,
10 days , 2,3 ,
10 days , , 2
What this policy specifies, assuming the only consent variable of interest is the time for which data is held, is that consent is required to enable the collection of for 30 days and its dissemination for 10 days; furthermore, can be revoked using revocation types 2 (deletion) and 3 (revocation of permissions to process data). The example for 10 days is required, and that deletion policy also states that consent to process may be performed. Note that the symbol – is used when one of the , , is omitted. Some further formalisation work is needed to make the definitions more rigorous, but we believe that the above presentation is sufficient for our purposes here. An enterprise is likely to require stipulations on the minimum consent a customer can provide for each data field. In other words, rather than stating in a policy specific consent values that an enterprise needs from a customer, it may be desirable to specify a range of values. We regard this as a direction for further investigation.
268
M. Casassa Mont et al.
8 Conclusions and Future Work We have discussed in this paper issues to do with the description, management and enforcement of policies in organisations. Specifically we highlighted the gap existing from a high-level approach to policies driven by risk and privacy impact assessment and low-level technical policies. We strongly believe this gap needs to be filled to enable continuity of requirements and constraints across all these levels and enable proper enforcement of policies. To achieve this we proposed the adoption of a conceptual policy model, to enable reasoning and mapping of concepts at lower levels of abstraction. We have in this paper explained the significance of consent and revocation preferences for personal data, and introduced the notion of a consent and revocation policy. We have described in detail the different aspects of the notions of consent and revocation respectively, and explained how an enterprise can define policies that make clear what options are available to a customer in terms of controlling his or her personal data. Our future work will seek to validate and refine our conceptualisation of a policy hierarchy, specifically with a view to ensuring that our conceptual model for privacy policy is rich enough to cater for all needs. We will develop a complete conceptual model encompassing a formal syntax and semantics. We will also investigate the utility of the conceptual model by application to case studies, initially within the EnCoRe project. We hope to be able to identify core privacy properties across the case studies, which can be easily mapped into reusable low-level control mechanisms. We also hope that this will offer opportunities to simplify the human interfaces, and reduce the amount of human intervention required making it simpler and more cost effective. There is much further work to be done on consent and revocation policies, most notably refining the definition of the , so that the revocation types include parameters analogous to consent variables. In practical terms, we expect to link consent and revocation policies to a real-world access control language, such as XACML, so that enforcement of consent and revocation can be done programmatically. There are also considerations to do with the internal consistency of such policies, such as preventing conflicts or incompatibilities between the consent requested and the revocation types made available. We believe that it is essential to incorporate consent and revocation controls in enterprise information systems that handle personal data, and that the deployment of consent and revocation policies across such systems is a useful means of ensuring that the privacy preferences of individuals will be respected.
Acknowledgements and Notes We gratefully acknowledge the support of the EnCoRe project sponsors – TSB, EPSRC and ESRC. The EnCoRe project [5] is an interdisciplinary research project, undertaken collaboratively by UK industry and academia, and partially funded by the Technology Strategy Board (TP/12/NS/P0501A), the Engineering and Physical Sciences Research Council and the Economic and Social Research Council (EP/G002541/1). It is noted here that the affiliation of author NP has changed after the original version of this paper was presented at the workshop.
A Conceptual Model for Privacy Policies with Consent and Revocation Requirements
269
References [1] Mont, M.C.: On the Need to Explicitly Manage Privacy Obligation Policies as Part of Good Data Handling Practices. In: Proceedings of W3C Workshop on Languages for Privacy Policy Negotiation and Semantics-Driven Enforcement, Ispra, Italy, October 17-18 (2006) [2] Mont, M.C., Pearson, S., Kounga, G., Shen, Y., Bramhall, P.: On the Management of Consent and Revocation in Enterprises: Setting the Context. Technical Report HPL-200949, HP Labs, Bristol (2009) [3] Cranor, L., Dobbs, B., Egelman, S., Hogben, G., Humphrey, J., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J.M., Schunter, M., Stampley, D.A., Wenning, R.: The Platform for Privacy Preferences 1.1 (P3P1.1) Specification. World Wide Web Consortium Note NOTEP3P11-20061113 (2006) [4] Mont, M.C., Thyne, R.: Privacy Policy Enforcement in Enterprises with Identity Management Solutions. In: PST 2006 (2006) [5] OASIS, eXtensible Access Control Markup Language (XACML), http://www.oasis-open.org/committees/tc_home.php? wg_abbrev=xacml [6] Ni, Q., Trombetta, A., Bertino, E., Lobo, J.: Privacy-aware role based access control. In: Proceedings of the 12th ACM Symposium on Access Control Models and Technologies, Sophia Antipolis, France, June 20-22, pp. 41–50. ACM, New York (2007) [7] Ferrini, R., Bertino, E.: A Comprehensive Approach for Solving Policy Heterogeneity. In: ICEIS 2009 -Proceedings of the 11th International Conference on Enterprise Information Systems, Milan, Italy, May 6-10, pp. 63–68 (2009) [8] Agrafiotis, I., Creese, S., Goldsmith, M., Papanikolaou, N.: Reaching for Informed Revocation: Shutting Off the Tap on Personal Data. In: Proceedings of Fifth International Summer School on Privacy and Identity Management for Life, Nice, France, September 7-11 (2009) [9] IBM, The Enterprise Privacy Authorization Language (EPAL), EPAL specification, v1.2 (2004), http://www.zurich.ibm.com/security/enterprise-privacy/epal/ [10] Vaniea, K., Karat, C., Gross, J.B., Karat, J., Brodie, C.: Evaluating assistance of natural language policy authoring. In: Proc. SOUPS 2008, vol. 337 (2008) [11] IBM, REALM project, http://www.zurich.ibm.com/security/publications/ 2006/REALM-at-IRIS2006-20060217.pdf [12] OASIS, eContracts Specification v1.0 (2007), http://www.oasis-open.org/apps/org/workgroup/ legalxml-econtracts [13] Travis, D., Breaux, T., Antón, A.: Analyzing Regulatory Rules for Privacy and Security Requirements. IEEE Transactions on Software Engineering 34(1), 5–20 (2008) [14] W3C, The Platform for Privacy Preferences, v1.0 (2002), http://www.w3.org/TR/P3P/ [15] Kenny, S., Borking, J.: The Value of Privacy Engineering. Journal of Information, Law and Technology (JILT) 1 (2002), http://elj.warwick.ac.uk/jilt/02-/kenny.html [16] Organization for Economic Co-operation and Development (OECD), Guidelines Governing the Protection of Privacy and Transborder Flow of Personal Data, OECD, Geneva (1980)
270
M. Casassa Mont et al.
[17] Borking, J.: Privacy Rules: A Steeple Chase for Systems Architects (2007), http://www.w3.org/2006/07/privacy-ws/papers/ 04-borking-rules/ [18] Cranor, L.: Web Privacy with P3P. O’Reilly & Associates, Sebastopol (2002) [19] Damianou, N., Dulay, N., Lupu, E., Sloman, M.: The Ponder Policy Specification Language (2001), http://www-dse.doc.ic.ac.uk/research/policies/index.shtml [20] PRIME, Privacy and Identity Management for Europe (2008), http://www.prime-project.org.eu [21] IBM: Sparcle project, http://domino.research.ibm.com/comm/research_projects.nsf/ pages/sparcle.index.html [22] The GRC-GRID, The Governance, Risk Management and Compliance Global Rules Information Database, http://www.grcroundtable.org/grc-grid.htm [23] Archer: Compliance Management solution, http://www.archer-tech.com [24] Pearson, S., Sander, T., Sharma, R.: A Privacy Management Tool for Global Outsourcing. In: Garcia-Alfaro, J., Navarro-Arribas, G., Cuppens-Boulahia, N., Roudier, Y. (eds.) DPM 2009. LNCS, vol. 5939. Springer, Heidelberg (2010) [25] Ardagna, C.A., Cremonini, M., De Capitani di Vimercati, S., Samarati, P.: A PrivacyAware Access Control System. Journal of Computer Security, JCS (2008) [26] Westin, A.: Privacy and Freedom. Athenaeum, New York (1967) [27] Agrafiotis, I., Creese, S., Goldsmith, M., Papanikolaou, N.: The Logic of Consent and Revocation (2010) (submitted) [28] Samarati, P., De Capitani di Vimercati, S.: Access Control: Policies, Models, and Mechanisms. In: Focardi, R., Gorrieri, R. (eds.) FOSAD 2000. LNCS, vol. 2171, p. 137. Springer, Heidelberg (2001) [29] Bonatti, P., Damiani, E., De Capitani di Vimercati, S., Samarati, P.: An Access Control Model for Data Archives. In: Proc. of the 16th International Conference on Information Security, Paris, France (June 2001)
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements Ioannis Agrafiotis, Sadie Creese, Michael Goldsmith, and Nick Papanikolaou International Digital Laboratory, University of Warwick, Coventry, UK {I.Agrafiotis,S.Creese,M.H.Goldsmith, N.Papanikolaou}@warwick.ac.uk
Abstract. In this paper, we demonstrate how formal methods can be used to unambiguously express privacy requirements. We focus on requirements for consent and revocation controls in a real world case study that has emerged within the EnCoRe project. We analyse the ambiguities and issues that arise when requirements expressed in natural language are transformed into a formal notation, and propose solutions to address these issues. These ambiguities were brought to our attention only through the use of a formal notation, which we have designed specifically for this purpose.
1 Introduction It is common practice for individuals to disclose personal information via the Internet in order to acquire access to services, products and benefits of today’s society. Thus, enterprises, organisations and government institutions alike have developed facilities to collect, process and share personal data with third parties. However, concerns about invasion of privacy are growing, mainly because of the way individuals’ personal data is handled by these parties. In this paper we use the term “data controllers” to describe all the parties that handle and process personal data. That these concerns are based on solid ground is illustrated by the increasing number of incidents where data has been lost, mistreated, or shared without authority [4], making the use of privacy-enhancing technologies essential for every Internet user. Although the right to privacy has been fundamental to all democratic societies and its importance is highlighted throughout the published literature [1], the term privacy has no inherent definition [3]. It is difficult to define privacy because it is a complex, multidimensional and highly context-dependent notion. People feel differently about what privacy means to them and have developed different meanings and interpretations according to their culture and experiences. The volatility of the notion of privacy, its highly contextual nature, and the widespread availability of powerful technologies for the collection, processing and sharing of personal data, all justify the need to carefully study, develop and enforce suitable privacy controls for users in modern information systems. Definitions in the information-privacy literature describe an “implicit and limited view” of controls that an individual can invoke [3]. Westin [1] defines privacy as “the claim of individuals, groups or institutions to determine for themselves when, how S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 271–282, 2011. © IFIP International Federation for Information Processing 2011
272
I. Agrafiotis et al.
and to what extend information about them is communicated to others.” Inspired by this view, the EnCoRe1 project [5] is working to make available a virtual smörgåsbord of consent and revocation controls that an individual may choose to use in order to manage her/his data flow. In the EnCoRe project we perceive “controls” as means enabling people to manage the flow of their personal data, by expressing consent and revocation preferences that can be implemented through non-interference and privacy policies. The overall vision of the project is “to make giving consent as reliable and easy as turning on a tap and revoking that consent as reliable and easy as turning it off again” [5]. To this end, we are taking into account a variety of perspectives, including social, legal, regulatory and technological aspects. We have devised a model of consent and revocation, based on the published literature and on workshops held within the scope of the project. We have developed an accompanying logic of consent and revocation (C&R), which we use to formalise specific contextual requirements, enabling us to translate natural language expressions of C&R needs into an unambiguous form suitable for checking implementations against. In this effort we used a real world case study to validate our logic. This paper describes the ambiguities and the problems that came to light when we applied our consent and revocation logic in a real world case study in order to represent formally the specification requirements of the system. When formal methods are applied to privacy problems, “the nature of privacy offers new challenges and thus new opportunities for the formal methods” [16]. The new challenge that we describe in this paper is the ambiguities, which were not evident at first consideration of the consent and revocation models, but derive from the challenges and gaps created when the details of law, regulation, policy and social factors are combined and applied by computer scientists [8]. We extend this view and argue that these ambiguities, as well as highlighting the gap between high-level and low-level methods, unveil the complexity of the privacy problem and also arise from the gap between peoples’ desire for privacy and the data controllers’ will for security. In the second section we describe the different controls envisaged in the information-privacy literature and how we extend these controls within the project to develop a consent and revocation logic, and present a brief explanation of the logic’s semantics. In the third section we present the real world case study and in the fourth section we categorise the ambiguities that emerge during the process of formalising this scenario and we illustrate with examples of formal descriptions of some of the scenario’s use cases. We also, propose solutions to the emerging ambiguities. Finally, we propose opportunities for future work.
2 Modelling Consent and Revocation In the literature of information privacy, controls have been conceptualised mainly during the process of consent [3]. Researchers identify controls that are applied at the start of a disclosure, during the processing of data and by providing the choice for the individual to be notified. Furthermore, controls could be exercised on what personal 1
The EnCoRe project [5] is an interdisciplinary research project, undertaken collaboratively by UK industry and academia, and partially funded by the Technology Strategy Board (TP/12/NS/P0501A), the Engineering and Physical Sciences Research Council and the Economic and Social Research Council (EP/G002541/1).
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements
273
data is made available to others and with whom this data is shared [3]. There are limited references to revocation controls and these are only focused on opt-out choices. We have developed a model of consent and revocation to provide a more holistic view and offer richer control mechanisms to the individuals whose personal data is held by data controllers [6]. In this paper we refer to this category of individuals with the term “data subjects”. From the literature, we have identified the different consent controls highlighted above and we have conducted workshops in order to identify different types of revocation controls. We have concluded that there exist at least eight different types of revocation [6]. These are: • • • • • • • •
No Revocation At All Deletion Revocation of Permissions to Process Data Revocation of Permissions for Third Party Dissemination Cascading Revocation Consentless Revocation Delegated Revocation Revocation of Identity (Anonymisation)
We have applied our model to a real world case study, in order to validate it and elicit requirements for the EnCoRe system [7]. Our logic is designed to provide a formal verification framework for privacy and identity management systems. It fills the gap between data-privacy policy languages and high-level requirements by focusing on the semantics of the process of consent and revocation when applied to the handling and use of personal data [7]. The application of formal methods to privacy mainly focuses on translating privacy policies [16] which are mostly written in natural language, into machine readable formats. Languages like P3P [14] and EPAL [15] are examples of these. Barth et al [13] have formed a logic of “contextual integrity” based on Nissenbaum’s theory about dissemination of information [12]. The logic describes how different roles are allocated to people according to context and allows or set constrains on how people of these roles transmit data between them. They applied this logic to privacy policies such as HIPAA [13] and the Children’s Online Privacy Protection Act [13]. However none of these methods handle consent while they completely neglect the notion of revocation. The logic consists of two novel models of consent and revocation, namely an access control model, described in Section 3, and a Hoare logic, described in Section 4. The access control model and the Hoare logic have been developed so as to be complementary to one another. The first model immediately supports policy enforcement architectures such as the one being developed within EnCoRe, but it does not provide an intuitive language for data subjects to express their consent and revocation behaviour. The second model provides a core set of consent and revocation actions axiomatised in their effect on rights and permissions in a way that is more familiar to data subjects. 2.1 An Access Control Model for Consent and Revocation In the access control model we formalise the semantics of consent and revocation processes using labelled transition systems [7]. The objective of this part is to express the requirements of such processes effectively. In this model, suitable for expressing
274
I. Agrafiotis et al.
privacy preferences, consent and revocation are perceived as dynamic modifications of those preferences. There are three main tasks for which a data collector requires consent from an individual: • collection of personal data (for storage in a database) • use of personal data (for analysis, processing marketing or one of many other purposes) • sharing or dissemination of personal data (to the public domain, or to another data collector) We identify these three cases as permissions in the access control model that describe the state of the system. These permissions may be shared or revoked from the data subject. Every action in the model is of the form r, action(σ, δ, Φ, q), v → v’, where r is the data subject who gives the permission, δ is the data that the action refers to and q is the data controller to whom the permission is shared. The letter σ describes which permissions are shared or revoked from the action and tracks the changes in the state of the system. The actions described in the model are these of consent, revocation, deletion, update and notification. Furthermore, we set guards or preferences on these actions, captured in the condition Φ, which contain the data subject’s options that change according to context. The variables contained in the Φ condition that set guards on the actions and describe the data subjects’ consent and revocation preferences are depicted in Table 1. Table 1. The variables in the Access Control Model
Variables t v s Π a
Meaning duration of consent (time-out) volume of data held sensitivity of data held parties that may access the data persistence: how data is treated after consent has lapsed
For example the operation grant(σ, δ, Φ, q) simply updates the rights matrix with a new permission on datum δ for a principal q and ensures the resulting consent and revocation state satisfies the condition Φ. 2.2 Description of the Hoare Logic In the Hoare logic we define consent and revocation processes with a set of rights for principals. We identify how the rights and actions are combined to affect permissions and create obligations. This logic differs from the access control model in its treatment, as it effectively models C&R as the application of rights that allow certain permissions. Consequently, one action in the access control model may be described with a combination of actions in the Hoare logic. This is because we believe, the assignment of “rights” in this manner, to be much more intuitive to the way data subjects express themselves. Furthermore, in the Hoare logic the conditions that guard each action are not expressed. We deliberately abstract the Φ conditions in the Hoare
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements
275
logic as the model focuses only on permissions in order to be more familiar to the data subjects. We use the notation 〈precondition〉 t 〈postcondition〉 to express obligations, with the following intuitive meaning: from a state satisfying 〈cond1〉 there is a requirement to apply term t to produce a new state satisfying 〈cond2〉. The rules for the logic will be given in the form of Hoare triples, as follows: {precondition} t {postcondition} The precondition is a combination of rights and obligations. Provided that the precondition is true, every time the t action is triggered has as the only result the post condition, which is a combination of new rights and obligations. There is also the case where more than one action t could be executed at the same time. We capture the concurrency of actions t1 and t2 by using the symbol “||”. Thus the triple will be formalised as {pre-condition} t1 || t2 {post-condition} All the permitted actions and rights are presented in the figures below. We identified six different rights in the Hoare logic. These rights are presented in Table 2. The actions that allow data subject to share and revoke rights, delete, anonymise and aggregate data are illustrated in Table 3. Table 2. The rights in the Hoare logic
Right aOδ aLδ aPδ aAδ aSδ aS*δ
Meaning a owns (originates) δ a knows (where to locate) δ a may process δ a may aggregate δ a may share δ (one-step further) a may share δ transitively Table 3. The actions in the Hoare logic
Action grant(a, b, δ) grant1(a, b, δ) grant†(a, b, δ) release(a, b, δ) revoke(a, b, δ) revoke†(a, b, δ) delete(a, b, δ) delete†(a, b, δ)
Meaning grant consent for b to process δ grant consent for b to share onward δ one step further grant consent for b to share onward δ transitively release δ for anonymous aggregation to b revoke permission from b to process δ (personally identifiable cascade revoke permission from b and friends to process δ delete δ at b cascade delete δ
276
I. Agrafiotis et al.
For example a principal a may grant consent for processing of a datum δ to a principal b only if a owns δ or is able to share it. Once the action grant is completeded, b will know where to find and process δ. Thus, the first rule for consent is as follows: {aOδ ∨ aSδ} grant(a, b, δ) {bLδ ∧ bPδ}
3 Description of the Case Study The case study selected for the validation of the models and the logic is the Enhanced Employee Data Scenario [2]. Our choice was informed by the fact that the management of employee data in organisations is a well-understood problem, and employees’ privacy offers interesting issues in terms of managing consent and revocation controls in a context where different business, legal and personal requirements need to be taken into account [2]. The case study describes a number of use case scenarios and elicits from these a list of requirements. We explore the implications of invoking consent and revocation controls. These use cases are meant to illustrate key points affecting the management of consent and revocation such as: provision or revocation of consent by a data subject; enforcing consent and revocation preferences; dealing with the overall consent and revocation controls and its impact on data including notifications, updates, auditing, assurance aspects, etc. In the employee data scenario, we focus on PII and sensitive information, such as trade union membership, financial/payment detail, home address details, family details, etc. Personal data can be gathered by different sub-organisations within the enterprise (e.g. HR department, Payroll, Occupational Health department, pensions, customer care department, help-desk and customer-relationship management services, web services to sell products, etc.). Although, this scenario is far from what can be achieved today in terms of consent and revocation controls, we believe that the emerging requirements, with reality checks, contribute to our understanding of these controls and indicate problems and ambiguities when implementing them.
4 Ambiguities in Requirements We categorise the ambiguities that emerge from our formalisation of the requirements into two classes. The first class comprises of the ambiguities created from the application of the details of law, regulation, policy and social factors by computer scientists [8]. The second class consists of ambiguities that emerge from the complexity of the notion of privacy and the gap that exists between the data subjects demand to control the flow of their data and the data controllers’ desire to reduce data subjects’ interference. We argue that to detect and solve these ambiguities the use of formal methods is necessary. Initially, we identified these issues when we applied a simple version of
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements
277
the logic on this environment (where the logic was designed in response to informal requirements expressions). We quickly identified that once we had resolved the ambiguities in requirement we could not express them using our simple logic. In order to address these issues we need to represent significantly larger rule sets than we had done before [8]. 4.1 Ambiguities of the First Kind In the first class ambiguities occur when the data subject performs controls in order to update, delete, revoke or change his or her given consent. More specifically, in the case where the data subject wishes to update his or her personal data, there can be ambiguities emerging as to whether previous data should be deleted or linked with the new data. Furthermore, in the case where the organisation has shared his data transitively it is not clarified whether the changes should affect the third parties as well. Consider the example below which we draw from the EnCoRe project’s case study on Empoyee Data Handling. This example highlights the aforementioned ambiguity and we propose solutions described in consent and revocation logic. Mary (an employee of the Company X) is getting married. She has to update her personal profiles and data within Company X and some of the third party services (including change address, financial details, indication of next of kin, etc.). It is clear that we need additional actions in the Hoare logic which will enable us to explicitly model the “update process” and a rule to define the pre and post conditions. We introduce two actions, the “ update* ” and “ update ”, which allow data subjects to determine whether to delete or link the old data with the new respectively. Furthermore, we define the right aOδ for someone to update her/his data and to express preferences whether his updates will be transitively shared onwards or not. This is formalised below: { mUδ ∧ hLδ } update*( m, h, δ, δ’ ) {(¬hLδ ∧ hLδ’ ) ∧ (〈 hPδ ||hSδ ||hS*δ 〉 grant( m, h, Φ, δ’ ) || grant1( m, h, Φ, δ’ ) || grant*( m, h, Φ, δ’ ) 〈 hPδ ||hSδ’ ||hS*δ’ 〉)} For the formalised examples in this paper, we define as m: Mary, h: HR department, t: third party and δ: Mary’s personal data. With the access control model we express the same actions and permissions but we also set the preferences of the data subject that guard every action. (r, update(σ, δ, Φ, q), ν ) → ν΄ where ν΄ = ν [ ρ|→ ρ’] such that ν΄|= Φ where Φ := (φ1 ∧ φ2) φ1:= ψt, φ2:= ψs, ψt := t < 30, ψs := s ={ sensitive} From the above formalisation, Mary decides to update her data by linking the new with the old ones. She doesn’t choose to disseminate these updates to third parties and
278
I. Agrafiotis et al.
she is able to set time and sensitivity preferences. This means that she controls what information is updated, how it is updated, who will process the updated information and she also sets guards on these controls, expressed via the values of the variables that she chooses. Similar to the process of update when an individual revokes or changes his initial consent it is not clarified whether these changes will affect third parties. Revocation of consent, even though it is analysed in detail in our model, obtains different meanings depending on the circumstances and purposes that the data is being held for. In the case of deletion, ambiguities emerge from differences on how higher level people perceive the notion of deletion and how it could be technically implemented. For example deleting data could have multiple meanings. We could render the data useless, scramble data, delete it from the back-up system or physically destroy the hard discs. Consider the case below that combines both types of these ambiguities: Company X outsources the provisioning of a few mandatory enterprise services (including travel agency services, pension fund management) and voluntarily services (Sport and Social Club - SSC). Part of Mary's data (financial details, address, employee references, etc.) needs to be disclosed to these third parties. Mary decides to withdraw from the voluntary SSC service. She revokes her consent to use her personal data. The formalisation of this example in the logic is illustrated below. { mΟδ } revoke( m, t, δ ) || delete( m, t, δ ) {(¬tLδ ∧ ¬tAδ ) ∧ ¬tPδ )} With the access control model we express the preferences of the data subject regarding the process of deletion. (r, delete(σ, δ, Φ, q), ν ) → ν΄ where ν΄ = ν [ ρ|→ ρ’] such that ν΄|= Φ where Φ := φ1 φ1:= α ψα := α = {delete from back-up system} In this example Mary chooses to revoke the ability from the third parties to process her data. Additionally, she expresses her preference that the data stored should also be deleted. It is important to highlight the difference illustrated by this example, between revocation of consent and deletion. This is a key misunderstanding between data subjects. The logic describes four different types of revocation that allows us to express formally all the different meanings that the term has, depending on the context. Another issue addressed with this formalisation is that of deletion. We include the variable a where Mary expresses what she perceives as deletion in the specific context. 4.2 Ambiguities of the Second Kind The second class highlights the complexity of the privacy problem and underlines the conflicts that emerge between a data subject and a data controller. The most
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements
279
interesting issues but at the same time most difficult to address is that of aggregation and anonymity. The complex nature of these issues could lead to a situation where it may be technically infeasible to express data subjects’ preferences or the privacy regulations in place. Aggregation unveils more information about the data subject, by combining pieces of information already available. Data could be processed and shared with the proper way by the data controller, but when aggregated, this data create new information that may compromise data subject’s privacy. In the Hoare logic we assume that every time when data is shared that could be aggregated by the data controller, which as long as he has permission to collect data has inherently the right to aggregate that. A possible solution to the problem of aggregation is for the data subject to define the purpose for which the data is shared and also control what further personal data the data controller collects. The ambiguities that arise in the case of anonymity concern the way which data is anonymised. These ambiguities where unveiled when we tried to formalise the case where Mary requested her medical records to be anonymised if shared with another third party. Although Mary may consent to share her anonymised medical records, the danger of her identity to be unveiled always lurks, as “data can either be useful or perfectly anonymous but never both” [9]. “There is growing evidence that data anonymisation is not a reliable technical measure to protect privacy. The development of more powerful re-linking technology capabilities and the wider access to increasing amounts of data from which to drive these are reducing its effectiveness” [17]. Even if methods such as k-anonymity [10,11] become efficient the link between the data controller and the third party captured by the logic, potentially could lead to de-anonymisation of the data. In our logic we capture data subjects request to anonymise data first and then disseminate to a data controller but we consider the anonymised data as new data where the data subject has no controls on that data and we forbid any share of the old data between the data subject the data controller and the third parties that have access to the anonymised data. Ambiguities also emerge when the data subject exercises her/his right to revoke consent but at the same time the data controller is unable to perform such an action. For example a data subject may request his data to be deleted but the organisation is still processing his data. To address these issues we insert a new action, which allows the data subjects to express transparency in their decisions. Additionally, we tackle the conflicts that occur between the data subjects and the data controllers by introducing a combination of permissions and obligations under certain conditions. For example in order to revoke a data subject his consent to process data there is a condition that the data controller does not process the specific data at that time. Consider the example where Mary decides to withdraw from the voluntary SSC service. She revokes her consent to use and share her personal data. The formalisation of the example is shown below: { mOδ ∧ hPδ ∧ hSδ } revoke( m, t, δ ) || revoke1( m, t, δ ) {(¬tPδ ∧ ¬tAδ ) ∧ (¬tSδ )}
280
I. Agrafiotis et al.
In the access control model we define a binary variable p which is true only if the data controller does not process the data. We include this variable in the Φ condition to permit the execution of the action of revocation only if the data controller does not process the data. (r, revoke(σ, δ, Φ, q), ν ) → ν΄ where ν΄ = ν [ ρ|→ ρ’] such that ν΄|= Φ where Φ := φ1 , φ1:= ψb , ψb:= p ={true} In this formalisation Mary chooses to revoke the ability of third parties to share her data onwards and their ability to process her data as well. In order to solve the conflicts between data subjects will and data controllers need to process the data these actions can only be fulfilled under certain circumstances controlled by both Mary and third parties. The p variable ensures that there will be no ambiguities when these actions take place. Another area of conflict is the notification process. It is ambiguous whether it is an obligation for the data controller to notify the data subject or a right for the data subject to request to be notified. Thus, the question that arises is who will trigger the action. For example consider the case where a data subject wants to be notified when a specific data is processed. There isn’t a right to provide the owner of the data the ability to request such an action. But also the enterprise is not allowed to contact the data subject unless she/he has consented to such an action. In the law, there isn’t an explicit reference on whether the data subject could request to be notified or not. However, there is a “data subject’s request” right that allows data subjects to request all the data that a data controller possess about them. In alliance with this reference, we will formalise the notification process as an obligation for the data controller. Furthermore, ambiguities occur with the handling of meta-data. In the logic, metadata mainly comprises of variables and ranges of values that set the context and describe data subject’s consent and revocation preferences. In the high-level models it is not clear what happens with the data subject’s control preferences. An interesting example is that of notification. How can an enterprise notify a data subject that their data were deleted completely if they do not keep their email and consent and revocation preferences describing the conditions for the action of notification? The following example illustrates both ambiguities and the formalisation presented proposes a way to address these issues. Consider the case where Mary is offered the opportunity to express notifications preferences about access/internal usage/disclosure to third parties of this data. The formalisation is: { mOδ ∧ tPδ } notify( m, t, δ ) {〈mNδ 〉 notify(m, t, δ ) 〈 true 〉) ∧ tLn† ∧ tLn* } In this example, Mary expresses her preference to be notified when her data are accessed and processed by third parties. She also chooses to be notified by e-mail. We solve the problem of who will trigger the action of notification by introducing an obligation for the company to notify Mary. Furthermore Mary controls the possible channels through which she will be notified and the reason that will trigger such an action. Also by defining the meta-data stored in the company, in every action
Applying Formal Methods to Detect and Resolve Ambiguities in Privacy Requirements
281
performed by the data subject she/he could express exactly the same controls that apply on her/his personal data.
5 Convergence The formalisation of the case study revealed ambiguities and areas of conflict, enabling us to extend and improve system’s requirements. However, their formalisation could not be effectively addressed with the initial state of the logic. Tackling the ambiguities created both from the formalisation of law, regulation and social factors and from the complexity of the notion of privacy, enhanced the effectiveness of our logic by introducing new actions and rights and enriched its descriptiveness by identifying new variables and options for the data subject to express his preferences. We will briefly mention the novelties in the logic that allowed us to formalise effectively and unambiguously all the requirements for the first case study. We introduced four actions for updating data, enabling data subjects to update data either by deleting the previous data or by linking that with the new or propagate the updates to third parties as well. We defined an action for notification and created an obligation for the data controller providing the means to the data subjects to be notified under certain conditions and through certain communication channels (e-mail). Further to the introduced actions, we identified rights that will determine whether the actions will be completed. The data subject now has the right to be notified, the right to update data and the right to delegate rights to other individuals. Furthermore, the data controller has the right to know the location of every meta-data, enabling the data subject to express preferences on the way that the meta-data will be treated. Last but not least, the effectiveness of the old and new actions was increased by the new variables. We now created variables to determine when the data subject could revoke permissions from the data controller, the way to delete data and the purpose for which the data will be used in order to address the problem of aggregation.
6 Conclusions and Future Work Enabling individuals to control the flow of their personal data is a complex issue. In this paper we focused on consent and revocation controls and their practical implications when formalising high-level requirements. We discussed the ambiguities that occurred in a real-case scenario of a large organisation, during that process. We categorised these issues into two kinds, according to the source of their existence. Furthermore we proposed the use of formal methods in order to address these problems and we described parts of the solution. Future work will focus on validating the extended logic in a different real-case scenario and identifying new ambiguities. As our aim is to develop a general applicable logic, the emerging ambiguities should be minor and solved without extending the logic with more actions and rights. Implementing consent and revocation controls raises technological, legal and business challenges. Thus, we need to combine effectively diverse scientific fields that are not necessarily complementary. Developing a logic for consent and revocation and applying it to a real-case scenario in order to identify and address the emerging difficulties, is the first step towards that objective.
282
I. Agrafiotis et al.
References 1. Westin, A.: Privacy and Freedom. Atheneum, New York (1967) 2. Mont, M.C., Pearson, S., Kounga, G., Shen, Y., Bramhall, P.: On the Management of Consent and Revocation in Enterprises: Setting the Context.Technical Report HPL-200949, HP Labs, Bristol (2009) 3. Whitley, E.A.: Information privacy consent and the “control” of personal data, Inform. Secur. Tech. Rep. (2009), doi:10.1016/j.istr, 10.001 4. Whitley, E.A.: Perceptions of government technology, surveillance and privacy: the UK identity cards scheme. In: Neyland, D., Goold, B. (eds.) New Directions in Privacy and Surveillance, pp. 133–156. William, Gullompton (2009) 5. EnCoRe, http://www.encore-project.info 6. Agrafiotis, I., Creese, S., Goldsmith, M., Papanikolaou, N.: Reaching for informed revocation: Shutting off the tap on personal data. In: Bezzi, M., Duquenoy, P., FischerHübner, S., Hansen, M., Zhang, G. (eds.) Privacy and Identity Management for Life. IFIP AICT, vol. 320, pp. 246–258. Springer, Heidelberg (2010) 7. Agrafiotis, I., Creese, S., Goldsmith, M., Papanikolaou, N.: The Logic of Consent and Revocation (2010) (submitted) 8. Krasnow Waterman, K.: Pre-processing Legal Text: Policy Parsing and Isomorphic Intermediate Representation. In: Intelligent information Privacy Management Symposium at the AAAI Spring Symposium (2010) 9. Ohm, P.: Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, University of Colorado Law Legal Studies Research Paper No. 09-12 (2009), http://ssrn.com/abstract=1450006 10. Samarati, P.: Protecting Respondents’ Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 13(6) (2001) 11. Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: k-anonymity.Secure Data. Managment in Decentralized Systems, 323–353 (2007) 12. Nissenbaum, H.: Privacy as contextual integrity. Washington Law Review 79(1) (2004) 13. Barth, A., Datta, A., Mitchell, J.C., Nissenbaum, H.: Privacy and contextual integrity: Framework and applications. In: SP 2006: Proceedings of the 2006 IEEE Symposium on Security and Privacy, Washington, DC, USA, pp. 184–198. IEEE Computer Society, Los Alamitos (2006) 14. Cranor, L.F.: Web Privacy with P3P. O’Reilly, Sebastopol (2002) 15. Powers, C., Schunter, M.: Enterprise privacy authorization language (EPAL 1.2). W3C Member Submission (2003) 16. Tschantz, M.C., Wing, J.M.: Formal Methods for Privacy. In: Cavalcanti, A., Dams, D.R. (eds.) FM 2009. LNCS, vol. 5850, pp. 1–15. Springer, Heidelberg (2009) 17. EnCoRe Press Briefing, London School of Economics, June 29 (2010)
A Decision Support System for Design for Privacy Siani Pearson and Azzedine Benameur Cloud and Security Lab, Hewlett-Packard, Bristol, UK {siani.pearson,azzedine.benameur}@hp.com
Abstract. Privacy is receiving increased attention from both consumers, who are concerned about how they are being tracked and profiled, and regulators, who are introducing stronger penalties and encouragements for organizations to comply with legislation and to carry out Privacy Impact Assessments (PIAs). These concerns are strengthened as usage of internet services, cloud computing and social networking spread. Therefore companies have to take privacy requirements into account just as they previously had to do this for security. While security mechanisms are relatively mature, system and product developers are not often provided with concrete suggestions from a privacy angle. This can be a problem because developers do not usually possess privacy expertise. In this paper we argue that it would be useful to move beyond current best practice – where a set of searchable privacy guidelines may be provided to developers – to automated support to software developers in early phases of software development. Specifically, our proposal is a decision support system for design for privacy focused on privacy by policy, to be integrated into the development environment. We have implemented a proof of concept and are extending this work to incorporate state-of-the art consent mechanisms derived from the EnCoRe (Ensuring Consent and Revocation) project [1]. Keywords: Decision Support, Expert System, Patterns, Privacy, Software Engineering.
1 Introduction A key challenge for software engineers is to design software and services in such a way as to decrease privacy risk. As with security, it is necessary to design privacy in from the outset, and not just bolt on privacy mechanisms at a later stage. There is an increasing awareness for the need for design for privacy from both companies and governmental organisations [2,3]. However, software engineers may lack privacy knowledge and the motivation to read and inwardly digest long and complicated guidelines. Therefore, to support software engineers in implementing privacy aware systems, it is necessary to move beyond a set of searchable guidelines. In this paper we address this problem by first extracting relevant high-level privacy design concepts from existing guidelines (with a focus on current best practice for privacy by policy [4]), and we then translate these concepts into context-dependent rules and privacy design patterns. This allows us to build decision support systems to help S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 283–296, 2011. © IFIP International Federation for Information Processing 2011
284
S. Pearson and A. Benameur
developers design privacy in early phases of the software development life cycle and potentially also improve design at a later stage.
2 Related Work For some decades, mechanisms have been developed to address the issue of taking security into account in early phases of the software development life cycle. However, until recently this has not been the case for privacy. To address this problem some software companies have issued privacy guidelines for their developers but these guidelines are not easily applicable by developers and often rely on their own interpretations, leading to an error-prone process. Our approach limits errors and potential misunderstanding of guidelines by shifting the reasoning and understanding from developers to a decision support system. Privacy design techniques are not a new concept: various companies, notably Microsoft [2], have produced detailed privacy design guidelines. Cannon has described processes and methodologies about how to integrate privacy considerations and engineering into the development process [5]. Privacy design guidelines in specific areas are given in [6,7], and [3] considers the case of cloud computing. In November 2007 the UK Information Commissioners Office (ICO) [8] (an organisation responsible for regulating and enforcing access to and use of personal information), launched a Privacy Impact Assessment (PIA) [8] process (incorporating privacy by design) to help organizations assess the impact of their operations on personal privacy. This process assesses the privacy requirements of new and existing systems; it is primarily intended for use in public sector risk management, but is increasingly seen to be of value to private sector businesses that process personal data. Similar methodologies exist in Australia, Canada and the USA [9]. This methodology aims to combat the slow take-up of designing in privacy protections at the enterprise level: see [10] for further discussion, [11] for further background, and [12] for a useful classification system for online privacy. There has also been related encouragement of a ‘privacy by design’ approach by the Canadian Privacy Commission. Our approach can be viewed in this context as a manifestation of a design for privacy or privacy by design approach that addresses a core subset of the concerns needed, focused around consent and notice mechanisms. Unlike a Privacy Impact Assessment, which is directed at organizations to provide an assessment of risk related to projects or activities, it is directed at developers in order to help them design privacy into online products and services. Our approach focuses on "privacy by policy", by which means privacy rights are protected through laws and organizational privacy policies, which must be enforced [4]. Privacy by policy mechanisms focus on provision of notice, choice, security safeguards, access and accountability (via audits and privacy policy management technology). Often, mechanisms are required to obtain and record consent. The ‘privacy by policy’ approach is central to the current legislative approach, although there is another approach to privacy protection, which is ‘privacy by architecture’ [4], which relies on technology to provide anonymity. The latter is often viewed as too expensive or restrictive, as it is not suitable for all situations, limits the amount of data available for data mining, research and development, targeting or other business
A Decision Support System for Design for Privacy
285
purposes, and may require more complicated system architectures and expensive cryptographic operations. We consider in this paper a solution to advising developers that is focused on privacy by policy as the elements can more easily be broken down; we plan in future to extend this approach to cover a hybrid approach with privacy by architecture. In order to provide a practical technique for design, we utilise design patterns [13]. Some previous work has been carried out in the privacy design pattern area: [14] describes four design patterns applicable to the design of anonymity systems. These could be integrated into our approach at a later stage, when we move on from considering our current "privacy by policy" rules to extend this to "privacy by architecture" approaches, hybrid approaches and assessment of the relative merits of the patterns. A number of existing tools provide a framework for the generation of decision support systems [15,16,17,18]. Decision support systems have been developed for privacy [19], but not for design related to privacy by policy, nor addressing our focus on suggesting design patterns related to provision of appropriate notice and consent mechanisms. In the security domain several support systems for security have been proposed, the closest approach to our work being SERENITY [20,21,22]. The latter supports developers from the early design phase by providing security solutions, going beyond a set of design patterns to provide both patterns and executable components. However, this approach has not been provided with privacy in mind and does not offer any privacy features. Delessy et al. [23] have discussed how to build upon model-driven development and the use of security patterns in order to secure applications in service-oriented architectures. Laboto et al. [24] have proposed the use of patterns to support the development of privacy policies, but – unlike in our approach – a rule engine was not proposed to automatically select appropriate patterns at the design stage, and the focus again was on security.
3 A DSS Focused on Privacy by Policy In this section we provide more information about the decision support system that we propose to address core aspects of privacy in the software design phase. We first give an overview of how the system operates, and then provide some examples of design patterns and explain more about the rulebase and inference procedures. 3.1 System Overview The architecture of our Decision Support System (DSS) is depicted in Figure 1. The thick solid lines represent input to or within the system while the thick dashed lines are the output of the system. In our DSS, two actors are present: a privacy expert in charge of refining the rules derived from guidelines and of providing an abstract representation of corresponding privacy patterns, and a developer who wishes to integrate privacy into their design and who will achieve this by implementing privacy patterns output by the DSS. The developer is the target end-user of the DSS. As a user of the DSS, the developer provides his/her requirements, as input to a questionnaire
286
S. Pearson and A. Benameur
Fig. 1. Decision Support System Architecture
shown by the system. This operation is done through a user-friendly interface but could instead be integrated into standard integrated development environments such as Eclipse. The DSS forwards this input to an inference engine. Based on the particular requirements and context set from the answers to the questionnaire, the engine queries the rule repository to obtain potential applicable patterns. The inference engine uses information about the implementation and abstract representations to reason and produce an output that is a set of candidate patterns matching the developer's requirements and context. The core of the DSS relies on the encoding of best practice guidelines into a machine readable format and this is why the assistance of privacy experts is crucial in order to have a sound rulebase. To date we have analyzed the Microsoft privacy design guidelines [2] and some other documentation related to privacy by policy, and extracted the core concepts. A couple of these concepts are presented as examples in the next section. As discussed
A Decision Support System for Design for Privacy
287
further in Section 5, we are now extending the knowledge base to include design techniques for consent that we have been developing within the EnCoRe project [1]. 3.2 Design Patterns Design patterns are a way to capture solutions to commonly occurring problems. In this section we present two examples related to the concept of "explicit consent". For this purpose we use radio buttons and checkboxes as some examples of how explicit consent may be provided. Design Pattern 1: Name: checkbox Classification: opt-in consent Intent: provide an opt-in consent mechanism Motivation: an explicit consent experience that is opt-in means that the proposition presented will only occur after the customer takes an action Context: you are presenting an option with a checkbox Solution: the checkbox cannot be pre-checked: the customer must actively check the box with text containing the privacy choice in order to enable the data collection, use, or transfer. Use case: checkboxes, one of which is not pre-checked and is associated with the text "I want to help make Microsoft products and services even better by sending Player usage data to Microsoft" Related patterns: radio button Design Pattern 2: Name: radio button Classification: opt-in consent Intent: provide an opt-in consent mechanism Motivation: an explicit consent experience that is opt-in means that the proposition presented will only occur after the customer takes an action Context: you are providing multiple choices to the customer Solution: the customer must select an option that was not selected by default. The design should not select any radio buttons by default, i.e. it should require the customer to actively select one of them, to consent or not consent to collection or transfer of data. Use Case: radio buttons, one of which is not pre-checked and is associated with the text "I want to sent statistic data about my usage of Chrome to Google" Related patterns: checkbox These design patterns describe mechanisms for providing consent and explicit consent using radio buttons and checkboxes and their related application context. The software developer, via UIs, provides his/her input to the decision support system that triggers a set of rules to find matching patterns related to both the input and the context. We then map these results to design patterns that show the developer how in practice this can be achieved. In this section we have presented two simple patterns related to the expression of "explicit consent". However there are a number of other design patterns that provide
288
S. Pearson and A. Benameur
alternative methods of consent (including implicit consent) and in addition various mechanisms for provision of notice (both discoverable and prominent), each suitable for different contexts. We have defined such patterns, based upon techniques described in the Microsoft developer guidelines, and these are used within our initial prototype, as described in Section 4. In addition, we have been extending this approach to build up a broader set of design patterns, including more complex consent management mechanisms, usage of obligations and sticky policies. An example from this set is of flexible policy negotiation between two entities (e.g. the data subject and data controller), where a number of different protocols for such negotiation (based upon inputting initial policies from both parties and then running a negotiation protocol to try to achieve agreement on a common, or new, set of policies) can each generate design patterns based upon the following generic template: Design Pattern: Name: Negotiation Classification: Data and policy management Intent: to allow negotiation of preference attributes to share only what is absolutely necessary Motivation: A scenario where this would be useful is when a service provider (SP) subcontracts services, but wishes to ensure that the data is deleted after a certain time and that the SP will be notified if there is further subcontracting Context: You are a user interacting with a system and want to share the least possible information to be able to interact with this system. You define a subset of your personal data. This subset represents the data you would absolutely not want to share. You can also express a sharing option on each attribute of your personal data. Problem: Systems have policies that require a certain subset of your personal data to be able to perform some transactions. The key problem here is to find a matching between the user’s preference over his/her personal attributes and the system requirements/organizational policies. The negotiation is here to provide more flexible policies and the least possible information disclosure for users. Solution: Use a policy negotiation protocol. This protocol requires users to express preferences for each of their PII’s elements. Then the negotiation consists of mandatory attribute requests from the system and seeing if this matches user preferences. A successful negotiation is when this exchange leads to a minimized set of PII attributes being shared. Design issues: • Need to have a mechanism to express preferences over each attributes for users. • Not compatible with legacy systems. Consequences: Benefits: privacy preferences of users are respected; systems benefit because it may lead to more usage as users are sharing only attributes that they want. Related patterns: Privacy Policy As described in Section 5, the extension of our design pattern repository is the subject of ongoing research.
A Decision Support System for Design for Privacy
289
3.3 Rules and Representations In order to be able to reason about different privacy patterns, natural language is not suitable; instead, we use a more formal representation of the patterns introduced in subsection 3.2. in order to make inferences about patterns. In this section we consider a representation for our system of deduction, the knowledge base and the inference mechanisms that serve as a foundation for the implementation of such a system described in the following section. System of Deduction
¬
To formalize our system of deduction we may use the representation of a propositional logic L1, where =L P,{ , &,∨, →}, , where P is a finite alphabet of propositional variables, Z are the rules of inference, there are no axioms and the connectives negation , conjunction &, disjunction ∨ and material implication → are used to build up well-formed formulae using the symbols of P according to the standard inductive process for doing this (see for example [25] for further details) . The rules of inference Z correspond to the exact propositional formulation used: for example, reduction ad absurdum, double negative elimination, conjunction introduction, conjunction elimination, disjunction introduction, disjunction elimination, modus ponens, conditional proof – or else, for example, if a sequent calculus were used, the axiom and inference rules of the propositional version of Gentzen’s sequent calculus LK (see [25] for further details). The logical representation above is minimal, in the sense that a propositional logic is used as the basis for the representation. There is scope for a more complex representation to be used for this problem, notably modal logic – where we could distinguish between necessary, preferable and possible relationships [26], or possibilistic logic, where necessity-valued formulae can represent several degrees of certainty [27]. Let us now consider our system. In our case, as an initial example, = CN Con where CN are the set of consent and notice requirements, and Con are the contextual settings. We define: CN = {C, IC, EC, CP, CA, SMC, AP, OU, OI, N, DN, PN} Con = {S, E, D, CF, CS, EA, CPI, SI, U, A, AC, AS, SP, TP, STP, PC, SO, T, TA}
¬
where the meaning of these symbols is as follows: C: consent is required IC: implicit consent is required EC: explicit consent is required CP: consent is required by a parent CA: consent is required by an application or system administrator SMC:separate mechanism for consent required AP: authentication is required by a parent OU: opt-out is required OI: opt-in is required N: notice is required DN: discoverable notice is required
290
S. Pearson and A. Benameur
PN: S: E: D: CF: CS: EA:
prominent notice is required sensitive PII is stored data is used in ways that exceed the original notice discrete transfer of anonymous and pseudonymous data file extensions already associated with another application are being changed users’ PII is being exposed in a sharing or collaboration feature anonymous or pseudonymous data is being exposed in a sharing or collaboration feature CPI: there is collection and disclosure of children’s PII SI: installation of software is involved U: automatic update of software is involved A: anonymous data is continuously collected and transferred AC: sensitive PII is stored for the purpose of automatic completion AS: age will be exposed in a sharing or collaboration feature SP: PII transferred will be used for secondary purposes TP: PII is transferred to or from the customer’s system STP: PII is shared in an independent third party PC: PII will be stored in a persistent cookie SO: sensitive information will be transferred and retained in a 1-time transaction T: data will be transferred from the server over the Internet TA: PII is transferred to an agent Knowledge Base Our knowledge base kb is a finite and consistent set of formulae pi for i = 1. . .n is a propositional formula. We assume the set of the propositional formulae st. occurring in kb (denoted Propkb) form a propositionally consistent set and as such do not lead to a contradiction. The formulae in kb have as possible interpretations all classical interpretations (“worlds”) I of the propositional variables P, i.e. these are “models” (denoted |=) of the propositional formulas in kb. The knowledge base is able to answer queries based on |=I in L1. That is, the input is a propositional query formula p of L1 and evaluation of this query outputs a formula q of L1, such that q is true under the interpretation I. We initialize the knowledge base to a set of well-formed formulae of L1, which are the privacy consent and notice rules, which have truth value T in the current interpretation. We also, as described below, add an additional well-formed formula to the knowledge base that corresponds to the contextual settings of the UI from the user and that also has truth value T in the current interpretation. The initial settings of the knowledge base are as follows: Propkb = {C → IC v EC, EC → OU v OI, N → PN v DN, PN → DN, SMC → C, S → C & N, E → SMC, D → C & DN, CF → C & N, CS → C & PN, EA → C & N, CPI → CP & AP, SI → EC & PN, U → EC & PN, A → EC & PN, AC → EC, AS → EC & PN, SP → EC & PN & SMC, TP → OI & DN, STP → OI & PN, PC → OI & SMC, SO → CA, T → DN, TA → DN} The interpretations possible for a given case (i.e. the particular models that are possible corresponding to a system usage) will be restricted by the choices selected by
A Decision Support System for Design for Privacy
291
the end user when answering the questionnaire, i.e. if the parameter selected by an end user via the questionnaire (e.g. sensitivePIIinvolved, PIIexposedincollaboration, collectionofchildsPII) is set to “yes”, the corresponding propositional symbol (from the set Con described above) is assigned the truth value T; and if the parameter is set to “no” then the corresponding propositional symbol would be assigned the truth value F. Multiple interpretations are possible if not all values are known. Corresponding to this interpretation, we add a formula u є L1 to Propkb, where u is a clause made up of a conjunction of literals such that if Con has truth value T in the current interpretation, we add the literal pi but if Con has truth value F in the current interpretation, we add the literal pi. i.e. u = p1 &… & pr &
¬q … & ¬ q 1,
t,
¬
where
,
Con
corresponds to the contextual settings of the UI from the user, and u will have truth value T in the current interpretation. Deduction In summary, we deduce the requirements (new well-formed formulae of L1) from Propkb using the rules of deduction Z (and then map formulae with certain properties onto design patterns as the output of the tool). The inference rules are applied to Propkb to deduce new formulae of L1. In particular, we want to find qi such that Propkb ├ qi., As we deduce these, we add them to Propkb. Alternatively, we can use a backwards chaining search method, starting with each qi . We assume that the user is asked all questions, or that those that are not asked can have the associated propositions in Con given a false truth value. In this way, we can assume the conjunction of all the propositional variables in Con (or their negations, if that has been indicated according to the user response). Then, what is required is for the system to deduce which propositions (or their negations) from the set CN follow. To make the inference process easier, we may convert Propkb into conjunctive normal form (CNF), i.e. into a conjunction of clauses where each clause is a disjunction of literals (atomic formula or its negation), and then reduce this further (using the commutative properties of v and &, and the logical equivalence of ( A v B) & A to B & A) [28] in order to highlight the resultant literals. Thus, Propkb can be reduced to u & v, where v is a clause in CNF that represents the additional knowledge derived by the system. In a separate process we then map the conjuncts within v onto design patterns.
¬
Example Let us consider the case where anonymous data is being collected and transferred within the system that the developer is designing. A central rule that corresponds to consent and notice advice for this situation that is given within the privacy guidelines in [2] may be paraphrased as “if anonymous data is continuously collected and transferred then explicit consent and prominent notice is required” and this may be represented using the formalism above as ‘A → EC & PN’, which is one of the rules in the knowledge base of our system.
292
S. Pearson and A. Benameur
If we have a scenario where anonymous data is continually collected, transferred, and shared with others, then according to the developer’s answers in the questionnaire, in this particular case we may assume the truth of the following formula:
᪰S & ᪰E & ᪰D & ᪰CF & ᪰CS & EA & ᪰CPI & ᪰SI & ᪰U & A & ᪰AC & ᪰AS & ᪰SP & ᪰TP & ᪰STP & ᪰PC & ᪰SO & ᪰T &᪰TA By the mechanisms described above we may deduce the following additional formulae for this scenario that are of particular interest: C, N, EC, PN, OU v OI, DN By focusing on the more specific instances, we have the requirements: EC, PN. As a follow-on inference process, these propositions are then mapped by our system to a choice of design patterns, i.e. some of these qi Propkb (in this case, EC and PN) are used to map onto design patterns as the output of the tool. So, the user will be shown a choice of patterns to enable him/her to implement explicit consent and prominent notice. This mapping may be more complex if desired than 1-1 or 1many: for example, we could use a Boolean trigger that includes other contextual conditions (corresponding to the values of selected propositions in P) to output a suggested design pattern, and this can be useful in refining the design patterns to particular contexts.
4 Implementation We have presented above the rule formalism used within the inference engine. The implementation of these rules into executable components is done in Java using JBoss Drools [29]. The implementation of our DSS is done as a plug-in added into a standard Integrated Development Environment (IDE) named Eclipse [30]. The concept of such an extension relies on views, which are different elements of the user interface, and on perspectives, which are layouts with several views. The Decision Support perspective includes three views for three different purposes: the privacy expert view, the developer view and the output view. The privacy expert view is the dedicated user interface accessed by privacy experts to add design patterns. In our implementation we decided to condense the format of the design patterns (considered above in section 3.2) to only include the following fields: name, classification, source, solution. The classification is the basis for automated analysis about the relationship between patterns. The solution can contain text, code snippets and examples. The context is no longer defined within the pattern itself, but within the rules of the system. The developer view is what developers need to answer in order for our DSS to offer a set of patterns addressing their issues. These questions represent the contextual settings of our system and in our implementation they correspond directly to the propositions within Con, as introduced in section 3.3. So the developer would be asked questions including the following examples and given the option to answer
A Decision Support System for Design for Privacy
293
‘yes’, no’ or ‘unsure’, together with associated help where required as to what the questions mean: • • • • • • • •
is personal data transferred to or from the customer’s system? will the transferred personal information be used for secondary purposes? is personal data shared with an independent third party? is there collection or disclosure of children’s personal data? is an automatic update of software involved? is sensitive data stored for the purposes of automatic completion? is personal data stored in a persistent cookie? will sensitive information be transferred and retained in a one-time transaction?
The set of candidate patterns is then presented in the output view, from which developers can select and implement the most suitable ones. The logic of the system is implemented using a rules engine [29]. Figure 2 is an example of the implementation of this part of the knowledge base: in this case, AS → EC & PN. This propositional formula is linked to a question (i.e. “is age exposed in a sharing or collaboration feature?”, with identity number 6, user options to answer ‘yes’, ’no’ or ‘unsure’ and associated help) and the rule shown in Figure 2 expresses that if the answer given by the user to this question is ‘yes’ then the list of requirements should include Explicit Consent and Prominent Notice. Similarly, there are analogous JBoss rules that represent the other inference rules of the system, in our case that correspond to all the formulae of Propkb. Here, the Java function ‘addToRequirementsList’ will delete duplication and the more general requirements (that duplicate the more specific requirements in the list) to output the most specific requirements. This final list is then mapped to design patterns, using the classification field of the design patterns, and these are output in a list to the developer.
Fig. 2. Example rule implementation
5 Further Work While this approach enhances privacy design it is possible to extend the rulebase in order to provide more accurate reasoning capabilities and a more comprehensive set of output choices.
294
S. Pearson and A. Benameur
A next step is to build up our repository of privacy patterns, for example by incorporating other privacy guidelines (including those from HP and Sun). We are also investigating integration of other methodologies. Furthermore, within the EnCoRe project, we are currently developing techniques for providing enhanced consent and revocation [1] and we are extending our design pattern set and rulebase to capture such mechanisms. This approach will enable assessment of legacy systems and other contextual requirements in determining an appropriate approach to consent and revocation solutions, as well as a broader range of solutions. In order to import, export or interface our knowledge base to other systems using a different rule language (which may be the case in particular for different domains) we plan to consider the use of the rules interchange format [31] to allow such interactions. Although it did not seem necessary in order to capture the MS guidelines, our representation allows for capturing more complex rules if desired (for example, A & B → C, and arbitrarily complex expressions of L1). These correspond in our implementation to the use of Boolean trigger conditions within the JBoss rules. It is also possible to have a more intelligent way in which to ask the questions to the user in which the questions can be nested and only asked if appropriate, in order to reduce the number of questions asked. For example, is the answer to ‘is users’ personal data being exposed in a sharing or collaboration feature?’ is ‘no’ then there is no need to ask the user the question ‘will age be exposed in a sharing or collaboration feature?’ because age is a type of personal data. This can be done by using templates of questions in which some are hidden, by using ontologies to detect semantic hierarchies or else via defining JBoss rules that govern the generation of the questions themselves (see for example [32] for further details of such an approach). We also wish to include further analysis within the pattern selection process, potentially involving grading of patterns. At present we are using a simplified mapping between requirements and design patterns, using the classification field. So for example, Design Pattern 1 above has a classification of ‘opt-in consent’, which indicates that it is a candidate for satisfying the opt-in consent requirement. In our implementation, the system groups together patterns for all security contexts that have been set to T, but this decreases the granularity of the suggestions. We plan to explore the use of more complex mappings, which for example allows a hierarchy of patterns to be defined more explicitly, and also can allow other background information (which can be gathered within the questionnaire) to be taken into account. More specifically, by grouping suggested patterns based on the corresponding privacy contexts from which they were derived, patterns that need to be implemented multiple times can be suggested with reference to each individual context, and the system output can display the corresponding context satisfied by implementing the patterns.
6 Conclusions We have presented an approach to support developers in designing privacy in early phases of development. As a first step, we have taken Microsoft developer privacy guidelines, encoded them into rules to allow reasoning about privacy design and
A Decision Support System for Design for Privacy
295
implemented a working decision support system. In doing this, we have moved beyond the state of the art (being a set of searchable privacy guidelines), to provide an architecture and a system that supports decision, selection and integration of privacy patterns, and to start building up a knowledge base that offers both privacy-enhancing mechanisms (in the form of a privacy pattern repository) and also rules that advise on the appropriate usage of those mechanisms.
References 1. The EnCoRe project: Ensuring Consent and Revocation (2008), http://www.encore-project.info 2. Microsoft Corporation, “Privacy Guidelines for Developing Software Products and Services”, Version 2.1a (April 26, 2007) 3. Information Commissioners Office, “Privacy by Design”, Report (November 2008), http://www.ico.gov.uk 4. Spiekermann, S., Cranor, L.: Engineering Privacy. IEEE Transactions on Software Engineeing 35(1) (January/February 2009) 5. Cannon, J.C.: Privacy: What Developers and IT Professionals Should Know. Addison Wesley, Reading (2004) 6. Patrick, A., Kenny, S.: From Privacy Legislation to Interface Design: Implementing Information Privacy in Human-Computer Interactions. In: Dingledine, R. (ed.) PET 2003. LNCS, vol. 2760, pp. 107–124. Springer, Heidelberg (2003) 7. Belloti, V., Sellen, A.: Design for Privacy in Ubiquitous Computing Environments. In: Proc. 3rd conference on European Conference on Computer-Supported Cooperative Work, pp. 77–92 (1993) 8. Information Commissioner‘s Office, PIA handbook (2007), http://www.ico.gov.uk/ 9. Office of the Privacy Commissioner of Canada, “Privacy impact assessments”, Fact Sheet (2007), http://www.privcom.gc.ca/ 10. Information Commissioners Office, “Privacy by Design”. Report (2008), http://www.ico.gov.uk 11. Jutla, D.N., Bodorik, P.: Sociotechnical architecture for online privacy. IEEE Security and Privacy 3(2), 29–39 (2005) 12. Spiekermann, S., Cranor, L.F.: Engineering privacy. IEEE Transactions on Software Engineering, 1–42 (2008) 13. Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., Angel, S.: A Pattern Language: Towns, Buildings, Construction. Oxford University Press, Oxford (1977) 14. Hafiz, M.: A collection of privacy design patterns. In: Pattern Languages of Programs, pp. 1–13. ACM, New York (2006) 15. Dicodess: Open Source Model-Driven DSS Generator (2009), http://dicodess.sourceforge.net 16. XpertRule: Knowledge Builder (2009), http://www.xpertrule.com/pages/info_kb.htm 17. Lumenaut: Decision Tree Package (2009), http://www.lumenaut.com/decisiontree.htm
296
S. Pearson and A. Benameur
18. OC1 Oblique Classifier 1 (2009), http://www.cbcb.umd.edu/~salzberg/announce-oc1.html 19. Pearson, S., Sander, T., Sharma, R.: Privacy Management for Global Organizations. In: Garcia-Alfaro, J., Navarro-Arribas, G., Cuppens-Boulahia, N., Roudier, Y. (eds.) DPM 2009. LNCS, vol. 5939, pp. 9–17. Springer, Heidelberg (2010) 20. SERENITY: System Engineering for Security and Dependability (2009), http://www.serenity-project.org 21. Kokolakis, S., Rizomiliotis, P., Benameur, A., Kumar Sinha, S.: Security and Dependability Solutions for Web Services and Workflows: A Patterns Approach. In: Security and Dependability for Ambient Intelligence. Springer, Heidelberg (2009) 22. Benameur, A., Fenet, S., Saidane, A., Khumar Sinha, S.: A Pattern-Based General Security Framework: An eBusiness Case Study, HPCC, Seoul, Korea (2009) 23. Delessy, N.A., d Fernandez, E.B.: A Pattern-Driven Security Process for SOA Applications. In: ARES, pp. 416–421 (2008) 24. Lobato, L.L., d Fernandez, E.B., Zorzo, S.D.: Patterns to Support the Development of Privacy Policies. In: ARES, pp. 744–774 (2009) 25. Mendelson, E.: Introduction to Mathematical Logic. D. Van Nostrand Co., New York (1964) 26. Blackburn, P., de Rijke, M., Venema, Y.: Modal Logic. Cambridge University Press, Cambridge, ISBN 0-521-80200-8 27. Benferhat, S., Dubois, D., Prade, H.: Towards a possibilistic logic handling of preferences. Applied Intelligence 14(3), 303–317 (2001) 28. Bundy, A.: The Computer Modelling of Mathematical Reasoning, 2nd edn. Academic Press, London (1986) 29. JBoss, Drools (2010), http://www.jboss.org/drools/ 30. Eclipse (2010), http://www.eclipse.org/ 31. W3C, Rule Interchange Form (2010), http://www.w3.org/2005/rules/wiki/RIF_Working_Group 32. Pearson, S., Rao, P., Sander, T., Parry, A., Paull, A., Patruni, S., Dandamudi-Ratnakar, Sharma, P.: Scalable, Accountable Privacy Management for Large Organizations. In: 2nd International Workshop on Security and Privacy Distributed Computing, Enterprise Distributed Object Conference Workshop, pp. 168–175. IEEE, Los Alamitos (2009)
A Multi-privacy Policy Enforcement System Kaniz Fatema, David W. Chadwick, and Stijn Lievens University of Kent, Canterbury, Kent, UK {k.fatema,D.W.Chadwick,S.F.Lievens}@kent.ac.uk
Abstract. Organisations are facing huge pressure to assure their users about the privacy protection of their personal data. Organisations may need to consult the privacy policies of their users when deciding who should access their personal data. The user’s privacy policy will need to be combined with the organisation’s own policy, as well as policies from different authorities such as the issuer of the data, and the law. The authorisation system will need to ensure the enforcement of all these policies. We have designed a system that will ensure the enforcement of multiple privacy policies within an organisation and throughout a distributed system. The current paper is an enhanced version of [1] and it takes the research one step further. Keywords: Privacy Policy, AIPEP, Master PDP, Conflict Resolution, Sticky Policy.
1 Introduction Many web sites today collect PII (Personal Identity Information) such as name and address from users through online registration, surveys, user profiles, and online order fulfilment processes etc. Also different personal data such as educational records, health data, credit card information and so on are collected by different organisations in order to provide consumers with services. An example of such a service is an online job agency where people post their CVs in order to hunt for jobs worldwide. Once released, users lose control of their personal data. But personal data like CVs which contain sensitive personal information may invite not only job offers but also identity theft. Losing PII has serious consequences ranging from significant financial loss to becoming a suspect in a crime which was committed with a stolen ID. Innocent people have been arrested due to a crime committed by an identity thief [2]. In the UK, the number of ID thefts is an alarming 19.86% higher in the first quarter of 2010 compared with the same period in 2009 [3]. About 27,000 victims were recorded by CIFAS members during the first 3 months of 2010 [3]. As a consequence concerns for the privacy of electronic private data are rising day by day [4, 5]. Hence the necessity for more technical controls over personal data collected online in order for users to gain more confidence and trust about the use of their personal data. Technical controls will help to protect personal data from being misused as well as enforce privacy laws so that personal data loss from reputable organisations such as HSBC bank [6] or Zurich Insurance [7] may be avoided in future. S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 297–310, 2011. © IFIP International Federation for Information Processing 2011
298
K. Fatema, D.W. Chadwick, and S. Lievens
Policy based systems are now well established [8, 9]. They rely on an application independent policy decision point (PDP) to make authorization decisions, and an application dependent policy enforcement point (PEP) to enforce these decisions. The model assumes that all the policies are written in the same language and are evaluated by a single PDP. However in a federated identity management system we cannot assume that every service provider (SP) and identity provider (IdP) will use the same policy language for specifying their rules. This is because different policy languages support different rule sets and hence support different requirements. Today we have many examples of different policy languages e.g. XACMLv2 [10], XACMLv3 [11], PERMIS [12], P3P [13], Keynote [14] etc. and even more PDP implementations. For example, P3P was designed specifically to express privacy policies, whereas the others were designed as access control or authorization policy languages. It is simply not possible to construct policies that satisfy every access control and privacy requirement using a single policy language or PDP. Therefore we need an infrastructure that can support multiple PDPs and multiple policy languages. Private data should be protected by the policy of its owner. The sticky policy paradigm [15] ensures that private data is stuck with its policy not only within the initial system but also when transferred between systems. Obligations are actions that must be performed when a certain event occurs. When the event is reading PII, then an obligation may require the PII subject to be notified. Some obligations may need to be performed along with the enforcement of an authorization decision, others before or after the enforcement [16]. Privacy protecting systems thus need an obligations service, ideally with a standard interface so that it can be called from multiple places in an application. In this paper we propose an advanced multi-policy authorization infrastructure that will provide privacy protection of personal data using multiple PDPs, sticky policies and obligations. The rest of the paper is structured as follows. Section 2 reviews related research. Section 3 discusses the architecture and components of the proposed system. Section 4 discusses the Sticky Policy implementation strategy whilst Section 5 describes the conflict resolution policy. Some use case scenarios are provided in Section 6. Details of our implementation are provided in Section 7. Section 8 concludes by discussing our future plans.
2 Related Research IBM’s security research group has performed research on privacy protection of customer's data collected by enterprises [17-21]. They used the sticky policy paradigm where personal data is associated with its privacy policy and they are passed together when exchanging data among enterprises [17-19, 21]. But they did not provide a way to accommodate different policy languages. Also the obligations they are providing are just activity names such as ‘log’, ‘notify’, ‘getConsent’ etc. [18]. They also did not provide a way to actually enforce the obligations which our system does. HP researchers [22, 23] have also been working on providing privacy to PII by enforcing obligations. They have provided a way of transmitting encrypted confidential data with obligations to other parties by obfuscation of the data [22]. Nevertheless, the work only describes obligations related to privacy and does not provide a uniform
A Multi-privacy Policy Enforcement System
299
solution to both access control and privacy. Their work does not consider policies from different authorities nor does it integrate multiple policy languages. Qun Ni et al [24, 25] have defined the privacy related access control model P-RBAC to support privacy related policies. This model theoretically associates data permissions with purposes, conditions and obligations. However, the model is too complex to be implemented practically. It has been claimed [26, 27] that the privacy policy defined by the owner of personal data should have the highest priority. But the fact is that the Law should have the highest priority. No one should be able to break the Law. No other previous work has focused on this issue. In our system we have implemented a Law PDP by converting the legal requirements into an XACML policy and this Law PDP is always given the highest priority. For example if there is a court order for seeing someone’s personal data neither the person nor the data controller can deny access to the data. To the best of our knowledge, no previous work has been concerned with integrating the policies of the law, data subject and data controller together. The Primelife Policy Language [28], [29] is an extension of XACML v3 which offers access control and usage control under the same structure. PPL has a new obligation handling mechanism which integrates a set of acceptable values for each obligation parameter and thus it solves the problem of “overdoing” obligations [30]. Their work mainly considered two groups of authors of privacy policy, the Data controller and Data Subject [29] where we considered Law and Issuer as well. An automated matching is performed between the Data controller and Data subject’s policy so that any mismatching elements do not appear in the final sticky policy [29]. In contrast, our system allows all the policies from all types of authors, and we have a sophisticated, automated, dynamic conflict resolution policy that resolves the conflicts of the decisions returned by the different policies. Moreover, we have considered that these policies can be written in different languages which lead to the need for multiple policy language support.
3 The Authorisation System Suppose that a health service provider wants to protect the privacy of personal health data (i.e. the Dr's note, history of treatment of the patient, diagnostic test report and so on) through their authorisation system. To provide privacy the authorisation system will need to include privacy rules from different authors such as the law, issuer (i.e. Dr) and data subject (the person). Suppose the data subject wants to share a part of his personal data with a health insurance company to recover the cost of treatment. When the data is shared with the insurance company it is expected that the policies related to the data will also be passed with the data and the receiving system will be capable of handling policies from different health service providers, which may be written in different policy languages. Also it is expected that the authorisation system will be able to enforce obligations such as sending email or keeping a secure log of accesses. Consequently the authorisation system will need to provide the following features: a. Enforcing multiple policies from multiple authors b. Handling sticky policies c. Distributed policy enforcement
300
K. Fatema, D.W. Chadwick, and S. Lievens
d. Support for multiple policy languages e. Obligation enforcement. Our proposed system has the following capabilities. a.
Enforcing multiple policies from multiple authors: In a traditional access control system only the organisation’s authority can set the access control policy which makes the system unsuitable for privacy protection. Our system will accept policies from different authorities and will resolve conflicts between decisions returned by different policies according to a sophisticated, automated, dynamic conflict resolution policy. b. Handling sticky policies: Our system will accept policies stuck to PII, will store the policies, enforce them every time the PII is accessed, and will return them when the PII is transferred to a remote site. c. Distributed policy enforcement: Our system ensures that sticky privacy policy are enforced within the current system and also in the receiving system, by a binding legal agreement between the two parties, such that if an organisation accepts data with a sticky policy, it confirms that its system will only store the data if it can satisfy the obligation to start one or more PDPs that support the received policies. d. Support for multiple policy languages: Our system supports an arbitrary number of PDPs that each utilise a different policy language. e. Obligation enforcement: Our system supports the enforcement of arbitrary obligations either before, after or whilst the access decision is enforced. The system is extensible and new obligations can easily be incorporated.
In order to satisfy the various requirements presented above we introduce several new components into the privacy preserving advanced authorization infrastructure. These are explained more fully in [1]. In this section a short description is given only. Firstly we introduce an application independent policy enforcement point, the AIPEP. The AIPEP is so called because it enforces the application independent obligations and coordinates all the other components of the application independent authorization infrastructure. When the AIPEP receives an authorization decision query message (step 1 in figure 1), it first calls the CVS to validate any credentials that are contained in the message (step 2 in figure 1). If the message contains a sticky policy/ies then this/these will be stored in the policy store. The AIPEP retains a manifest which records which CVSs and PDPs are currently spawned and which policies each is configured with. The AIPEP tells the Master PDP which set of spawned PDPs to use for a particular authorization decision request. The Credential Validation Service (CVS) is the component that validates users’ credentials by checking that each credential issuer is mentioned in the credential validation policy directly, or that the credential issuer has been delegated a privilege by a trusted Attribute Authority (AA) either directly or indirectly (i.e. a chain of trusted issuers is dynamically established controlled by the Delegation Policies of the Source of Authority and the intermediate AAs in the chain). The Credential Validation Policy, written by the SOA, contains rules that govern which attributes different AAs are trusted to issue to which user groups, along with a Delegation Policy for each AA. We recognize that in distributed systems the same credential/attribute may be known by different names e.g. PhD, D Phil, Dr.-Ing etc. For this reason we introduce an
A Multi-privacy Policy Enforcement System
301
Ontology Mapping Server (see below) which knows the semantic relationships between attribute names. If two names are semantically related then the CVS can determine if an unknown attribute is valid or not. Once the CVS has finished validating the subject’s credentials, these are returned to the AIPEP as standard XACML formatted attributes (step 5), ready to be passed to the Master PDP.
Fig. 1. The privacy preserving advanced authorization system
In order to evaluate multiple authorization policies in different languages we introduce a new conceptual component called the Master PDP. The Master PDP is responsible for calling multiple PDPs (step 7) as directed by the AIPEP, obtaining their authorization decisions (step 8), and then resolving any conflicts between these decisions, before returning the overall authorization decision and any resulting obligations to the AIPEP (in step 9). Each of the policy PDPs supports the same interface, which is the SAML profile of XACML [31]. This allows the Master PDP to call any number of subordinate PDPs, each configured with its own policy in its own language. This design isolates the used policy languages from the rest of the authorization infrastructure, and the Master PDP will not be affected by any changes to any policy language as it evolves or by the introduction of any new policy language. Of course, new policy languages will require new PDPs to be written to interpret them, and these new PDPs will require new code in the PDP/CVS factory object so that it knows how to spawn them on demand. But this is a one-off occurrence for each new policy language and PDP that needs to be supported by the infrastructure. The policy store is the location where policies can be safely stored and retrieved. If the store is trusted then policies can be stored there in an unsecured manner. If the
302
K. Fatema, D.W. Chadwick, and S. Lievens
store is not trustworthy then policies will need to be protected e.g. digitally signed and/or encrypted, to ensure that they are not tampered with and/or remain confidential. When the AIPEP stores a policy in the policy store, it provides the store with the StickyPolicy element, and the globally unique Policy ID (PID). Each policy must have a globally unique id so that it can be uniquely referenced in the distributed system, primarily for performance reasons, so that when a sticky policy is moved from system to system, the receiver can determine if it needs to analyse the received policy or not. Already known PIDs don’t need to be analysed, whereas unknown PIDs will need to be evaluated to ensure that they can be supported, otherwise the incoming data and sticky policy will need to be rejected. This design cleanly separates the implementation details of the policy store from the rest of the infrastructure, and allows different types of policy store to be constructed e.g. built on an LDAP directory or RDBMS. The sticky store holds the mapping between sticky policies and the resources to which they are stuck. This is a many to many mapping so that one policy can apply to many resources and one resource can have many sticky policies applied to it. The design requires that each resource has a locally unique resource ID (RID) which is mapped to Policy IDs. The RID is locally unique and will vary from system to system, as resources are transferred. We do not have a requirement to pass the RID from system to system so each system can compute its own. Obligations may need to be enforced before the user’s action is performed, after the user’s action has been performed, or simultaneously with the performance of the user’s action [16]. We call this the temporal type of the obligation. Examples are as follows: before the user is given access get the consent of owner; after the user has been given access, email the data owner that his/her data is accessed; simultaneously with the user’s access, write to the log the activities he/she is doing. According to the XACML model, each obligation has a unique ID (a URI). We follow this scheme in our infrastructure. Each obligations service is configured at construction time with the obligation IDs it can enforce and the obligation handling services that are responsible for enacting them. It is also configured with the temporal type(s) of the obligations it is to enforce. When passed a set of obligations by the AIPEP, the obligations service will walk through this set, ignore any obligations of the wrong temporal type or unknown ID, and call the appropriate obligation handling service for the others. If any single obligation handling service returns an error, then the obligations service stops further processing and returns an error to the AIPEP. If all obligations are processed successfully, a success result is returned. Each of the obligations enforced by the AIPEP must be of temporal type before. The Ontology Mapping Server is a service which returns the semantic relationship between two different terms. Multiple terms can name the same concept. The ontology concepts are held as a lattice, and the server will say if one term (equivalent concept) dominates the other in the lattice or if there is no domination relationship between them. The CVS calls this service to determine the relationship between an attribute name in an SOA’s policy and the attribute name in an issued credential. The Master PDP calls this server to determine the relationship between the different terms in the policy and the request context.
A Multi-privacy Policy Enforcement System
303
4 Sticky Policy Contents A sticky policy comprises the following elements: -
The policy author i.e. the authority which wrote the policy. The globally unique policy ID The time of creation of the policy and optional expiry time. The type(s) of resource(s) that are covered by this policy. The type of policy this is. The policy language. The policy itself, written in the specified policy language.
Any number of sticky policies can be stuck to a data resource in either an application dependent manner e.g. as a in a SAML attribute assertion, or by using the StickyPAD (sticky policy(ies) and data) XML structure that we have defined. The policies should be stuck to the data by using a digital signature. This could be by using the XML structure in the StickyPAD and SAML attribute assertion, or it could be externally provided e.g. by using SSL/TLS when transferring the data and policy across the Internet. It is the responsibility of the sending PEP to create the equivalent of our StickyPAD structure when sending data with a sticky policy attached, and the receiving PEP to validate its signature when it receives the message in step 0 of figure 1. The PEP should then parse and unpack the contents and pass the sticky policy to the AIPEP along with the authorization decision request (step 1 of figure 1).
5 Conflict Resolution Policy Our system includes many different PDPs each with policies from different authorities and possibly written in different languages. As a consequence a mechanism is needed to combine the decisions returned by these PDPs and resolve any conflicts between them. We have introduced a Master PDP which is the component responsible for combining the decision results returned by the subordinate PDPs and resolving the conflicts among their decisions. The Master PDP has a conflict resolution policy (CRP) consisting of multiple conflict resolution rules (CRRs). The default CRP is read in at program initialisation time and additional CRRs are dynamically obtained from the subjects’ and issuers’ sticky policies. Each conflict resolution rule (CRR) comprises: - a condition, which is tested against the request context by the Master PDP, to see if the attached decision combining rule should be used, - a decision combining rule (DCR), - optionally an ordering of policy authors (to be used by FirstApplicable DCR) - an author and - a time of creation. A DCR can take one of five values: FirstApplicable, DenyOverrides, GrantOverrides, SpecificOverrides or MajorityWins which applies to the decisions returned by the subordinate PDPs. The DCRs will be discussed shortly.
304
K. Fatema, D.W. Chadwick, and S. Lievens
The Master PDP is called by the AIPEP and is passed the list of PDPs to call and the request context. From the request context it will get the information such as requester, requested resource type, issuer and data subject of the requested resource. The Master PDP has all the CRRs defined by different authors as well as a default one. From the request context it knows the issuer and data subjects and so can determine the relevant CRRs. It will order the CRRs of law, issuer, data subject and holder sequentially. For the same author the CRRs will be ordered according to the time of creation so that the latest CRR always comes first in the ordered list. All the conditions of a CRR need to match with the request context for it to be applicable. The CRR from the ordered CRR queue will be tested one by one against the request context. If the CRR conditions match the request context the CRR is chosen. If the CRR conditions do not match the request context the next CRR from the queue will be tested. The default CRR (which has DCR=DenyOverrides) will be placed at the end of CRR queue and it will only be reached when no other CRR conditions match the request context. The PDPs are called according to the DCR of the chosen CRR. Each PDP can return 5 different results: Grant, Deny, BTG, NotApplicable and Indeterminate. NotApplicable means that the PDP has no policy covering the authorisation request. Indeterminate means that the request context is either malformed e.g. a String value is found in place of an Integer, or is missing some vital information so that the PDP does not currently know the answer. BTG (Break the Glass) [32] means that the requestor is currently not allowed access but can break the glass to gain access to the resource if he so wishes. In this case his activity will be monitored and he will be made accountable for his actions. BTG provides a facility for emergency access or access over-ride and is particularly important in medical applications. If DCR=FirstApplicable the CRR is accompanied by a precedence rule (OrderOfAuthors) which says the order in which to call the PDPs. For example, if the resourceType=PII and the requestor=data subject) then the DCR=FirstApplicable and the (OrderOfAuthor=law, dataSubject, holder). The Master PDP calls each subordinate PDP in order (according to the order of authors), and stops processing when the first Grant or Deny decision is obtained. For SpecificOverrides the decision returned by the most specific policy will get preference. We define a policy to be the most specific if it is assigned to the most specific resource, as identified by its RID. We use the containment model in which the resource with the longest pathname is the most specific resource, for example C:/MyDocument/MyFile is more specific than C:/MyDocument. All the resources in the system have their RIDs formatted in the form of the URL hierarchy e.g. Kent.ac.uk/issrg2/C:/MyFiles. Each policy applicable to a resource is linked by its PID to that RID. In this containment model, polies applied to a less specific resource will also be applied to the resource contained in that; but policies applied to the more specific resource will not be applicable for the containing resource. For example, policies applied to the kent.ac.uk/issrg2/C: will be applied to kent.ac.uk/issrg2/C:/ MyFiles but policies applicable to kent.ac.uk/issrg2/C:/MyFiles will not be applied to kent.ac.uk/issrg2/C:. If multiple most specific policies exist then all the most specific policies will be evaluated and the Deny result will get precedence; in other words DenyOverrides will be applied on the Most specific set of policies.
A Multi-privacy Policy Enforcement System
305
For DenyOverrides and GrantOverrides the Master PDP will call all the subordinate PDPs and will combine the decisions using the following semantics: -
DenyOverrides – A Deny result overrides all other results. The precedence of results for deny override is Deny>Indeterminate>BTG>Grant>NotApplicable. GrantOverrides – A Grant result overrides all other results. The precedence of results for grant override is Grant>BTG>Indeterminate>Deny>NotApplicable.
When a final result returned by the Master PDP is Grant (or Deny) the obligations of all the PDPs returning a Grant (or Deny) result are merged to form the final set of obligations. For MajorityWins all the PDPs will be called and the final decision (Grant/Deny/ BTG) will depend on the returned decision of the majority number of PDPs. If the same numbers of PDPs return Grant and Deny and there is at least one BTG, then BTG will be the final answer, otherwise Deny. If none of the PDPs return Grant, Deny or BTG then Indeterminate will override NotApplicable. Initially the system will have the law and controller PDPs running as these two are common for all request contexts. Based on the request context the issuer and the data subject’s PDP may be started.
6 Use Case Scenarios Mr K wants to receive treatment from the X-Health Centre and for this he has to be registered. During the registration process he is presented with a consent form where he indicates with whom he is prepared to share his medical data. This form includes tick boxes such as: 1.
2.
3. 4. 5.
6.
Other Drs at this health centre, indicating that the patient will accept a one to many relationship with the health centre staff (as opposed to a one to one relationship with a specific Dr.). Other registered Drs/Consultants of other Organisations and a place where the name of the doctor and organisation can be written. If this box is ticked and no Dr’s name and organisation are specified then the consent will be for any Dr in general. Health Insurance Companies (with a place for specifying the name(s) of the company(ies) or can say all) Research organisation/ researcher. (A note will say that all medical data used for research purposes will be anonymised prior to release.) Other organisations for promotional offers. These are for example organisations offering samples and promotions for newborn babies and their parents. In this case not all of the medical record will be available to the interested companies. What portion of medical data will be available is determined by the organisation’s policy. Other person (for specifying the name of someone such as next of kin.)
Mr K has a policy with the Health Insurance Company HIC1 to cover his treatment costs. So he puts a tick in box 3 only and mentions HIC1 there and finishes his registration with X-Health Centre.
306
K. Fatema, D.W. Chadwick, and S. Lievens
It is important to note that HIC1 will not have access to the complete medical records of Mr K. The policy of X-Health Centre determines what portion of the medical data can be made available to HIC1. Mr K undergoes some treatment and HIC1 submits a request to X-Health Centre for (a portion of) the medical record of Mr K. The Master PDP of X-Health Centre’s authorisation system consults the CRRs of Law, issuer, data subject and holder sequentially. The law CRRs say if resourceType=MedicalData and requestor=data subject and resourceClassification is different from Drs notes then DCR=Grant Overrides; anything else for resourceType=MedicalData leads to a DCR of Deny Overrides. Therefore in the case of medical data the CRRs of the other entities will never be consulted. Since the requestor is not the data subject the DCR is DenyOverrides. The PDPs give the following results: • • •
The law PDP returns NotApplicable because it only has rules pertaining to cases where the requestor is either the data subject or the creator of the data. Depending on the actual medical data requested, the issuer (X-Health Centre) PDP returns either Grant or Deny. The data subject PDP returns Grant because Mr K has allowed this on his registration with the health centre.
The overall decision is therefore the same as that of the issuer’s PDP. Assuming this is Grant, then the requested medical data is passed to HIC1 together with the sticky policies from the data subject, the law and the issuer. After receiving the medical data and sticky policies the receiving application will make a call to the authorisation system of HIC1 to see whether it is permitted to store the data. The authorisation system will reply Grant and will start two new PDPs with the received policies of the data subject (Mr K) and the issuer (X-Health Centre) – assuming it can process them. If it cannot, it will return Deny. At HIC1's site a law PDP is already running containing the same legal policy as that sent and therefore it does not need to start a new law PDP. If HIC1 had been in a different jurisdiction to X-Health Centre, then it would have needed to run a new legal PDP for Mr. X’s medical data. The policy to data mappings are duly recorded in the sticky store. HIC1’s authorisation system subscribes for updates to these policies, so that if the patient or X-Health Centre should change their policies, the new ones will be notified to HIC1’s authorisation system.. Mr K did not allow researchers to view his medical record. The researcher Mr R requests Mr K’s medical record at HIC1's system and this request is denied for the following reasons: -
as in the previous case the chosen DCR will be DenyOverrides chosen by the law CRR. the law PDP will return NotApplicable (as before), the issuer’s PDP returns Grant with a “with” obligation for anonymisation (because it allows the data to be used for research), the holder’s (HIC1’s) PDP returns NotApplicable because it allows data subjects to determine if researchers should have access to their data or not, Mr K’s PDP returns Deny which makes the overall result a Deny.
A Multi-privacy Policy Enforcement System
307
Mr K now changes his preferences at the X-Health Centre and allows his data to be accessed by researchers. The X-Health Centre publishes this update and both its and HIC1’s authorisation systems update the subject’s policy with the new rule, which contains a “with” obligation to anonymise the data prior to release. If a researcher now asks for access to Mr K’s data at HIC1’s site, the DCR will be DenyOverrides chosen by the law CRR, the law PDP will return NotApplicable (as before), both the issuer’s and subject’s PDPs return Grant with the same “with” obligation to anonymise the data, and the holder’s (HIC1’s) PDP returns NotApplicable. The overall result is therefore Grant with an obligation. The obligation is passed to the Obligations Service of the PEP for it to enact simultaneously with data release. If this obligation cannot be enforced by the PEP, then it must deny the researcher’s request.
7 Implementation Details Our advanced authorization infrastructure is implemented in Java, and is being used and developed as part of the EC TAS³ Integrated Project (www.tas3.eu). The first beta version is available for download from the PERMIS web site1. This contains the AIPEP, CVS, the Obligations Service, a Master PDP, a policy store and sticky store, and multiple PDPs of different types. A number of different obligation handling services have been written that are called by the obligations service, and these can perform a variety of tasks such as write the authorization decision to a secure audit trail, send an email notification to a security officer, and update the internal state information (called retained ADI in ISO/IEC 10181-3 (1996)). We have implemented state based Break The Glass (BTG) policies [27] using the AIPEP, the obligations service and a stateless PDP. A live demo of BTG is available at http://issrg-testbed-2.cs.kent.ac.uk/. The performance of the obligation state handling BTG wrapper adds only a small overhead in most cases (between 0.3% and 50%, depending on the size of the policy and the actual request) to the performance of a stateless PDP that does not support BTG. A paper presenting the complete results is currently under preparation. We have constructed an ontology mapping server, which, when given two class names (such as Visa card and credit card) will return the relationship between them. The authorization infrastructure has been tested with three different PDPs: Sun’s XACML PDP2, the PERMIS PDP3 and a behavioral trust PDP from TU-Eindhoven4. Each of these PDPs uses a different policy language. Sun’s PDP uses the XACML language, the PERMIS PDP uses its own XML based language whilst TU-Eindhoven’s PDP uses SWI-Prolog. The next step is to integrate a secure publish/subscribe mechanism for policy updates and write a reasonably full set of validation tests and use cases along with example policies so that all the PDPs can be called together and their decisions 1
Advanced authz software available from http://sec.cs.kent.ac.uk/permis/downloads/Level3/standalone. shtml 2 Sun’s XACML PDP. Available from http://sunxacml.sourceforge.net/ 3 PERMIS PDP. Available from http://sec.cs.kent.ac.uk/permis 4 TU-Eindhovens PDP. Available from http://w3.tue.nl/en/services/dpo/education_and_training/ inleiding/pdp/
308
K. Fatema, D.W. Chadwick, and S. Lievens
resolved into one final decision using a conflict resolution policy that obeys the law and the wishes of all the various actors. This is a complex task since the number of permutations is infinite.
8 Discussion, Conclusions and Future Plans Our authorization infrastructure does not obviate the need for trust. Our infrastructure still requires trust between the various parties. It is not a digital rights management (DRM) system that assumes the receiving party is untrustworthy and wants to steal any received information from the sender. On the contrary, our infrastructure assumes that the various parties do trust each other to the extent that they want an automated infrastructure that can easily enforce each other’s policies reliably and automatically, and if it cannot, will inform the other party of the fact. Consequently data subjects must trust the organizations that they submit their PII to, so that when an organization says it will enforce a subject’s sticky policy, the subject can trust that it has every intention of doing so. Our system provides organizations with an application independent authorization infrastructure that makes it easy for them to enforce a subject’s privacy policy without having to write a significant amount of new code themselves. Further- more the user has the potential for more complete control over his/her privacy than now, in that the infrastructure allows the user to specify a complete privacy policy including a set of obligations which can notify the user when his/her data is accessed or transferred between organizations e.g. by using an after obligation when giving permission for the transfer of her PII to go ahead or a before obligation before giving permission for the PII to be read. However we expect the user interfaces for such full privacy policy creation to be too complex for most users to handle, and consequently organizations are more likely to provide their users with a policy template and a limited subset of options and boxes to tick, making the user’s task much easier. This also reduces the burden on the organization, since it won’t be sent user privacy policies that it cannot handle. The benefit of our infrastructure is that it does not constrain organizations in setting their privacy policy templates, as the infrastructure will enforce whatever combinations they choose. Organizations must also trust each other to honor the sticky policies that are passed to them when they transfer data between themselves. An untrustworthy organization can always discard any sticky policies it receives and never need access the authorization infrastructure to ask for permission to receive the data, but we assume that legally binding contracts between the organizations will require them to support any sticky policies that are transferred between them. Our authorization infrastructure makes it much easier for them to do this. Our final step is to perform user trials with two application demonstrators, one for the privacy protection and access to electronic medical records, the other for eportfolios. Both of these applications require access to distributed personal information that is stored in a variety of repositories at different locations, and so a distributed sticky policy enforcement infrastructure is needed. Acknowledgements. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 216287 (TAS³ - Trusted Architecture for Securely SharedServices).
A Multi-privacy Policy Enforcement System
309
References [1] Chadwick, D.W., Fatema, K.: An advanced policy based authorisation infrastructure. In: Proceedings of the 5th ACM Workshop on Digital Identity Management (DIM 2009). ACM, New York (2009) [2] BBC news on 18 June (2001), http://news.bbc.co.uk/1/hi/uk/1395109.stm [3] CIFAS, http://www.cifas.org.uk/default.asp?edit_id=1014-57 [4] Msnbc report on 16 January (2008), http://www.msnbc.msn.com/id/22685515/ [5] Voice of America news report on April 29 (2008), http://www1.voanews.com/english/news/science-technology/ a-13-2008-04-29-voa44.html [6] BBC news on 22 July (2009), http://news.bbc.co.uk/1/hi/business/8162787.stm [7] BBC news on 24 August (2010), http://www.bbc.co.uk/news/business-11070217 [8] Zhu, Y., Keoh, S., Sloman, M., Lupu, E., Dulay, N., Pryce, N.: A Policy System to Support Adaptability and Security on Body Sensors. In: 5th International Summer School and Symposium on Medical Devices and Biosensors, Hong Kong, pp. 97–100 (2008) [9] Wu, J., Leangsuksun, C.B., Rampure, V., Ong, H.: Policy-based Access Control Framework for Grid Computing. In: Proceedings of the sixth IEEE International Symposium on Cluster Computing and the Grid, CCGRID, pp. 391–394 (2006) [10] OASIS XACML 2.0. eXtensible Access Control Markup Language (XACML) Version 2.0 (October 2005), http://www.oasis-open.org/committees/ tc_home.php?wg_abbrev=xacml#XACML20 [11] OASIS XACML 3.0. eXtensible Access Control Markup Language (XACML) Version 3.0, April 16 (2009), http://docs.oasis-open.org/xacml/3.0/ xacml-3.0-core-spec-en.html [12] Chadwick, D., Zhao, G., Otenko, S., Laborde, R., Su, L., Nguyen, T.A.: PERMIS: a modular authorization infrastructure. Concurrency And Computation: Practice And Experience 20(11), 1341–1357 (2008) [13] W3C: The Platform for Privacy Preferences 1.0 (P3P 1.0). Technical Report (2002) [14] Blaze, M., Feigenbaum, J., Ioannidis, J.: The KeyNote Trust-Management System Version 2. RFC 2704 (1999) [15] Chadwick, D.W., Lievens, S.F.: Enforcing “Sticky” Security Policies throughout a Distributed Application. In: MidSec 2008, Leuven, Belgium, December 1-5 (2008) [16] Chadwick, D.W., Su, L., Laborde, R.: Coordinating Access Control in Grid Services. J. Concurrency and Computation: Practice and Experience 20, 1071–1094 (2008) [17] Karjoth, G., Schunter, M., Waidner, M.: Privacy-enabled services for enterprises. In: 13th International Workshop on Database and Expert Systems Applications, pp. 483–487. IEEE Computer Society, Washington DC (2002) [18] Karjoth, G., Schunter, M., Waidner, M.: Platform for Enterprise Privacy Practices: Privacy-enabled Management of Customer Data. In: 2nd Workshop on Privacy Enhancing Technologies, San Francisco (2002) [19] Karjoth, G., Schunter, M.: A Privacy Policy Model for Enterprises. In: 15th IEEE Computer Foundations Workshop (2002)
310
K. Fatema, D.W. Chadwick, and S. Lievens
[20] Nelson, R., Schunter, M., McCullough, M.R., Bliss, J.S.: Trust on Demand — Enabling Privacy, Security, Transparency, and Accountability in Distributed Systems. In: 33rd Research Conference on Communication, Information and Internet Policy (TPRC), Arlington VA, USA (2005) [21] Schunter, M., Berghe, C.V.: Privacy Injector — Automated Privacy Enforcement Through Aspects. In: Danezis, G., Golle, P. (eds.) PET 2006. LNCS, vol. 4258, pp. 99– 117. Springer, Heidelberg (2006) [22] Mont, M.C.: Dealing with Privacy Obligations: Important Aspects and Technical Approaches. In: International Conference on Trust and Privacy in Digital Business No1, Zaragoza (2004) [23] Mont, M.C., Pearson, S., Bramhall, P.: Towards Accountable Management of Identity and Privacy: Sticky Policy and Privacy. Technical report, Trusted System Laboratory, HP Laboratories, Bristol, HPL-2003-49 (2003) [24] Ni, Q., Trombetta, A., Bertino, E., Lobo, J.: Privacy aware role based access control. In: SACMAT 2007, Sophia Antipolis, France (2007) [25] Ni, Q., Bertino, E., Lobo, J.: An Obligation Model Bridging Access Control Policies and Privacy Policies. In: SACMAT 2008, Estes Park, Colorado, USA (2008) [26] Mont, M.C.: Dealing with Privacy Obligations: Important Aspects and Technical Approaches. In: International Conference on Trust and Privacy in Digital Business No1 (2004) [27] Mont, M.C., Beato, F.: On Parametric Obligation Policies:Enabling Privacy-aware Information Lifecycle Management in Enterprises. In: Eighth IEEE International Workshop on Policies for Distributed Systems and Networks (2007) [28] Ardagna, C.A., Bussard, L., Vimercati, S.D.C., Neven, G., Paraboschi, S., Pedrini, E., Preiss, F.-S., Raggett, D., Samarati, P., Trabelsi, S., Verdicchio, M.: PrimeLife Policy Language, Project’s position paper at W3C Workshop on Access Control Application Scenarios (November 2009) [29] Trabelsi, S., Njeh, A., Bussard, L., Neven, G.: PPL Engine: A Symmetric Architecture for Privacy Policy Handling. Position paper at W3C Workshop on Privacy and Data Usage Control (October 2010) [30] Bussard, L., Neven, G., Schallaböck, J.: Data Handling: Dependencies between Authorizations and Obligations. Position paper at W3C Workshop on Privacy and Data Usage Control (October 2010) [31] OASIS “SAML 2.0 profile of XACML, Version 2.0”. OASIS committee specification 01, August 10 (2010) [32] Ferreira, A., Chadwick, D., Farinha, P., Correia, R., Zhao, G., Chilro, R., Antunes, L.: How to securely break into RBAC: the BTG-RBAC model. In: Annual Computer Security Applications Conference, Honolulu, Hawaii, p. 23 (2009)
Designing Usable Online Privacy Mechanisms: What Can We Learn from Real World Behaviour? Periambal L. Coopamootoo and Debi Ashenden Department of Informatics & Systems Engineering, School of Defence & Security, Cranfield University, Shrivenham, Swindon, UK, SN6 8LA {p.coopamootoo,d.m.ashenden}@cranfield.ac.uk
Abstract. A variety of privacy mechanisms have been designed for the online environment but they have not been effective in ensuring end-users’ privacy. In this study, we analyse existing privacy theories devised from offline sociopsychological studies and discuss which of those could be useful in the design of usable online privacy. We found that the Communication Privacy Management framework which provides boundary management processes could be used to design online privacy since it addresses information seeking, boundary rules formation, negotiation and means of addressing turbulence. We argue that since privacy is implicit within interpersonal and communication behaviour, a persuasive approach to designing online privacy could help to make privacy implicit within human-computer interactions, provide end-users with the ability to better engage with, and express their online privacy, and further ensure the usability of online privacy mechanisms. Keywords: online privacy mechanisms, persuasive technology, privacy behaviours, usable privacy.
1 Introduction Previous research aimed at understanding online privacy behaviour and its relation to privacy concerns has shown that although end-users claimed to have high privacy concerns, they behaved very differently online [1]. Several studies were performed to understand this discrepancy which was called the “privacy dichotomy”. Among the possible explanations attributed to the phenomenon were the imbalance of information between end-users and service providers [2], the need for immediate gratification [3], behavioural biases, peer pressure to share information and the sharing of private information online to try out identities, for instance among young people [4]. Some research, however, has also claimed that online privacy approaches are too complicated for end-users to understand [5] and are not easy to use. In this paper we first explore the historical development of privacy as a concept and explain briefly the differences in privacy behaviour and the characteristics of private information shared online versus offline. We follow this with an exploration of the methodological approaches from sociology and psychology used to understand offline S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 311–324, 2011. © IFIP International Federation for Information Processing 2011
312
P.L. Coopamootoo and D. Ashenden
privacy behaviour. We compare and contrast this with the research carried out in the information systems field (in particular Human Computer Interaction (HCI)) to develop ways of designing usable privacy mechanisms for online environments. In the discussion section we aim to determine which approaches to understanding real world behaviours around privacy could assist in designing usable online privacy mechanisms. We conclude with recommendations for how a better understanding of interpersonal privacy interactions leads us towards taking a persuasive approach in order to develop effective privacy models before providing an analysis of a disclosure-privacy scenario.
2 Privacy Online versus Offline In this section we explore the historical development of privacy as a concept and then compare the offline privacy mechanisms employed by individuals with the privacy mechanisms available online and the properties of private information shared offline versus online. Following the comparison, we briefly discuss the consequences those differences have on online privacy. The first systematic, written discussion of the concept of privacy is said to have begun in 1890 with Warren and Brandeis’ famous essay “The right to privacy” which cited political, social and economic changes which led to the recognition for the right to be let alone [6]. They argued that existing law afforded a way to protect the privacy of the individual; the privacy principle they believed was already part of the common law but that new technology, for instance photography and newspapers, made it important to explicitly and separately recognise this protection under the name of privacy. They thus laid the foundation for a concept of privacy that has come to be known as the control over information about oneself. However as explained by DeCew [7], it was only in the second half of the twentieth century that philosophical debates concerning definitions of privacy became prominent due to the development of privacy protection in the law. In addition to this it has been argued that privacy and intimacy are deeply related. Fried [8] argues that privacy has intrinsic value and is necessarily related to, and fundamental to, one’s development as an individual with a moral and social personality to be able to form intimate relationships involving respect, love, friendship and trust. Privacy is valuable because it allows one to maintain varying degrees of intimacy [8]. Gerstein [9] also supports the necessity of privacy for the intimacy which is required in communication and interpersonal relationships for a person to fully experience his or her life. Other researchers such as Rachels [10] expand the value of privacy to intimacy by emphasising the importance of developing diverse interpersonal relationships with others. Rachels’ analysis emphasises that privacy is not only about limiting control of information but also access to oneself, both of which allows control over relationships with others, thus connecting privacy to one’s behaviour and activities [10]. In more recent literature related to the advances in technology; privacy has been defined as the freedom from judgement [11-13], the ability to exercise privacy tradeoffs [12], the control over who has access to information, for what purpose it is needed and how sensitive the information is in a particular context [14]. Although the explicit impact of technology on privacy has been recognised since the arguments of
Designing Usable Online Privacy Mechanisms
313
Warren et al. [6], there have been compelling arguments for overriding the privacy concerns for accountability and security needs [15]. This overview of the historical development of the concept of privacy demonstrates how legal support for privacy has come to the fore and the role of privacy as a key attribute in the development and maintenance of relationships. From a technological perspective, privacy has been depicted mainly as the control over one’s private information and the ability to exercise privacy tradeoffs. It could therefore be useful to understand whether and how technology has catered for the legal and interpersonal aspects of privacy. Privacy is required for communication and interpersonal relationships [9] and hence by extension required for the maintenance of an identity. It is embedded within the mechanisms of offline communication and participation and differs across societies where individuals socially manage their privacy with respect to others through an ongoing “boundary definition process” [16]. The mechanisms used offline are often implicit within the individual’s behaviour in the form of non-verbal cues [17] including body language, oral and visual cues, accessories such as, for example, clothing, curtains and blinds, to avoid the release of information and achieve varying degrees of privacy or openness. In the online environment, two different approaches are adopted to ensure the privacy of end-users. The regulation approach considers privacy to be a basic human right which requires protection whereas the self-regulation approach views privacy as a commodity which can be traded in the market place. While in the regulated approach, privacy is a must and although privacy mechanisms such as anonymity, pseudonymity and unlinkability technologies are provided, they are usually made explicit, are not often included within the system design and are hard for users to understand [18]. The selfregulation approach on the other hand assumes rational behaviour from online users in consenting to services in exchange for the release of personal information. This idea conflicts with research that looks at the biases and attributions that underpin the behaviour of individuals [1]. It is apparent from the above that both of these approaches cause difficulties for end-users and this may be a result of the difference between the protection offered by these online mechanisms and how individuals make decisions about privacy in the offline world. Thus while those approaches as implemented online aim to provide for the legal needs of privacy and for further protection of the shared information, they require rational and explicit privacy behaviour from the users. This type of behaviour can consequently make online privacy interactions seem impersonal, and make it hard for users to behave according to their concerns; that are often driven by the context of interpersonal relationships. The different attributes of information in the online environment may further point us towards understanding some of the reasons for the privacy paradox. Certain types of personal information shared in an offline social environment may be considered to have a brief retention time since it often relies on human memory and is bounded within the context and associated human emotions [19]. In the online environment, however, information is persistent and is easily replicated due to the nature of the internet infrastructure. The consequences are that the information online can be easily taken out of context at a later time, flattened of its emotional value and made available for analysis and scrutiny by systems or people of which one might not be aware. The information might be given a different meaning and secondary information might be
314
P.L. Coopamootoo and D. Ashenden
inferred. These characteristics might also deny users of their rights to exercise control on their personal information in terms of who has access to it, when and how. In the offline environment, individuals tend to share private information with a small number of individuals and generally tend to not broadcast it to the wider public audiences, while online broadcasting is much easier to accomplish and personal information is frequently broadcast to a large audience although the user may be sharing with a specific audience in mind [20]. The sharing of one’s personal information is also usually done by the individual or others close to the individual which differs from the online scenario where personal information is more easily accessible and can potentially be shared by anyone with access to it. For this reason the properties of online data and its transmission affect the very nature of private information and hence no longer cater for the intimacy required for communication and interpersonal relationships [10]. To summarise, while offline, privacy is implicitly linked to individual behaviour and communication and the building, development and maintenance of relationships; online it is explicitly designed and dependent on human-computer interactions. Privacy is provided by the online system and hence privacy online is constrained by the technology. Moreover, while privacy has a contextual or situational value, the personal information gathered and stored online may be deprived of its context. A lack of awareness of the properties of information online and of the consequences results in users making a poor risk assessment and unknowingly trading off privacy. The asymmetry of information transmission may also cause ambiguity and assumption of privacy where users might believe their interactions happen within a safe system within their computer system in their physical space, thus explaining the privacy paradox.
3 Methodological Approaches to Understanding Privacy Offline versus Online Design In this section, we briefly explore the social-psychological theories of privacy that laid the foundation for further studies and some extensions that have built on those theories through conceptual studies, systematic analysis and empirical studies. We then review the approaches used to design privacy online and discuss how they cater for the social aspect of privacy and the characteristics of offline privacy behaviour and of private information shared. 3.1 Foundational Social-Psychological Privacy Theories Westin and Altman’s theoretical contributions to the understanding of the socialpsychological aspects of privacy have stood the test of time and provide a firm foundation for other researchers to build on [21; 22]. Westin’s theory highlights the ways in which individuals protect themselves by temporarily limiting access to themselves by others [21]. Since privacy allows individuals, groups or institutions to determine when, how and to what extent their information is communicated to others, it is viewed in relation to social participation and is the voluntary and temporary withdrawal of an individual or group through physical or psychological means.
Designing Usable Online Privacy Mechanisms
315
Westin describes privacy as being a dynamic and non-monotonic process which is also neither self-sufficient nor an end in itself. According to Westin’s theory, privacy has four states which can be thought of as privacy mechanisms, that is, the means through which privacy is maintained. These states are solitude, intimacy, anonymity and reserve. He also posits four functions or goals of privacy. These are personal autonomy, emotional release, self-evaluation, and limited and protected communication [21]. Empirical research, such as the factor analysis undertaken by Pederson [23], not only found support for Westin’s states but also tested the relationship between the states and functions of privacy. While describing his results as coherent and inclusive, he proposes a 6 x 5 ‘types of privacy x privacy functions’ model [23] which provides the link between the types of privacy behaviour individuals exhibit and the functions or goals of these. Altman [22] on the other hand places social interactions at the heart of his theory, with the environment providing mechanisms for regulating privacy. While for Altman also, privacy is the selective control of access to the self, he also identifies privacy to be a temporal and dynamic process of interpersonal boundary control. This is the process through which individuals regulate interactions with others where privacy has both a desired and actual level and privacy is non-monotonic, bidirectional and applies at the individual and group level. Altman also provides a range of privacy mechanisms for privacy regulation, such as the verbal content of communications, territorial behaviour to enable separation of personal space from others and cultural norms. He goes on to suggest that privacy should be considered as a social process, and that an in-depth psychological study of the aspects of privacy must include the interplay of people, their social world and the physical environment. Two important extensions of Altman’s regulation theory are based on the linkage of privacy and disclosure [24]; these build on Altman’s dialectical conception of privacy as a process of opening and closing a boundary to others. Petronio [25] proposes a conceptual and theoretical framework in her articulation of Communication Privacy Management (CPM), arguing that individuals depend on a rule-based boundary system when deciding whether to disclose private information. Central to Petronio’s approach is the need to strike a balance between the positive opportunities to interact with others, made possible by technology, and the dangers of losing the means to control and regulate access to one by others. The rules are used to balance revealing and concealing private information - that is disclosure and privacy. These rules are dynamic since they can change, grow or remain stable for periods. Derlega & Chaikin [26] on the other hand extend Altman’s boundary concept to a dual boundary model while exploring its applicability to information privacy. They suggest that individuals function within a dyadic boundary that is perceived as a safe zone within which they disclose to invited others or across which disclosure does not pass [26]. Newell [27] performed a systematic classification of past privacy studies across a variety of disciplines and proposed a framework which extends from both Westin’s and Altman’s theories. She classifies past studies into person-centred, place-centred and interaction perspectives. Within the interaction perspective, privacy is an attitude, a behaviour, a goal or a process. Privacy as behaviour includes choice, control, boundary regulation, interaction management and information management [27] since privacy presupposes the existence of others, the opportunity of interactions with them and the ability to control this interaction.
316
P.L. Coopamootoo and D. Ashenden
From the above overview of socio-psychological studies, interpersonal communication through a process of boundary control within behaviour is highlighted as prerequisite to the development and maintenance of relationships. This corresponds with Gerstein [9] and Rachels’ [10] ideas outlined earlier around the necessity of privacy for communication and the development of interpersonal relationships. 3.2 Approaches to Designing Online Privacy We can see that the design of privacy into online systems had its roots in the legal need to protect end-users from the threat of misuse of their personal data and for them to provide an informed consent to its further processing or sharing. Thus privacy policies were implemented as liability shields for businesses and are often long texts that are too legalistic and complicated for end-users to read and understand. In response to this some research has looked at the design of privacy policy plug-ins or user agents [28] that allow end-users to select their preferences and make it easier for them to be alerted when a website does not comply with their preferences. This, however, has to be set a priori to interactions and does not form part of the humancomputer interaction during the online experience (for example, in ecommerce transactions or online social networking). Other kinds of plug-ins devised have tried to minimise the collection of profiling information which end-users might not be aware of such as browser filtering or cookie removers. Primelife has also worked towards enhancing the transparency of policies through the ‘Creative Commons’ type layered approach to privacy [29]. Whilst needing to overcome the challenges listed by Hansen (2010), this is an important work in progress towards enhancing end-users’ understanding of their privacy online. There are also tools and research initiatives that look at embedded access control mechanisms which provide users with the technological functionality of controlling access to themselves and provide them with feedback for the control applied. Examples of this are the fine-grained access controls of Facebook and the usercontrolled privacy research carried out by Cornwell et al. [30]. A few steps further and we find user-centric identity management systems that provide users with control of the private details they share and therefore their online identity. The Prime project for instance has analysed and translated legal principles into HCI requirements which were further supplemented by social needs [31]. They extend the privacy policy useragent by allowing end-users to express their policy preference regarding data disclosure as well as negotiate it with the service provider. The negotiated policy is attached to the data shared and a data track feature can provide the end-users with a comprehensive report of their history of data sharing and of the policies attached to the disclosures [32].
4 Discussion As reviewed above, studies for the design of online privacy have concentrated on providing technological solutions for the legal needs of privacy while working towards making it usable and providing some control to the end-user. Projects such as Primelife [32] also provide for policy negotiation and reporting facilities which might be more helpful for certain types of end-users to manage their privacy. While this approach
Designing Usable Online Privacy Mechanisms
317
provides the technological solution for further control, and feedback mechanisms about information shared online and hence one’s identity, it does not consider privacy as an implicit process within interpersonal communications which is an essential component of privacy and disclosure behaviours and of how one manages one’s identity. Moreover, privacy in this approach is made explicit and adds additional steps to be performed during the online experience. Making privacy an additional task during online interactions, although enhancing trust in the service provider could make it cumbersome to use the service and make online communication less natural which undermines the aim of social features embedded within online systems. The aim is to persuade endusers of the interpersonal aspect of human-computer interactions in order to enhance online participation - a large proportion of which includes the disclosure of private information. Newell [27] suggested more than a decade ago that the vagueness and ambiguity in the definition and representation of privacy could be resolved and wide support obtained if privacy was viewed as an interactive condition of the person and the environment. Pederson’s [23] states versus functions matrix (adapted from Westin [21]) might help to design mechanisms if the privacy goals are known. Moreover, while Pederson’s matrix is highly valuable for securing the link between the types or means of privacy and the functions or goals of privacy, it is at a high level and cannot be practically applied to interaction design. An application of the matrix will require further research to ensure the concepts can be translated into online interaction design requirements. Petronio’s Communication Privacy Management (CPM) framework [25] on the other hand has been used to understand how people decide to disclose private information in offline settings and also to understand and address the tension between disclosure and privacy by examining how and why people decide to reveal or conceal private information within the ecommerce context [33]. CPM is also a practical framework which can more easily be used to assess online systems although there are fundamental differences between the nature of offline and online environments. CPM is a rule-based theory proposing that individuals develop rules to aid decisions about whether to reveal or conceal information, the rules developed help people maximise the benefits while minimising the risks of disclosure and are a function of the context and disclosure goals. The theory proposes three iterative processes for boundary management [25]. The first process, boundary rule formation includes the seeking of information and rules development to regulate when and under what circumstances people will reveal rather than withhold personal information whereas the second process, boundary coordination refers to the negotiation of privacy rules between parties through the setting and maintenance of boundary linkages, boundary ownership rights and boundary permeability. The third process, boundary turbulence might result from differences in privacy rules between parties, privacy rule violations or deficient boundary coordination. Boundary turbulence refers to the dynamic process of maintaining and negotiating boundaries to manage personal disclosures. The different CPM processes could cater for the properties of privacy identified above. For instance, CPM caters for the dynamic and temporal nature of privacy through the coordination and turbulence recovery processes and for bidirectional flow of information through the information seeking and negotiation processes. In addition,
318
P.L. Coopamootoo and D. Ashenden
despite the fact that the internet causes persistency of data, a CPM approach to online privacy would allow the user to be to some extent in control of the lifespan of the private information shared according to the coordinated and negotiated boundary. Also, the CPM approach would mean that the user would have control over the audience to which his or her private details are broadcasted. The processes of the CPM, that is, boundary rule formation, coordination and turbulence are however, dependent on the interaction mechanisms employed within the online environment which can be highly persuasive in favouring rules that minimise the apparent effects of risks and maximise benefits of sharing personal details. Information technology is never neutral but always influences users’ attitudes and/or behaviour in one way or another [34]. Moreover, privacy is embedded within human behaviour and communication while expression and persuasion is a big part of communication. Hence persuasion is important for expressive privacy which relates to the social and communication dimension of privacy and encompasses an individual’s ability and effort to control social contacts [22]. Thus to enable end-users to express their privacy through human-computer interactions and for service providers to be better able to convey details of information processing, a persuasive approach could be useful. Direct persuasion approaches such as rational arguments or indirect approaches such as simple cues could allow end-users to be in a better position to communicate and participate online while maintaining their privacy. Privacy behaviour for instance, consists of boundary regulation while enjoying social interactions without negatively affecting oneself or the other party. Furthermore, since individuals want to be private and do not explicitly perform privacy decisions at every instant their privacy behaviour is implicit. But in online interfaces privacy mechanisms are made explicit and mostly kept away from the interaction paths. It seems very relevant then to suggest that a persuasive approach could also cater for the implicitness property of privacy behaviour within human-computer interaction and consequently enhance the usability of the online privacy mechanisms.
5 The Persuasive CPM In this section we first introduce persuasive systems and propose the persuasive CPM approach as a means to enhance usability of privacy online. We then evaluate a disclosure-privacy scenario with respect to the CPM stages in an attempt to identify the boundary rules that can be formed with the current interaction design and to understand whether the design caters for privacy as a communication process. 5.1 Persuasive Systems Persuasive systems are ‘computerised software or information system designed to reinforce, shape or change attitudes or behaviours or both without using coercion or deception’ [35]. Such systems employ persuasive techniques that are designed to enable compliance, change behaviour or attitudes and that rely on the voluntary participation of end-users [36]. Studies have shown that human beings have an innate privacy need and hence attitudes to privacy. Since end-users already have a privacy attitude, we suggest employing persuasive techniques to influence behaviour. Hence, the context of the persuasive topic tackled by this paper and research is online privacy behaviour change for adopted, or learnt disclosure behaviour, privacy reinforcement
Designing Usable Online Privacy Mechanisms
319
for those privacy behaviours that are already present but hard to maintain and sustain and the shaping of new privacy behaviours. A persuasive system design approach [37] though quite a recently developed approach can be used to direct the analysis of systems requiring persuasive strategies and the selection of specific persuasive principles that can be used to achieve specific goals, in different contexts. Persuasive design principles [37] can provide for primary task support, human-computer dialogue support, perceived system credibility and social influence. 5.2 The Proposed Persuasive CPM In Figure 1 below we propose a persuasive CPM for usable online privacy. The persuasive CPM model is a preliminary adaptation of CPM with persuasive techniques selected for each process of privacy boundary management using the four categories of persuasive systems principles of the persuasive system design approach of Oinas-Kukkonen & Harjumaa [37].
Fig. 1. The persuasive CPM – a preliminary diagram based on Petronio’s CPM [25] and OinasKukkonen & Harjumaa’s persuasive system design approach [37]
For each stage of the CPM we propose a set of persuasive principles that have the potential of enhancing usability of privacy interactions. For instance, for boundary formation, information is sought and new rules are formed or existing rules are acquired. Hence, the persuasive principles listed in Figure 1 for boundary formation
320
P.L. Coopamootoo and D. Ashenden
could facilitate communication to make the end-user aware of the type of boundaries being formed, to help decide on the type of privacy rules and also help facilitate the process of setting these rules. While this model still needs to be subjected to rigorous analysis and testing, we can already advance that this persuasive CPM approach could cater for the transparency need of privacy and also help towards providing the ability, motivation and trigger to end-users to exercise their right of control over their private information. 5.3 Analysis of a Disclosure-Privacy Scenario with Respect to the CPM and the Persuasive System Design Principles We analysed the profile creation of Amazon UK in an attempt to understand how interactions within this scenario have been designed with respect to disclosure and privacy using the CPM processes. We explored the scenario to understand how information can be sought, what boundary rules can be formed, how they are coordinated and negotiated and how turbulence can be resolved. We then identified persuasive principles used within this scenario that would favour disclosure and privacy. While the first two criteria of rule formation, culture and gender are not affected by the interaction design, the design can however contribute to other criteria of rule formation such as motivation, context and risk-benefit ratio awareness. Moreover, the boundaries coordinated during disclosure-privacy can be inclusive, intersected or unified. In order to identify the type of boundary that can be formed within the scenario, we look for the possibility of coordinating boundaries that is by forming linkage, ownership and permeability rules. For each of the different types of boundaries, different linkages and ownership rules can be coordinated. For instance, within inclusive boundaries, role, coercive or susceptibility linkages and manipulative, benevolent or obligatory ownerships occur. Within intersected boundaries, the linkages are goal or identity linkages, and both parties share responsibility of ownership. Findings. The interface does not provide explicit information to notify or explain that entering personal data will result in a dyadic boundary nor is the risk of disclosure highlighted. On the other hand, it motivates the end-user to disclose using the words ‘share’ and ‘friends’. Thus, the end-user might not realise that he or she is disclosing information to the service provider rather than still being within his or her personal boundary. In fact, the creation of a profile involves a dyadic boundary formation, where the criteria used to trigger boundary rule formation includes context (type of products bought), motivation and benefits - apart from culture and gender. The rules formed within this dyadic boundary are also acquired by the end-user without his or her awareness. Since there are no disclosure warnings and rules cannot be negotiated, the end-user has to accept pre-existing rules set by the service provider. As the end-user goes through a process of boundary appropriation by appropriating an already defined and set boundary and without having been provided with information about the type of boundary being formed, the coordinated boundary is inclusive. In this type of boundary coordination in the current scenario, the linkages formed can be either coercive or role linkages. In this case it is coercive linkage since the end-user is not made aware that he or she is leaving the personal boundary to form
Designing Usable Online Privacy Mechanisms
321
a dyadic boundary during the interactions of profile creation. If the end-user was aware of this it would be the result of prior experience and the boundary linkage within this inclusive coordination would be role linkages. Role linkages refer to linkages formed with the service provider who takes control of the information disclosed by the end-user due to the former’s role of providing services that requires disclosure from the part of the end-user. In both cases, the end-user does not know who else might be linked and have access to this new boundary. The type of boundary ownership formed is either manipulative or obligatory. That is the service provider manipulates the end-user into disclosing while not making the end-user aware that a dyadic boundary is being formed and that control of ownership of information shared has been lost. The ownership could also be obligatory if the end-user can understand that he or she has left his personal boundary but is obliged to disclose and give up ownership in order to benefit from services. Hence the end-user is not given any control over how the disclosed information can be distributed. Profile creation in this scenario precludes editing and the visibility of the profile is automatically set to ‘public’. The profile creation page does not lead immediately to profile editing meaning that one would not know how the profile is visible. We also identified the persuasive techniques that are present within the interaction design and the table below provides the identified list from each of the four categories of persuasive system design principles that favours disclosure. Table 1. Persuasive techniques used to favour disclosure
Primary task support Reduction: ‘It’s easy! Just choose a public name for Your Profile.’ Self-monitoring: ‘Your profile contains information about you and your Amazon activities such as your Wish List and reviews you’ve written.’ Suggestion: of a name in the textbox Personalisation: personalised suggested name. Tunnelling: by providing a text box to write a name and a yellow button to create profile. Dialogue Support Rewards: ‘Your Profile is a one-stop place for your friends and other people to find you and learn more about you.’
System Credibility Support Trustworthiness: trustworthy since it is Amazon (widely used) and says ‘If you are not X, click here’. Surface credibility: the interface/website looks and feels competent; there are for instance no adverts.
Social Support Normative influence: ‘Connect with friends and other Amazon customers.’
For persuasive techniques that could favour privacy; there is a link at the bottom of the page in very small print. Clicking on the link reveals a page of text that provides for reduction since the text is divided into sections. However the text within the separate sections is condensed and would probably not encourage reading. This analysis has allowed us to identify the linkages and ownership rules that could be coordinated within the current scenario and the type of boundary that could be
322
P.L. Coopamootoo and D. Ashenden
formed. We also found that the interactions designed within this scenario fail to provide for the boundary management processes of the CPM. We identified some persuasive system design principles that could favour disclosure but only one that would favour privacy. It would be valuable to find out whether the addition of persuasive principles for privacy (via the boundary management process of the CPM) would enhance privacy usability online.
6 Conclusion and Future Work In this paper we have discussed the differences between offline and online privacy. The online and offline environments differ fundamentally in a way that causes a difference in the properties of the private details shared. We explored socialpsychological theories of privacy and highlighted that privacy is intertwined with communication and interpersonal relationships. We then suggested an approach through which offline privacy behaviour could to some extent be replicated online, that is by using Petronio’s [25] privacy boundary management theory. We proposed a persuasive Communication Privacy Management (CPM) as a means for end-users to be better able to express, and communicate, and hence engage, with their privacy online and analysed a disclosure-privacy scenario with respect to the CPM. The proposed approach has not yet been tested but paves the way for research which considers the changing, reinforcing and shaping of online privacy behaviour through enhancement of human-computer privacy interactions which will lead to usable online privacy. The next step for the research is to analyse other scenarios of online privacy mechanisms added and embedded within systems with respect to the CPM framework and then analyse their persuasiveness according to the persuasive system design approach. We will then perform empirical usability studies with an aim to explore the effect of a persuasive CPM approach on the usability of privacy mechanisms.
References 1. Acquisti, A., Grossklags, J.: Losses, Gains, Hyperbolic discounting: an experimental approach to information security attitudes and behaviour. In: 2nd Annual Workshop on Economics and Information Security - WEIS 2003, May 29-30, University of Maryland (2003) 2. Spiekermann, S., Grossklags, J., Berendt, B.: E-privacy in 2nd Generation E-commerce: Privacy preferences versus actual behaviour. In: 3rd ACM Conference on Electronic Commerce, Tampa, Florida, USA, October 14-17, pp. 38–47. ACM, New York (2001) 3. Acquisti, A.: Privacy in electronic commerce and the economics of immediate gratification. In: 5th ACM Conference on Electronic Commerce, May 17-20, pp. 21–29. ACM, New York (2004) 4. boyd, d.: Why youth love social network sites; The role of networked publics in teenage social life. In: Buckingham, D. (ed.) MacArthur Foundation Series on Digital Learning Youth, Identity and Digital Media, p. 119. The MIT Press, Cambridge (2007) 5. Cranor, L., McDonald, A., Reeder, R., Gage Kelley, P.: A Comparative Study of Online Privacy Policies and Formats. In: Goldberg, I., Atallah, M.J. (eds.) PETS 2009. LNCS, vol. 5672, pp. 37–55. Springer, Heidelberg (2009)
Designing Usable Online Privacy Mechanisms
323
6. Warren, S.D., Brandeis, L.: The right to privacy. Harvard Law Review 4, 193–220 (1890) 7. DeCew, J.: Privacy. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy (Fall 2008), http://plato.stanford.edu/archives/fall2008/entries/privacy 8. Fried, C.: An Anatomy of Values: Problems of personal and social choice. Harvard University Press, Cambridge (1970) 9. Gerstein, R.: Intimacy and Privacy. Ethics 89, 76–81 (1978) 10. Rachels, R.: Why Privacy is important? Philosophy of Public Affairs 4, 323–333 (1975) 11. Itrona, L.D., Pouloudi, A.: Privacy in the Information Age: Stakeholders, Interests and Values. Journal of Business Ethics 22(1), 27–38 (1999) 12. Adams, A., Sasse, M.A.: Privacy issues in ubiquitous multimedia environments: wake sleeping dogs, or let them lie? In: Sasse, M.A., Johnson, C. (eds.) Seventh IFIP Conference on Human-Computer Interaction INTERACT 1999. Edinburgh Conference Centre, August 30-September 3. IOS Press, Riccarton (1999) 13. Strater, K., Richter, H.: Examining privacy and disclosure in a social networking community. In: 3rd Symposium on Usable Privacy and Security, July 18-20, vol. 229, pp. 157–158. ACM, New York (2007) 14. Adams, A., Sasse, M.A.: Taming the wolf in sheeps clothing: Privacy in multimedia communications. In: Seventh ACM International Conference on Multimedia, Orlando, Florida, United States, October 30 - November 05, pp. 101–107. ACM, New York (1999) 15. Swire, P.: Privacy and Information Sharing in the War on Terrorism. Villanova Law Review 51, 101–129 (2006) 16. Palen, L., Dourish, P.: ”Unpacking” privacy for a networked world. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Ft. Lauderdale, Florida, USA, p. 126. ACM, NY (2003) 17. Patterson, M.L., Mullens, S., Romano, J.: Compensatory reactions to spatial intention. Sociometry 34, 114–121 (1971) 18. Zwick, Dholakia, N.: Models of privacy in the Digital Age: Implications for Marketing and E-commerce (unpublished Paper), Research Institute for Telecommunications and Information Marketing, RITIM, University of Rhode Island (1999) 19. Blanchette, J., Johnson, D.G.: Data retention and the panoptic Society: The social benefits of forgetfulness. The Information society 18(1), 33–45 (2002) 20. Richter-Lipford, H., Besmer, A., Watson, J.: Understanding Privacy settings in facebook with an audience view. In: Churchill, E., Dhamija, R. (eds.) Proceedings of the 1st Conference on Usability, Psychology, and Security, San Francisco, California, April 14, pp. 1–8. USENIX Association, Berkeley (2008) 21. Westin, A.: Privacy and Freedom. Athenum (1967) 22. Altman, I.: The Environment and Social Behaviour: Privacy, Personal Space, Territory and Crowding. Brooks/Cole Publishing, Monterey, California (1975) 23. Pedersen, P.M.: Models for types of privacy by privacy functions. Journal of Environmental Psychology 19(4), 397–405 (1999) 24. Margulis, S.T.: On the status and collaboration of Westins’s and Altman’s Theories of Privacy. Journal of Social Issues 59(2), 411–429 (2003) 25. Petronio, S.: Boundaries of privacy: dialectics of disclosure. State University of New York Press, Albany (2002) 26. Derlega, V.J., Chaikin, A.L.: Privacy and self-disclosure in social relationships. Journal of Social Issues 33(3), 102–115 (1977) 27. Newell, P.B.: Perspectives on Privacy. Journal of Environmental Psychology 15(2), 87– 104 (1995)
324
P.L. Coopamootoo and D. Ashenden
28. Cranor, L., Guguru, P., Arjula, M.: User interfaces for privacy agents. ACM Trans. Computer-Human Interaction 13, 135–176 (2006) 29. Hansen, M.: Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein, Holstenstr. 98, 24103 Kiel, Putting privacy pictograms into practice: a european perspective (2010), http://subs.emis.de/LNI/Proceedings/Proceedings154/ gi-proc-154-134.pdf 30. Cornwell, J., Fette, I., Hsieh, G., Prabaker, M., Rao, J., Tang, K., Vaniea, K., Bauer, L., Cranor, L., Hong, J., McLaren, B., Reiter, M., Sadeh, N.: User-controllable security and privacy for pervasive computing. In: Eighth IEEE Workshop on Mobile Computing Systems and Applications, HOTMOBILE, March 08-09, pp. 14–19. IEEE Computer Society, Washington DC (2007) 31. PRIME WP06.1, HCI Guidelines, D06.1.f (2008), https://www.prime-project.eu/prime_products/reports/arch/ pub_del_D06.1.f_ec_wp06.1_v1_final.pdf 32. Leenes, R.E.: User-Centric Identity Management as an indispensable tool for privacy protection. International Journal of Intellectual Property Management 2(4), 345–371 (2008) 33. Metzger, M.: Communication Privacy Management in Electronic Commerce. Journal of Computer-Mediated Communication 12(2) (2007), http://jcmc.indiana.edu/vol12/issue2/metzger.html (June 2010) 34. Oinas-Kukkonen, H., Harjumaa, M.: Persuasive system design: key issues, process model and system features. Communications of the Association of Information Systems 24, 485– 500 (2009) 35. Oinas-Kukkonen, H., Harjumaa, M.: Towards deeper understanding of persuasion in software and information systems. In: First International Conference on Advances in Computer-Human Interaction, Sainte Luce, Martinique, ACHI, February 10-15, pp. 200– 205. IEEE Computer Society, Washington, DC (2008) 36. Oinas-Kukkonen, H.: Behaviour change support systems: a research model and agenda. In: Ploug, T., Hassle, P., Oinas-Kukkonen, H. (eds.) 5th International Conference, Persuasive 2010, pp. 4–14. Springer, Heidelberg (2010) 37. Oinas-Kukkonen, H., Harjumaa, M.: A Systematic Framework for Designing and Evaluating Persuasive Systems. In: Oinas-Kukkonen, H., Hasle, P., Harjumaa, M., Segerståhl, K., Øhrstrøm, P. (eds.) PERSUASIVE 2008. LNCS, vol. 5033, pp. 164–176. Springer, Heidelberg (2008)
PrimeLife Checkout A Privacy-Enabling e-Shopping User Interface Ulrich K¨ onig Unabh¨ angiges Landeszentrum f¨ ur Datenschutz Schleswig-Holstein (ULD), Germany
[email protected] Abstract. The PrimeLife Checkout user interface aims at supporting the user in enforcing her privacy preferences in an online purchase process. The user can choose how much privacy protection she wants, and the system visualises what shipping and payment methods are matching with her needs or why the selected methods are not suitable for the protection level of her choice. The interface displays which personal data will be transferred to whom for what purposes in a user friendly way. In most webshops, this information can only be retrieved by reading the shop’s full privacy policy. In contrast, the proposed approach informs the user what happens with her data for which purpose while she is entering her data. Thereby it specifically addresses the challenge of a user friendly and more transparent policy display.
1
Introduction
The PrimeLife1 Checkout user interface (PLC) is designed to give users control of their data in a checkout process. Usually the web-based checkout process works with many small steps. In each step today’s checkout systems ask the user for specific data, like name, e-mail address, payment information etc. After each step, the data are transferred to the server of the webshop where the user has no control what the shop operator does with the data. Some shops employ scoring systems that use the data collected so far to decide which payment methods they offer to the user in a later step, as described in [LEP+ 01]. The PLC offers a different approach. The user-entered data will be checked on the local machine of the user to see whether they are valid. The rules for validating the data have to be provided by the shop at the beginning of the transaction. In this particular demonstrator, this is done with JavaScript entirely on the user’s side. No data will be transferred to the service until the user finally confirms that she is willing to perform the purchase and all checks are done. Only when the user finally confirms, the data will be transferred to the webshop. This text is organised as follows: Section 2 explains the “three steps design” chosen for the PLC. Section 3 outlines the different components of the PLC 1
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 216483 for the project PrimeLife.
S. Fischer-H¨ ubner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 325–337, 2011. c IFIP International Federation for Information Processing 2011
326
U. K¨ onig
demonstrator and shows their functionality. The technical background is described in section 4, followed by evaluation results in section 5. Finally, section 6 summarises the findings.
2
Three Steps Design
In the online shopping process, users typically have to go through a number of steps to complete the checkout process. In every step, they have to enter and transfer personal data, like shipping address payment information etc. For example, seven steps are needed to buy something at amazon.com as visualised in Figure 1. However, usability tests of the multiple steps “Send Personal Data” dialogue performed in the PrimeLife project showed that users regarded six steps only for the data collection as too many [WP410]. The user has no knowledge if the data that she has entered will be accepted by the system, which is a problem.
STEP 1:
STEP 2:
STEP 3:
Shopping Cart
Shipping Address
Shipping Method
STEP 4:
Payment
STEP 5: Billing Address
STEP 6:
STEP 7:
Place Order
Confirmation
Fig. 1. Seven Steps solution by amazon.com
A typical scenario is that the user has to enter data and click “next” for the next step where again she is asked for data. The whole set of data collected in one of the steps will immediately be sent to the server, where its validity is checked. If all data is correct, the user is taken to the next step. If not, the user will receive an error notice, has to fix the problem and try again. She has no control what the webshop does with these data until the transaction is finished or cancelled. In addition, the user will not know whether the privacy policy of the shop fulfils the user’s requirements. For example at amazon.com, the user is told in step 6 after she has entered and transferred all her data to the amazon.com server: “Please review and submit your order”, “By placing your order, you agree to amazon.com’s privacy notice and conditions of use”. Interested users may follow the link to the privacy policy displayed on all pages of the shop, but users are not actively notified on the parts of the privacy policy relevant to the transaction. In general, another popular way of using data within the checkout process is to request a score value from credit agencies without the users’ consent or at least
PrimeLife Checkout
327
without clearly informing the users. The offered payment methods are selected by the result of the score [LEP+ 01], instead of asking the user beforehand to whether such a credit rating is necessary at all. This is not the case when the user opts for cash on delivery or any form of prepayment. The PLC solution chooses a different approach. The objective is to collect all necessary data in one single step. The moment the user enters the data, PLC will check their validity. This step is performed within the realm of the user’s browser without transferring any data to the server. Therefore, the whole validity check is done before any data will be sent to the webshop’s server2. The user gets instant feedback if the entered data are formally valid and fulfils all requirements from the service or if she has to correct something. PLC can be used in two different ways. It can be integrated directly into the webshop interface, so that it is seamlessly usable by the user in her browser, or it can be installed on the user’s computer as a stand-alone application. The details are described in section 4. The resulting steps are shown in Figure 2 – the procedure just needs three steps for the purchase, the user gets instantaneous feedback, and she may correct her data if needed. This reduces the necessary amount of clicks required to finish the user’s purchase. The individual steps are described in sections 2.1-2.3.
STEP 1: Shopping Cart +Address +Shipping
STEP 2:
STEP 3:
Summary
Confirmation
+Payment
Fig. 2. PrimeLife Checkout three steps solution
The major advantage of the PLC solution presented in this section is that all data will only be transferred at the end of the process in one data container and only if the user finishes the whole process. The webshop cannot score the user and select payment methods by the score value without the user’s consent. If the user cancels the process at any time, no data will be transferred to the server. 2
To perform this step, the webshop needs to transfer all of its validity checks to PLC. This may seem like a heavy burden, but it has two advantages: The user will get a clear overview, which data she has filled in and is going to share and the user does not disclose any of her data until she finally confirms to complete the order. A thorough evaluation can be found in [WP410], section 5.2.
328
U. K¨ onig
Some would argue the disadvantage of the PLC is that by putting all of the fields the user has to fill out on one screen, the screen is getting to complex, and the user may be overwhelmed. They might argue that an Amazon like step-bystep solution is easier to understand. However, two points can be raised to defend PLC: First, users get a clear overview what data is required for the transaction from the beginning. The information is not hidden in the different steps, which are only accessible, if the user has filled out all fields in the steps before correctly. Second, while only a very small number of user tests have been conducted so far, the complexity of the screen was never an issue in the test results. 2.1
Step 1: Shopping Cart
In the first step, the user has to enter all necessary data to perform the purchase as shown in Figure 3. She is informed that she is in step 1 out of 3 and that no data will be transferred to the webshop until, she confirms the data transfer in the next step. The user will also get a view on her current shopping cart with the option to change the amount of the selected items. Details about Figure 3 can be found in section 3. 2.2
Step 2: Summary
In step 2, the PLC will display all the data that have been collected in step 1 again in the same layout without the option to modify them as shown in Figure 4. The user has the opportunity to save the data that she entered in step 1 plus information to whom the data may be transferred for what purposes, to her privacy settings. She can check if all data that she has entered are correct and confirm the purchase and data disclosure by clicking on the “Order and Transfer Data →” link or go back and make changes by clicking on “← Return to Shopping Cart”. The goal of this step is to give the user a final view on her purchase, what data will be transferred to whom for what purposes, before finally confirming the purchase, and data transfer. 2.3
Step 3: Confirmation
In step 3, the PLC confirms the purchase. It also notifies the user that the entered data have been transferred to the service providers listed in the “My Data” field and in the bar on the right side. The user has the opportunity to save the privacy settings for future use and reference.
3
Components of the PLC
In this section, we provide a short description of the components of the PLC. At the top of the PLC user interface, there is an overview of all steps where the current step is displayed in bold and underlined font. In step 1, the user
PrimeLife Checkout
Fig. 3. Step 1: Shopping Cart
329
330
U. K¨ onig
Fig. 4. Step 2: Summary
PrimeLife Checkout
331
gets the information that no personal data entered within the grey area will be transferred until the user finally clicks “Order and Transfer Data”. In step 3, this box will show the confirmation that the data have been transferred to the recipients listed in the right bar, see Figures 3, 4. 3.1
My Privacy Settings
An important part of the PLC is the “My Privacy Settings” box illustrated in Figure 5 where the user selects her preferred privacy settings. There are three predefined settings available: “Nearly Anonymous”, “Few Data” and “Don’t Care”. In addition, there is a field for the user to insert her privacy settings with the help of a policy editor.
Fig. 5. “My Privacy Settings”
The standard privacy settings are predefined and always the same. The user has the option to use her customized privacy settings by using the “Insert Privacy Settings” link. The names of the privacy settings should indicate how much personal information the user is willing to disclose: – “Nearly Anonymous” is the most restrictive setting where the user wants to reveal as little personally identifiable information as possible, i.e., she desires to act anonymously. In web transactions, complete anonymity is hard to obtain or measure. This is why we have called this setting “Nearly Anonymous”, to prevent a false sense of safety in users. – With “Few Data” the shop, the shipping company and the payment provider will only get the data necessary to perform the transaction, by default. So the default of “Few Data” is identical with the “Nearly Anonymous” settings. The difference to “Nearly Anonymous” is that a user can agree to transfer more data than necessary in the “Nearly Anonymous” case. This might enable the user to e.g. select a different payment provider. Additional purposes for data collection and data usage such as marketing are disabled by default within this setting. – The setting “Don’t Care” disables all restrictions. The user can configure to disclose her data without being warned by the system that the webshop’s privacy policy does not match with her privacy settings. 3.2
My Data
The “My Data” box shown in Figure 6 contains text fields in which the user can enter personal data requested by the service provider. It also contains a matrix
332
U. K¨ onig
Fig. 6. “My Data” field including matching results
in which the user can use checkboxes to select which data should be transferred to whom for what purposes. Text fields. In the text fields, the user can enter the personal data requested. The fields are greyed out if data for that respective field are not requested by the service provider, yellow if the information in the field is necessary for completing the transaction but not provided yet or filled out incorrectly, and white if the data are necessary and filled out correctly. One issue with the use of colours relates to colour-blind users that cannot perceive the difference between a yellow and a grey field. A solution could be a symbol behind the text field, which indicates the status of the text field. This has not been implemented within PLC. Checkboxes. The checkboxes are arranged in a matrix with data fields as rows and the combination of data controllers and the respective purposes as columns. Every checkbox symbolises whether information from the user is disclosed to a
PrimeLife Checkout
333
data controller or not. If there is an asterisk next to a checkbox, the data field is mandatorily requested by the data controller. To successfully perform the purchase, the user has to give her consent to allow the respective processing by checking the checkbox next to the asterisk. Otherwise, there will be a mismatch between the “Privacy Settings” and the “Privacy Policy” of the data controllers in the involved column. In such a case the user may seek for a different option which does not require this type of data, e.g., to opt for an anonymous payment method or to choose a shipping provider who allows anonymous pickup of the shipped goods.
Fig. 7. Left: “My Data” field with a “User Settings” mismatch. Right: “My Data” field with a “Privacy Settings” mismatch.
Matching. The PLC matches the “Privacy Settings” selected by the user with the “Privacy Policy” of the different data controllers. The choice of the “Privacy Settings” determines the statuses of the checkboxes in the matrix. The “Privacy Policy” of the data controllers is compared with statuses of the checkboxes in the matrix. If both are matching, there will be a “MATCH ” in the box on the right side of the matrix, otherwise a “MISMATCH ”. If the mismatch can be resolved by the user without changing her privacy settings, there will be an orange exclamation mark “!” on the right side of the checkbox where the mismatch is located and the mismatch will be displayed in the “User Settings Matching” box as illustrated in Figure 7. If the user has to change the Privacy Settings in order to resolve the conflict, the exclamation mark will be red and the mismatch will be displayed in the “Privacy Settings Matching” box as shown in Figure 7. The details of the mismatches are displayed in the boxes on the right side of the matrix. Visibility impaired persons can distinguish the different mismatches with the help of the text in the boxes. The user can only proceed with the shopping transaction if there are no mismatches.
334
U. K¨ onig
Overview of data transfer. One main objective of the PLC is to visualise in a user-friendly manner what will be done with the data disclosed by the user. This is expressed by a list of all data the user has entered which shows all involved data controllers with all data fields (attributes) and field content (attribute values) that will be transmitted if the transaction is completed. The list is updated in real-time so that the user can immediately observe consequences of all her changes. To establish this, the “Data to Transfer” box on the right side of the user interface displays for each data controller what data will be disclosed for what purposes and what will be the data retention periods for the respective purposes, see Figure 8. Besides, links to the full privacy policies of the data controllers are provided to comply with the Art. 29 Working Party recommendation on multi-layered privacy policies [Par04]. The “Data to Transfer” box confirms the user input while the “My Data” box is used to input data.
4
Technical Issues
This section deals with some technical issues of the PLC. Section 4.1 describes alternative ways how data of the user can be transferred to third party data controllers. Section 4.2 describes different options how PLC can be used and deals with some technical issues. 4.1
Direct Data Transfer
In the “classic” approach, the user transfers all her data to the webshop from the first step of the buying procedure onwards. If necessary, the webshop forwards the information to the other data providers, also called downstream data controllers [ABD+ 09]. The problem is that the webshop gets the complete data set of every user, including information not necessary to perform the Fig. 8. “Data to Transfer” box transaction. There are two alternatives to this “classic” approach. The first solution may be to transfer the necessary data directly to the third parties and other service providers (downstream data controllers), e.g., the shipping address directly to the shipping company and the payment data to the payment company. In this case, the webshop would just need a primary key for each process and the corresponding data provider. The second solution for such downstream controllers could be to encrypt the data with the public key of the downstream provider and then forward this to the provider via the webshop.
PrimeLife Checkout
335
Neither of these options has been implemented in the PLC because the PLC is just a GUI-demonstrator, but in any implementation in a real life environment, it is an important design decision how to transfer the data to the downstream controllers and how to make this transparent to the user. For example: A user wants to buy “The Blues Brothers” DVD at a webshop using the shipping services of DHL and the payment services of Visa Card. The webshop gets the information that the user wants to buy “The Blues Brothers”, that the payment is done by Visa Card with the primary key 156345 and the shipping will be done by DHL with the primary key 95328. Visa gets the information that the user wants to pay e10, with credit card no. 1234-5678-9012-3456, valid until 10/2015 with card security code 987 to the owner of the primary key 156345. DHL gets the information that one parcel with the primary key 95328 has to be shipped to Holstenstr. 98, 24103 Kiel, Germany. Visa informs the webshop that e10 from primary key 156345 has been paid and will be transferred to the webshop. The webshop passes the DVD to DHL with the primary key 95328 and pays the shipping costs. DHL ships the DVD to the user. This architecture enables the user to expose her private data to a minimal group of data providers. For each of these providers only a minimal set of data is exposed. This approach works under the assumption that the data providers have a privacy policy that does not allow forwarding data to another data provider with a foreign purpose and that the data providers obey to their own privacy policy. 4.2
Low Barriers for Usage
The PLC could run directly from the website of the webshop or in a standalone program on the user’s computer. All the users need is a web browser with JavaScript support. This lowers the barrier for users who do not know PLC and do not want to install anything from a third party on their computer or if their operating system is not supported. The output of PLC is also designed to be screen reader compatible, to enable visually handicapped people to use it, too. Users can also install PLC directly from PrimeLife Website or another trusted party on their PC, so that there is no way of manipulation by the shop provider. The necessary data to perform the transaction has to be provided by the shop provider at the beginning of the transaction. This includes validation checks for the personal data of the user, additional costs for different shipping and payment methods and allowed payment and shipping methods. To transfer this data from the shop provider to the PLC a policy language similar to the PrimeLife Policy Language [ABD+ 09] is needed, which ensures that requirements and checks are performed correctly. All checks have to be done offline. If code is transferred that will be executed on the users’ computers, it has to be ensured that this code runs in a sandbox like environment, without network or file access. The research of such a policy has not been done within the research of PLC. If data has to be transferred for a validation or availability check, the user has to give her informed consent to transfer the specific data. A user-interface-demonstration is available at [K¨o10].
336
5
U. K¨ onig
Evaluation
The PLC has shown some strength in discussions with other researchers, e.g. the “Data to Transfer” section was well accepted. It is an easy way to make transparent to the customers what data will be transferred for which purpose to whom. The three-step design is also a big step forward to bring privacy to the users. Most webshops are designed to make it as easy as possible for the customer to buy something. The goal is to lose a minimal amount of customers during the checkout process. Nevertheless, easy is not equal to transparent. Nowadays transparency becomes a more and more important issue for customer in times of phishing, identity theft, and massive data warehousing. On the other hand, the PLC has some weaknesses. One of the major weaknesses are the checkboxes in the “My Data” section. There are too many checkboxes. It is difficult for users to understand what to do with all of these boxes. Users may not understand that they are giving their consent by clicking a checkbox to transfer data. Another problem is that the user has too many choices. Many of the combinations of choices that are possible to select in the GUI make no sense and will lead to some kind of error. It would help the user a lot if the GUI would prevent the selection of senseless combinations. It may also help if all payment/shipping providers that not compatible with the chosen “MyPrivacySettings” would be hidden or visually disabled. E.g., rows with unused fields like the credit card number are even displayed, when they are not needed. In addition, unused columns are displayed even if they are not used e.g. when the user has selected nearly anonymous, a column for marketing purposes makes no sense. In addition, an automatic mismatch solver would probably help the users to deal with the PLC-Interface. Moreover, it has to be taken into account that the interface should make it easier to select a privacy friendly solution than giving the consent into privacy unfriendly data processing. There has been an evaluation of the PLC done by Staffan Gustavsson, but with just five participants, so the results are not very reliable. It can be found in [WP410], section 5.2.
6
Conclusion
PLC introduces a new way for the checkout process in three steps that makes the whole process more transparent for the users compared with other multiple step processes. The concept may influence the design of real life webshops, if privacy becomes a selling point, beside the price. If privacy becomes a selling point, there is a good chance that privacy-aware webshops are going to include something like the “Data to Transfer” box into their website because this would support the intelligibility of their privacy policy. The other parts of PLC contain many valuable ideas, but need some reworking before they can be transferred into productive systems.
PrimeLife Checkout
337
The next step is to make privacy more transparent and comparable in a way that webshops start to compete not just with the price, but also in terms of who is guaranteeing the best privacy and making this process transparent to the potential customers.
References [ABD+ 09] Ardagna, C.A., Bussard, L., De Capitani Di, S., Neven, G., Paraboschi, S., Pedrini, E., Preiss, S., Raggett, D., Samarati, P., Trabelsi, S., Verdicchio, M.: Primelife policy language, 4.1, 4.2 (2009), http://www.w3.org/2009/ policy-ws/papers/Trabelisi.pdf [K¨ o10] K¨ onig, U.: Primelife checkout livedemo, 4.2 (04, 2010), http://www. primelife.eu/images/stories/releases/checkout/PrimeLifeMockU%p_ 3_2.html or http://is.gd/e3AFr [LEP+ 01] Loosemore, P., Egelhaaf, C., Peeters, E., Jakobsson, S., Zygourakis, K., Quinn, N.: Technology assessment of middleware for telecommunications. Eurescom Project P910, 1, 2 (April 2001), http://www.eurescom.eu/~pub/ deliverables/documents/P900-series/P910/TI_5/p910ti5.pdf [Par04] Article 29 Data Protection Working Party. Opinion on more harmonised information provisions, 3.2 (November 2004), http://ec.europa. eu/justice_home/fsj/privacy/docs/wpdocs/2004/wp100_en.pdf [WP410] WP4.3. UI prototypes: Policy administration and presentation - version 2. In: Fischer-H¨ ubner, S., Zwingelberg, H. (eds.) PrimeLife Deliverable 4.3.2. PrimeLife, 2, 2, 5 (June 2010), http://www.primelife.eu/results/ documents
Towards Displaying Privacy Information with Icons Leif-Erik Holtz, Katharina Nocun, and Marit Hansen Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein, Holstenstr. 98, 24103 Kiel, Germany
[email protected] Abstract. European data protection regulation obliges every service provider to show a privacy policy on his web site. Many privacy policies are too long, too complicated to understand, and reading them is hardly appealing. To enhance the user’s awareness on who is collecting and handling their personal data for what purpose and to depict core information of the policy, privacy icons could be used in addition to written policies. Further, specific privacy icons could be helpful for expressing possible, planned or performed data processing between individuals, e.g., in social networks. Keywords: privacy icons, privacy policies, privacy pictograms, social networks.
1 Introduction1 Every person has an individual view on her privacy, what to protect and what information to share with others. Effective protection of informational privacy [1] requires clarity on the data processing and possible consequences for the individual so that rendering a decision on when to disclose which personal data to whom bases on correct information. However, users are rarely aware of the planned or actual data processing or other aspects possibly relevant to their privacy. Sometimes the necessary information is not given by the data controllers, but even if they show the legally demanded information in their web site’s privacy policy, most users refrain from studying it. This was the reason for proposals of machine-readable privacy policies that could be interpreted by the user’s machine according to her preferences. The most popular attempt was the specification of P3P – Platform for Privacy Preferences by the World Wide Web Consortium [2]. Still, P3P or other policy languages that are being developed lack widespread implementations. The Art. 29 Data Protection Working Party pursued another approach: multilayered privacy policies [3] should display the most relevant information on the first 1
The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 216483. The information in this document is provided “as is”, and no guarantee or warranty is given that the information is fit for any particular purpose. The above referenced consortium members shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials subject to any liability which is mandatory due to applicable law.
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 338–348, 2011. © IFIP International Federation for Information Processing 2011
Towards Displaying Privacy Information with Icons
339
layer: the identity of the controller, the purposes of processing and any additional information which in view of the particular circumstances of the case must be provided beforehand to ensure a fair processing. This first layer, the so-called “short notice”, should be directly visible to all users concerned. The second and third layer would give more detailed information to interested persons. However, also this approach is rarely implemented. Instead, privacy policies consist of legalese that is not appealing to most users, and in addition they often lack preciseness or leave out important parts to enable data protection authorities to assess whether the outlined data processing is legally compliant or not. Another approach of showing core aspects of a privacy policy is the approach of using (privacy) icons. In general icons are used to visualize specific statements or properties, e.g., for emergency fire exits or subway stations. Well designed icons may allow for quick comprehensibility for everybody who is not visually impaired. Note that often today’s icons are not fully self-explaining, but can either be understood from the context they are used in or they belong to the general knowledge that has to be learned. Privacy icons should offer at least some valuable information on a firstglance basis for users and point to core issues related with the processing of data in a given case. The PrimeLife project investigates in designing such icons for different scenarios and testing them with users [4]. This text will provide insight into PrimeLife’s work in progress on privacy icons. It is organized as follows: Since various privacy icon sets have been proposed in the last few years, section 2 will discuss related work. Section 3 shows some privacy icons that are currently being evaluated: some are designated to be used in e-commerce or other classical client-server scenarios, others are rather relevant in social networks or in peer-to-peer settings. Section 4 provides first results of a user test and an online survey for evaluation of alternative privacy icons in the PrimeLife project. Section 5 summarizes the results and gives an outlook.
2 Related Work “Privacy icons” are understood as simplified pictures expressing privacy-related statements. Various areas of use can be distinguished [5]: 1.
2. 3.
4.
statements on results of data protection audits or similar evaluations concerning informational privacy relevant components of data processing, e.g., privacy seals or trust marks, statements on how well a situation matches the privacy preferences of a user, e.g., Cranor’s PrivacyBird for P3P [6], statements from privacy policies on planned or performed processing of potentially personal data or on guarantees concerning the use of these data, e.g., proposals from Rundle [7], Mehldau [8], Helton [9] and Raskin [10] as well as the evaluative approach in the KnowPrivacy report [11], statements on how personal data may be used by others, e.g., Bickerstaff strengthening the user’s perspective and proposing “Privacy Commons” analogue to “Creative Commons” [12], an icon set tailored to users in social networks by Iannella and Finden [13], or the Privicon proposal that senders of e-mails should be able to express easily how recipients should handle the message [14].
340
L.-E. Holtz, K. Nocun, and M. Hansen
Except for some trust marks and certain security-related icons, e.g., the SSL lock in web browsers showing the encryption status, none of the icon proposals in the privacy area has gained much outreach, yet. In addition, legal departments may advise service providers against implementing privacy icons in addition to (or even instead of!) their policy because they cannot be as expressive as the privacy policy which may cause a misunderstanding by users or supervisory authorities. However, current work being done on privacy icons does not aim at expressing all possible privacy-related aspects by these pictograms and thereby substitute the privacy policy. It is even pointed out that icons may be valuable for illustrating privacy policies to help users in understanding the text of the policy as well as getting used to the icons and learning on the fly what the icons stand for [4]. Research on icons especially for social networks that combine both pictograms for the privacy policy of the provider and pictograms for expressing the peer-to-peer aspects of social networks are still in the early stages of development [13].
3 Approaches to Implement Privacy Icons Privacy icons could have a vast area of usage: for indicating rights and limitations for own data provided via e-mail, social networks or blogs, for web sites showing prominently their illustrated privacy policy, for web sites providing machine-readable policies to be interpreted by the user’s software, or even for third-party services commenting others’ privacy policies [e.g., 11]. Today, the use of icons alone, i.e., without a written privacy policy spelling out the details, cannot be a sufficient substitute for the information that has to be provided to the user. Thus, privacy icons can be used in association with a written privacy policy. It is important to note that the documents from the Art. 29 Working Party such as [3] do not oppose the idea of icons. Catchy icons may be more attractive and informative for a large group of people than lengthy texts in a technical or legal language. In the PrimeLife project icon sets have been developed for general use as well as for specific use in social networks, as exemplarily shown in the following sections. 3.1 Icons for General Usage The developed icon set for general usage includes categories like types of data, purposes and data processing steps. Fig. 1 shows a few examples of icons for general usage.
Fig. 1. Excerpt of possible icons for general usage
Towards Displaying Privacy Information with Icons
341
3.2 Icons for an e-Commerce Scenario An icon set in an e-commerce scenario or other client-server applications dealing with personal data should tackle data types that usually play a role in these settings, deal with timely erasure of data (e.g., if IP addresses are stored for a short period of time) and comprise icons for specific purposes such as shipping, cf. Fig. 2. The purpose “legal obligations” does not inform users to a sufficient extent about the exact purpose, but it calls for getting more information on at least the specific regulation obliging the data controller.
Fig. 2. Excerpt of possible icons in an e-commerce scenario
3.3 Icons for a Social Network Scenario In social networks additional privacy-related statements are helpful for users, in particular to visualize who – mostly in addition to the provider in the social network – will get access to which information or what happens to their data on the server of the social network [13]. Fig. 3 deals with possible icons for recipients of data pieces of social network users. These icons could also be used in combination with configuring privacy settings, e.g., to directly select individuals that may or must not get access to personal data. In addition they may work as reminder whenever the user looks at her profile. Note that it is work in progress: The selection of icons in Fig. 3 is an excerpt from different strands of icon development where alternatives are evaluated in user tests.
342
L.-E. Holtz, K. Nocun, and M. Hansen
Fig. 3. Excerpt of possible icons in a social network scenario
4 Online Survey and First Test Results Privacy icons should allow for quick comprehension by all possible groups of users regardless of their cultural or social background. The different constructions of privacy and individual freedom should not hamper grasping the meaning of icons. Social factors like education and age must not restrict their user-friendliness. Furthermore, it should be possible to understand the icons within different legal frameworks. In order to create icons that are generally understandable by an international target group, the employment of symbols which are not limited to certain areas or countries is crucial. The shape of such icons might serve as an example here: the icons that have been developed and tested have a circular shape and not, e.g., a triangular shape that is widely associated with warning symbols. The developed icons also refrain from color use, but are simply black and white because colors like red, orange or yellow often have a warning function, too. Further, since some users are color-blind, the correct interpretation of icons should not depend on usage of colors. Moreover, the icons should be designed in a fashion that enables a thorough depiction of information. Varied icon sets have been designed and evaluated in the PrimeLife project. One test with about 20 students from Sweden and China was performed at Karlstad University (KAU) in Sweden. The PrimeLife project has also assessed the privacy icons by way of an online survey, interviewing 70 participants from at least ten different countries. The test results from KAU plead for the assumption that the icons shown in Fig. 4 seem to be good approaches. The results from the online survey showed similar but in particular more granulated results. For instance, the participants were asked to decide between two alternative icons or to rate them according to their understandability, clearness, and feasibility. But most importantly, every question left enough room for the interviewees to add comments or suggestions of their own and to elaborate on their points of critique or approval.
Towards Displaying Privacy Information with Icons
343
Fig. 4. Excerpt of icons for general usage tested by KAU
The survey returned occasionally quite surprising results. Some icons which were deemed rather suitable by the developers were rejected by the interviewees. However, the major part of the survey outcome revealed that the development of such icons were well worth the effort. The icon for shipping might serve as a good example for an internationally comprehensible icon. In the survey an alternative icon was presented to the participants showing a posthorn (cf. Fig. 5). A few hundred years ago, the postal service was characterized by blowing a posthorn, i.e., a bugle that served as a widely noticeable audio signal. Even later when the posthorn was not used any more by the postal service, its picture can be found on letterboxes, post office vans or in company logos of public or private postal services in least in many European countries. Although most of the participants of the survey understood the meaning of this symbol, they doubted that users from abroad would be able to, since they lacked that specific knowledge. For this reason, the icon depicting a parcel was preferred by the majority of the interviewees. This example stresses the importance of knowledge regarding the historical background of some symbols, for instance. While they seemed to be perfectly suitable inside certain areas or countries, they were deemed inappropriate for the integration into an internationally standardized set of privacy icons.
Fig. 5. Icons for shipping
The icons for payment data (cf. Fig. 6) and medical care (cf. Fig. 7) show further examples of rather unapt icons. While the icons presented in Fig. 6 and Fig. 7 on the left side returned a very good rating, the alternative suggestions were seen as unsuitable. The accompanying comments revealed that the interviewees thought the efforts involved in making these icons understandable in an international context stood in no relation to the possible advantages of such an endeavor. The alternative
344
L.-E. Holtz, K. Nocun, and M. Hansen
icons contained various symbols which were closely designed related the meaning of health care or money respectively in several areas of the world. But the participants felt rather confused about different symbols.
Approval rate2: 70%
Approval rate: 15,7%
Fig. 6. Icons for payment/banking data
Approval rate: 1,4 %
Approval rate: 50 %
Fig. 7. Icons for medical data
The survey showed that instead of gaining from this fusion, the icons rather lost the ability to transfer the intended message. For example, the depiction of different currencies made the participants think that only these three major currencies were accepted. The alternative icon for medical data was criticized for being too crowded. In order to create icons that are generally understandable by users of all ages, it is crucial to employ symbols that are not limited to a specific time or technology used in a certain period of time. This might be depicted with the example in Fig. 8: The symbol for storage consists of a floppy disk pictogram. This icon got a good rating in the KAU test. Although this icon was very easy to recognize and to link to the process of data storage, most participants of the online survey still felt uncomfortable about it. They argued that younger generations of internet users might not be used to a floppy disk pictogram since they had never used this storage medium and preferred CDROM or USB devices instead. Because of this critique most survey participants rated this icon as inadequate.
Fig. 8. Icon for storage 2
The approval rates are relative values compared to alternative icon proposals.
Towards Displaying Privacy Information with Icons
345
In order to create icons which are easily as well as intuitively recognizable, the symbols employed must be as simplified as possible. Especially when there is only very limited space for an icon, the recognizability of such icons must still be ensured. Thus, a high degree of simplification is essential. Some comments therefore suggested to concentrate on certain aspects of an icon. For instance, one of the icons meaning pseudonymization depicts a person wearing a mask. Several participants proposed to utilize only the mask instead and to employ simplifications rather than detailed depictions (cf. Fig. 9).
Fig. 9. Icon for pseudonymization
In order to clarify that every privacy icon is part of a larger set of icons, they must also adapt a common design. Some sets of icons which were presented in the survey transferring the same messages used different margins or varied in quality of design (cf. Fig. 10). This was perceived as highly suggestive. Therefore, it is required to define certain standards – such as a circular shape – for all basic parameters of the design in order to standardize their appearance. Identification
“Friends” in SNSs
Tracking Tools
Fig. 10. Different design styles
The survey has not only supported the endeavours to find crucial aspects for the development of an internationally standardized set of icons, but has also revealed that the previous efforts had been fruitful. There were a couple of icons that were perceived as outstanding ways of depicting complex content by more than three quarters of the participants (cf. Fig. 11). The high approval rate might testify to the success of one of the project’s goals, namely the depiction of complex scenarios of data processing in online services. The intermediate results of the tests indicate shortcomings of the existing approaches as well as good solutions and will therefore help developing final icon sets. Future surveys might help to identify further suitable icons and to develop an overall concept of privacy icons which could enhance users’ control of their privacy management in day-to-day life.
346
L.-E. Holtz, K. Nocun, and M. Hansen
Fig. 11. Examples for very well fitting icons
Still there are some predicaments for a significant result which are important for the design of future surveys. The number of questioned male and female participants should be balanced in order to be representative for the population. Also people of every age should be asked to prevent difficulties in understanding for certain groups. Furthermore the set should include interviewees from different cultures and countries as well as people of different educational background to guarantee the significance of the results on a wider scale. Taking these aspect in consideration, future surveys will improve the efforts to create a set of icons that fits to the scope and the scale of the intended purposes and help to make privacy issues better understandable for the users of the World Wide Web.
5 Conclusions and Outlook Privacy icons may be important means of conveying relevant information about the processing of personal data to a user and thereby enhance her awareness concerning her privacy. Also there are several obstacles when trying to develop and promote icon schemes that are understandable world wide, the amount of research groups working on that topic and exchanging their ideas looks promising for getting at least a few – hopefully standardized – icons in at least some specific areas. However, clarity on the meaning and extent of legal binding should be achieved. The icons that are being developed in the project PrimeLife have been evaluated in user tests that involve individuals from different cultures (e.g., Swedish and Chinese users). These tests confirm that indeed there may be cultural differences in understanding specific icons, e.g., the interpretation of a posthorn in the meaning of postal services was not understood by the Chinese test users. The preliminary results show that the large icon sets should be reduced to that extent and complexity that interested users will be able to understand and to deal with. The usability should be improved, among others, by providing information about the icons’ meaning via the mouse-over function and links to the concerning part of the written privacy policy. In the next iteration, the improved icon sets and proposals for their integration in applications will be evaluated again and put forward for public discussion. Special attention will be given to possibilities of combining the icon approach with machine-readable service policies and user preferences. The development in this research field over the last decade, starting from the work on P3P [2], has meanwhile led to the PLING working group with ongoing discussion on languages and frameworks as well as their interoperability [14]. While the singular use of either privacy icons or machine-readable policies have already some advantages, their
Towards Displaying Privacy Information with Icons
347
combination can be even more fruitful, provided that their semantics including legal effects and conditions are clear, service providers and system developers see the benefit, and at best the data protection authorities give their blessing.
References 1. Solove, D.: Understanding Privacy. Harvard University Press, Cambridge (2008) 2. Platform for Privacy Preferences (P3P) Project: P3P1.0 Specification, W3C Recommendation 2002 / P3P1.1 Specification, W3C Working Group Note 2006 (2002/2006), http://www.w3.org/P3P/ 3. Art. 29 Working Party: Opinion 10/2004 on More Harmonised Information Provisions, WP 100, 11987/04/EN (November 2004), http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/ 2004/wp100_en.pdf 4. Fischer-Hübner, S., Wästlund, E., Zwingelberg, H. (eds.): UI prototypes: Policy administration and presentation version 1. Deliverable D4.3.1 of the EC FP7 project PrimeLife (2009), http://www.primelife.eu/images/stories/deliverables/ d4.3.1-ui_prototypes-policy_administration_and_ presentation_v1.pdf 5. Hansen, M.: Putting Privacy Pictograms into Practice – A European Perspective. In: Fischer, S., Maehle, E., Reischuk, R. (eds.) Proceedings of Informatik 2009 – Im Focus das Leben. LNI P, vol. 154, pp. 1703–1716. Köllen Verlag, Bonn (2009) 6. Cranor, L.F.: Privacy Policies and Privacy Preferences. In: Cranor, L.F., Garfinkel, S. (eds.) Security and Usability – Designing Secure Systems That People Can Use, pp. 447– 471. O’Reilly, Sebastopol (2005) 7. Rundle, M.: International Data Protection and Digital Identity Management Tools. Presentation at IGF 2006, Privacy Workshop I, Athens (2006), http://identityproject.lse.ac.uk/mary.pdf 8. Mehldau, M.: Iconset for Data-Privacy Declarations v0.1 (2007), http://netzpolitik.org/wp-upload/data-privacy-icons-v01.pdf 9. Helton, A.: Privacy Commons Icon Set (2009), http://aaronhelton.wordpress.com/2009/02/20/privacy-commonsicon-set/ 10. Raskin, A.: Privacy Icons – Making your online privacy rights understandable. Project web site (2010), http://www.drumbeat.org/project/privacy-icons 11. Gomez, J., Pinnick, T., Soltani, A.: KnowPrivacy, June 1 (2009), http://www.knowprivacy.org/report/KnowPrivacy_Final_Report. pdf, therein: Policy Coding Methodology (2009), http://www.knowprivacy. org/policies_methodology.html 12. Bickerstaff, R.: Towards a Commons Approach to Online Privacy – a “Privacy Commons”. Presentation at SCL Information Governance Conference 2008, London (May 2008), http://www.healthymedia.co.uk/scl-2008-may-governance/ pdf/scl-2008-05-privacy-commons-roger-bickerstaff.pdf, Updated presentation: Towards a Commons Approach to Online Privacy for Social Networking Services – a Privacy Commons (2009), http://www.ico.gov.uk/upload/ documents/pio_conference_2009/roger_bickerstaff_birdandbird_ presentation.pdf
348
L.-E. Holtz, K. Nocun, and M. Hansen
13. Iannella, R., Finden, A.: Privacy Awareness: Icons and Expression for Social Networks. In: 8th International Workshop for Technical, Economic and Legal Aspects of Business Models for Virtual Goods Incorporating the 6th International ODRL Workshop, Namur, Belgium (2010), http://semanticidentity.com/Resources/Entries/2010/7/ 1_Virtual_Goods_+_ODRL_Workshop_2010_files/vg+odrl2010-wspaper.pdf 14. Privicons project (2010), http://www.privicons.org/projects/icons/ 15. W3C Policy Languages Interest Group, PLING (2010), http://www.w3.org/Policy/pling/wiki/Main_Page
Andreas Pfitzmann 1958-2010: Pioneer of Technical Privacy Protection in the Information Society Hannes Federrath, Marit Hansen, and Michael Waidner*
Abstract. On September 23rd 2010, Prof. Dr. Andreas Pfitzmann died at the age 52 after a short but serious illness. The focus of his reasoning had been the individual and with him the society, in which he lives. During his life as a researcher Andreas Pfitzmann contributed decisively and groundbreakingly to the technical implementation of the constitutional right to informational selfdetermination.
The academic career of Andreas Pfitzmann began in 1982 at the University of Karlsruhe as a research fellow at the chair of Prof. Winfried Görke. From the beginning he was certain that even though his research work had to have a strong technical core, it was at the same time more important to have social significance and value. In 1983, the Federal Constitutional Court of Germany developed in its Census Decision the term “Right to Informational Self-Determination“. Andreas Pfitzmann was one of the first to recognize that the implementation of this right could only succeed if law and technology interacted. In 1983, the Federal Constitutional Court declared: “Under the modern conditions of data processing free development of personality requires the protection of the individual against the unlimited survey, storage, use and disclosure of his personal data. [...] Whoever cannot overlook with sufficient certainty, which of the information regarding him in certain areas of his social environment are known, and whoever cannot measure the knowledge of possible communication partners to any degree, can be fundamentally limited in his personal freedom to plan and make decisions out of his own self-determination. A society in which the citizen cannot know anymore, who knows what, when and at which opportunity about him, is not compatible with the right to informational self-determination.“ (1. BvR 209/83 paragraph C II.1, p. 43) Working for a university department specialized in computer architecture and fault tolerance, Andreas Pfitzmann began to research network anonymity, pseudonyms, signatures and electronic legal transactions in 1983. Together with his students then and later colleagues, Birgit Pfitzmann and Michael Waidner, he founded a working group and converted his office into the “Café Pfitzmann“ as the group, that worked there more or less around-the-clock, consumated enormous quantities of coffee, peppermint tea and chocolate. *
Our thanks for references, amendments und corrections go to Katrin Borcea-Pfitzmann, Rüdiger Dierstein, Rüdiger Grimm, Hermann Härtig, Steffen Hölldobler, Hartmut Pohl, Kai Rannenberg and Manfred Reitenspieß. The text was translated from German to English by Donate Reimer. An extended German version of this text has been published in Informatik Spektrum Heft 1, 2011.
S. Fischer-Hübner et al. (Eds.): Privacy and Identity 2010, IFIP AICT 352, pp. 349–352, 2011. © IFIP International Federation for Information Processing 2011
350
H. Federrath, M. Hansen, and M. Waidner
Within a year the group developed basic terms for what was later to be called “Privacy by Technology“ and “Multilateral Security“: Privacy has to be supported, controlled and finally enforced by technology. Privacy cannot only be achieved by law. Systems that are used by multiple parties have to support the security interests of all these parties. Then revolutionary and utopian, these thoughts now are commonly used in Computer Science as „Privacy Enhancing Technologies (PETs)“. In 1984, Andreas Pfitzmann met the American cryptographer David Chaum. Chaum then worked for the CWI in Amsterdam, where he developed the cryptographic background for pseudonymity and anonymity within networks. Andreas Pfitzmann soon recognized the practical potential of the theoretical works of David Chaum and an intensive working relationship developed between the two groups that continued over many years. Andreas Pfitzmann began to probe the theoretical concepts of David Chaum and others as to their pracitical value. Then as now, privacy and security were seen as in opposition to one another within the political debate. With their publicaton “Legal Security despite Anonymity“ the group around Andreas Pfitzmann tried to explain coherently to lawyers and technicians how privacy and security could be made compatible. Approximately from 1987 on, Andreas Pfitzmann began to analyze the developed concepts and methods for privacy enhancing technologies in prototypical implementations and system concepts. Together with students in Karlsruhe and later in Hildesheim he developed the presumably first data protection practicum in Germany. In 1988, the group around Andreas Pfitzmann developed and analyzed for the first time the concept of “ISDN-Mixes“ – the first practically applicable method for anonymous communication in real time. With his method and many of his other ideas, Andreas Pfitzmann was ahead of the main stream by 5-10 years: What used to be labeled as utopian with regards to ISDN, proved to be visionary and groundbreaking with the success of the internet in the mid-90s. In 1989 Andreas Pfitzmann earned his Ph.D. for his dissertation: “Services Integrating Communication Networks with Participant-verifiable Privacy“ and in 1991, he moved on as an assistant professor to the chair of Prof. Joachim Biskup at the University of Hildesheim. Together with David Chaum he applied for the EU project “CAFÉ“, which implemented and demonstrated the practical application of the first secure and anonymous smart card based payment system. In 1993, Andreas Pfitzmann received a professorship at the Technical University of Dresden. With his promotion to professor he began his scientific work at the TU Dresden. His article on the protection of mobile participants from monitoring and localization published in 1993 in the German journal “Datenschutz und Datensicherheit“ (= Data Protection and Data Security) was trendsetting for his work at the TU Dresden within the first four years. In the beginning he dealt with the application and adaption of “ISDN-Mixes“ to GSM-based mobile networks. With research funds from the German Research Foundation and the Gottlieb-Daimler- and Karl-Benz Foundation, the newly founded working group around Andreas Pfitzmann developed methods for mobile networks that contrasted the popular acceptance that mobile network operators had to know the constant geographical locations of its users, with new solutions regarding the protection of confidentiality. Suddenly it was
Andreas Pfitzmann 1958-2010: Pioneer of Technical Privacy Protection
351
possible to be available by mobile phone without the network operator always knowing ones current whereabouts. Within the context of the Daimler-Benz Kolleg „Security in Communication Technology“ the terms “Technical Privacy“ and “Multilateral Security“ were developed further in the years 1993-1999 by him and other scientists, mainly the head of the Kolleg, Prof. Günter Müller, and the Kolleg coordinator, Kai Rannenberg. With regard to “Multilateral Security“ the protection interests of all participants had to be embraced as well as the resulting protection conflicts at the time of establishment of any communication connection. Andreas Pfitzmann received great recognition from science, industry and politics. The Alcatel SEL Foundation awarded him with the research prize “Technical Communication 1998“, which was a milestone in the public perception and general acceptance of his works. Among his great political successes was his work during the crypto-debate of the 90s. When, around 1997, the experience of governmental powerlessness with regard to surveilling internet communication assumed grotesque shapes, the scientist and citizen Andreas Pfitzmann unremittingly fought for the free und unlimited application of cryptography in the internet. One of his essential messages was that with a ban on crypto, criminals could use unobservable technical concealment possibilities, while innocent citizens became transparent persons for the state. From 2000 on, important works of Andreas Pfitzmann and his group concentrated on the area of anonymous communication in the internet. Together with Hannes Federrath and Marit Hansen, he successfully requested adequate research projects from the German Research Foundation and the Federal Ministry of Economics. With the internet anonymization service AN.ON he produced one of the first self-protection tools for citizens and companies alike, which made the early theoretical works for the internet practically applicable. His teaching and students were most important to him. In 2001, he was the first to receive the Best Course Award of the Department of Computer Science at the TU Dresden for the best course in the graduate study programme. As a long-standing dean of the Department of Computer Science, he lived and cultivated the unity – and freedom – of research and teaching. With his expert report on the “Modernization of Privacy“, commissioned by the Federal Ministry of the Interior in 2001, which he co-edited with Prof. Alexander Roßnagel and Prof. Hansjürgen Garstka, he hoped that his ideas of technical privacy protection would also be reflected in legislation. The upcoming amendment of the Data Protection Act can make this wish finally – almost a decade later – come true. Again, it becomes apparent that Andreas Pfitzmann was ahead of his time. In any case, his latest works on the extension of the classical protection goals (confidentiality, integrity and availability) by special privacy protection goals such as transparency and unlinkability will have an impact on legislation. Federal data protection commissioners have referred to them in their current discussion. Andreas Pfitzmann was invited as an expert to the political debate as well as requested by different courts, among others on issues like the application of biometry, on data retention and on online investigation. He particularly received considerable attention as an active expert for the Federal Constitutional Court on online investigation in 2007. Hence, he contributed to the phrasing of a new “Computer
352
H. Federrath, M. Hansen, and M. Waidner
Constitutional Right“ through the Federal Constitutional Court in February 2008 for the “Right to the Guarantee of Confidentiality and Integrity in IT Systems“. The topic of “Anonymity“ in its diverse facets formed a big part of his research works. During his “Workshops on Design Issues in Anonymity and Unobservability“, organized in 2000, from which the yearly „Privacy Enhancing Technologies Symposium“ (PETS) developed, he started the attempt to systematically edit the terminology of Anonymity and related terms. Presented by Andreas Pfitzmann and Marit Hansen, the „Terminology-Paper“ was improved with the help of contributions from the community over the last 10 years [http://dud.inf.tu-dresden.de/Anon_ Terminology.shtml]. In the early days, Andreas Pfitzmann’s view on privacy protection was dominated by the concept of data minimisation: If there are no personal data, there is no risk that they will be misused. As in many cases absolute data avoidance is impossible, he expanded his view on privacy protection by the principle of control through the individual concerned – this fits in well with the right to informational selfdetermination and the concept of multilateral security. His research works – since 2000 – on the issue of identity management also emphasized this. From 2004 until recently, he and his research group published within the scope of the EU-funded projects “PRIME – Privacy and Identity Management for Europe“ and “PrimeLife“ important contributions in the area of “Privacy Enhancing Identity Management“ in the online world. Furthermore, he was significantly involved in the European Network of Excellence “FIDIS – Future of Identity in the Information Society“ (2004-2009). He was able to sketch his latest suggestions on privacy concepts, which on the one hand should enable life-long privacy and on the other hand should provide a contextual binding of personal data, at the IFIP/PrimeLife Summer School in August 2010- therewith giving new impulses to the PrimeLife Project.
Visionary and Pioneer Andreas Pfitzmann was a visionary and a pioneer. With his distinctive observation skills, his deep understanding of details, his high intelligence and determination to bring together people with similar – as well as different – interests, he contributed invaluably as a scientist and human being to improving our world.
Author Index
Agrafiotis, Ioannis 271 Alkassar, Ammar 120 Andrade, Norberto Nuno Gomes de Ashenden, Debi 311
Leitold, Herbert 144 Lievens, Stijn 297
Becker, Claudia 219 Benameur, Azzedine 283 Berg, Manuela 15 Berthold, Stefan 27 Borcea-Pfitzmann, Katrin 15 Casassa Mont, Marco 258 Chadwick, David W. 297 Coopamootoo, Periambal L. Coudert, Fanny 231 Creese, Sadie 258, 271 De Decker, Bart 164 de Meer, Hermann 120 Dobias, Jaromir 244 Dowd, Michael 78 Fatema, Kaniz 297 Federrath, Hannes 349 Fischer, Stefan 219 Flick, Catherine 64 Friedewald, Michael 1 Fritsch, Lothar 52 Goldsmith, Michael
258, 271
Haas, Sebastian 120 Hansen, Marit 338, 349 Herkenh¨ oner, Ralph 120 Holtz, Leif-Erik 338
90
Keenan, Thomas P. 108 K¨ onig, Ulrich 325 Kuczerawy, Aleksandra 231
M¨ uller, G¨ unter
120
Naessens, Vincent Nocun, Katharina
311
164 338
Paintsil, Ebenezer 52 Papanikolaou, Nick 258, 271 Pearson, Siani 258, 283 Rajbhandari, Lisa 41 Rothenpieler, Peter 219 Royer, Denis 120 Sch¨ utz, Philip 1 Shahmehri, Nahid 130 Snekkenes, Einar Arthur 41 Stahl, Bernd Carsten 64 Strauß, Stefan 206 van den Berg, Bibi 178 van Deursen, Ton 192 Vapen, Anna 130 Verhaeghe, Pieter 164 Vossaert, Jan 164 Waidner, Michael Zwingelberg, Harald
349 151