Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
6334
Yiyu Yao Ron Sun Tomaso Poggio Jiming Liu Ning Zhong Jimmy Huang (Eds.)
Brain Informatics International Conference, BI 2010 Toronto, ON, Canada, August 28-30, 2010 Proceedings
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Yiyu Yao University of Regina, Regina, SK, Canada E-mail:
[email protected] Ron Sun Rensselaer Polytechnic Institute, Troy, NY, USA E-mail:
[email protected] Tomaso Poggio Massachusetts Institute of Technology, Cambridge, MA, USA E-mail:
[email protected] Jiming Liu Hong Kong Baptist University, Kowloon Tong, Hong Kong E-mail:
[email protected] Ning Zhong Maebashi Institute of Technology, Maebashi-City, Japan E-mail:
[email protected] Jimmy Huang York University, Toronto, ON, Canada E-mail:
[email protected] Library of Congress Control Number: 2010932525 CR Subject Classification (1998): I.2, I.4, I.5, H.3, H.5, H.4 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-15313-5 Springer Berlin Heidelberg New York 978-3-642-15313-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
This volume contains the papers selected for presentation at The 2010 International Conference on Brain Informatics (BI 2010) held at York University, Toronto, Canada, during August 28–30, 2010. It was organized by the Web Intelligence Consortium (WIC), the IEEE Computational Intelligence Society Task Force on Brain Informatics (IEEE-CIS TF-BI), and York University. The conference was held jointly with the 2010 International Conference on Active Media Technology (AMT 2010). Brain informatics (BI) has emerged as an interdisciplinary research field that focuses on studying the mechanisms underlying the human information processing system (HIPS). It investigates the essential functions of the brain, ranging from perception to thinking, and encompassing such areas as multi-perception, attention, memory, language, computation, heuristic search, reasoning, planning, decision-making, problem-solving, learning, discovery, and creativity. The goal of BI is to develop and demonstrate a systematic approach to achieving an integrated understanding of both macroscopic and microscopic-level working principles of the brain, by means of experimental, computational, and cognitive neuroscience studies, as well as utilizing advanced Web intelligence (WI)-centric information technologies. BI represents a potentially revolutionary shift in the way that research is undertaken. It attempts to capture new forms of collaborative and interdisciplinary work. In this vision, new kinds of BI methods and global research communities will emerge, through infrastructure on the wisdom Web and knowledge grids that enable high-speed and distributed, large-scale analysis and computations, and radically new ways of sharing data/knowledge. The Brain Informatics Conferences started with the First WICI International Workshop on Web Intelligence meets Brain Informatics (WImBI 2006), held at Beijing, China, December 15–16, 2006. The second conference, Brain Informatics 2009, was held again in Beijing, China, October 22–24, 2009. This series is the first conference specifically dedicated to interdisciplinary research in BI and provides an international forum to bring together researchers and practitioners from diverse fields, such as computer science, information technology, artificial intelligence, Web intelligence, cognitive science, neuroscience, medical science, life science, economics, data mining, data science and knowledge science, intelligent agent technology, human–computer interaction, complex systems, and systems science, to present the state of the art in the development of BI, and to explore the main research problems in BI that lie in the interplay between the studies of the human brain and the research of informatics. All the papers submitted to BI 2010 were rigorously reviewed by three committee members and external reviewers. The selected papers offered new insights into the research challenges and development of BI.
VI
Preface
There are bidirectional mutual support tracks of BI research. In one direction, one models and characterizes the functions of the human brain based on the notions of information processing systems. WI-centric information technologies are applied to support brain science studies. For instance, the wisdom Web, knowledge grids, and cloud computing enable high-speed, large-scale analysis, simulation, and computation as well as new ways of sharing research data and scientific discoveries. In another direction, informatics-enabled brain studies, e.g., based on fMRI, EEG, and MEG, significantly broaden the spectrum of theories and models of brain sciences and offer new insights into the development of human-level intelligence toward brain-inspired wisdom Web computing. BI 2010 had a very exciting program with many features, ranging from keynote talks, regular technical sessions, WIC featured sessions and social programs. All of these would not have been possible without the great support of the authors in submitting and presenting their best and latest research results, the distinguished contributions of keynote speakers, Vinod Goel (York University, Canada), Jianhua Ma (Hosei University, Japan), Ben Shneiderman (University of Maryland, USA) and Yingxu Wang (University of Calgary, Canada), in preparing and delivering their very stimulating talks, and the generous dedication of the Program Committee members and the external reviewers in reviewing the submitted papers. We wish to express our gratitude to all authors, the keynote speakers, and the members of the Conference Committees for their instrumental and unfailing support. BI 2010 could not have taken place without the great team effort of the Local Organizing Committee, the support of the International WIC Institute, Beijing University of Technology, China and York University, Canada. Our special thanks go to Aijun An, Juzhen Dong, Jian Yang, and Daniel Tao for organizing and promoting BI 2010 and coordinating with AMT 2010. We are grateful to the Springer Lecture Notes in Computer Science (LNCS/LNAI) team for their generous support. We thank Alfred Hofmann and Anna Kramer of Springer for their help in coordinating the publication of this special volume in an emerging and interdisciplinary research field. August 2010
Yiyu Yao Ron Sun Tomaso Poggio Jiming Liu Ning Zhong Jimmy Huang
Conference Organization
Conference General Chairs Tomaso Poggio Jiming Liu
Massachusetts Institute of Technology, USA International WIC Institute, Beijing University of Technology, China Hong Kong Baptist University, Hong Kong
Program Chairs Yiyu Yao
Ron Sun
International WIC Institute, Beijing University of Technology, China University of Regina, Canada Rensselaer Polytechnic Institute, USA
Organizing Chair Jimmy Huang
York University, Toronto, Canada
Publicity Chairs Jian Yang Daniel Tao
International WIC Institute, Beijing University of Technology, China Queensland University of Technology, Australia
IEEE-CIS TF-BI Chair Ning Zhong
Maebashi Institute of Technology, Japan International WIC Institute, Beijing University of Technology, China
WIC Co-chairs/Directors Ning Zhong Jiming Liu
Maebashi Institute of Technology, Japan Hong Kong Baptist University, Hong Kong
WIC Advisory Board Edward A. Feigenbaum Setsuo Ohsuga
Stanford University, USA University of Tokyo, Japan
VIII
Conference Organization
Benjamin Wah Philip Yu L.A. Zadeh
University of Illinois, Urbana-Champaign, USA University of Illinois, Chicago, USA University of California, Berkeley, USA
WIC Technical Committee Jeffrey Bradshaw Nick Cercone Dieter Fensel Georg Gottlob Lakhmi Jain Jianchang Mao Pierre Morizet-Mahoudeaux Hiroshi Motoda Toyoaki Nishida Andrzej Skowron Jinglong Wu Xindong Wu Yiyu Yao
UWF/Institute for Human and Machine Cognition, USA York University, Canada University of Innsbruck, Austria Oxford University, UK University of South Australia, Australia Yahoo! Inc., USA Compiegne University of Technology, France Osaka University, Japan Kyoto University, Japan Warsaw University, Poland Okayama University, Japan University of Vermont, USA University of Regina, Canada
Program Committee John R. Anderson Chang Cai Xiaocong Fan Mohand-Said Hacid D. Frank Hsu Kazuyuki Imamura Kuncheng Li Peipeng Liang Pawan Lingras Duoqian Miao Mariofanna Milanova Sankar Kumar Pal Frank Ritter Hideyuki Sawada Lael Schooler Tomoaki Shirao Andrzej Skowron Dominik Slezak
Carnegie Mellon University, USA National Rehabilitation Center for Persons with Disabilities, Japan The Pennsylvania State University, USA Universite Claude Bernard Lyon 1, France Fordham University, USA Maebashi Institute of Technology, Japan Xuanwu Hospital, China Beijing University of Technology, China Saint Mary’s University, Canada Tongji University, China University of Arkansas at Little Rock, USA Indian Statistical Institute, India Penn State University, USA Kagawa University, Japan Max Planck Institute for Human Development, Germany Gunma University Graduate School of Medicine, Japan Warsaw University, Poland University of Warsaw and Infobright Inc., Poland
Conference Organization
Diego Sona Piotr S. Szczepaniak Shusaku Tsumoto Frank van der Velde Guoyin Wang
Fondazione Bruno Kessler, Italy Technical University of Lodz, Poland Shimane University, Japan Leiden University, The Netherlands Chongqing University of Posts and Telecommunications, China Okayama University, Japan International WIC Institute, Beijing University of Technology, China University of Rome “Tor Vergata”, Italy Tsinghua University, China Georgia State University, USA Maebashi Institute of Technology, Japan International WIC Institute, Beijing University of Technology, China Fudan University, China
Jinglong Wu Jian Yang Fabio Massimo Zanzotto Bo Zhang Yanqing Zhang Ning Zhong Haiyan Zhou Yangyong Zhu
Additional Reviewers Paolo Avesani Emanuele Olivetti
Yang Mei Linchang Qin
Andrea Mognon Shujuan Zhang
IX
Table of Contents
Keynote Talks Fractionating the Rational Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinod Goel Cognitive Informatics and Denotational Mathematical Means for Brain Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yingxu Wang
1
2
Cognitive Computing An Adaptive Model for Dynamics of Desiring and Feeling Based on Hebbian Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tibor Bosse, Mark Hoogendoorn, Zulfiqar A. Memon, Jan Treur, and Muhammad Umair Modelling the Emergence of Group Decisions Based on Mirroring and Somatic Marking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark Hoogendoorn, Jan Treur, C. Natalie van der Wal, and Arlette van Wissen
14
29
Rank-Score Characteristics (RSC) Function and Cognitive Diversity . . . . D. Frank Hsu, Bruce S. Kristal, and Christina Schweikert
42
Cognitive Effort for Multi-agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luca Longo and Stephen Barrett
55
Behavioural Abstraction of Agent Models Addressing Mutual Interaction of Cognitive and Affective Processes . . . . . . . . . . . . . . . . . . . . . . Alexei Sharpanskykh and Jan Treur
67
Data Brain and Analysis The Effect of the Normalization Strategy on Voxel-Based Analysis of DTI Images: A Pattern Recognition Based Assessment . . . . . . . . . . . . . . . . Gloria D´ıaz, Gonzalo Pajares, Eduardo Romero, Juan Alvarez-Linera, Eva L´ opez, Juan Antonio Hern´ andez-Tamames, and Norberto Malpica
78
XII
Table of Contents
Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos . . . . . . . . . . . . . . . . Sander Koelstra, Ashkan Yazdani, Mohammad Soleymani, Christian M¨ uhl, Jong-Seok Lee, Anton Nijholt, Thierry Pun, Touradj Ebrahimi, and Ioannis Patras Brain Signal Recognition and Conversion towards Symbiosis with Ambulatory Humanoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasuo Matsuyama, Keita Noguchi, Takashi Hatakeyama, Nimiko Ochiai, and Tatsuro Hori Feature Rating by Random Subspaces for Functional Brain Mapping . . . Diego Sona and Paolo Avesani
89
101
112
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nasibeh Talebi and Ali Motie Nasrabadi
124
Comparing EEG/ERP-Like and fMRI-Like Techniques for Reading Machine Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Massimo Zanzotto and Danilo Croce
133
Improving Individual Identification in Security Check with an EEG Based Biometric Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qinglin Zhao, Hong Peng, Bin Hu, Quanying Liu, Li Liu, YanBing Qi, and Lanlan Li
145
Neuronal Modeling and Brain Modeling Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Baghdadi, Nac´era Benamrane, and Lakhdar Sais
156
Domain-Specific Modeling as a Pragmatic Approach to Neuronal Model Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ralf Ansorg and Lars Schwabe
168
Guessing What’s on Your Mind: Using the N400 in Brain Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marijn van Vliet, Christian M¨ uhl, Boris Reuderink, and Mannes Poel
180
A Brain Data Integration Model Based on Multiple Ontology and Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Li Xue, Yun Xiong, and Yangyong Zhu
192
Table of Contents
XIII
Perception and Information Processing How Does Repetition of Signals Increase Precision of Numerical Judgment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eike B. Kroll, J¨ org Rieger, and Bodo Vogt
200
Sparse Regression Models of Pain Perception . . . . . . . . . . . . . . . . . . . . . . . . Irina Rish, Guillermo A. Cecchi, Marwan N. Baliki, and A. Vania Apkarian
212
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink . . . . Chen Xie, Lun Zhao, Duoqian Miao, Deng Wang, Zhihua Wei, and Hongyun Zhang
224
Learning Attentional Disengage from Test-Related Pictures in Test-Anxious Students: Evidence from Event-Related Potentials . . . . . . . . . . . . . . . . . . . . Rui Chen and Renlai Zhou
232
Concept Learning in Text Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . Manas Hardas and Javed Khan
240
A Qualitative Approach of Learning in Parkinson’s Disease . . . . . . . . . . . . Delphine Penny-Leguy and Josiane Caron-Pargue
252
Cognition-Inspired Applications Modelling Caregiving Interactions during Stress . . . . . . . . . . . . . . . . . . . . . . Azizi Ab Aziz, Jan Treur, and C. Natalie van der Wal Computational Modeling and Analysis of Therapeutical Interventions for Depression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fiemke Both, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur A Time Series Based Method for Analyzing and Predicting Personalized Medical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qinwin Vivian Hu, Xiangji Jimmy Huang, William Melek, and C. Joseph Kurian Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer’s Disease . . . . . . . . . . . . . . . . . William L. Jarrold, Bart Peintner, Eric Yeh, Ruth Krasnow, Harold S. Javitz, and Gary E. Swan The Effect of Sequence Complexity on the Construction of Protein-Protein Interaction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Kargar and Aijun An
263
274
288
299
308
XIV
Table of Contents
Data Fusion and Feature Selection for Alzheimer’s Diagnosis . . . . . . . . . . Blake Lemoine, Sara Rayburn, and Ryan Benton A Cognitive Architecture Based on Neuroscience for the Control of Virtual 3D Human Creatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Felipe Rodr´ıguez, Francisco Galvan, F´elix Ramos, Erick Castellanos, Gregorio Garc´ıa, and Pablo Covarrubias Towards Inexpensive BCI Control for Wheelchair Navigation in the Enabled Environment – A Hardware Survey . . . . . . . . . . . . . . . . . . . . . . . . . Kenyon Stamps and Yskandar Hamam Expression Recognition Methods Based on Feature Fusion . . . . . . . . . . . . . Chang Su, Jiefang Deng, Yong Yang, and Guoyin Wang Investigation on Human Characteristics of Japanese Katakana Recognition by Active Touch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suguru Yokotani, Jiajia Yang, and Jinglong Wu
320
328
336 346
357
WICI Perspectives on Brain Informatics Towards Systematic Human Brain Data Management Using a Data-Brain Based GLS-BI System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianhui Chen, Ning Zhong, and Runhe Huang
365
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mi Li, Shengfu Lu, Jiaojiao Li, and Ning Zhong
377
Brain Activation and Deactivation in Human Inductive Reasoning: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peipeng Liang, Yang Mei, Xiuqin Jia, Yanhui Yang, Shengfu Lu, Ning Zhong, and Kuncheng Li Clustering of fMRI Data Using Affinity Propagation . . . . . . . . . . . . . . . . . . Dazhong Liu, Wanxuan Lu, and Ning Zhong Interaction between Visual Attention and Goal Control for Speeding Up Human Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rifeng Wang, Jie Xiang, and Ning Zhong The Role of Posterior Parietal Cortex in Problem Representation . . . . . . Jie Xiang, Yulin Qin, Junjie Chen, Haiyan Zhou, Kuncheng Li, and Ning Zhong Basic Level Advantage and Its Switching during Information Retrieval: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyan Zhou, Jieyu Liu, Wei Jing, Yulin Qin, Shengfu Lu, Yiyu Yao, and Ning Zhong Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
387
399
407 417
427
437
Fractionating the Rational Brain Vinod Goel York University, Canada http://www.yorku.ca/vgoel
Considerable progress has been made over the past decade in our understanding of the neural basis of logical reasoning. Unsurprisingly these data are telling us that the brain is organized in ways not anticipated by cognitive theory. In particular, they’re forcing us to confront the possibility that there may be no unitary reasoning system in the brain (be it mental models or mental logic). Rather, the evidence points to a fractionated system that is dynamically configured in response to certain task and environmental cues. I will review three lines of demarcation including (a) systems for heuristic and formal processes (with evidence for some degree of content specificity in the heuristic system), (b) conflict detection/resolution systems, and (c) systems for dealing with certain and uncertain inferences; and then offer a tentative account of how the systems might interact to facilitate logical reasoning. Sensitivity to data generated by neuroimaging and patient methodologies will move us beyond the sterility of mental models vs. mental logic debate and further the development of cognitive theories of reasoning.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, p. 1, 2010. c Springer-Verlag Berlin Heidelberg 2010
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics Yingxu Wang Director, International Institute of Cognitive Informatics and Cognitive Computing (IICICC) Director, Theoretical and Empirical Software Engineering Research Centre (TESERC) Dept. of Electrical and Computer Engineering, Schulich School of Engineering University of Calgary 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 Tel.: (403) 220 6141, Fax: (403) 282 6855
[email protected] http://enel.ucalgary.ca/People/wangyx
Abstract. Cognitive informatics studies the natural intelligence and the brain from a theoretical and a computational approach, which rigorously explains the mechanisms of the brain by a fundamental theory known as abstract intelligence, and formally models the brain by contemporary denotational mathematics. This paper, as an extended summary of the invited keynote presented in AMT-BI 2010, describes the interplay of cognitive informatics, abstract intelligence, denotational mathematics, brain informatics, and computational intelligence. Some of the theoretical foundations for brain informatics developed in cognitive informatics are elaborated. A key notion recognized in recent studies in cognitive informatics is that the root and profound objective in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs that were missing in the last six decades. A layered reference model of the brain and a set of cognitive processes of the mind are systematically developed towards the exploration of the theoretical framework of brain informatics. The current methodologies for brain studies are reviewed and their strengths and weaknesses are analyzed. A wide range of applications of cognitive informatics and denotational mathematics are recognized in brain informatics toward the implementation of highly intelligent systems such as world-wide wisdom (WWW+), cognitive knowledge search engines, autonomous learning machines, and cognitive robots. Keywords: Cognitive informatics, abstract intelligence, brain informatics, cognitive computing, cognitive computers, natural intelligence, artificial intelligence, machinable intelligence, computational intelligence, denotational mathematics, concept algebra, system algebra, RTPA, visual semantic algebra, granular algebra, eBrain, engineering applications.
1 Introduction The contemporary wonder of sciences and engineering has recently refocused on the starting point of them: how the brain processes internal and external information Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 2–13, 2010. © Springer-Verlag Berlin Heidelberg 2010
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
3
autonomously and cognitively rather than imperatively as those of conventional computers. The latest advances and engineering applications of CI have led to the emergence of cognitive computing and the development of cognitive computers that perceive, learn, and reason [9, 18, 20, 23, 24, 32]. CI has also fundamentally contributed to autonomous agent systems and cognitive robots. A wide range of applications of CI are identified such as in the development of cognitive computers, cognitive robots, cognitive agent systems, cognitive search engines, cognitive learning systems, and artificial brains. The work in CI may also lead to a fundamental solution to computational linguistics, Computing with Natural Language (CNL), and Computing with Words (CWW) [34, 35]. Cognitive Informatics is a term coined by Wang in the first IEEE International Conference on Cognitive Informatics (ICCI 2002) [6]. Cognitive informatics [6, 8, 11, 12, 26, 27, 28, 29, 31] studies the natural intelligence and the brain from a theoretical and a computational approach, which rigorously explains the mechanisms of the brain by a fundamental theory known as abstract intelligence, and formally models the brain by contemporary denotational mathematics such as concept algebra [Wang, 2008b], real-time process algebra (RTPA) [7, 16], system algebra [15, 30], and visual semantic algebra (VSA) [19]. The latest advances in CI have led to a systematic solution for explaining brain informatics and the future generation of intelligent computers. A key notion recognized in recent studies in cognitive informatics is that the root and profound objective in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs, which were missing in the last six decades. This is a general need and requirement for searching the metamethodology in any discipline particularly those of emerging fields where no suitable mathematics has been developed or of traditional fields where persistent hard problems have been unsolved efficiently or completely [1, 2, 4, 13]. This paper is an extended summary of the invited keynote lecture presented in the 2010 joint International Conferences on Active Media Technology and Brain Informatics (AMT-BI 2010), which covers some of the theoretical foundations of brain informatics (BI) developed in cognitive informatics and denotational mathematics. In this paper, cognitive informatics as the science of abstract intelligence and cognitive computing is briefly described in Section 2. The fundamental theories and expressive tools for cognitive informatics, brain Informatics, and computational intelligence, collectively known as denotational mathematics, are introduced in Section 3. Applications of cognitive informatics and denotational mathematics in BI and cognitive computing are elaborated in Sections 4, where the layered reference model of the brain and a set of cognitive processes of the mind are systematically modeled towards the exploration of the theoretical framework of brain informatics.
2 Cognitive Informatics: The Science of Abstract Intelligence and Computational Intelligence Information is the third essence of the word supplementing energy and matter. A key discovery in information science is the basic unit of information, bit, abbreviated from a “binary digit”, which forms a shared foundation of computer science and informatics.
4
Y. Wang
The science of information, informatics, has gone through three generations of evolution, known as the classic, modern, and cognitive informatics, since Shannon proposed the classic notion of information [5]. The classical information theory founded by Shannon (1948) defined information as a probabilistic measure of the variability of message that can be obtained from a message source. Along with the development in computer science and in the IT industry, the domain of informatics has been dramatically extended in the last few decades. This led to the modern informatics that treats information as entities of messages rather than a probabilistic measurement of the variability of messages as in that of the classic information theory. The new perception of information is found better to explain the theories in computer science and practices in the IT industry. However, both classic and modern views on information are only focused on external information. The real sources and destinations of information, the human brains, are often overlooked. This leads to the third generation of informatics, cognitive informatics, which focuses on the nature of information in the brain, such as information acquisition, memory, categorization, retrieve, generation, representation, and communication. Information in cognitive informatics is defined as the abstract artifacts and their relations that can be modeled, processed, stored and processed by human brains. Cognitive informatics [6, 8, 11, 12, 26, 27, 28, 29, 31] is emerged and developed based on the multidisciplinary research in cognitive science, computing science, information science, abstract intelligence, and denotational mathematics since the inauguration of the 1st IEEE ICCI’02 [6]. Definition 1. Cognitive informatics (CI) is a transdisciplinary enquiry of computer science, information science, cognitive science, and intelligence science that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, as well as their engineering applications in cognitive computing. CI is a cutting-edge and multidisciplinary research area that tackles the fundamental problems shared by modern informatics, computation, software engineering, AI, cybernetics, cognitive science, neuropsychology, medical science, philosophy, linguistics, brain sciences, and many others. The development and the cross fertilization among the aforementioned science and engineering disciplines have led to a whole range of extremely interesting new research areas. The theoretical framework of CI encompasses four main areas of basic and applied research [11] such as: a) fundamental theories of natural intelligence; b) abstract intelligence; c) denotational mathematics; and d) cognitive computing. These areas of CI are elaborated in the following subsections. Fundamental theories developed in CI covers the Information-Matter-Energy (IME) model [8], the Layered Reference Model of the Brain (LRMB) [28], the Object-Attribute-Relation (OAR) model of information/knowledge representation in the brain [12], the cognitive informatics model of the brain [23, 26], Natural Intelligence (NI) [8], and neuroinformatics [12]. Recent studies on LRMB in cognitive informatics reveal an entire set of cognitive functions of the brain and their cognitive process models, which explain the functional mechanisms of the natural intelligence with 43 cognitive processes at seven layers known as the sensation, memory, perception, action, meta-cognitive, metainference, and higher cognitive layers [28].
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
5
Definition 2. Abstract intelligence (αI) is a universal mathematical form of intelligence that transfers information into actions and behaviors. The studies on αI form a field of enquiry for both natural and artificial intelligence at the reductive levels of neural, cognitive, functional, and logical from the bottom up [17]. The paradigms of αI are such as natural, artificial, machinable, and computational intelligence. The studies in CI and αI lay a theoretical foundation toward revealing the basic mechanisms of different forms of intelligence [25]. As a result, cognitive computers may be developed, which are characterized as a knowledge processor beyond those of data processors in conventional computing. Definition 3. Cognitive Computing (CC) is an emerging paradigm of intelligent computing methodologies and systems that implements computational intelligence by autonomous inferences and perceptions mimicking the mechanisms of the brain. CC is emerged and developed based on the transdisciplinary research in cognitive informatics and abstract intelligence. The term computing in a narrow sense is an application of computers to solve a given problem by imperative instructions; while in a broad sense, it is a process to implement the instructive intelligence by a system that transfers a set of given information or instructions into expected intelligent behaviors. The essences of computing are both its data objects and their predefined computational operations. From these facets, different computing paradigms may be comparatively analyzed as follows: a) Conventional computing - Data objects: abstract bits and structured data - Operations: logic, arithmetic, and functional operations
(1a)
b) Cognitive computing (CC) - Data objects: words, concepts, syntax, and semantics - Basic operations: syntactic analyses and semantic analyses - Advanced operations: concept formulation, knowledge representation, comprehension, learning, inferences, and causal analyses (1b) The latest advances in cognitive informatics, abstract intelligence, and denotational mathematics have led to a systematic solution for the future generation of intelligent computers known as cognitive computers [9, 18]. Definition 4. A cognitive computer (cC) is an intelligent computer for knowledge processing that perceive, learn, and reason. As that of a conventional von Neumann computers for data processing, cCs are designed to embody machinable intelligence such as computational inferences, causal analyses, knowledge manipulations, learning, and problem solving. According to the above analyses, a cC is driven by a cognitive CPU with a cognitive learning engine and formal inference engine for intelligent operations on abstract concepts as the basic unit of human knowledge. cCs are designed based on contemporary denotational mathematics [13, 21], particularly concept algebra, as that of Boolean algebra for the conventional von Neumann architecture computers. cC is an important extension of conventional computing in both data objects modeling capabilities and their advanced operations at the abstract level of concept beyond bits. Therefore, cC is
6
Y. Wang
an intelligent knowledge processor that is much closer to the capability of human brains thinking at the level of concepts rather than bits. It is recognized that the basic unit of human knowledge in natural language representation is a concept rather than a word [14], because the former conveys the structured semantics of the latter with its intention (attributes), extension (objects), and relations to other concepts in the context of a knowledge network. Main applications of the fundamental theories and technologies of CI can be divided into two categories. The first category of applications uses informatics and computing techniques to investigate intelligence science, cognitive science, and knowledge science problems, such as abstract intelligence, memory, learning, and reasoning. The second category includes the areas that use cognitive informatics theories to investigate problems in informatics, computing, software engineering, knowledge engineering, and computational intelligence. CI focuses on the nature of information processing in the brain, such as information acquisition, representation, memory, retrieval, creation, and communication. Via the interdisciplinary approach and with the support of modern information and neuroscience technologies, intelligent mechanisms of the brain and cognitive processes of the mind may be systematically explored [33] within the framework of CI.
3 Denotational Mathematics: A Metamethodology for Cognitive Informatics, Brain Informatics, Cognitive Computing, and Computational Intelligence It is recognized that the maturity of a scientific discipline is characterized by the maturity of its mathematical (meta-methodological) means. A key notion recognized in recent studies in cognitive informatics and computational intelligence is that the root and profound problem in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs. This is a general need and requirement for searching the metamethodology in any discipline particularly the emerging fields where no suitable mathematics has been developed and the traditional fields where persistent hard problems have been unsolved efficiently or completely [1, 2, 3, 4, 10, 13]. Definition 5. Denotational mathematics (DM) is a category of expressive mathematical structures that deals with high-level mathematical entities beyond numbers and sets, such as abstract objects, complex relations, perceptual information, abstract concepts, knowledge, intelligent behaviors, behavioral processes, inferences, and systems. A number of DMs have been created and developed [13, 21] such as concept algebra [14], system algebra [15, 30], real-time process algebra (RTPA) [7, 16], granular algebra [22], visual semantic algebra (VSA) [19], and formal causal inference methodologies. As summarized in Table 1 with their structures, mathematical entities, algebraic operations, and usages, the set of DMs provide a coherent set of contemporary mathematical means and explicit expressive power for CI, αI, CC, AI, and computational intelligence.
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
7
Table 1. Paradigms of Denotational Mathematics Paradigm
Structure
Concept algebra (CA)
CA (C, OP, Θ) = ({O, A, Rc , Ri , Ro},
System algebra (SA)
SA (S, OP, Θ) = ({C, Rc , Ri , Ro , B, Ω},
Real-time process algebra (RTPA)
RTPA
Mathematical entities
c
{•r , •c}, ΘC )
(O, A, Rc , Ri , Ro )
Algebraic operations •r •c
{•r , •c }, Θ)
S (C, Rc , Ri , Ro , B, Ω, Θ)
•r •c
(T, P, N)
P {:=, , ⇒, ⇐, , *, ), |*, |), @ ↑, ↓, !, ⊗, , §} ,
T
R
Usage
Algebraic manipulations on − + ∼ {⇒, ⇒⇒⇒ , , , , , , , } abstract concepts {↔, , ≺, ,=, ≅, ∼, }
Algebraic manipulations on {⇒, ⇒, ⇒⇒ , , , , , , } abstract systems { ,↔,∏,=, , } −
{→,
,
,
+
∼
Algebraic manipulations on , ||, ∯, |||, », , t, e, i} abstract processes , |, |…|…,
*
+
i
R ,R ,R ,
{N, Z, R, S, BL, B, H, P, TI, D, DT,
RT, ST, @eS, @t TM, @int , s BL}
Visual semantic algebra (VSA)
VSA
O
(O, •VSA )
{H ∪ S ∪ F ∪ L}
•VSA
{ ↑, ↓, ←→ , , , ⊗, , , @(p),@(x,y,x), n−1N
R(Ai
Ai+1)}
,
,
Algebraic manipulations on abstract visual objects/patterns
iN=0
GA (G, •r , • p , •c )
Granular algebra (GrA)
G (C, Rc , Ri , Ro , B, Ω, Θ)
= ((C, R , R , R , B, Ω ), •r , • p , •c ) c
i
•r
{ ,↔, ∏,=, , }
o
−
+
∼
⇒, ⇒, ⇒}
•p
{⇒,
•c
{, ,
Algebraic manipulations on abstract granules
}
Among the above collection of denotational mathematics, concept algebra is an abstract mathematical structure for the formal treatment of concepts as the basic unit of human reasoning and their algebraic relations, operations, and associative rules for composing complex concepts. It is noteworthy that, according to concept algebra, although the semantics of words may be ambiguity, the semantics of concept is always unique and precise in CC. Example 1. The word, “bank”, is ambiguity because it may be a notion of a financial institution, a geographic location of raised ground of a river/lake, and/or a storage of something. However, the three distinguished concepts related to “bank”, i.e., bo = bank(organization), br = bank(river), and bs = bank(storage), are precisely unique, which can be formally described in concept algebra [14] for CC as shown in Fig. 1, where K represents the entire concepts existed in the analyser’s knowledge. All given concrete concepts share a generic framework, known as the universal abstract concept as modeled in concept algebra as given below. Definition 6. An abstract concept, c, is a 5-tuple, i.e.:
c
(O, A, R c , Ri , R o )
(2)
where •
O is a nonempty set of objects of the concept, O = {o1, o2, …, om} ⊆ ÞO, where ÞO denotes a power set of abstract objects in the universal discourse
U.
8
Y. Wang
•
A is a nonempty set of attributes, A = {a1, a2, …, an} ⊆ ÞA, where ÞA
• •
denotes a power set of attributes in U. Rc = O × A is a set of internal relations. Ri ⊆ C′ × c is a set of input relations, where C′ is a set of external concepts in U.
•
Ro ⊆ c × C′ is a set of output relations. boST
(A, O, Rc, Ri, Ro)
// bank(organization)
= ( boST.A = {organization, company, financial business, money, deposit, withdraw, invest, exchange}, boST.O = {international bank, national bank, local bank, investment bank, ATM} boST.Rc = O × A, boST.Ri = K × boST, boST.Ro = boST × K ) brST
(A, O, Rc, Ri, Ro)
// bank(river)
= ( brST.A = {sides of a river, raised ground, a pile of earth, location}, brST.O = {river bank, lake bank, canal bank} brST.Rc = O × A, brST.Ri = K × brST, brST.Ro = brST × K ) bsST
(A, O, Rc, Ri, Ro)
// bank(storage)
= ( bsST.A = {storage, container, place, organization}, bsST.O = {information bank, human resource bank, blood bank} bsST.Rc = O × A, bsST.Ri = K × bsST, bsST.Ro = bsST × K )
Fig. 1. Formal and distinguished concepts derived from the word “bank”
Concept algebra provides a set of 8 relational and 9 compositional operations on abstract concepts as summarized in Table 1. Detailed definitions of operations defined in concept algebra may be referred to [14]. A Cognitive Learning Engine (CLE), known as the "CPU" of cCs, is under developing in my lab on the basis of concept algebra, which implements the basic and advanced cognitive computational operations of concepts and knowledge for cCs as outlined in Eq. 1b. Additional concept operations may be introduced in order to reveal the underpinning mechanisms of learning and natural language comprehension. One of
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
9
the advanced operations in concept algebra for knowledge processing is known as knowledge differential, which can be formalized in concept algebra as follows. Definition 7. Knowledge differential, dK/dt, is an eliciting operation on a set of knowledge K represented by a set of concepts over time that recalls new concepts learnt during a given period t1 through t2, i.e.: dK dt
d (OAR) dt = OAR(t2 ) − OAR(t1 )
(3)
= OAR.C (t2 ) \ OAR.C (t1 )
where the set of concepts, OAR.C(t1), are existing concepts that have already been known at time point t1. Example 2. As given in Example 1, assume the following concepts, OAR.C(t1) = {Co}, are known at t1, and the system’s learning result at t2 is OAR.C(t2) = {Co, Cr, Cs}. Then, a knowledge differential can be carried out using Eq. 3 as follows: dK dt
d (OAR ) dt = OAR.C (t2 ) \ OAR.C (t1 ) = {Co , Cr , Cs } \ {Co } = {Cr , Cs }
Concept algebra provides a powerful denotational mathematical means for algebraic manipulations of abstract concepts. Concept algebra can be used to model, specify, and manipulate generic “to be” type problems, particularly system architectures, knowledge bases, and detail-level system designs, in cognitive informatics, intelligence science, computational intelligence, computing science, software science, and knowledge science. The work in this area may also lead to a fundamental solution to computational linguistics, Computing with Natural Language (CNL), and Computing with Words (CWW) [34, 35].
4 Applications of Cognitive Informatics and Denotational Mathematics in Brain Informatics This section introduces the notion of brain informatics as developed by Zhong and his colleagues [36]. A functional and logical reference model of the brain and a set of cognitive processes of the mind are systematically developed towards the exploration of the theoretical framework of brain informatics. The current methodologies for brain studies are reviewed and their strengths and weaknesses are analyzed. Definition 8. Brain informatics (BI) is a joint field of brain and information sciences that studies the information processing mechanisms of the brain by computing and medical imagination technologies. A variety of life functions and their cognitive processes have been identified in cognitive informatics, neuropsychology, cognitive science, and neurophilosophy.
10
Y. Wang
Based on the advances of research in cognitive informatics and related fields, a Layered Reference Model of the Brain (LRMB) is developed by Wang and his colleagues [28]. The LRMB model explains the functional mechanisms and cognitive processes of the natural and artificial brains with 43 cognitive processes at seven layers. LRMB elicits the core and highly recurrent cognitive processes from a huge variety of life functions, which may shed light on the study of the fundamental mechanisms and interactions of complicated mental processes as well as of cognitive systems, particularly the relationships and interactions between the inherited and the acquired life functions as well as those of the subconscious and conscious cognitive processes. Any everyday life function or behavior, such as reading or driving, is a concurrent combination of part or all of the 43 fundamental cognitive processes according to LRMB. The basic methodologies in CI and BI are: a) logic (formal and mathematical) modeling and reasoning; b) empirical introspection; c) experiments (particularly abductive observations on brain patients); and d) using high technologies particularly brain imaging technologies. The central roles of formal logical and functional modeling for BI have been demonstrated in Sections 2 and 3 by CI, αI, and denotational mathematics. The advantage and disadvantaged of the latest methodologies of brain imaging are analyzed in the following subsections. Modern brain imaging technologies such as EEG, fMRI, MEG, and PET are illustrated as shown in Fig. 2. Although many promising results on cognitive functions of the brain have been derived by brain imaging studies in cognitive tests and neurobiology, they are limited to simple cognitive functions compared with the entire framework of the brain as revealed in LRMB. Moreover, there is a lack of a systematic knowledge about what roles particular types of neurons may play in complex cognitive functions such as learning and memorization, because neuroimages cannot pinpoint to detailed relationships between structures and functions in the brain.
Fig. 2. Major imaging technologies in brain studies
The limitations of current brain imaging technologies such as PET and fMRI towards understanding the functions of the brain may be equivalent to the problem to exam the functions of a computer by looking at its layout and locations where they are
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
11
active using imaging technologies. It is well recognized that without understanding the logical and functional models and mechanisms of the CPU as shown in Fig. 3, nobody can explain the functions of it by fine pictures of the intricate interconnections of millions of transistors (gates). Further, it would be more confusing because the control unit (CU) and arithmetic and logic unit (ALU) of the CPU and its buses are always active for almost all different kind of operations. So do unfortunately, brain science and neurobiology. Without a rational guide to the high-level life functions and cognitive processes as shown in the LRMB reference model, nobody may pinpoint rational functional relationship between a brain image and a specific behaviour such as an action of learning and its effect in memory, a recall of a particular knowledge retained in long-term memory, and a mapping of the same mental object from shortterm memory to long-term memory.
Fig. 3. The layout of a CPU
The above case study indicates that neuroscience theories and artificial intelligence technologies toward the brain have been studied at almost separate levels so far in biophysics, neurology, cognitive science, and computational/artificial intelligence. However, a synergic model as that of LRMB that maps the architectures and functions of the brain crossing individual disciplines is necessary to explain the complexity and underpinning mechanisms of the brain. This coherent approach will leads to the development of novel engineering applications of CI, αI, DM, CC, and BI, such as cognitive computers, artificial brains, cognitive robots, and cognitive software agents, which mimic the natural intelligence of the brain based on the theories and denotational mathematical means developed in cognitive informatics and abstract intelligence.
5 Conclusions Cognitive Informatics (CI) has been described as a transdisciplinary enquiry of computer science, information sciences, cognitive science, and intelligence science that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, as well as their engineering applications in
12
Y. Wang
cognitive computing. Brain informatics (BI) has been introduced as a joint field of brain and information sciences that studies the information processing mechanisms of the brain by computing and medical imagination technologies. This paper has presented some of the theoretical foundations of brain informatics developed in cognitive informatics, abstract intelligence, and denotational mathematics. In this paper, cognitive informatics as the science of abstract intelligence and cognitive computing has been briefly introduced. A set of denotational mathematics, particularly concept algebra, has been elaborated in order to enhance the fundamental theories and mathematical means for cognitive informatics, brain Informatics, and computational intelligence. Applications of cognitive informatics and denotational mathematics in brain informatics and cognitive computing are demonstrated based on the Layered Reference Model of the Brain (LRMB) and a set of cognitive processes of the mind towards the exploration of the theoretical framework of brain informatics.
References 1. Bender, E.A.: Mathematical Methods in Artificial Intelligence. IEEE CS Press, Los Alamitos (1996) 2. Boole, G.: The Laws of Thought, 1854. Prometheus Books, NY (2003) 3. Kline, M.: Mathematical Thought: From Ancient to Modern Times, Oxford, UK (1972) 4. Russell, B.: The Principles of Mathematics, 1903. W.W. Norton & Co., NY (1996) 5. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623–656 (1948) 6. Wang, Y.: Keynote: On Cognitive Informatics. In: Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI 2002), Calgary, Canada, pp. 34–42. IEEE CS Press, Los Alamitos (August 2002a) 7. Wang, Y.: The Real-Time Process Algebra (RTPA). Annals of Software Engineering 14, 235–274 (2002b) 8. Wang, Y.: On Cognitive Informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy 4(3), 151–167 (2003) 9. Wang, Y.: Keynote: Cognitive Informatics - Towards the Future Generation Computers that Think and Feel. In: Proc. 5th IEEE International Conference on Cognitive Informatics (ICCI 2006), Beijing, China, pp. 3–7. IEEE CS Press, Los Alamitos (July 2006) 10. Wang, Y.: Software Engineering Foundations: A Software Science Perspective, July 2007. CRC Series in Software Engineering, vol. II. Auerbach Publications, NY (July 2007a) 11. Wang, Y.: The Theoretical Framework of Cognitive Informatics. International Journal of Cognitive Informatics and Natural Intelligence 1(1), 1–27 (2007b) 12. Wang, Y.: The OAR Model of Neural Informatics for Internal Knowledge Representation in the Brain. International Journal of Cognitive Informatics and Natural Intelligence 1(3), 64–75 (2007c) 13. Wang, Y.: On Contemporary Denotational Mathematics for Computational Intelligence. Transactions of Computational Science 2, 6–29 (2008a) 14. Wang, Y.: On Concept Algebra: A Denotational Mathematical Structure for Knowledge and Software Modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 1–19 (2008b) 15. Wang, Y.: On System Algebra: A Denotational Mathematical Structure for Abstract System Modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 20–42 (2008c)
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
13
16. Wang, Y.: RTPA: A Denotational Mathematics for Manipulating Intelligent and Computational Behaviors. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 44–62 (2008d) 17. Wang, Y.: On Abstract Intelligence: Toward a Unified Theory of Natural, Artificial, Machinable, and Computational Intelligence. International Journal of Software Science and Computational Intelligence 1(1), 1–18 (2009a) 18. Wang, Y.: On Cognitive Computing. International Journal of Software Science and Computational Intelligence 1(3), 1–15 (2009b) 19. Wang, Y.: On Visual Semantic Algebra (VSA): A Denotational Mathematical Structure for Modeling and Manipulating Visual Objects and Patterns. International Journal of Software Science and Computational Intelligence 1(4), 1–15 (2009c) 20. Wang, Y.(ed.): Special Issue on Cognitive Computing. International Journal of Software Science and Computational Intelligence 1(3) (July 2009d) 21. Wang, Y.: Paradigms of Denotational Mathematics for Cognitive Informatics and Cognitive Computing. Fundamenta Informaticae 90(3), 282–303 (2009e) 22. Wang, Y.: Granular Algebra for Modeling Granular Systems and Granular Computing. In: Proc. 8th IEEE International Conference on Cognitive Informatics (ICCI 2009), Hong Kong, pp. 145–154. IEEE CS Press, Los Alamitos (2009f) 23. Wang, Y.: Toward a Cognitive Behavioral Reference Model of Artificial Brains. Journal of Computational and Theoretical Nanoscience (2010a) (to appear) 24. Wang, Y.: Abstract Intelligence and Cognitive Robots. Journal of Behavioral Robotics 1(1), 66–72 (2010b) 25. Wang, Y.: A Sociopsychological Perspective on Collective Intelligence in Metaheuristic Computing. International Journal of Applied Metaheuristic Computing 1(1), 110–128 (2010c) 26. Wang, Y., Wang, Y.: Cognitive Informatics Models of the Brain. IEEE Trans. on Systems, Man, and Cybernetics (C) 36(2), 203–207 (2006) 27. Wang, Y., Kinsner, W.: Recent Advances in Cognitive Informatics. IEEE Transactions on Systems, Man, and Cybernetics (C) 36(2), 121–123 (2006a) 28. Wang, Y., Wang, Y., Patel, S., Patel, D.: A Layered Reference Model of the Brain (LRMB). IEEE Trans. on Systems, Man, and Cybernetics (C) 36(2), 124–133 (2006b) 29. Wang, Y., Kinsner, W., Zhang, D.: Contemporary Cybernetics and its Faces of Cognitive Informatics and Computational Intelligence. IEEE Trans. on System, Man, and Cybernetics (B) 39(4), 1–11 (2009a) 30. Wang, Y., Zadeh, L.A., Yao, Y.: On the System Algebra Foundations for Granular Computing. International Journal of Software Science and Computational Intelligence (1), 1–17 (2009b) 31. Wang, Y., Kinsner, W., Anderson, J.A., Zhang, D., Yao, Y., Sheu, P., Tsai, J., Pedrycz, W., Latombe, J.-C., Zadeh, L.A., Patel, D., Chan, C.: A Doctrine of Cognitive Informatics. Fundamenta Informaticae 90(3), 203–228 (2009c) 32. Wang, Y., Zhang, D., Tsumoto, S.: Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (I). Fundamenta Informaticae 90(3), 1–7 (2009d) 33. Wang, Y., Chiew, V.: On the Cognitive Process of Human Problem Solving. Cognitive Systems Research: An International Journal 11(1), 81–92 (2010) 34. Zadeh, L.A.: Fuzzy Logic and Approximate Reasoning. Syntheses 30, 407–428 (1975) 35. Zadeh, L.A.: From Computing with Numbers to Computing with Words – from Manipulation of Measurements to Manipulation of Perception. IEEE Trans. on Circuits and Systems I 45(1), 105–119 (1999) 36. Zhong, N.: A Unified Study on Human and Web Granular Reasoning. In: Proc. 8th Int’l. Conf. Cognitive Informatics (ICCI 2009), Hong Kong, pp. 3–4. IEEE CS Press, Los Alamitos (July 2009)
An Adaptive Model for Dynamics of Desiring and Feeling Based on Hebbian Learning Tibor Bosse1, Mark Hoogendoorn1, Zulfiqar A. Memon1,2, Jan Treur1, and Muhammad Umair1,3 1 VU University Amsterdam, Department of AI De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 2 Sukkur Ins. of Business Administration (Sukkur IBA) Air Port Road Sukkur, Sindh, Pakistan 3 COMSATS Institute of Information Technology, Dept. of Computer Science Lahore, Pakistan {tbosse,mhoogen,zamemon,treur,mumair}@few.vu.nl http://www.few.vu.nl/~{tbosse,mhoogen,zamemon,treur,mumair}
Abstract. Within cognitive models, desires are often considered as functional concepts that play a role in efficient focusing of behaviour. In practice a desire often goes hand in hand with having certain feelings. In this paper by adopting neurological theories a model is introduced incorporating both cognitive and affective aspects in the dynamics of desiring and feeling. Example simulations are presented, and both a mathematical and logical analysis is included.
1 Introduction Desires play an important role in human functioning. To provide automated support for human functioning in various domains [2], it may be important to also monitor the humans states of desiring. Desires [13] are often considered cognitive states with the function of focusing the behaviour by constraining or indicating the options for actions to be chosen. Yet, there is much more to the concept of desire, especially concerning associated affective aspects. Cognitive functioning is often strongly related to affective processes, as has been shown more in general in empirical work as described in, for example, [9, 19]. In this paper a model is introduced that addresses both cognitive and affective aspects related to desires, adopting neurological theories as described in, for example, [3, 6, 7, 8, 19]. The aim of developing such a model is both to analyse adaptive dynamics of interacting cognitive and affective processes, and to provide a basis for an ambient agent that supports a person; cf. [14, 16, 2]. Evaluation criteria include in how far the model shows emerging patterns that are considered plausible, and the possibility to use the model in model-based reasoning within an ambient agent; cf. [2]. Within the presented model an activated desire induces a set of responses in the form of preparations for actions to fulfil the desire, and involving changing body states. By a recursive as-if body loop each of these preparations generates a level of feeling [18] that in turn can strengthen the level of the related preparation. These loops result in equilibria for both the strength of the preparation and of the feeling, Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 14–28, 2010. © Springer-Verlag Berlin Heidelberg 2010
An Adaptive Model for Dynamics of Desiring and Feeling
15
and when these are strong enough, the action is actually activated. The specific strengths of the connections from the desire to the preparations, and within the recursive as-if body loops can be innate, or are acquired during lifetime. The computational model is based on neurological notions such as somatic marking, body loop and as-if body loop. The adaptivity in the model is based on Hebbian learning. Any mental state in a person induces emotions felt by this person, as described in [7, 8]; e.g., [8], p. 93: ‘… few if any exceptions of any object or event, actually present or recalled from memory, are ever neutral in emotional terms. Through either innate design or by learning, we react to most, perhaps all, objects with emotions, however weak, and subsequent feelings, however feeble.’ More specifically, in this paper it is assumed that responses in relation to a mental state of desiring roughly proceed according to the following causal chain for a body loop, based on elements from [3, 7, 8]: desire → preparation for bodily response → body state modification → sensing body state → sensory representation of body state → induced feeling
In addition, an as-if body loop uses a direct causal relation preparation for bodily response → sensory representation of body state
as a shortcut in the causal chain; cf. [7]. The body loop (or as-if body loop) is extended to a recursive (as-if) body loop by assuming that the preparation of the bodily response is also affected by the state of feeling the emotion: feeling → preparation for the bodily response
Such recursion is suggested in [8], pp. 91-92, noticing that what is felt is a body state under the person’s control: ‘The brain has a direct means to respond to the object as feelings unfold because the object at the origin is inside the body, rather than external to it. The brain can act directly on the very object it is perceiving. (…) The object at the origin on the one hand, and the brain map of that object on the other, can influence each other in a sort of reverberative process that is not to be found, for example, in the perception of an external object.’ Within the model presented in this paper, both the bodily response and the feeling are assigned a level (or gradation), expressed by a number. The causal cycle is triggered by an activation of the desire and converges to certain activation levels of feeling and preparation for a body state. The activation of a specific action preparation is based on both the activation level of the desire and of the feeling associated to this action. This illustrates Damasio’s theory on decision making by somatic marking, called the Somatic Marker Hypothesis; cf. [1, 6, 8]. The strengths of the connections from feeling to preparation may be subject to learning. Especially when a specific action is performed and it leads to a strong effect in feeling, by Hebbian learning [10, 12] this may give a positive effect on the strength of this connection and consequently on future activations of the preparation of this specific action. Through such a mechanism experiences in the past may have their effect on behavioural choices made in the future, as also described as part of Damasio’s Somatic Marker Hypothesis [6]. In the computational model described below, this is applied in the form of a Hebbian learning rule realising that actions induced by a certain desire which result in stronger experiences of satisfaction felt will be chosen more often to fulfil this desire.
16
T. Bosse et al.
In Section 2 the computational model for the dynamics of desiring and feeling is described. Section 3 presents some simulation results. In Section 4, formal analysis of the model is addressed, both by mathematical analysis of equilibria and automated logical verification of properties. Finally, Section 5 is a discussion.
2 Modelling Desiring and Feeling In this section the computational model for desiring and feeling is presented; for an overview see Fig. 1. This picture also shows representations from the detailed specifications explained below. The precise numerical relations between the indicated variables V shown are not expressed in this picture, but in the detailed specifications of properties below, which are labelled by LP0 to LP9 (where LP stands for Local Property), as also shown in the picture. The detailed specification (both informally and formally) of the computational model is presented below. Here capitals are used for (assumed universally quantified) variables. The model was specified in LEADSTO [4], where the temporal relation a → → b denotes that when a state property a occurs, then after a certain time delay (which can be specified as any positive real number), state property b will occur. In LEADSTO both logical and numerical relations can be specified. Generating a desire by sensing a bodily unbalance The desire considered in the example scenario is assumed to be generated by sensing an unbalance in a body state b, according to the principle that organisms aim at maintaining homeostasis of their internal milieu. The first dynamic property addresses how body states are sensed. LP0 Sensing a body state If body state property B has level V, then the sensor state for B will have level V. body_state(B, V) → sensor_state(B, V)
For the example scenario this dynamic property is used by the person to sense the body state b from which the desire originates (e.g., a state of being hungry), and the body states bi involved in feeling satisfaction with specific ways in which the desire is being fulfilled. From sensor states, sensory representations are generated as follows. LP1 Generating a sensory representation for a sensed body state If a sensor state for B has level V, then the sensory representation for B will have level V. sensor_state(B, V) → srs(B, V)
Next the dynamic property for the process for desire generation is described, from the sensory representation of the body state unbalance. LP2 Generating a desire based on a sensory representation If a sensory representation for B has level V, then the desire to address B will have level V. srs(B, V) → desire(B, V)
Inducing preparations It is assumed that activation of a desire, together with a feeling, induces preparations for a number of action options: those actions considered relevant to satisfy the desire, for example based on earlier experiences. Dynamic property LP3 describes such responses in the form of the preparation for specific actions. It combines the activation levels V and Vi of two states (desire and feeling) through connection
An Adaptive Model for Dynamics of Desiring and Feeling
17
strengths ω1i and ω2i respectively. This specifies part of the recursive as-if loop between feeling and body state. This dynamic property uses a combination model based on a function g(σ, τ,V, Vi, ω1i, ω2i) which includes a sigmoid threshold function th(σ, τ,V) =
σ
τ
with steepness σ and threshold τ . For this model g(σ, τ,V, Vi, ω1i, ω2i) is defined as g(σ, τ,V, Vi, ω1i, ω2i) = th(σ, τ,ω1iV + ω2iVi)
with V, Vi activation levels and ω1i, ω2i weights of the connections to the preparation state. Note that alternative combination functions g could be used as well, for example quadratic functions such as used in [15]. Property LP3 is formalised in LEADSTO as: LP3 From desire and feeling to preparation If the desire for b has level V and feeling the associated body state bi has level Vi and the preparation state for bi has level Ui and ω1i is the strength of the connection from desire for b to preparation for bi and ω2i is the strength of the connection from feeling of bi to preparation for bi and σ i is the steepness value for the preparation for bi and τ i is the threshold value for the preparation for bi and γ 1 is the person’s flexibility for bodily responses then the preparation state for bi will have level Ui + γ 1(g(σi, τ i, V, Vi, ω1i, ω2i) - Ui) Δt. desire(b, V) & feeling(bi, Vi) & prep_state(bi, Ui) & has_steepness(prep_state(bi), σi) & has_threshold(prep_state(bi), τi) → prep_state(bi, Ui + γ1 (g(σi, τ i, V, Vi, ω1i, ω2i) - Ui) Δt)
From preparation to feeling Dynamic properties LP4 and LP5 describe how the as-if body loop together with the body loop affects the feeling. LP4 From preparation and sensor state to sensory representation of body state If the preparation state for body state B has level V1 and the sensor state for B has level V2 and the sensory representation for B has level U and σ is the steepness value for the sensory representation of B and τ is the threshold value for the sensory representation of B and γ 2 is the person’s flexibility for bodily responses then the sensory representation for body state B will have level level U + γ2 (g(σ, τ, V1, V2, 1, 1) - U) Δt. prep_state(B, V1) & sensor_state(B, V2) & srs(B, U) & has_steepness(srs(B), σ) & has_threshold(srs(B), τ) → srs(B, U + γ2 (g(σ, τ, V1, V2, 1, 1) - U) Δt)
Dynamic properties LP5 describes the remaining part of the as-if body loop. LP5 From sensory representation of body state to feeling If a sensory representation for body state B has level V, then B will be felt with level V. srs(B, V) → feeling(B, V)
Action performance and effects on body states Temporal relationships LP6 and LP7 below describe the preparations of body states bi and their effects on body states b and bi. The idea is that the actions performed by body states bi are different means to satisfy the desire related to b, by having an impact on the body state that decreases the activation level V (indicating the extent of
18
T. Bosse et al.
unbalance) of body state b. In addition, when performed, each of them involves an effect on a specific body state bi which can be interpreted as a basis for a form of satisfaction felt for the specific way in which b was satisfied. So, an action performance involving bi has an effect on both body state b, by decreasing the level of unbalance entailed by b, and on body state bi by increasing the specific level of satisfaction. This specific level of satisfaction may or may not be proportional to the extent to which the unbalance is reduced.
effector_ state(bi, Vi)
prep_state(bi, Vi)
sensor_state(b,V)
desire(b, V)
srs(b, V)
LP6 LP1
sensor state(b1,V1)
LP2
srs(b1, V1) feeling(b1, V1)
srs(b2, V2) feeling(b2, V2)
sensor_state(b3,V3)
LP3 LP7
srs(b3, V3) feeling(b3, V3) LP5
LP8
LP4
LP0
body_state(b3,V)
body_state(b2,V)
body_state(b1,V)
body_state(b,V)
Fig. 1. Overview of the computational model for desiring and feeling
As the possible actions to fulfil a desire are considered different, they differ in the extents of their effects on these two types of body states, according to an effectiveness rate αi between 0 and 1 for b, and an effectiveness rate βi between 0 and 1 for bi. The effectiveness rates αi and βi can be considered a kind of connection strengths from the effector state to the body states b and bi, respectively. In common situations for each action these two rates may be equal (i.e., αi = βi), but especially in more pathological
An Adaptive Model for Dynamics of Desiring and Feeling
19
cases they may also have different values where the satisfaction felt based on rate βi for bi may be disproportionally higher or lower in comparison to the effect on b based on rate αi (i.e., βi > αi or βi < αi). An example of this situation would be a case of addiction for one of the actions. To express the extent of disproportionality between βi and αi, a parameter λi, called satisfaction disproportion rate, between -1 and 1 is used; here: λi = (βi - αi) / (1-αi) if βi ≥ αi; λi = (βi - αi) /αi if βi ≤ αi. This parameter can also be used to relate βi to αi using a function: βi = f(λi, αi). Here f(λ, α) satisfies f(0, α) = α f(-1, α) = 0 f(1, α) = 1 The piecewise linear function f(λ, α) can be defined in a continuous manner as: f(λ, α) = α + λ(1-α) if λ ≥ 0; f(λ, α) = (1+λ)α if λ ≤ 0 Using this, for normal cases λi = 0 is taken, for cases were satisfaction is higher 0 < λi ≤ 1 and for cases where satisfaction is lower -1 ≤ λi < 0. LP6 From preparation to effector state If preparation state for B has level V,
prep_state(B, V) → effector_state(B, V)
then the effector state for body state B will have level V.
LP7 From effector state to modified body state bi If the effector state for bi has level Vi, and for each i the effectivity of bi for b is αi and the satisfaction disproportion rate for bi for b is λi then body state bi will have level f(λi, αi)Vi. effector_state(bi, Vi) & is_effectivity_for(αi, bi, b) & is_disproportion_rate_for(λi, bi) → body_state(bi, f(λi, αi)Vi)
LP8 From effector state to modified body state b If the effector states for bi have levels Vi, and body state b has level V, and for each i the effectivity of bi for b is αi then body state b will have level V +(ϑ * (1-V) – ρ * (1 – ( (1 - α1 * V1) * (1 - α2 * V2) * (1 - α3 * V3) )) * V) Δt. effector_state(bi, Vi) & body_state(b, V) & is_effectivity_for(αi, bi, b) → body_state(b, V + (ϑ * (1-V) – ρ * (1 – ( (1 - α1*V1) * (1 - α2*V2) * (1 - α3*V3) )) * V) Δt
Note that in case only one action is performed (i.e., Vj = 0 for all j ≠ i), the formula in LP8 above reduces to V +(ϑ * (1-V) – ρ*αi*Vi * V) Δt. In the formula ϑ is a rate of developing unbalance over time (for example, getting hungry), and ρ a general rate of compensating for this unbalance. Note that the specific formula used here to adapt the level of b is meant as just an example. As no assumptions on body state b are made, this formula is meant as a stand-in for more realistic formulae that could be used for specific body states b. Learning of the connections from desire to preparation The strengths ω2i of the connections from feeling bi to preparation of bi are considered to be subjected to learning. When an action involving bi is performed and leads to a strong effect on bi, by Hebbian learning [10, 12] this increases the strength of this connection. This is an adaptive mechanism that models how experiences in the past may have their effect on behavioural choices made in the future, as also described in Damasio’s Somatic Marker Hypothesis [6]. Within the model the strength ω2i of the connection from feeling to preparation is adapted using the following Hebbian learning rule. It takes into account a maximal connection strength 1, a learning rate η, and an extinction rate ζ.
20
T. Bosse et al.
LP9 If and and and and then
Hebbian learning for the connection from feeling to preparation the connection from feeling bi to preparation of bi has strength ω2i the feeling bi has level V1i the preparation of bi has level V2i the learning rate from feeling bi to preparation of bi is η the extinction rate from feeling bi to preparation of bi is ζ after Δt the connection strength from feeling bi to preparation of bi will be ω2i + (ηV1iV2i (1 - ω2i) - ζω2i) Δt.
has_connection_strength(feeling(bi), preparation(bi), ω2i) & feeling(bi, V1i) & preparation(bi, V2i) & has_learning_rate(feeling(bi), preparation(bi), η) & has_extinction_rate(feeling(bi), preparation(bi), ζ) → has_connection_strength(feeling(bi), preparation(bi), ω2i + (ηV1iV2i (1 - ω2i) - ζω2i) Δt)
3 Example Simulation Results Based on the model described in the previous section, a number of simulations have been performed. A first example simulation trace included in this section as an illustration is shown in Fig. 2; in all traces, the time delays within the temporal LEADSTO relations were taken 1 time unit. Note that only a selection of the relevant nodes (represented as state properties) is shown. In all of the figures time is on the horizontal axis, and the activation levels of state properties on the vertical axis. 1.2
1.2 1
1
0.8
0.8
0.6
effector1 effector2 effector3
0.4 0.2
body state body1 body2 body3
0.6 0.4 0.2
0
0 1
101
201
(a)
301
401
1
1.2 1 0.8 0.6
feeling1 feeling2 feeling3
0.4 0.2 0 1
101
201
(c)
301
401
101
201
(b)
301
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
401
Connection Strength-1 Connection Strength-2 Connection Strength-3
1
101
201
301
(d)
401
501
601
Fig. 2. Simulation Trace 1 – Normal behavior (σ1=σ2=10, τ1=τ2=0.5, γ1=γ2=0.05, α1=β1 =0.05, α2=β2 =0.25, α3=β3=1, ρ=0.8, ϑ=0.1, η=0.04, ζ=0.01)
For the example shown in Fig. 2, for each i it was taken λi = 0, so satisfaction felt is in proportion with fulfilment of the desire. Action option 3 has the highest effectiveness rate, i.e. α3 =1. Its value is higher as compared to the other two action options. This effect has been propagated to their respective body states as shown in
An Adaptive Model for Dynamics of Desiring and Feeling
21
Fig. 2(b). All these body states has a positive effect on body state b, decreasing the level of unbalance, as shown in Fig. 2(b), where the value of body state b (which was set initially to 0.3) decreases over time until it reaches an equilibrium state. Each of these body states generates feelings by a recursive as-if body loop, as shown in Fig. 2(c). Furthermore it gives a strong effect on the strength of the connection from feeling to preparation. The connection strength keeps on increasing over time until it reaches an equilibrium state, as shown in Fig. 2(d). As the extinction rate (ζ=0.01) is smaller compared to the learning rate (η=0.04), the connection strength becomes 0.8, which is closer to 1, as confirmed by the mathematical analysis in Section 4. Fig. 3, shows the simulation of an example scenario where the person is addicted to a particular action, in this case to action option 1, λ1 = 1. But because the effectiveness rate α1 for this option is very low (0.05), the addiction makes that the person is not very effective in fulfilling the desire: the level of unbalance remains around 0.3; the person mainly selects action option 1 because of its higher satisfaction.
Fig. 3. Simulation Trace 2 – Addiction-like behaviour (σ1=σ2=10, τ1=τ2=0.5, γ1=γ2=0.05, α1=0.05, α2=β2=0.1, α3=β3=0.7, ρ =0.8, ϑ=0.1, η=0.02, ζ=0.01)
In the next trace (see Fig. 4), the effectiveness rates for the different action options have been given a distinct pattern, i.e. after some time α1 has been gradually increased with a term of 0.009, starting with an initial value of 0.05 until it reaches the value of 1, thereafter it has been kept constant to 1. In the same period the effectiveness rate α3 has been gradually decreased with 0.009, starting with an initial value of 1, until it reaches the value of 0.05, thereafter it has been kept constant to 0.05, showing an exact opposite pattern of α1. Effectiveness rate α2 is being kept constant to 0.15 for all
22
T. Bosse et al.
the time points. As can be seen in Fig. 4, first the person selects action option 3 as the most effective one, but after a change in circumstances the person shows adaptation by selecting action option 1, which has now a higher effectiveness rate.
Fig. 4. Simulation Trace 3 – Adapting to changing circumstances (σ1=σ2=6, τ1=τ2=0.5, γ1=γ2=0.1, α1=β1 increasing from 0.05 to 1, α2=β2=0.15, α3=β3 decreasing from 1 to 0.05, ρ =0.8, ϑ=0.1, η=0.04, ζ=0.02)
4 Formal Analysis of the Model This section addresses formal analysis of the model and the simulation results as presented above. First a mathematical analysis of the equilibria is made. Next, a number of more globally emerging dynamic properties are verified for a set of simulation traces. Mathematical analysis of equilibria For an equilibrium of the strength of the connection from feeling bi to preparation of bi, by LP9 it holds ηV1iV2i (1 - ω2i) - ζω2i = 0 with values V1i for feeling level and V2i for preparation level for bi. This can be rewritten into ω2i =
η η
ζ
=
ζ/ η
Using V1i, V2i ≤ 1 from this it follows that ω2i ≤
ζ /η
gives a maximal connection strength that can be obtained. This shows that given the extinction, the maximal connection strength will be lower than 1, but may be close to
An Adaptive Model for Dynamics of Desiring and Feeling
23
1 when the extinction rate is small compared to the learning rate. For example, for the trace shown in Fig. 2 with ζ = 0.01 and η=0.04, this bound is 0.8, which indeed is reached for option 3. For the traces in Fig. 3 and 4 with ζ /η = ½ this maximum is 2/3, which is indeed reached for option 1 in Fig. 3 and option 3, resp. 1 in Fig. 4. Whether or not this maximally possible value for ω2i is approximated for a certain option, also depends on the equilibrium values for feeling level V1i and preparation level V2i for bi. For values of V1i and V2i that are 1 or close to 1, the maximal possible value of ω2i is approximated. When in contrast these values are very low, also the equilibrium value for ω2i will be low, since: ω2i =
η η
≤ η V1iV2i /ζ
ζ
So, when one of V1i and V2i is 0 then also ω2i = 0 (and conversely). This is illustrated by the options 1 and 2 in Fig. 2, and option 2 in Fig. 3. Given the sigmoid combination functions it is not possible to analytically solve the equilibrium equations in general. Therefore the patterns emerging in the simulations cannot be derived mathematically in a precise manner. However, as the combination functions are monotonic, some relationships between inequalities can be found: (1) (2) (3) (4) (5) (6)
V1jV2j ≤ V1kV2k ⇒ ω2j ≤ ω2k ω2j < ω2k ⇒ V1jV2j < V1kV2k ω2j ≤ ω2k & V1j ≤ V1k ⇒ ω2j V1j ≤ ω2k V1k ⇒ V2j ≤ V2k V2j < V2k ⇒ ω2j V1j < ω2k V1k βj ≤ βk & V2j ≤ V2k ⇒ (1+βj ) V2j ≤ (1+βk ) V2k ⇒ V1j ≤ V1k V1j < V1k ⇒ (1+βj ) V2j < (1+βk ) V2k
Here (1) and (2) follow from the above expressions based on LP9. Moreover, (3) and (4) follow from LP3, and (5) and (6) from the properties LP4, LP5, LP6, LP7, LP0 and LP1 describing the body loop and as-if body loop. For the case that one action dominates exclusively, i.e., V2k = 0 and ω2k = 0 for all k ≠ i, and V2i > 0, by LP8 it holds ϑ * (1-V) – ρ * αi * V2i * V = 0 where V is the level of body state b. Therefore for ϑ >0 it holds V =
ρα /ϑ
≥
ρ/ ϑ α
As V2i > 0 is assumed, this shows that if ϑ is close to 0 (almost no development of unbalance), and ρ > 0 and αi > 0, the value V can be close to 0 as well. If, in contrast, the value of ϑ is high (strong development of unbalance) compared to ρ and αi, then the equilibrium value V will be close to 1. For the example traces in Fig. 2, 3 and 4, ρ =0.8 and ϑ=0.1, so ρ /ϑ = 8. Therefore for a dominating option with αi = 1, it holds V ≥ 0.11, which can be seen in Fig. 2 and 4. In Fig. 3 the effectiveness of option 1 is very low (α1 = 0.05), and therefore the potential of this option to decrease V is low: V ≥ 0.7. However, as in Fig. 3 also option 3 is partially active, V reaches values around 0.35. Note that for the special case ϑ = 0 (no development of unbalance) it follows that ρ * αi * V2i * V = 0 which shows that V = 0. Values for V at or close to 0 confirm that in such an equilibrium state the desire is fulfilled or is close to being fulfilled (via LP0, LP1 and LP2 which show that the same value V occurs for the desire).
24
T. Bosse et al.
Logical verification of properties on simulation traces In order to investigate particular patterns in the processes shown in the simulation runs, a number of properties have been formulated. Formal specification of the properties, enabled automatic verification of them against simulation traces, using the logical language and verification tool TTL (cf. [5]). The purpose of this type of verification is to check whether the simulation model behaves as it should. A typical example of a property that may be checked is whether certain equilibria occur, or whether the appropriate actions are selected. The temporal predicate logical language TTL supports formal specification and analysis of dynamic properties, covering both qualitative and quantitative aspects. TTL is built on atoms referring to states of the world, time points and traces, i.e. trajectories of states over time. Dynamic properties are temporal statements formulated with respect to traces based on the state ontology Ont in the following manner. Given a trace γ over state ontology Ont, the state in γ at time point t is denoted by state(γ, t). These states are related to state properties via the infix predicate |=, where state(γ, t) |= p denotes that state property p holds in trace γ at time t. Based on these statements, dynamic properties are formulated in a sorted predicate logic, using quantifiers over time and traces and the usual logical connectives such as ¬, ∧, ∨, ⇒, ∀, ∃. For more details on TTL, see [5]. A number of properties have been identified for the processes modelled. Note that not all properties are expected to always hold for all traces. The first property, GP1 (short for Global Property 1), expresses that eventually the preparation state with respect to an action will stabilise. GP1(d): Equilibrium of preparation state Eventually, the preparation state for each bi will stabilise at a certain value (i.e., not deviate more than a value d). ∀γ:TRACE, B:BODY_STATE [ ∃t1:TIME [ ∀t2:TIME > t1, V1, V2 :VALUE [ state(γ, t1) |= prep_state(B, V1) & state(γ, t2) |= prep_state(B, V2) ⇒ V2 ≥ (1 – d) * V1 & V2 ≤ (1 + d) * V1 ] ] ]
Next, in property GP2 it is expressed that eventually the action which has the most positive feeling associated with it will have the highest preparation state value. GP2: Action with best feeling is eventually selected For all traces there exists a time point such that the bi with the highest value for feeling eventually also has the highest activation level. ∀γ:TRACE, B:BODY_STATE, t1:TIME<end_time, V:VALUE [ [ state(γ, t1) |= feeling(B, V) & ∀B2:BODY_STATE, V2:VALUE [ state(γ, t1) |= feeling(B2, V2) ⇒ V2 ≤ V] ⇒ [ ∃t2:TIME > t1, V1:VALUE [ state(γ, t2) |= prep_state(B, V1) & ∀B3:BODY_STATE, V3:VALUE [ state(γ, t2) |= prep_state(B3, V3) ⇒ V3 ≤ V1 ] ] ] ]
Property GP3 expresses that if the accumulated positive feelings experienced in the past are higher compared to another time point, and the number of negative experiences is lower or equal, then the weight through Hebbian learning will be higher. GP3: Accumulation of positive experiences If at time point t1 the accumulated feeling for bi is higher than the accumulated feeling at time point t2, then the weight of the connection from bi is higher than at t1 compared to t2.
An Adaptive Model for Dynamics of Desiring and Feeling
25
∀γ:TRACE, B:BODY_STATE, a:ACTION, t1, t2:TIME<end_time, V1, V2:VALUE [ [state(γ, t1) |= accumulated_feeling(B, V1) & state(γ, t2) |= accumulated_feeling(B, V2) & V1>V2 ] ⇒ ∃W1, W2:VALUE [state(γ, t1) |= has_connection_strength(feeling(B), preparation(B), W1) & state(γ, t2) |= has_connection_strength(feeling(B), preparation(B), W2) & W1 ≥ W2 ] ]
Next, property GP4 specifies a monotonicity property where two traces are compared. It expresses that strictly higher feeling levels result in a higher weight of the connection between the feeling and the preparation state. GP4: High feelings lead to high connection strength If at time point t1 in a trace γ1 the feelings have been strictly higher level compared to another trace γ2, then the weight of the connection between the feeling and the preparation state will also be strictly higher. ∀γ1, γ2:TRACE, B:BODY_STATE, t1:TIME<end_time, W1, W2:VALUE [∀t’ < t1:TIME, V1, V2:VALUE [ [ state(γ1, t’) |= feeling(B, V1) & state(γ2, t’) |= feeling(B, V2) ] ⇒ V1 > V2 ] & state(γ1, t1) |= has_connection_strength(feeling(B), preparation(B), W1) & state(γ2, t1) |= has_connection_strength(feeling(B), preparation(B), W2) ⇒ W1 ≥ W2 ]
Finally, property GP5 analyses traces that address cases of addiction. In particular, it checks whether it is the case that if a person is addicted to a certain action (i.e., has a high value for the satisfaction disproportion rate λ for this action), this results in a situation of unbalance (i.e., a situation in which the feeling caused by this action stays higher than the overall body state). An example of such a situation is found in simulation trace 2 (in Fig. 3). GP5: Addiction leads to unbalance between feeling and body state For all traces, if a certain action has λ > 0, then there will be a time point t1 after which the feeling caused by this action stays higher than the overall body state. ∀γ:TRACE, B1:BODY_STATE, L1:VALUE [ state(γ, 0) |= has_lambda(B1,L1) & L1 > 0 ⇒ [ ∃t1:TIME < last_time ∀t2:TIME>t1 X,X1:VALUE [ state(γ, t2) |= body_state(b, X) & body_state(B1, X1) ⇒ X < X1 ] ] ]
An overview of the results of the verification process is shown in Table 1 for the three traces that have been considered in Section 4. The results show that several expected global properties of the model were confirmed. For example, the first row indicates that for all traces, eventually an equilibrium occurs in which the values of the preparation states never deviate more than 0.0005 (this number can still be decreased by running the simulation for a longer time period). Also, the checks indicate that some properties do not hold. In such cases, the TTL checkersoftware provides a counter example, i.e., a situation in which the property does not hold. This way, it could be concluded, for example, that property GP1 only holds for the generated traces if d is not chosen too small. Table 1. Results of verification property GP1(X) GP2 GP3 GP4 GP5
trace 1 trace 2 trace 3 X≥0.0001 X≥0.0005 X≥0.0001 satisfied satisfied satisfied satisfied satisfied Satisfied satisfied for all pairs of traces satisfied satisfied satisfied
26
T. Bosse et al.
5 Discussion In this paper an adaptive computational model was introduced for dynamics of cognitive and affective aspects of desiring, based on neurological theories involving (as-if) body loops, somatic marking, and Hebbian learning. The introduced model describes more specifically how a desire induces (as a response) a set of preparations for a number of possible actions, involving certain body states, which each affect sensory representations of the body states involved and thus provide associated feelings. On their turn these feelings affect the preparations, for example, by amplifying them. In this way an model is obtained for desiring which integrates both cognitive and affective aspects of mental functioning. For the interaction between feeling and preparation of responses, a converging recursive body loop is included in the model, based on elements taken from [3, 7, 8]. Both the strength of the preparation and of the feeling emerge as a result of the dynamic pattern generated by this loop. The model is adaptive in the sense that within these loops the connection strengths from feelings to preparations are adapted over time by Hebbian learning. By this adaptation mechanism, in principle the person achieves that the most effective action to fulfill a desire is chosen. However, the model can also be used to cover persons for whom satisfaction for an action is not in proportion with the fulfilment of the desire, as occurs, e.g., in certain cases of temptation and addiction, such as illustrated in [14]. Despite growing interest in integrating cognitive and affective aspects of mental functioning in recent years, both in informally described approaches [9, 19] and in formal and computational approaches [11, 15], the relation of affective and cognitive aspects of desires has received less than adequate attention. Moreover, most existing formal models that integrate cognitive and affective aspects in mental functioning adopt the BDI (belief-desire-intention) paradigm and/or are based on appraisal theory (e.g., [11]). The proposed model is the first to show the effect of desire on feeling in a formalised computational manner and is based on neurological theories given in the literature as opposed to the BDI paradigm or appraisal-based theories. An interesting contrasting proposal of representing feelings as resistance to variance is put forward by [17]; this model is however not computational. The computational model was specified in the hybrid dynamic modelling language LEADSTO, and simulations were performed in its software environment; cf. [4]. The computational model was analysed through a number of simulations for a variety of different settings and scenarios, and by formal analyses both by mathematical methods and by automated logical verification of dynamic properties on a set of simulation traces. Several expected global properties, such as the occurrence of equilibria and the selection of appropriate actions, were confirmed for the generated traces. Although this is not an exhaustive proof, it is an important indication that the model behaves as expected. Currently the model is generic in a sense that it does not address any specific desire or feeling. It would be an interesting future work to parameterise the model to analyse desire relating to different types of feeling. Future work will also focus on a more extensive validation of the model. It was shown that under normal circumstances indeed over time the behaviour of the person is more and more focusing on actions that provide higher levels of desire fulfilment and stronger feelings of satisfaction, thus improving effectiveness of desire fulfilment. Also less standard circumstances have been analysed: particular cases in
An Adaptive Model for Dynamics of Desiring and Feeling
27
which the fulfilment of the desire and the feeling of satisfaction are out of proportion, as, for example, shown in some types of addictive behaviour. Indeed also such cases are covered well by the model as it shows over time a stronger focus on the action for which the satisfaction is unreasonably high, thereby reducing the effectiveness to fulfil the desire. In [14] it is reported how this model can be used as a basis for an ambient agent performing model-based reasoning and supporting addictive persons in order to avoid temptations.
References [1] Bechara, A., Damasio, A.: The Somatic Marker Hypothesis: a neural theory of economic decision. Games and Economic Behavior 52, 336–372 (2004) [2] Bosse, T., Both, F., Gerritsen, C., Hoogendoorn, M., Treur, J.: Model-Based Reasoning Methods within an Ambient Intelligent Agent Model. In: Mühlhäuser, M., Ferscha, A., Aitenbichler, E. (eds.) Proceedings of the First International Workshop on Human Aspects in Ambient Intelligence, Constructing Ambient Intelligence: AmI-2007 Workshops Proceedings. Communications in Computer and Information Science (CCIS), vol. 11, pp. 352–370. Springer, Hiedelberg (2008) [3] Bosse, T., Jonker, C.M., Treur, J.: Formalisation of Damasio’s Theory of Emotion, Feeling and Core Consciousness. Consciousness and Cognition 17, 94–113 (2008) [4] Bosse, T., Jonker, C.M., van der Meij, L., Treur, J.: A Language and Environment for Analysis of Dynamics by Simulation. International Journal of Artificial Intelligence Tools 16, 435–464 (2007) [5] Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) [6] Damasio, A.: Descartes’ Error: Emotion, Reason and the Human Brain, Papermac. (1994) [7] Damasio, A.: The Feeling of What Happens. In: Body and Emotion in the Making of Consciousness. Harcourt Brace, New York (1999) [8] Damasio, A.: Looking for Spinoza. Vintage books, London(2004) [9] Eich, E., Kihlstrom, J.F., Bower, G.H., Forgas, J.P., Niedenthal, P.M.: Cognition and Emotion. Oxford University Press, New York (2000) [10] Gerstner, W., Kirstner, W.M.: Mathematical formulations of Hebbian learning. Biol. Cybern. 87, 145–404 (2002) [11] Gratch, J., Marsella, S.: A domain independent framework for modeling emotion. Journal of Cognitive Systems Research 5, 269–306 (2004) [12] Hebb, D.: The Organisation of Behavior. Wiley, New York (1949) [13] Marks, J.: The Ways of Desire: New Essays in Philosophical Psychology on the Concept of Wanting. Transaction Publishers, New Brunswick (1986) [14] Memon, Z.A., Treur, J.: An Adaptive Integrative Ambient Agent Model to Intervene in the Dynamics of Beliefs and Emotions. In: Catrambone, R., Ohlsson, S. (eds.) Proc. of the 32nd Annual Conference of the Cognitive Science Society, CogSci 2010. Cognitive Science Society, Austin (to appear 2010) [15] Memon, Z.A., Treur, J.: Modelling the Reciprocal Interaction between Believing and Feeling from a Neurological Perspective. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. Lecture Notes in Computer Science(LNAI), vol. 5819, pp. 13–24. Springer, Heidelberg (2009)
28
T. Bosse et al.
[16] Riva, G., Vatalaro, F., Davide, F., Alcañiz, M. (eds.): Ambient Intelligence. IOS Press, Amsterdam (2005) [17] Rudrauf, D., Damasio, A.: A conjecture regarding the biological mechanism of subjectivity and feeling. Journal of Consciousness Studies 12(8-10), 26–42 (2005) [18] Solomon, R.C. (ed.): Thinking About Feeling: Contemporary Philosophers on Emotions. Oxford University Press, Oxford (2004) [19] Winkielman, P., Niedenthal, P.M., Oberman, L.M.: Embodied Perspective on EmotionCognition Interactions. In: Pineda, J.A. (ed.) Mirror Neuron Systems: the Role of Mirroring Processes in Social Cognition, pp. 235–257. Humana Press/Springer Science (2009)
Modelling the Emergence of Group Decisions Based on Mirroring and Somatic Marking Mark Hoogendoorn, Jan Treur, C. Natalie van der Wal, and Arlette van Wissen Vrije Universiteit Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands {mhoogen,treur,cn.van.der.wal,wissen}@few.vu.nl http://www.few.vu.nl/~{mhoogen,treur,cn.van.der.wal,wissen}
Abstract. This paper introduces a neurologically inspired computational model for the emergence of group decisions. The model combines an individual decision making model based on Damasio’s Somatic Marker Hypothesis with mutual effects of group members on each other via mirroring of emotions and intentions. The obtained model shows how this combination of assumed neural mechanisms can form an adequate basis for the emergence of common group decisions, while, in addition, there is a feeling of wellness with these common decisions amongst the group members.
1 Introduction To express the impossibility of a task, sometimes the expression ‘like managing a herd of cats’ is used, for example, in relation to managing a group of researchers. This is meant to indicate that no single direction or decision will come out of such a group, no matter how hard it is tried. As an alternative, sometimes a reference is made to ‘riding a garden-cart with frogs’. It seems that such a lack of coherence-directed tendency in a group is considered as something exceptional, a kind of surprising, and in a way unfair. However, as each group member is an autonomous agent with his or her own neurological structures, patterns and states, carrying for example, their own emotions, desires, preferences, and intentions, it would be more reasonable to expect that the surprise concerns the opposite side: how is it possible that so often, groups – even those of researchers – develop coherent directions and decisions, and, moreover, why do the group members in some miraculous manner even seem to feel good with these? This paper presents a neurologically inspired computational modelling approach for the emergence of group decisions. It incorporates the ideas of somatic marking as a basis for individual decision making, see [1], [3], [5], [6] and mirroring of emotions and intentions as a basis for mutual influences between group members, see [7], [11], [12], [14], [15], 16], [18]. The model shows how for many cases indeed, the combination of these two neural mechanisms is sufficient to obtain the emergence of common group decisions on the one hand, and, on the other hand, to achieve that the group members have a feeling of wellness with these decisions. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 29–41, 2010. © Springer-Verlag Berlin Heidelberg 2010
30
M. Hoogendoorn et al.
The paper is organised as follows. In Section 2 a brief introduction of the neurological ideas underlying the approach is presented: mirroring and somatic marking. Next, in Section 3 the computational model is described in detail. Section 4 presents a number of simulation results. Section 5 addresses verification of the model against formally specified properties describing expected emerging patterns. Finally, Section 6 is a discussion.
2 Somatic Marking and Mirroring Cognitive states of a person, such as sensory or other representations often induce emotions felt within this person, as described by neurologist Damasio, [4], [5]; for example: ‘Even when we somewhat misuse the notion of feeling – as in “I feel I am right about this” or “I feel I cannot agree with you” – we are referring, at least vaguely, to the feeling that accompanies the idea of believing a certain fact or endorsing a certain view. This is because believing and endorsing cause a certain emotion to happen.’ ([5], p. 93)
Damasio’s Somatic Marker Hypothesis; cf. [1], [3], [5], [6], is a theory on decision making which provides a central role to emotions felt. Within a given context, each represented decision option induces (via an emotional response) a feeling which is used to mark the option. For example, a strongly negative somatic marker linked to a particular option occurs as a strongly negative feeling for that option. Similarly, a positive somatic marker occurs as a positive feeling for that option. Damasio describes the use of somatic markers in the following way: ‘the somatic marker (..) forces attention on the negative outcome to which a given action may lead, and functions as an automated alarm signal which says: beware of danger ahead if you choose the option which leads to this outcome. The signal may lead you to reject, immediately, the negative course of action and thus make you choose among other alternatives. (…) When a positive somatic marker is juxtaposed instead, it becomes a beacon of incentive.’ ([3], pp. 173-174)
Usually the Somatic Marker Hypothesis is applied to provide endorsements or valuations for options for a person’s actions, thus shaping a decision process. Somatic markers may be innate, but may also by adaptive, related to experiences: ‘Somatic markers are thus acquired through experience, under the control of an internal preference system and under the influence of an external set of circumstances which include not only entities and events with which the organism must interact, but also social conventions and ethical rules. ([3], p. 179)
In a social context, the idea of somatic marking can be combined with recent neurological findings on the mirroring function of certain neurons (e.g., [7], [11], [12], [14], [15], [16], [17], [18]. Mirror neurons are neurons which, in the context of the neural circuits in which they are embedded, show both a function to prepare for certain actions or bodily changes and a function to mirror states of other persons. They are active not only when a person intends to perform a specific action or body change, but also when the person observes somebody else intending or performing this action or body change. This includes expressing emotions in body states, such as facial expressions. For example, there is strong evidence that (already from an age of just 1 hour) sensing somebody else’s face expression leads (within about 300 milliseconds) to preparing for and showing the same face expression ([10], p. 129-130). The idea is
Modelling the Emergence of Group Decisions
31
that these neurons and the neural circuits in which they are embedded play an important role in social functioning and in (empathic) understanding of others; (e.g., [7], [11], [17], [18]). The discovery of mirror neurons is often considered a crucial step for the further development of the discipline of social cognition, comparable to the role the discovery of DNA has played for biology, as it provides a biological basis for many social phenomena; cf. [11]. Indeed, when states of other persons are mirrored by some of the person’s own states that at the same time are connected via neural circuits to states that are crucial for the own feelings and actions, then this provides an effective basic mechanism for how in a social context persons fundamentally affect each other’s actions and feelings. Given the general principles described above, the mirroring function relates to decision making in two different ways. In the first place mirroring of emotions indicates how emotions felt in different individuals about a certain considered decision option mutually affect each other, and, assuming a context of somatic marking, in this way affect how by individuals decision options are valuated in relation to how they feel about them. A second way in which a mirroring function relates to decision making is by applying it to the mirroring of intentions or action tendencies of individuals for the respective decision options. This may work when by verbal and/or nonverbal behaviour, individuals show in how far they tend to choose for a certain option. For example, in ([9], p.70) action tendencies are described as ‘states of readiness to execute a given kind of action, [which] is defined by its end result aimed at or achieved’. In the computational model introduced below both of these (emotion and intention) mirroring effects are incorporated in the proposed model.
3 The Computational Model for Group Decision Making In this section, based on the neurological principles of somatic marking and mirroring discussed in the previous section, the computational model for group decision making is introduced. To design such a model a choice has to be made for the grain-size: for example, it has to be decided in which level of detail the internal neurological processes of individuals are described. Such a choice depends on the aim of the model. In this case the aim was more to be able to simulate emerging patterns in groups of individuals, than to obtain a more detailed account of the intermediate neurological patterns and states involved. Therefore the choice was made to abstract to a certain extent from the latter types of intermediate processes. For example, the process of mirroring is described in an abstract manner by a direct causal relation from the emotional state shown by an individual to the emotional state shown by another individual, and the process of somatic marking is described by a direct causal relation from the emotional state shown for a certain option to the intention shown for this option (see Figure 1). These choices provide a model that is easier to handle for larger numbers of individuals. However, the model can easily be refined into a model that also incorporates more detailed intermediate internal processes, for example, based on recursive as-if body loops involving preparation and sensory neuron activations and the states of feeling the emotion, as shown in [13].
32
M. Hoogendoorn et al.
emotion states of other group members for option O
intention states of other group members for option O
A’s mirroring of emotion for option O A’s emotion state for option O A’s somatic marking for option O
A’s intention state A’s mirroring for option O of intention for option O
Fig. 1. Abstract causal relations induced by mirroring and somatic marking by person A
First for a given state S of a person (for example, an emotion or an intention) the impact due to the person’s mirroring function is described. This is done by a basic building block called the contagion strength for any particular state S between two individuals within a group. This contagion strength from person B to person A for state S is defined as follows: γSBA = εSB ⋅ αSBA ⋅ δSA
(1)
Here εSB is the personal characteristic expressiveness of the sender (person B) for S, δSA the personal characteristic openness of the receiver (person A) for S, and αSBA the interaction characteristic channel strength for S from sender B to receiver A. The expressiveness describes the strength of expression of given internal states by verbal and/or nonverbal behaviour (e.g., body states). The openness describes how strong stimuli from outside are propagated internally. The channel strength depends on the type of connection between the two persons, for example their closeness. To determine the level qSA(t) of an agent A for a specific state S the following model is used. First, the overall contagion strength γSA from the group towards agent A is calculated: γSA = ∑B≠A γSBA
(2)
This value is used to determine the weighed impact qSA*(t) of all the other agents upon state S of agent A: qSA*(t) = ∑B≠A γSBA ⋅ qSB(t) / γSA
(3)
How much this external influence actually changes state S of the agent A is determined by two additional personal characteristics of the agent, namely the tendency ηSA to absorb or to amplify the level of a state and the bias βSA towards positive or negative impact for the value of the state. The model to update the value of qSA(t) over time is then expressed as follows: qSA(t + Δt) = qSA(t) + γSA ·[ηSA·[βSA·(1 - (1-qSA*(t))·(1-qSA(t))) + (1 βSA)·qSA*(t)·qSA(t) ] + (1 - ηSA)·qSA*(t) - qSA(t) ] Δt
(4)
Modelling the Emergence of Group Decisions
33
Here the new value of the state is the old value, plus the change of the value based on the contagion. This change is defined as the multiplication of the contagion strength times a factor for the amplification of information plus a factor for the absorption of information. The absorption part (after 1 - ηSA) simply considers the difference between the incoming contagion and the current level for S. The amplification part (afterηSA) depends on the tendency or bias of the agent towards more positive (part of equation multiplied by βSA) or negative (part of equation multiplied by 1 - βSA) level for S. Table 1 summarizes the most important parameters and state variables within the model (note that the last two parameters will be explained below). Table 1. Parameters and state variables qSA(t)
εSA δSA ηSA βSA αSBA γSBA ωOIA ωOEA
level for state S of agent A at time t extent to which agent A expresses state S extent to which agent A is open to state S tendency of agent A to absorb or amplify state S positive or negative bias of agent A on state S channel strenght for state S from sender B to receiver A contagion strength for S from sender B to receiver A weigth for group intention impact on agent A ‘s intention for O weigth for own emotion impact on agent A ‘s intention for O
The abstract model for mirroring described above applies to both emotion and intention states S or an option O, but does not describe any interplay between them yet. Taking the Somatic Marker Hypothesis on decision making as a point of departure, not only intentions of others, but also one’s own emotions affect one’s own intentions. To incorporate such an interaction, the basic model is extended as follows: to update qSA(t) for an intention state S relating to an option O, both the intention states of others for O and the qS'A(t) values for the emotion state S' for O are taken into account. These intention and emotion states S and S' for option O are denoted by OI and OE, respectively: Level of emotion for option O of person A: Level of intention indication for O of person A:
qOEA(t) qOIA(t)
The combination of the own (positive) emotion level and the rest of the group’s aggregated intention is made by a weighted average of the two: qOIA**(t) = (ωOIA/ωOA) qOIA*(t) + (ωOEA/ωOA) qOEA(t) γOIA* = ω γOIA
(5)
where ωOIA and ωOEA are the weights for the contributions of the group intention impact (by mirroring) and the own emotion impact (by somatic marking) on the intention of A for O, respectively, and ωOA = ωOIA + ωOEA. Then the model for the intention and emotion contagion based on mirroring and somatic marking becomes: qOEA(t + Δt) = qOEA(t) + γOEA[ηOEA(βOEA (1 - (1-qOEA*(t))(1-qOEA(t))) + (1-βOEA) qOEA*(t) qOEA(t)) + (1 - ηOEA) qOEA*(t) - qOEA(t) ] ⋅ Δt qOIA(t + Δt) = qOIA(t) + γOIA* [ηOIA (βOIA (1 - (1-qOIA**(t))(1-qOIA(t))) + (1-βOIA) qOIA**(t) qOIA(t)) + (1 - ηOIA) qOIA**(t) - qOIA (t)] ⋅ Δt
(6) (7)
34
M. Hoogendoorn et al.
4 Simulation Results The model has been studied in several scenarios in order to examine whether the proposed approach indeed exhibits the patterns that can be expected from literature. The investigated domain consists of a group of four agents who have to make a choice between four different options: A, B, C or D. The model has been implemented in Matlab by constructing three different scenarios which are characterized by different relationships (i.e., channel strength) between the agents. The scenarios used, involve two more specific types of agents: leaders and followers. Some agents have strong leadership abilities while others play a more timid role within the group. The general characteristics of leaders and followers as they were used in the experiments, which can be manifested differently within all agents, can be found in Table 2. Table 2. Parameters and state variables for leaders and followers emotion level intention level expressivity channel strength
scenario 1
Leader A qOEA high for particular O qOIA high for particular O εSA high αSAB high αSBA low scenario 2
Follower B εSB low αSAB high αSBA low scenario 3
Fig. 2. Scenarios for the presented simulation experiments
The different scenarios are depicted in Figure 2. Scenario 1 consists of a group of agents in which agent1 has strong leadership abilities and high channel strengths with all other agents. His initial levels of emotion and intention for option A, are very high. Scenario 2 depicts a situation where there are two agents with leadership abilities in the group, agent1 and agent4. Agent1 has strong channel strength to agent2, while agent4 has a strong connection to agent3. Agent1 has an initial state of high (positive) emotion and intention for option A, while agent4 has strong emotion and intention states for option D. Agent2 and agent3 have show no strong intentions and emotions for any of the options in their initial emotion and intention states. In Scenario 3 there are no evident leaders. Instead, all agents have moderate channel strengths with each other. A majority of the agents (agent3 and agent4) prefers option C, i.e., initially they have high intention and emotions states for option C. For both scenarios two variants have been created, one with similar agent characteristics within the group (besides the
Modelling the Emergence of Group Decisions
35
difference between leader and follower characteristics), and the second with a greater variety of agent personalities. In this section, only the main results using the greater variety in agent characteristics are shown for the sake of brevity. For the formal verification (Section 6) both have been used. The results of scenario 1 clearly show how one influential leader can influence the emotions and intention in a group. This is shown in the left graph of Figure 3, here the z-axis shows the value for the respective states, and the x-and y-axes represent time and the various agents. The emotion and intention of the leader (in this case agent1) spread through the network of agents, while the emotions and intentions of other agents hardly spread. Consequently, the emotions and intentions for option A, which is the preferred option of the leader, develop to be high in all agents. As can be seen in the figure, there are small differences between the developments of emotions and intentions of the agents. This is because they have different personality characteristics, which are reflected in the settings for the scenario . Depending on their openness, agents are more or less influenced by the states of others. Those agents with low openness (such as agent4) are hardly influenced by intentions and emotions of others. 1
Fig. 3. Simulation results for scenario 1 (left) and scenario 2 (right)
In scenario 2 (as shown in the right graph of Figure 3), the leader has somewhat positive emotions about option C as well, which explains the small but increasing spread of emotions (and after a while also intentions) concerning option C through the social network. Even though agent3 and agent2 both have a moderate intention for option B, their only strong channel strength is with each other, causing only some contagion between the two of them. Their intention does not spread because of a low
1
A full description of the characteristics and different parameter setting of the agents can be found in Appendix A: http://www.cs.vu.nl/~wai/Papers/group_decisions_appendix1.pdf
36
M. Hoogendoorn et al.
expressive nature and low amplification rate of both agents. The patterns found in the simulation of scenario 2 are similar to the ones of scenario 1, with the addition that both leaders highly dominate the spread of the emotions and intentions. The figure shows that the emotions and intentions of agent2 turn out to depend highly on the emotions and intentions of agent1, whereas the emotions and intentions of agent3 highly depend on those of agent4. As can be seen in the figure, any preferences for option D and C by agent2 and agent3 quickly grow silent.
Fig. 4. Simulation results for scenario 3
Scenario 3 shows how a group converges to the same high emotions and intentions for an option when there is no authority. In general, the graphs show that when there is no clear leadership, the majority determines the option with highest emotion and intentions in all agents. Option C, initially preferred by agent4 and agent3, eventually is the preferred option for all. However, the emotions and intentions for option A also spread and increase, though to a lesser extent. This is due to the fact that agent1 has strong feelings and intentions for option A and a high amplification level for these states. Furthermore, he has a significant channel strength with agent3, explaining why agent3 has the most increasing emotions and intentions for option A. However, the majority has the most important vote in this scenario. Furthermore, some general statements can be made about the behaviour of the model. In case a leader has high emotions but low intentions for a particular option, both the intentions and emotions of all followers will increase for that option. On the other hand, if a leader has high intentions for a particular option, but not high emotions for that option, this intention will not spread to other agents.
Modelling the Emergence of Group Decisions
37
5 Mathematical Analysis of Equilibria During simulations it turns out that eventually equilibria are reached: all variables approximate values for which no change occurs anymore. Such equilibrium values can also be determined by mathematical analysis of the differential equations for the model: dqOEA(t)/dt = γOEA[ηOEA(βOEA (1 - (1-qOEA*(t))(1-qOEA(t))) + (8) (1-βOEA) qOEA*(t) qOEA(t)) + (1 - ηOEA) qOEA*(t) - qOEA(t) ] ⋅ Δt dqOIA(t)/dt = γOIA* [ηOIA (βOIA (1 - (1-qOIA**(t))(1-qOIA(t))) + (1-βOIA) qOIA**(t) qOIA(t)) + (1 - ηOIA) qOIA**(t) - qOIA (t)] ⋅ Δt
(9)
Putting dqOEA(t)/dt = 0 and dqOIA(t)/dt = 0 and assuming γOEA and γOIA* nonzero, provides the following equilibrium equations for each agent A. ηOEA(βOEA (1-(1-qOEA*)(1-qOEA)) + (1-βOEA) qOEA* qOEA) + (1 - ηOEA) qOEA* - qOEA = 0
(10)
ηOIA (βOIA (1-(1-qOIA**)(1-qOIA)) + (1-βOIA) qOIA** qOIA) + (1 - ηOIA) qOIA** - qOIA = 0
(11)
For given values of the parameters ηOEA, βOEA, ηOIA, and βOIA , these equations may be solved analytically or by standard numerical approximation procedures. Moreover, by considering when dqOEA(t)/dt > 0 or dqOEA(t)/dt < 0 one can find out when qOEA(t) is strictly increasing and when strictly decreasing, and similarly for qOIA(t). For example, for equation (2), one of the cases considered is the following. Case ηOIA = 1 and β OIA = 1 For this case, equation (2) reduces to (1-(1-qOIA**)(1-qOIA)) - qOIA = 0. This can easily be rewritten via (1- qOIA ) -(1-qOIA**)(1-qOIA) = 0 into qOIA**(1-qOIA) = 0. From this, it can be concluded that equilibrium values satisfy qOIA**= 0 or qOIA = 1, and qOIA is never strictly decreasing, and is strictly increasing when qOIA** > 0 and qOIA < 1. Now the condition qOIA** = 0 is equivalent to (ωOIA/ωOA) qOIA* + (ωOEA/ωOA) qOEA = 0 ⇔ qOIA* = 0 if ωOIA > 0 and qOEA = 0 if ωOEA > 0
where qOIA* = 0 is equivalent to ∑B≠A γOIBA ⋅ qOIB / γOIA = 0 ⇔ qOIB = 0 for all B≠A with γOIBA > 0. Assuming both ωOIA and ωOEA nonzero, this results in the following: equilibrium: qOIA = 1 or qOIA < 1 and qOEA = 0 and qOIB = 0 for all B≠A with γOIBA > 0 strictly increasing: qOIA < 1 and qOEA > 0 or qOIB > 0 for some B≠A with γOIBA > 0 For a number of cases such results have been found, as summarised in Table 3. This table considers any agent A in the group. Suppose A is the agent in the group with highest qOEA, i.e., qOEB ≤ qOEA for all B≠ A. This implies that qOEA* = ∑B≠A γOEBA ⋅ qOEB / γOEA ≤ ∑B≠A γOEBA ⋅ qOEA / γOEA = qOEA ∑B≠A γOEBA / γOEA = qOEA. So in this case always qOEA* ≤ qOEA . Note that when qOEB < qOEA for some B≠ A with γOEBA > 0, then qOEA* = ∑B≠A γOEBA ⋅ qOEB / γOEA < ∑B≠A γOEBA ⋅ qOEA / γOEA = qOEA ∑B≠A γOEBA ⋅ / γOEA = qOEA. Therefore qOEA* = qOEA implies qOEB = qOEA for all B ≠ A with γOEBA > 0. Similarly, when A has the lowest qOEA of the group, then always qOEA* ≥ qOEA and again qOEA* = qOEA implies qOEB = qOEA for all B ≠ A with γOEBA > 0. This implies, for example, for ηOEA = 1 and βOEA = 0.5, assuming nonzero γOEBA , that always for each option the members’ emotion levels for option O will converge to one value in the group (everybody will feel the same about option O).
38
M. Hoogendoorn et al.
Table 3. Equilibria cases for an agent A with both ωOEA > 0, ωOIA > 0, and γOEBA > 0 for all B ηOIA = 1 β OIA = 1 qOIA = 1
ηOEA = 1 β OEA = 1
qOEA = 1
qOEA = 1 qOIA = 1
qOEA < 1 qOEB = 0 for all B ≠ A
qOEA < 1 qOIA = 1 qOEB = 0 for all B ≠ A
ηOEA = 1 qOEA* = qOEA β OEA = 0.5
qOEA* = qOEA qOIA = 1
ηOEA = 1 β OEA = 0
qOEA = 0 qOIA = 1
qOEA = 0
qOEA > 0 qOEB = 1 for all B ≠ A
qOEA > 0 qOIA = 1 qOEB = 1 for all B ≠ A
ηOIA = 1 β OIA = 0.5
qOIA < 1 qOEA = 0 qOIB = 0 for all B ≠ A none
qOEC = 0 for all C qOIA < 1 qOIB = 0 for all B ≠ A qOEC = 0 for all C qOIA < 1 qOIB = 0 for all B ≠ A qOEA = 0 qOIA < 1 qOIB = 0 for all B ≠ A none
ηOIA = 1 β OIA = 0
qOIA** = qOIA qOIA = 0
qOEA = 1 qOEA = 1 qOIA** = qOIA qOIA = 0
qOEA < 1 qOIA** = qOIA qOEB = 0 for all B ≠ A
qOEA < 1 qOIA = 0 qOEB = 0 for all B ≠ A
qOEA* = qOEA qOEA* = qOEA qOIA** = qOIA qOIA = 0
qOEA = 0 qOEA = 0 qOIA** = qOIA qOIA = 0
qOEA > 0 qOIA** = qOIA qOEB = 1 for all B ≠ A
qOEA > 0 qOIA = 0 qOEB = 1 for all B ≠ A
qOIA > 0 qOEA = 1 qOIB = 1 for all B ≠ A qOEA = 1 qOIA > 0 qOIB = 1 for all B ≠ A none
qOEC = 1 for all C qOIA > 0 qOIB = 1 for all B ≠ A none
qOIA > 0 qOEC = 1 for all C qOIB = 1 for all B ≠ A
6 Verifying Properties Specifying Emerging Patterns This section addresses the analysis of the group decision making model by specification and verification of properties expressing dynamic patterns that emerge. The purpose of this type of verification is to check whether the model behaves as it should, by automatically verifying such properties against the simulation traces for the various scenarios. In this way the modeller can easily detect inappropriate behaviours and locate sources of errors in the model. A typical example of a property that may be checked, is whether no unexpected situations occur, such as a variable running out of its bounds (e.g., qA(t) > 1, for some time point t and agent A), or whether eventually an equilibrium value is reached, but also more detailed expected properties of the model such as compliance to the theories found in literature. A number of dynamic properties have been identified, formalized in the Temporal Trace Language (TTL), cf. [2] and automatically checked. The TTL software environment includes a dedicated editor supporting specification of dynamic properties to obtain a formally represented temporal predicate logical language TTL formula. In addition, an automated checker is included that takes such a formula and a set of traces as input, and verifies automatically whether the formula holds for the traces. The language TTL is built on atoms referring to states of the world, time points and traces, i.e.
Modelling the Emergence of Group Decisions
39
trajectories of states over time. In addition, dynamic properties are temporal predicate logic statements that can be formulated with respect to traces based on a state ontology. Below, a number of the dynamic properties that were identified for the group decision making model are introduced, both in semi-formal and in informal notation (where state(γ, t) |= p denotes that p holds in trace γ at time t). The first property counts the number of subgroups that are present. Here, a subgroup is defined as a group of agents having the same highest intention. Each agent has 4 intention values (namely one for each of the four options that exist), therefore the number of subgroups that can emerge are always: 1, 2, 3 or 4 subgroups. P1 –number of subgroups The number of subgroups in a trace γ is the number of options for which there exists at least one agent that has an intention for this option as its highest valued intention. P1_number_of_subgroups(γ:TRACE) ≡ sum(I:INTENTION, case(highest_intention(γ, I), 1, 0)
where highest_intention(γ:TRACE, I:INTENTION) ≡ ∃A:AGENT [∀R1:REAL state(γ, te) |= has_value(A, I, R1) ⇒ ∀I2:INTENTION≠I, ∀R2:REAL [state(γ, te) |= has_value(A, I2, R2) ⇒ R2 < R1]]
In this property, the expression case(p, 1, 0) in TTL functions such that if property p holds it is evaluated to the second argument (1 in this example), and to the third argument (0 in this example) if the property does not hold. The sum operator simply adds these over the number of elements in the sort over which the sum is calculated (the intentions in this case). Furthermore, when tb or te are used in the property, they denote the begin or end time of the simulation, whereby in te an equilibrium is often reached. Property P1 can be used to count the number of subgroups that emerge. A subgroup is defined as a group of agents that each have the same intention as their intention with highest value. This property was checked on multiple traces that each belong to one of the three scenario’s discussed in the simulation results section. For the traces for both variants of scenario 1: , a single subgroup was found, for scenario 2: two subgroups were found, and for scenario 3, a single subgroup was found, which is precisely according to the expectations. The second property counts the number of agents in each of the subgroups, using a similar construct. P2– subgroup size The number of agents in a subgroup for intention I is the number of agents that have this intention as their highest intention. P2_subgroup_size(γ:TRACE, I:INTENTION) ≡ sum(A:AGENT, case(highest_intention_for(γ, I, A), 1, 0))
where highest_intention_for(γ:TRACE, I:INTENTION, A:AGENT) ≡ ∀R1:REAL [state(γ, te) |= has_level(A, I, R1) ⇒ ∀I2:OPTION≠I, ∀R2:REAL [state(γ, te) |= has_level(A, I2, R2) ⇒ R2 < R1]]
In the traces for scenario1 the size of the single subgroup that occurred was 4 agents. For scenario 2 two subgroups of 2 agents were found. Finally, in scenario 3 only a single subgroup combining 4 agents has been found. These findings are correct; they indeed correspond to the simulation results.
40
M. Hoogendoorn et al.
The final property, P3, expresses that an agent is a leader in case its intention values have changed the least over the whole simulation trace, as seen from his initial intention values and compared to the other agents (thereby assuming that these agents moved towards the intention of the leader that managed to convince them of this intention). P3–leader An agent is considered a leader in a trace if the number of intentions for which it has the lowest change is at least as high as all other agents. P3_leader (γ:TRACE, A:AGENT) ≡ ∀A2:AGENT ≠A sum(I:INTENTION, case(leader_for_intention(γ, A, I),1,0)) ≥ sum(I:INTENTION, case(leader_for_intention(γ, A2, I),1,0))
where leader_for_intention(M:TRACE, A:AGENT, I:INTENTION) ≡ ∀R1, R2: REAL [ [state(γ, tb) |= has_value(A,I, R1) & state(γ, te) |= has_value(A, I, R2) ] ⇒ ∀R3, R4: REAL, ∀A2:AGENT ≠A [state(γ, tb) |= has_value (A2, I, R4) & state(γ, te) |= has_value (A2, I, R3) ⇒ |R2-R1|< |R3-R4| ]]
Using this definition, only agent 1 qualifies as a leader in scenario 1. For scenario 2 only agent 4 is a leader. Finally, in scenario 3 both agent 1 and agent 3 are found to be leaders as they both have equal intentions for which they change the least.
7 Discussion In this paper, an approach has been presented, to model the emergence of group decisions. The current model has been based on the neurological concept of mirroring (see e.g. [12], [18]) in combination with the Somatic Marker Hypothesis of Damasio (cf. [1], [3], [5], [6]). An existing model of emotion contagion (cf. [8]) was taken as inspiration, and has been generalised to contagion of both emotions and intentions, and extended with interaction between the two, in the form of influences of emotions upon intentions. Several scenarios have been simulated by the model to investigate the emerging patterns, and also to look at leadership of agents within groups. The results of these simulation experiments show patterns as desired and expected. In order to be able to make this claim more solid, both a mathematical analysis as well as a formal verification of the simulation traces have been performed, showing that the model indeed behaves properly. For future work, an interesting element would be to scale up the simulations and investigate the behaviour of agents in larger scale simulations. Furthermore, modelling a more detailed neurological model is also part of future work, thereby defining an abstraction relation mapping between this detailed level model and the current model. Acknowledgements. This research has partly been conducted as part of the FP7 ICT Future Enabling Technologies program of the European Commission under grant agreement No 231288 (SOCIONICAL).
Modelling the Emergence of Group Decisions
41
References 1. Bechara, A., Damasio, A.: The Somatic Marker Hypothesis: a neural theory of economic decision. Games and Economic Behavior 52, 336–372 (2004) 2. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) 3. Damasio, A.: Descartes’ Error: Emotion, Reason and the Human Brain. Papermac, London (1994) 4. Damasio, A.: The Feeling of What Happens. In: Body and Emotion in the Making of Consciousness. Harcourt Brace, New York (1999) 5. Damasio, A.: Looking for Spinoza. Vintage books, London (2003) 6. Damasio, A.: The Somatic Marker Hypothesis and the Possible Functions of the Prefrontal Cortex. Philosophical Transactions of the Royal Society: Biological Sciences 351, 1413– 1420 (1996) 7. Damasio, A., Meyer, K.: Behind the looking-glass. Nature 454, 167–168 (2008) 8. Duell, R., Memon, Z.A., Treur, J., van der Wal, C.N.: An Ambient Agent Model for Group Emotion Support. In: Cohn, J., Nijholt, A., Pantic, M. (eds.) Proceedings of the Third International Conference on Affective Computing and Intelligent Interaction, ACII 2009, pp. 550–557. IEEE Computer Society Press, Los Alamitos (2009) 9. Frijda, N.H.: The Emotions. Studies in Emotion and Social Interaction. Cambridge University Press, Cambridge (1987) 10. Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford Univ. Press, New York (2006) 11. Iacoboni, M.: Mirroring People, Farrar, Straus & Giroux, New York (2008) 12. Iacoboni, M.: Understanding others: imitation, language, empathy. In: Hurley, S., Chater, N. (eds.) Perspectives on imitation: from cognitive neuroscience to social science, vol. 1, pp. 77–100. MIT Press, Cambridge (2005) 13. Memon, Z.A., Treur, J.: Modelling the Reciprocal Interaction between Believing and Feeling from a Neurological Perspective. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. Lecture Notes in Computer Science(LNAI), vol. 5819, pp. 13–24. Springer, Heidelberg (2009) 14. Rizzolatti, G.: The mirror-neuron system and imitation. In: Hurley, S., Chater, N. (eds.) Perspectives on imitation: from cognitive neuroscience to social science, vol. 1, pp. 55–76. MIT Press, Cambridge (2005) 15. Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169– 192 (2004) 16. Rizzolatti, G., Fogassi, L., Gallese, V.: Neuro-physiological mechanisms underlying the understanding and imitation of action. Nature Rev. Neurosci. 2, 661–670 (2001) 17. Rizzolatti, G., Sinigaglia, C.: Mirrors in the Brain: How Our Minds Share Actions and Emotions. Oxford University Press, Oxford (2008) 18. Pineda, J.A. (ed.): Mirror Neuron Systems: the Role of Mirroring Processes in Social Cognition. Humana Press Inc., Totowa (2009)
Rank-Score Characteristics (RSC) Function and Cognitive Diversity D. Frank Hsu1, Bruce S. Kristal2, and Christina Schweikert1 1
Department of Computer and Information Science, Fordham University, New York, NY 10023, USA 2 Department of Neurosurgery, Brigham and Women’s Hospital, Boston, MA 02115, USA and Department of Surgery, Harvard Medical School, Boston, MA 02115, USA
Abstract. In Combinatorial Fusion Analysis (CFA), a set of multiple scoring systems is used to facilitate integration and fusion of data, features, and/or decisions so as to improve the quality of resultant decisions and actions. Specifically, in a recently developed information fusion method, each system consists of a score function, a rank function, and a Rank-Score Characteristic (RSC) function. The RSC function illustrates the scoring (or ranking) behavior of the system. In this report, we show that RSC functions can be computed easily and RSC functions can be used to measure cognitive diversity for two or more scoring systems. In addition, we show that measuring diversity using the RSC function is inherently distinct from the concept of correlation in statistics and can be used to improve fusion results in classification and decision making. Among a set of domain applications, we discuss information retrieval, virtual screening, and target tracking.
1 Introduction In the second half of the last century, and, as we enter the second decade of the twenty-first century, information and scientific revolutions have taken place and new progress is being made. The emerging digital and genomic landscapes have shaped our life, community, culture, society, and the world. 1.1 The Digital Landscape and the Genomic Landscape The number of information providers and users has increased tremendously over the last 2-3 decades, and now includes a large percentage of the population of the developed world. The nature of information content has also changed drastically from text to a mix of text, speech, still and video images to histories of interactions with colleagues, friends, information sources and their automated proxies. Raw data sources of interest also now include tracks of sensor readings from GPS devices, medical devices, and possibly other embedded sensors and robots in our environment [23]. Communication conduits have included twisted pairs, coaxial cables, optical fibers, wireline, wireless, satellite, and the Internet. More recently, the list extends to include radio, iPod, iPhone, Blackberry, laptop, notebook, desktop, and iPad. As such, a pipeline has been formed [7]: Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 42–54, 2010. © Springer-Verlag Berlin Heidelberg 2010
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
43
Data ---> Information ---> Knowledge Medicine is beginning to make strides using genomic information and biomarkers to study, diagnose, and treat diseases and disorders -- heralding the beginning of the era of personalized medicine. Moreover, a renewed emphasis of translational science (from bench to bed side) has similarly begun to enhance the diagnostics and screening of diseases and disorders and to improve the process of (and way for) treatment (and therapy). More recently, molecular networks, which connect molecular biology to clinical medicine, have become a major focus for translational science [24]. 1.2 The Fourth Paradigm and the “CompXInfor” Evolution Jim Gray, in his presentation to the Computer Science and Telecommunications Board, proposed what he considered a new paradigm for scientific discovery that he called the Fourth Paradigm [10]. He argued that, as early as a thousand years ago, science could be described as “empirical,” describing natural phenomena. Then, in the last few hundred years, “theoretical” branches of science used models, methods, and generalizations. In the last few decades, scientists have increasingly used “computational” models and simulations as an adjunct to study complex phenomena. Indeed, one branch of contemporary scientific approaches utilizes “data exploration” (what he called e-science) to attempt to synergistically probe and/or unify experiment, theory, and simulation. Similarly, experiments today increasingly involve megavariate datasets captured by instruments or generated by simulators and processed by software. Information and knowledge are stored in computers or data centers as databases. These information and databases are analyzed using statistical and computational tools and techniques. A point raised by Jim Gray in the above exposition [10] is that one of the central problems in scientific discovery is 'how to codify and represent knowledge in a given discipline X?'. Several generic problems include: data ingest, managing large datasets, identifying and enforcing common schema, how to organize and reorganize these data and their associated analyses, building and executing models, documenting experiments, curation, long-term preservation, interpretation of information, and transformation of information to knowledge. All these issues require computational and informatics tools and techniques. Hence the “CompXinfor” is born which means computational-X and X-informatics for a given discipline X. One example is computational biology and bioinformatics. Another is computational neuroscience and neuroinformatics. The name of this conference is related to computational brain and brain informatics. 1.3 Informatics and Information Fusion The word “Informatics” has been used very often in several different contexts and disciplines. Webster’s Dictionary (10th Edition) describes it as “Information science”, which is stated as “the collection, classification, storage, retrieval, and dissemination of recorded knowledge treated both as a pure and as an applied science.” In an attempt to place the framework and issues in proper perspective, we suggest the following: “Informatics is the science that studies and investigates the acquisition, representation, processing, interpretation, and transformation of information in, for, and by living organisms, neuronal systems, interconnection networks, and other complex systems.”
44
D.F. Hsu, B.S. Kristal, and C. Schweikert
Informatics as an emerging scientific discipline consisting of methods, processes, and applications is the crucial link between domain data and domain knowledge (see Figure 1 and Figure 2).
Fig. 1. Scope and Scale of Informatics
Fig. 2. The Autopoiesis of Informatics
Information fusion is the “integration” or “combination” of information (or data) from multiple sensors, sources, features, classifiers, and decisions in order to improve the quality of situation analysis, ensembled decisions, and action outcomes (see [1, 6, 14, 16, and 26]). Information fusion is a crucial and integral part of the process and function of informatics. Combinatorial Fusion Analysis (CFA), a recently developed information fusion method, uses multiple scoring systems to facilitate fusion of data, features, and decisions [14]. Figure 3 depicts the CFA architecture and its informatics flow.
Fig. 3. The CFA Architecture
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
45
2 Combinatorial Fusion Analysis (CFA): An Emerging Fusion Method As stated in Section (1.2) and depicted in Figure 1 and Figure 3, representation of data and the process of informatics takes a sequence of formats: Data ---> Features ---> Decision. The following remark might help clarify the complexity of the informatic process and justify the need for information fusion [11, 12, 14]. Remark 2.1: (a) Real world applications today typically involve data sets collected from different devices/sources or generated from different information sources/experiments. Different features/attributes/indicators/cues frequently use different, often non-interconvertable kinds of measurements or parameters, and different decisions/methods may be appropriate for different feature sets, different data sets, and different temporal traces. (b) Different methods/systems for decision and action may be ensembled to address potential solutions to the same problem with the same (or different) data and feature sets. 2.1 Multiple Scoring Systems (MSS) Let D be a set of, for example, documents, genes, molecules, or classes with |D| = n. Let N = [1, n] be the set of integers from 1 to n and R be the set of real numbers. We have the following: Remark 2.2: A set of p scoring systems A1, A2, …, Ap on D has each scoring system A consisting of a score function sA, a rank function rA derived by sorting the score function sA, and a Rank-Score Characteristic (RSC) function fA defined as fA: N→R in Figure 4:
Fig. 4. Rank-Score Characteristic (RSC) Function
In a set of p scoring systems A1, A2, …, Ap, there are many, indeed essentially an infinite number of different ways to combine these scoring systems into a single system A* (e.g. see [14] and [29]). Hsu and Taksa studied comparisons between score combination and rank combination [13], which are defined as follows:
46
D.F. Hsu, B.S. Kristal, and C. Schweikert
Remark 2.3: Let A1, A2, …, Ap, be p scoring systems. Let Cs(∑Ai) = E and Cr(∑Ai) = F be the score combination and rank combination defined by sE(d) = (1/p) ∑ sAi(d) and sF(d) = (1/p) ∑ rAi(d), and rE and rF are derived by sorting sE and sF in decreasing order and increasing order, respectively. Depending on application domains, performances can be evaluated using different measurements such as true/false positives and true/false negatives, precision and recall, goodness of hit, specificity and sensitivity, etc... Once one or more performance measurements can be agreed upon, the following remark states two of the most fundamental problems in information fusion. For simplicity, we only describe the case where p = 2, and a single performance metric is used. Remark 2.4: Let A and B be two scoring systems on the domain set D. Let E = Cs(A,B) and F = Cr(A,B) be a score combination and a rank combination of A and B. Let P be a performance measurement. (a) When is P(E) or P(F) greater than or equal to max{P(A), P(B)}? (b) When is P(F) greater than or equal to P(E)? 2.2 How to Compute the RSC Function fA? For a scoring system A with score function sA, as stated in Remark 2.2 and shown in Figure 4, its rank function rA can be derived by sorting the score values in decreasing order and assigning a rank value to replace the score value. The diagram in Figure 4 shows mathematically, fA(i) = (sA◦ rA-1)(i) = sA(rA-1(i)) for i in N=[1,n]. Computationally, we can derive fA simply by sorting the score values by using the rank values as the keys. The example in Figure 5 illustrates a RSC function on D = {d1,d2,…, d12} using the computational approach, which is short and easy. However, Figure 6 (a), (b), (c), (d), and (e) depict a statistical approach to derive the same RSC
D d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12
Score function s:D→R 3 8.5 8 4.5 4 10 9.5 3.5 1 2 5 5.5
Rank function r:D→N 10 3 4 7 8 1 2 9 12 11 6 5
RSC function f:N→R 1 10 2 9.5 3 8.5 4 8 5 5.5 6 5 7 4.5 8 4 9 3.5 10 3 11 2 12 1
Fig. 5. Computational Derivation of RSC Function
Rank-Score Characteristics (RSC) Function and Cognitive Diversity Variable Score Function Rank Function
47
d1 3
d2 8.5
d3 8
d4 4.5
d5 4
d6 10
d7 9.5
d8 3.5
d9 1
d10 2
d11 5
d12 5.5
di s(di)
10
3
4
7
8
1
2
9
12
11
6
5
r(di)
(a) Score Function and Rank Function
(b) epdf = empirical probability distribution function
(c) cumulative epdf = cepdf ---> epcdf ---> cdf = cumulative distribution function
(d) i- cdf = inverse-cdf
(e) reversed inverse-cdf
Figure 6(c) → Figure 6(d): Interchange x, y coordinates, x ↔ y Figure 6(d) → Figure 6(e): Re-label x-coordinates 12x → x and transform f(i) → f(13-i) Rank Score Characteristic (RSC) Function
i
1
2
3
4
5
6
7
8
9
10
11
12
N
f(i)
10
9.5
8.5
8
5.5
5
4.5
4
3.5
3
2
1
R
(f) Resulting RSC function f = s ° f-1 Fig. 6. Statistical Derivation of RSC Function
48
D.F. Hsu, B.S. Kristal, and C. Schweikert
function, which is much longer and more complicated. For the sake of contrast and comparison, we use the same data set D = {d1,d2,…,d12}. The function in Figure 6(d) is the inverse of that in Figure 6(c). The function in Figure 6(e) is derived from Figure 6(d) by re-labeling the x-coordinates 12x → x and applying the following transformation f(i) → f(13-i). According to this statistical approach, the function in Figure 6(e), as derived from Figure 6(a), would have been called “reversed inverse empirical probability cumulated distribution function.”
3 Diversity vs. Correlation 3.1 RSC Function for Computing Cognitive Diversity Let D be a set of twenty students, and consider the example of three professors A, B, C assigning scores to this class at the end of a semester. Figure 7 illustrates three potential RSC functions fA, fB, and fC, respectively. In this case, each RSC function illustrates the scoring (or ranking) behavior of the scoring system, which is one of the professors. The example shows that Professor A has a very evenly distributed scoring practice while Professor B gives less students high scores and Professor C gives more students high scores.
Fig. 7. Three RSC functions fA, fB, and fC
This example thus highlights a use of multiple scoring systems, as we could use a multiple scoring system to assess how good a given student was in the combined views of the three professors. Specifically, in a multiple scoring system, suppose we have two systems A and B. The concept of diversity d(A,B) can be defined in the following (see [14]). Remark 3.1: For scoring systems A and B, the diversity d(A,B) between A and B can be defined as: (a) d(A,B)= 1-d(sA,sB), where d(sA,sB) is the correlation (e.g. Pearson’s z correlation) between score functions sA and sB, (b) d(A,B)=1-d(rA,rB), where d(rA,rB) is the rank correlation (e.g. Kendall’s τ or Spearman’s ρ) between rank functions rA and rB, and (c) d(A,B)=d(fA, fB), the diversity between RSC functions fA and fB.
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
49
Correlation is a central concept in statistics. Correlation has been shown to be very useful in many application domains which use statistical methods and tools. However, it is always a challenge to process, predict, and interpret correlations in a complex system or dynamic environment. More recently, for example, Engle discussed the challenge of forecasting dynamic correlations which play an essential role in risk management, portfolio management, and other financial activities [8]. Diversity, on the other hand, is a crucial concept in informatics. In machine learning, data mining, and information fusion, it has been shown that when combining multiple classifier systems, multiple neural nets, and multiple scoring systems, higher diversity is a necessary condition for improvement [2, 14, 16, 26, 28]. Figure 8 shows some characteristic differences between correlation and diversity.
Correlation / Similarity Diversity / Heterogeneity
Likely Target
Domain Rules
Reasoning / Method
Opposite Concept
Measurement / Judgment
Fusion Level
Object
Syntactic
Statistics
Difference
Data
Data
Subject
Semantic
Informatics
Homogeneity
Decision
Feature / Decision
Fig. 8. Correlation/Similarity vs. Diversity/Heterogeneity
3.2 The Role of Diversity in Information Fusion In regression, multiple classifier systems, multiple artificial neural nets, and multiple scoring systems, it has been shown that combination, fusion, or ensemble of these systems can outperform single systems. A variety of diversity measures and fusion methods have been proposed and studied [2, 14, 16, 26, 28]. However, it remains a challenging problem to predict and assess the fused system performance in terms of the performance of and the diversity among individual systems. In regression, Krogh and Vedelsby [15] established the following elegant and practical relationship: E = Ē - Ā, where E is the quadratic error of the combined estimate, Ē is the average quadratic error of the individual estimation, and Ā is the variance among the individual estimates. In classification, a relationship of this kind is not as clear. In classifier ensemble, using majority voting and entropy diversity, Chung, Hsu and Tang [4] showed that:
max{P − D, p ( P + D) + 1 − p} ≤ P m ≤ min{P + D, p ( P − D)} , where Pm is the performance of the ensemble of p classifiers using majority voting,
P is the average performance of the p classifiers and D is the average entropy diversity among p individual classifiers. These upper and lower bounds were shown to be tight using the concept of a performance distribution pattern (PDP) for the input set. More recently, tight bounds of Pm in terms of
P and Dis (the pairwise disagreement measure) and similar results in terms of P and D for tight bounds of Ppl using plurality voting have been established [3, 5].
50
D.F. Hsu, B.S. Kristal, and C. Schweikert
In multiple scoring systems, several results have been obtained that demonstrate that combining multiple scoring systems can improve the result only if (a) each of the individual scoring systems has relatively high performance and (b) the individual scoring systems are diverse [13, 14, 19, 22, 29]. A closed formula or bounds for the combined system is yet to be found. In the next section, we discuss several examples in different application domains with two necessary conditions (a) and (b) for the combination to be positive, i.e. its performance is better than each individual system.
4 Examples of Domain Applications In this section, we show examples of domain applications in information retrieval, virtual screening, and target tracking where RSC function is used to define cognitive diversity [13, 19, 29]. Applications in other domains include bioinformatics, text mining, protein structure prediction, portfolio management, and online learning [17, 18, 20, 21, 25, 27]. 4.1 Comparing Rank and Score Combination Methods Let A and B be two scoring systems on a set of five hundred documents in a retrieval system. Let fA and fB be the RSC function of A and B, respectively, as defined in Remark 2.2. Let E and F be two new scoring systems related to score combination and rank combination, respectively, as defined in Remark 2.3. Using the symmetric group S500 as the sample space for rank functions with respect to five hundred documents, Hsu and Taksa [13] showed the following: Remark 4.1: Under certain conditions, such as the greatest value of the diversity d(fA, fB), the performance of rank combination is better than that of score combination, P(F)≥P(E), under both performance evaluation of precision and average precision. Figure 9 gives an illustration of two sets of RSC functions: one with ten documents and scores ranging from 1 to 10 and the other with five hundred documents and scores ranging from 1 to 100. These two examples indicate that as long as the diversity
Fig. 9. Two RSC functions with (a) n=10 and s=10 and (b), n=500 and s=100
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
51
between two RSC functions d(fA, fB) is large (one measure is the area between these two functions fA and fB), rank combination of A and B, Cr(A, B) = F has better precision than score combination of A and B, Cs(A, B) = E [13]. 4.2 Improving Enrichment in Virtual Screening Virtual screening of molecular compound libraries has, in the past decades, been shown to be a distinct, useful and potentially faster and less expensive method for novel lead compound discovery in drug design and discovery. However, a major weakness of virtual screening - the inability to consistently and accurately identify true positives - is probably due to our insufficient understanding of the chemistry involved in ligand binding and the scoring systems used to screen these ligands. Although it has been demonstrated that combining multiple scoring systems (consensus scoring) would improve the enrichment of true positives, it has been a challenge to provide a theoretical foundation which explains when and how combination for virtual screening should be done. Using five scoring systems with two genetic docking algorithms on four target proteins: thymidine kinase (TK), human dihydrofolate reductase (DHFR), and estrogen receptors of antagonists and agonists (ER antagonist and ER agonist), Yang et al [29] demonstrated that high performance ratio and high diversity are two conditions necessary for the fusion to be positive, i.e. combination performs better than each of the individual systems. Figure 10 illustrates two necessary conditions (high performance ratio and high diversity using RSC functions) for positive enrichment among eighty rank or score combinations of two individual systems. These examples suggest that applications of consensus scoring could increase the hit rate and reduce the false positive rate. Moreover, the improvement depends heavily on both factors: (a) the performance of each of the individual systems and (b) the diversity among individual scoring systems as stated in Section 3.2.
Fig. 10. Positive vs. Negative cases
52
D.F. Hsu, B.S. Kristal, and C. Schweikert
4.3 Target Tracking under Occlusion Target tracking is the process of predicting the future state of a target by examining the current state from a sequence of information and data collected by a group of sensors, sources, and databases. Multi-target tracking can be very complicated because targets can occlude one another affecting feature or cue measurements. Lyons and Hsu [19] applied a multisensory fusion approach, based on Combinatorial Fusion Analysis and the RSC function to measure cognitive diversity, to study the problem of multisensory video tracking with occlusion. Each sensory cue is considered as a scoring system. A RSC function is used to characterize scoring (or ranking) behavior of each system (or sensor). A diversity measure, computed using the variation in the RSC function, is used to dynamically choose the best scoring system to combine and the best operations to fuse. The relationship between the diversity measure and the tracking accuracy of two fusion operations (rank combination vs. score combination) is evaluated using a set of twelve video sequences. In this study, Lyons and Hsu [19] demonstrated that using RSC function as a diversity measure is an effective method to study target tracking video with occlusions. The experiments by Lyons and Hsu [19] confirm what Hsu and Taksa [13] and Hsu, Chung, and Kristal [14] proposed that the RSC function is a feasible and useful characteristic to define cognitive diversity and to guide us in the process of fusing multiple scoring systems.
5 Conclusion and Remarks In this paper, we show that the Rank-Score Characteristic (RSC) function as defined in Combinatorial Fusion Analysis can be computed easily and can be used to measure cognitive diversity among two or more scoring systems. We also show that diversity measure using the RSC function is different from the concept of correlation in statistics. Moreover, the notion of diversity using RSC function plays an important role in the fusion of data, features, cues, and decision in classification and other decision making. Three domain applications in information retrieval and search algorithms, virtual screening and drug discovery, and target tracking and recognition were discussed [13, 19, 29]. We wish to include other domain applications in bioinformatics, text mining protein structure prediction, online learning, and portfolio management later in a future report [17, 18, 20, 21, 25, 27]. In the future, we wish to study diversity using RSC function in application domains such as sports teams ranking, figure skating judgment, ecology, and biodiversity (e.g. [9]). Living organisms, neuronal systems, interconnection networks, and other complex systems are in great need for efficient and effective methods and techniques such as computational X and X-informatics in the emerging field of Informatics. Information fusion plays an important role in the CompXinfor e-science approach in modern day scientific discovery.
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
53
References [1] Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008) [2] Brown, G., Wyatt, J.L., Harris, R., Yao, X.: Diversity creation methods: A survey and categorisation. Journal of Information Fusion 6(1), 5–20 (2005a) [3] Chun, Y.S., Hsu, D.F., Tang, C.Y.: On the relationships among various diversity measures in multiple classifier systems. In: 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN 2008), pp. 184–190 (2008) [4] Chung, Y.S., Hsu, D.F., Tang, C.Y.: On the Diversity-Performance Relationship for Majority Voting in Classifier Ensembles. MCS, 407–420 (2007) [5] Chung, Y.S., Hsu, D.F., Liu, C.Y., Tang, C.Y.: Performance Evaluation of Classifer Ensembles in Terms of Diversity and Performance of Individual Systems (submitted) [6] Dasarathy, B.V.: Elucidative fusion systems—an exposition. Information Fusion 1, 5–15 (2000) [7] Denning, P.J.: The profession of IT: The IT schools movement. Commun. ACM 44(8), 19–22 (2001) [8] Engle, R.: Anticipating Correlations: A New Paradigm for Risk Management. Princeton University Press, Princeton (2009) [9] Gewin, V.: Rack and Field. Nature 460, 944–946 (2009) [10] Hey, T., et al.(eds.): Jim Gray on eScience: A Transformed Scientific Method, in the Fourth Paradigm, pp. 17–31. Microsoft Research(2009) [11] Ho, T.K.: Multiple classifier combination: Lessons and next steps. In: Bunke, H., Kandel, A. (eds.) Hybrid methods in pattern recognition, pp. 171–198. World Scientific, Singapore (2002) [12] Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier system. IEEE Trans. on Pattern Analysis and Machine Intelligence 16(1), 66–75 (1994) [13] Hsu, D.F., Taksa, I.: Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval 8(3), 449–480 (2005) [14] Hsu, D.F., Chung, Y.S., Kristal, B.S.: Combinatorial fusion analysis: methods and practice of combining multiple scoring systems. In: Hsu, H.H. (ed.) Advanced Data Mining Technologies in Bioinformatics. Idea Group Inc., USA (2006) [15] Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. In: Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. M.I.T. Press, Cambridge (1995) [16] Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. WileyInterscience, Hoboken (2004) [17] Li, Y., Hsu, D.F., Chung, S.M.: Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics. In: 21st IEEE International Conference on Tools with Artificial Intelligence, pp. 508–517 (2009) [18] Lin, K.-L., et al.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nanobioscience 6(2), 186–196 (2007) [19] Lyons, D.M., Hsu, D.F.: Combining multiple scoring systems for target tracking using rank-score characteristics. Information Fusion 10(2), 124–136 (2009) [20] McMunn-Coffran, C., Schweikert, C., Hsu, D.F.: Microarray Gene Expression Analysis Using Combinatorial Fusion. BIBE, 410–414 (2009) [21] Mesterharm, C., Hsu, D.F.: Combinatorial Fusion with On-line Learning Algorithms. In: The 11th International Conference on Information Fusion, pp. 1117–1124 (2008) [22] Ng, K.B., Kantor, P.B.: Predicting the effectiveness of naive data fusion on the basis of system characteristics. J. Am. Soc. Inform. Sci. 51(12), 1177–1189 (2000)
54
D.F. Hsu, B.S. Kristal, and C. Schweikert
[23] Norvig, P.: Search. In ”2020 visions”. Nature 463, 26 (2010) [24] Schadt, E.: Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 (2009) [25] Schweikert, C., Li, Y., Dayya, D., Yens, D., Torrents, M., Hsu, D.F.: Analysis of Autism Prevalence and Neurotoxins Using Combinatorial Fusion and Association Rule Mining. BIBE, 400–404 (2009) [26] Sharkey, A.J.C. (ed.): Combining Artificial Neural Nets: Ensemble and. Modular MultiNet Systems. Perspectives in Neural Computing. Springer, London (1999) [27] Vinod, H.D., Hsu, D.F., Tian, Y.: Combinatorial Fusion for Improving Portfolio Performance. In: Advances in Social Science Research Using R, pp. 95–105. Springer, Heidelberg (2010) [28] Whittle, M., Gillet, V.J., Willett, P.: Analysis of data fusion methods in virtual screening: Theoretical model. Journal of Chemical Information and Modeling 46, 2193–2205 (2006) [29] Yang, J.M., Chen, Y.F., Shen, T.W., Kristal, B.S., Hsu, D.F.: Consensus scoring for improving enrichment in virtual screening. Journal of Chemical Information and Modeling 45, 1134–1146 (2005)
Cognitive Effort for Multi-agent Systems Luca Longo and Stephen Barrett Department of Computer Science and Statistics - Trinity College Dublin {llongo,stephen.barrett}@cs.tcd.ie
Abstract. Cognitive Effort is a multi-faceted phenomenon that has suffered from an imperfect understanding, an informal use in everyday life and numerous definitions. This paper attempts to clarify the concept, along with some of the main influencing factors, by presenting a possible heuristic formalism intended to be implemented as a computational concept, and therefore be embedded in an artificial agent capable of cognitive effort-based decision support. Its applicability in the domain of Artificial Intelligence and Multi-Agent Systems is discussed. The technical challenge of this contribution is to start an active discussion towards the formalisation of Cognitive Effort and its application in AI.
1
Introduction
Theoretical constructs of attention and cognitive effort have a long history in psychology [11]. Cognitive effort is often understood as a multi-faceted phenomenon and a subjective concept, influenced by attention, that changes within individuals in response to individual and environmental factors [18]. Such a view, sustained by motivation theories, contrasts with empirical studies that have tended to treat attention as a static concept [6]. Theories of information processing consider cognitive effort as a hypothetical construct, regarded as a limited capacity resources that affects the speed of information processing [11]. Studies suggest that, even though cognitive effort may be a hypothetical construct, it is manifest as a subjective state that people have introspective access to [10]. Attention can be related to physiological states of stress and effort, to subjective experiences of stress, mental effort, and time pressure, and to objective measures of performance levels to breakdown in performance. These various aspects of attention have led to distinct means for assessing cognitive effort including physiological criteria such as heart rate, performance criteria such as quantity and quality of performance and subjective criteria such as rating of level of effort. Despite the interest in the topic for the past 40 years, there is no universally accepted and clear definition of cognitive effort often referred to as mental workload [9]. There appears to be little work to link the measurement of workload by any one paradigm to others and the lack of a formal theory of cognitive effort has lead to a proliferation of several methods with little chance of reconciliation [7]. Formalising cognitive effort as a computational concept would appear to be an interesting step towards a common definition and an opportunity to provide a usable structure for investigating behaviours. The goal of this paper is to facilitate such a development through the presentation of a formalisation of cognitive Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 55–66, 2010. c Springer-Verlag Berlin Heidelberg 2010
56
L. Longo and S. Barrett
effort in a organised fashion using formal tools. The principal reason for measuring cognitive effort is to quantify the mental cost of performing tasks to predict operator and system performances. It is studied from the point of view of artificial agents: our formalism does not aim to be the de-facto standard but it provides the tools necessary for its own revision. We are concerned with two key issues: How can we formalise cognitive effort as a usable computational concept? How can we provide cognitive effort-based decision supporting capabilities to an artificial agent? The methodology adopted to model cognitive effort is presented in section 2. The subjective nature of the concept is underlined in section 3 where a literature review identifies some of the main factors, amenable to computational treatment, that influence cognitive effort along with related works. We present our heuristic formalism in section 4. In 5 an optimisation problem in multi-agent systems is presented that aims to clarify a possible application of our heuristic formalism. We address open issues and future challenges in section 6.
2
Attacking the Phenomenon: Our Approach
Cognitive effort is a subjective, elusive concept and its precise definition is far from trivial. Indeed, the contextual aspect of the phenomenon may render attempts at both precise and generally applicable definition impossible in practice. Our approach tries to study the essential behaviour of cognitive effort and seeks to capture some of its aspects in a formalism structured as an open and extensible framework. Our method is based on a generalist assessment of the available literature, seeking to merge together different observations, intuitions and definitions towards a tool for the assessment of Cognitive effort in practical scenarios. The multi-agent paradigm is a powerful tool for investigating the problem. Although an agent’s cognitive model of its human peer is not necessarily precise, having at least a realistic model can be beneficial in offering unintrusive help, bias reduction, as well as trustable and self-adjustable autonomy. It is feasible to develop agents as cognitive aids to alleviate human bias, as long as an agent can be trained to obtain a model of a human’s cognitive inclination. Furthermore, with a realistic human cognitive model, an agent can also better adjust its automation level [19].
3
Cognitive Effort and Related Work
The assessment of the cognitive effort expended in the completion of a task is dependent on several factors such as individual skill, background and status that means the individual’s subjective experience and cognitive ability. Self-regulation theories [4] suggest that individuals with different levels of cognitive ability may react to changes in task difficulty in different ways because their perception of the task may be different. High ability individuals have a larger pool of cognitive resources than their counterparts who need to make larger resource adjustments to achieve the same outcome. People of low ability who perceive a high degree
Cognitive Effort for Multi-agent Systems
57
of difficulty in a task, will expend greater cognitive effort [20]. Similarly, intentions play a role on attention, and individuals with strong intentions allocate more cognitive effort to a task: highly conscientious individuals choose to work harder and persevere longer than their counterparts [1]. In the literature, curiosity, motivation, psychological stress, anxiety are often referred as arousals [11] and have a strong impact on attention and therefore on cognitive effort. Similarly, time plays a central role on attention as well: time-pressure may increase the amount of attention an individual needs to allocate on a task. Furthermore, performing a task requires an interval of time in which an individual has to elicit an amount of cognitive effort. Finally, contextual biases may influence attention over time: these may be unpredictable external distractions, contextual or taskrelated constraints. All these factors represent a sub-portion of all the possible factors used by several existing models of workload and mental effort. The popular NASA-Task load index, for instance, consists of six clusters of variables such as mental, physical and temporal demands, frustration, effort and performance [8]. The Subjective Workload Assessment Technique is a subjective rating technique that considers time load, mental effort and psychological stress load to assess workload [14]. Multi-agents systems are often used to model social structure where artificial agents collaborate with each other towards a common goal [17] seeking to find the best solution for their problems autonomously without human intervention. Most of the work in agent-based systems has assumed highly simplified agent models and artificial agents developed so far incorporate a wide range of cognitive functionalities such as memory, representation, learning and sensory motor capabilities. However, at present, there is a weak consideration of cognitive effort in multi-agent systems [16]. Integrating cognitive effort in an artificial agent may increase its robustness in terms of interdependence with other agents and the ability in the decision-making process, without loosing any of the freedom of choice such agents will be expected to possess.
4
A Presumption-Based Heuristic Formalism
As discussed briefly in section 3, models of cognitive effort involve a highly contextual and individual-dependent set of factors. Our approach begins by focusing on a set of context-dependent influencing factors, each representing a presumption or interpretation of facts in literature useful for inferring cognitive effort. Each presumption needs to be formally conceptualised in order to be computable, and in the following paragraphs we present six factors with different difficulty of formalisation. The set of factors considered here can be expanded, refined, criticised and reducted: we provide these as illustrative of our approach. The aim of our framework is to be open, extensible and applicable in different contexts where only some influencing factor can be monitored, captured and conceptualised formally. Cognitive Ability. Some people obviously and consistently understand new concepts quicker, solve new problem faster, see relationship and are more
58
L. Longo and S. Barrett
knowledgeable about a wider range of topics than others. Modern psychological theory views cognitive ability as a multidimensional concept and several studies, today known as IQ tests, tried to measure this trait [5]. Carroll suggested in his work [3] that there is a tendency for people who perform well in a specific range of activities, to perform well in all others as well. Prof. T. Salthouse suggested in his recent work [15] that some aspects of people’s cognitive ability peak around the age of 22 and begin a slow decline starting around 27. However, he pointed out that there is a great deal of variance among people and most cognitive functions are at a highly effective level into their final years, even when living a long life. Some type of mental flexibility decreases relatively early in adulthood, but how much knowledge one has, and the effectiveness of integrating it with one’s abilities may increase throughout all of adulthood if there are no pathological diseases. This research provides suitable evidence to model cognitive ability with a long-term growing function as the flexible sigmoid function proposed by Yin [22]: CA : [1..Gth ] ∈ ℵ3 → [0..1] ∈ G Gth −t t th −Gr CA(Gth , Gr , t) = CAmax 1+ GGthth−G Gth r where CA is cognitive ability whose maximum level is defined by CAmax (in this case equal to 1) and t is the age in years of an individual. Gth is the growing threshold, set to an average of mortality of 85 years and Gr is the growing rate, set to 22 years that identifies where the curve reaches the maximum growing weight and from that, increases moderately. The properties Gth and Gr are flexible because they may be set by considering environmental factors. Arousal. The concept of arousal plays an important role in assessing cognitive effort. It is sometimes treated in literature as a unitary dimension, as if a subject’s arousal state could be completely specified by a single measurement such as the size of his pupil. However, this is an oversimplification since arousal is a multidimensional concept that may vary in different situations [11]. Its intrinsic degree of uncertainty and subjectiveness are hard to model and we propose a simple subjective arousal taxonomy where different types of arousal, such as curiosity, motivation, anxiety, psychological stress, are organised in a multi-level tree. A subjective arousal taxonomy is a 3-tuple < A, W, R >, composed by vertexes A connected as a tree by unidirectional weighted edges, defined in R, by using the weights in W . Each vertex has at most one parent, except the root node Aroot which has no parent and represents the final level of arousal that influences cognitive effort. A : {a|a ∈ {[0..1] ∈ }}
W : {w|w ∈ {[0..1] ∈ }}
R : {∀ ai ∈ A ∃! r | r : A × A → W, r : (ai , ap ) = w} internal Aleaf explicit ∪ Aaggregated = A; ∀ ai ∈ A ∃! path(ai , aroot )
All the nodes have a path towards the root node: this property guarantees the non-presence of cycles. Leaf nodes (node without children) are values explicitly provided by an agent: they indicate the related degree of a given type of arousal
Cognitive Effort for Multi-agent Systems
59
(eg. 0 is not motivated at all, 1 is highly motivated). Internal nodes represent aggregation nodes and like the root node’s value are inferred by the relationship with their children defined in R along with the related strength in W . In particular, each internal node’s value is the weighted sum of its c children’s values: c leaf internal (az · wz ) ≤ 1 aexplicit = [0..1] ∈ , aaggregated = z=0
Finally, the root node is a special internal node with weight wroot = 1 and, as it has no parent, its relation rroot = ∅. The weights w in the arousal taxonomy may be derived from the literature or learnt while the explicit values aleaf explicit represent an individual’s subjective status before starting a task. An example of a possible subjective arousal taxonomy is depicted in figure 1. Based on the level of arousal, we may adopt the descriptive Yerkes-Dodson law [21] which empirically studied the relationship between performance and arousal. For example, the authors discovered that increasing the intensity of a shock administered to mice facilitated the learning of brightness discrimination, up to a point. However, further increases of shock intensity caused learning deteriorate. These conclusions, appear to be valid in an extraordinarily wide range of situations. The law is usually modeled with an inverted U-shape curve which increases the level of performance at low level of arousal and then decreases with higher levels of arousal. The law is task-dependent: different tasks require different levels of arousal for optimal performance thus the shape of the curve can be highly variable. The first part of the curve, which increases, is positively affected by the strength of arousal while the second part, which decreases, is influenced by the negative effect of arousal on cognitive effort. The law is useful to study the maximum performance an agent can achieve based on his subjective status before performing a task. As each task may have a different complexity and as the law is task-dependent, we propose to introduce a task dictionary formally described as a tuple < T S, A, P, T P, D, Y D, δyd , δd , δtp >: · T S ⊆ ℵ is the set of possible tasks; · A, P, T P, D ⊆ {[0..1] ∈ } are the possible set of values for Arousal, Performance, Time-pressure, Difficulty; · Y D ⊆ {fyd : A → P } is the set of possible functions that model the YerkesDodson law. Each of them takes an arousal level and return a performance value; · δyd : T S → Y D assigns to a task a Y.D. law function; · δtp : T S → T P assigns to a task a degree of time-pressure. · δd : T S → D maps for each task a level of difficulty. A task dictionary example is depicted in table 1. Here the YD laws associated to each task are for descriptive purposes but in reality they may be approximated with experiments via numerical analysis. Once we have the subjective arousal taxonomy for a subject and the task difficulty dictionary, we are able to study the effect of arousal on cognitive effort. The derived performance is the maximum level of attention that a subject can elicit on a certain task. Formally, given a task
60
L. Longo and S. Barrett Table 1. Tasks dictionary with descriptive YD equations Description
D TP
YD law (a ∈ A) 2
math equation 0.8 0.9 fyd (a) = e−a reading/summary 0.6 0.7 fyd (a) = −a2 + a reading 0.3 0.2 fyd (a) = −2a2 + 2a 2
dictate 0.4 0.8 fyd (a) = e−(a−2) memorising poetry 0.7 0.6 fyd (a) = −3a2 + a
Fig. 1. A possible Subjective Arousal Taxonomy
ts ∈ T S, the max performance p on the task ts is derived from the associated Yerkes-Dodson law with an input level of arousal a, that means p = (δyd (ts))(a). Intentions. A subject’s intentions have an important role in determining the amount of cognitive effort while performing a task. As with arousal, this is an individual, subjective concept that may be split into short-term and long-term intentions, and that may be modelled with real values. We refer to short-term intentions or momentary intentions with Ist and to long-term intentions with Ilt . Those are subjective judgments in the range [−1..1] ∈ (-1: no intention at all; 1: highly intentioned). The overall degree of intentions I is the average of the above values and may have a negative, positive or null influence on cognitive effort: I : [0..1] ∈ 2 → [−1..1] ∈ , IST , ILT : [−1..1] ∈ I(IST , ILT ) = 32 IST + 31 ILT This model deals with intentional shades: an individual may be momentarily intentioned to success in a IQ test without any future intention. Involuntary Context Bias. Several external factors may influence cognitive effort as pseudo-static and unpredictable biases. The former refers to biases that are almost static and depend on environmental aspects. For instance, there is a large difference across ethnic groups and geographic areas in the available knowledge: people living in poor African countries have a reduced access to knowledge compared to their counterpart in occidental countries so they may find a question dependent on access to information to be more difficult to answer. Another pseudo-static bias is the task’s difficulty. Even though it is hard to exactly estimate the complexity of different tasks, it is not unfeasible perhaps
Cognitive Effort for Multi-agent Systems
61
to claim that reading a newspaper demands less cognitive effort than resolving a math equation. Unpredictable context biases represent involuntary context biases such as a phone ringing, questions from colleagues, e-mail delivering in a working context. These involuntary distractions and environmental aspects, in comparison to arousals and intentions, are easier to embed in a formalism as they are not individual-dependent. We propose real fuzzy values to model contextual available knowledge and unpredictable bias, while the level of task difficulty is obtained from the task dictionary. Knowledge availability is a positive factor, that means it elicits less cognitive effort, while task-difficulty and unpredictable bias are negative as they require more cognitive effort as value increases. The higher the value of contextual bias is, the more a subject has to concentrate allocating more cognitive effort on a task. To model how context bias negatively affects attention, we take the complement of knowledge availability: CB : [0..1] ∈ 3 → [0..1] ∈ , CB(Cknow , Tdif f , Ubias ) =
Cknow ,Tdif f ,Ubias : [0..1]∈
[1 − Cknow ] + Tdif f + Ubias 3
where CB is the total context bias, Cknow is the contextual knowledge availability, Tdif f is the task difficulty and Ubias is the unpredictable bias. Perception. The same task may be perceived differently by two subjects. In literature there is evidence suggesting that perceived difficulty is higher when individuals are presented with a new task: they may not know what the optimal amount of effort is, given a particular difficulty level [20]. We propose to model this concept as a simple real fuzzy value Pdif f = [0..1] ∈ where values close to 0 indicate a task perceived highly complex. Perception is connected to cognitive ability and skill acquisition. Intermediate students may perceive the resolution of math equations to be difficult compared to university students due to their limited experience, preparation and background. Perception has a negative effect as a subject who perceives a task to be difficult needs to allocate more resources eliciting higher cognitive effort. Time. Time is a crucial factor that must be considered in modelling cognitive effort. Temporal properties are essential because performing a task is not a single-instant action, rather is an action over time, therefore cognitive effort’s influencing factors need to be considered over time. Our environment is dynamic and, consequently, time-related: the temporal dimension is an important aspect of perception necessary to guide effective action in the real world [12]. Several temporal theories are available in the literature of computer science but less effort has been spent on the temporal-related aspect of cognitive effort. Firstly, we take into consideration time as a single stimulus that influence attention. We refer to this as time-pressure which is sometimes imposed by explicit instruction to hurry and sometimes by intrinsic characteristics of the explicit : [0..1] ∈ . For task. The former may me modelled as a fuzzy value Tpress instance, a student may resolve a task within an interval of 10 minutes. In this case we need to estimate or learn the maximum time to perform a task
62
L. Longo and S. Barrett
(mapped to 1) and transform 10 minutes in the scale [0..1]. The latter may be implicit modelled as a fuzzy value Tpress : [0..1] ∈ and we propose to adopt the task-related time-pressure value from the task-dictionary previously proposed that underlines the intrinsic pressure imposed by a certain task. For instance, a student may resolve an integral equation which requires an auto discipline and rigorousness in performing the task. He must keep track of the initial problem, partial results, the next step, requiring greater cognitive effort: slowing down or even stopping for just an instant of time may force the student to start again. The more difficult arithmetic problems require more storage, the more they impose high time-pressure eliciting greater cognitive effort [11]. The final degree of time-pressure is modelled as the average of the above values: explicit implicit Tpress : [0..1] ∈ , Tpress = 12 Tpress + 21 Tpress
Everyday experience suggests that time intervals also play an important role in directing our attention to the external world. Cognitive effort may vary while performing a task due to the variation on the degree of focused attention and sustained attention. The former is referred to as the ability to respond discretely to specific visual auditory or tactile stimuli while the latter refers to the ability to maintain a consistent behavioural response while performing a certain task [11]. The modelling of focused, and sustained attention, is not easy at all and these properties are individual-dependent. However, taking into consideration a certain task, a trajectory that describes how the degree of sustained attention would likely behave for most of the people on that task would be useful. To deal with this we propose an extension for our task-dictionary by adding an estimation of the time needed to complete a certain task. This value may be learnt through experimentations by using unsupervised techniques and is needed to estimate the end of a certain task in order to model the focused attention function. This function likely has a S-shape that increases quickly at the beginning reaching the maximum peak of attention, then decreases very moderately during the sustained attention time-interval, and decreases quicker until the estimated time for the completion of the task. Yet, this function may be approximated by applying numerical analysis and should model the fact that, at the beginning, people elicit almost the highest degree of attention, which is the maximum performance level obtained by the Yerkes-Dodson law of a task ts with a given arousal a defined before (p = (δyd (ts))(a)); from here it follows an interval of time in which individuals perform well, maintaining a high level of sustained attention. Then the curve starts to decrease towards the estimated end-point of the task from which the function persists but at very low levels, underlying that a small amount of cognitive effort is dedicated to the task. Formally, we add to the task-dictionary: · T ⊂ is the domain of time; · AT ⊆ P is the set of possible degrees for attention; · SA : {ff a : T → AT } is the set of possible functions that models the concept of focused attention for tasks; · δf a : T S → SA is the function that maps a S-shaped function from the domain SA to a given task;
Cognitive Effort for Multi-agent Systems
63
· δT E : T S → T is the function that assigns to a task a completion estimated time. The completion time would be useful for understanding whether an agent performed similarly to others or required further time to complete a task or even before giving up. Taking into account the explanations so far, we are now able to provide a general formula to compute cognitive effort of an agent on a given task along with a formalism summary depicted in figure 2. CE : [0..1] ∈ 5 × [−1..1] ∈ × T S × T 2 →
CA = CA(Gth ,Gr ,t), A = (δyd (α))(Aroot ), P D = P D
t = tpress , I = I(Ist ,Ilt ), CB = CB(Cknow ,Tdif f ,Ubias )
CE(CA , A , I , CB , P D , t , α, t0 , t1 ) =
t1 CA + A + I + CB + P D + t δf a (α) (x) dx 6 t0 where CA is cognitive ability, A represents arousals, I is intentions, CB is contextual bias, tpress is the time pressure, t0 is the start time and t1 the time spent on the task α.
Fig. 2. The Cognitive Effort’s formalism
5
A Multi-agent Application
In this section we take the viewpoint of an agent α situated in an open environment trying to choose the best interaction partners from a pool of potential agents A and deciding on the strategy to adopt with them to resolve an effort-full task T in an optimal way. Our heuristic, based on cognitive effort, represents a possible strategy to select reliable partners. Each agent in the system has certain cognitive properties such as experience, motivation, intentions, cognitive ability, and it is realistic to assume that they operate in environments with different constraints and biases. Furthermore, we assume each agent acts honestly and provides real information about its cognitive status. An agent α may split the
64
L. Longo and S. Barrett Table 2. Agents, influencing factors and Cognitive Effort Factor Gth (Growing Threshold) GR (Growing Rate) Age (Years) CA (Cognitive Ability) IST (Short-term Intentions) ILT (Long-term Intentions) I (Intentions) Pdif f (Perceived Difficulty) T P (Time Pressure) Cknow (Context Knowledge) Tdif f (Task Difficulty) Ubias (Unpredictable Bias) CB (Contextual Bias) ar1 (Anxiety) ar2 (Curiosity) ar3 (Sleepiness) ar4 (Tiredness) ar5 (Motivation) Aroot (Arousal Root) 2
fyd (A) = e−Aroot t0 /t1 (secs) C.E. (Cognitive Effort)
a1 85 22 18 0.25 0.4 0.6 0.47 0.7 0.5 1 0.8 0.6 0.47 0.5 0.7 0.3 0.4 0.6 0.67
a2 a3 a4 a5 85 85 85 85 22 22 22 22 25 28 40 55 0.37 0.43 0.62 0.82 0.6 -0.5 -1 0.5 -0.3 -1 0 0.2 0.3 -0.67 -0.67 0.4 0.7 0.6 0.5 0.7 0.5 0.5 0.5 0.5 1 0.9 0.8 1 0.8 0.8 0.8 0.8 0.3 0.4 0.6 0.7 0.37 0.43 0.53 0.5 0.7 0.5 0.4 0.3 0.3 0.5 0.8 0.4 0.3 0.5 0.7 0.4 0.3 0.6 0.2 0.5 1 0.8 0.6 0.3 0.81 0.72 0.67 0.37
0.64 0.52 0.59 0.64 0.87 0 / 55 0 / 40 0/45 0/40 0/50 18.30 14.32 11.00 11.87 22.63
task T in partial sub-tasks t1 ...tn with the same estimation of required effort. We suppose he has direct connections with 5 agents, a1 .. a5 ∈ A , and it forwards to each of them one of the 5 sub-tasks t1 ..t5 .. Now, each agent starts to resolve the sub-task of competence by using its own resources, skills and experience. Once the sub-task is completed, they send back to α their subjective status of arousals, their intentions, cognitive ability, perception, involuntary context bias and the start/stop time needed to complete the assigned sub-task. Let’s assume the agent α adopts the first task (math equation) of the task-dictionary depicted in table 1 and uses the subjective arousal taxonomy depicted in figure 1 with the explicit values (ari ) provided by each agent and showed in table 2. The 2 Yerkes-Dodson law associated to the task is δyd (α) = fyd (a) = e−a while the task difficulty is δd (α) = 0.8. The time pressure is δtp (α) = 0.9 and the focused attention trajectory is: δf a (α) = ff a (t) = [1 + e(bt−a) ]−1 . The parameter b shrinks the S-shaped curve while a shifts the function to 15 the right. We set b = 100 to model sustained attention at the beginning of α for around 20 seconds, and a to effectively start from the 0 of the time line (x-axis) with attention at high level (1). The function decreases quickly after 20 seconds reaching low levels of around 50 (δT E (α) = 50) seconds which is the estimated time we set for the completion of T . α uses our heuristic formalism as a potential decision-supporting tool useful to generate an index of cognitive effort for each partner: the results obtained are presented in table 2. It may forward remaining sub-tasks in proportion of the elicited agents’ cognitive effort, it may deliver more sub-tasks to agents that showed less cognitive effort (eg. a3 , a4 ) in completing assigned work. Furthermore, α has a knowledge of its partners’ skills, their subjective status and over time it can infer something about their
Cognitive Effort for Multi-agent Systems
65
behaviour. For instance, information about the learning rate may be learnt, as α might assume that its partners, over time, should acquire experience, get more skilled therefore manifesting less cognitive effort in performing similar tasks.
6
Open Issues and Future Challenges
Cognitive effort is a subjective phenomenon and its formalisation for a virtual agent is not a trivial problem. In this paper we tackled the problem by analysing current state-of-the-art in psychology, cognitive and neuro-science to build a formalism that is extensible and open to further refinements. The heuristic proposed here can be embedded in an artificial agent providing it with a cognitive effort-based decision supporting system. The computational model is an aggregation of a subset of the possible presumptions or factors influencing cognitive effort such as cognitive ability, arousals, intentions, contextual bias, perception and time. We intend this to be the starting point of an active discussion among researchers in social and computer science fields. In this work we have considered each factors’ influence being the same but a simple aggregation is not subtle enough to provide good estimates of cognitive effort. Argumentation theory provides a framework for systematic studying how cognitive effort influencing factors may be combined, sustained or discarded in a computable formalism towards a robust approximation of the concept. In our opinion, cognitive effort shares some of the properties of a non-monotonic concept by which we mean that adding a factor to the overall formalism never produces a reduction of its set of consequences [2]. Adding a new argument and reasoning on its plausibility/combination with previous ones increases the robustness of the overall formalism. A new factor may attack or support an existing one therefore amplifying or diminishing its strength. The consideration of mutual relationships among arguments is fundamental in assessing an index of cognitive effort, therefore a future challenge might be the investigation of the strength of each argument and their mutual influence by using non-monotonic logics such as the defeasible reasoning semantic proposed by Pollock [13]. It remains to demonstrate this aspect of computation of cognitive effort. In terms of evaluation, popular frameworks such as the NASA-TLX [8] and SWAT [14] may be useful for comparisons. Furthermore, our framework, as conceived to be open and adaptable to different contexts, may be applied in operational environment and, for instance, populated by physiological-based argument related to neuro-science equipment such as fMRI, EEG and other types of physiological scanner.
References 1. Barrick, M.R., Mount, M.K., Strauss, J.P.: Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology 78(5) (1993) 2. Brewka, G., Niemel, I., Truszczynski, M.: Nonmonotonic reasoning. In: Handbook of Knowledge Representation, pp. 239–284 (2007)
66
L. Longo and S. Barrett
3. Carroll, J.B.: Human Cognitive Abilities: A survey of Factor-analytic Studies. Cambridge Uni. Press, Cambridge (1993) 4. Carver, C.S., Scheiner, M.F.: On the Self-regulation of Behavior. Cambridge University Press, UK (1998) 5. Dickens, T.W.: New Palgrave Dictionary of Economics. In: Cognitive Ability (forthcoming) 6. Fried, Y., Slowik, L.H.: Enriching goal-setting theory with time: An integrated approach. Academy of Management Review 29(3), 404–422 (2004) 7. Gopher, D., Braune, R.: On the psychophysics of workload: Why bother with subjective measures? Human Factors 26(5) (1984) 8. Hart, S.G., Staveland, L.E.: Development of nasa-tlx (task load index): Results of empirical and theoretical research. In: Human Mental Workload, pp. 139–183 (1988) 9. Huey, F.M., Wickens, C.D.: Workload transition: Implications for Individual and team performance. National Academy Press, Washington (1993) 10. Humphreys, M.S., Revelle, W.: Personality, motivation and performance: A theory of the relationship between individual differences and information processing. Psychological Review 91(2) (1984) 11. Kahneman, D.: Attention and Effort. Prentice Hall, NJ (1973) 12. Miniussi, C., Wilding, E.L., Coull, J.T., Nobre, A.C.: Orienting attention in time: Modulation of brain potentials. Brain 122 (8) 13. Pollock, J.L.: Cognitive Carpentry. A blueprint for How to Build a Person. MIT Press, Cambridge (1995) 14. Reid, G.B., Nygren, T.E.: The subjective workload assessment technique: A scaling procedure for measuring mental workload. In: Human Mental Workload, pp. 185– 218 (1988) 15. Salthouse, T.: When does age-related cognitive decline begin? Neurobiology of Aging 30(4), 507–515 (2009) 16. Sun, R.: Duality of the Mind. Lawrence Erlbaum Assoc., NJ (2002) 17. Sun, R., Naveh, I.: Simulating organizational decision-making using a cognitively realistic agent model. Journal of Artificial Societies and Social Simulation 7 (3) (2004) 18. Vroom, V.H.: Work and Motivation. Wiley, NY (1964) 19. Xiaocong, F., John, Y.: Realistic cognitive load modeling for enhancing shared mental models in human-agent collaboration. In: AAMAS 2007 - Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, p. 60 (2007) 20. Yeo, G., Neal, A.: Subjective cognitive effort: a model of states, traits, and time. Journal of Applied Psychology 93(3) (2008) 21. Yerkes, R.M., Dodson, J.D.: The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology 18, 459–482 (1908) 22. Yin, X., Goudriaan, J., Lantinga, E.A., Vos, J., Spiertz, H.J.: A flexible sigmoid function of determinate growth. Annals of Botany 91, 361–371 (2003)
Behavioural Abstraction of Agent Models Addressing Mutual Interaction of Cognitive and Affective Processes Alexei Sharpanskykh and Jan Treur VU University Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands {sharp,treur}@few.vu.nl http://www.few.vu.nl/~{sharp,treur} Abstract. In this paper the issue of relating a specification of the internal processes within an agent to a specification of the behaviour of the agent is addressed. A previously proposed approach for automated generation of behavioural specifications from an internal specification was limited to stratified specifications of internal processes. Therefore, it cannot be applied to mutually interacting cognitive and affective processes described by interacting loops. However, such processes are not rare in agent models addressing integration of cognitive and affective processes and agent learning. In this paper a novel approach is proposed which addresses this issue. The proposed technique for loop abstraction is based on identifying dependencies of equilibrium states for interacting loops. The technique is illustrated by an example of an internal agent model with interdependent processes of believing, feeling, and trusting.
1 Introduction Dynamics of an agent are usually modelled by an internal agent model specifying relations between mental states of the agent. Often such agent models are specified in an executable format following a noncyclic causal graph (e.g., [12]). However, for more complex and adaptive types of agents, such agent models may have a format of dynamical systems including internal loops. Such cyclic interactions are well-known from the neurological and brain research areas. For example, agents in which as-if body loops [5] are used to model the interaction between feelings and other mental states (e.g., [9]). Thus, although the noncyclic graph assumption behind most existing agent models (as, for example in [12] ) may be useful for the design of software agents, it seriously limits applicability for modelling more realistic neurologically founded processes in natural or human-like agents. To perform simulations with agents it is often only the behaviour of the agents that matters, and the internal states can be kept out of the simulation model. Other work shows that automated transformations are possible (1) to obtain an executable internal model for a given behavioural specification (e.g., [13]), and (2) to obtain a behavioural specification from an executable internal model. The approach available for the second type of transformation (cf. [12]) has a severe limitation, as an executable internal model is assumed which has a noncyclic, stratified form. This limitation excludes the approach from being applied to agent models addressing more complex internal processes in which internal loops play a crucial role. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 67–77, 2010. © Springer-Verlag Berlin Heidelberg 2010
68
A. Sharpanskykh and J. Treur
In this paper a more generally applicable automated transformation is introduced from an internal agent model to a behavioural model, abstracting from the internal states. Within this transformation, techniques for loop abstraction are applied by identifying how equilibrium states depend on inputs for these loops. It is also shown how interaction between loops is addressed. In particular for agent models, in which the interaction between cognitive and affective processes plays an important role the proposed approach is useful. Empirical work such as described in, for example, [8, 10], reports such types of effects of emotions on beliefs. From the area of neuroscience informal theories and models have been proposed (e.g., [5, 6]), involving a causal relation from feeling to belief, which is in line, for example, with the Somatic Marker Hypothesis described in [2], and may also be justified by a Hebbian learning principle (cf. [4]). These informal theories have been formalised in an abstracted computational form to obtain internal agent models (e.g., [16]). The transformation is illustrated for two agent models that include interaction between cognitive and affective processes. A single loop case is illustrated for an existing agent model for emotion-affected beliefs, described in [9]. In addition, a novel agent model with interdependent processes of believing, feeling, and trusting is introduced in this paper illustrating a case with two interacting loops. The paper is organised as follows. First, in Section 2 the modelling approach is briefly introduced. Section 3 presents the transformation procedure. The applications of the procedure are described in Section 4. Finally, Section 5 is a discussion.
2 Specifying Internal Agent Models As in [12], both behavioural specifications and internal agent models are specified using the reified temporal predicate language RTPL, a many-sorted temporal predicate logic language that allows specification and reasoning about the dynamics of a system. To express state properties ontologies are used. An ontology is a signature specified by a tuple <S1,…, Sn,…, C, f, P, arity>, where Si is a sort for i=1,.., n, C is a finite set of constant symbols, f is a finite set of function symbols, P is a finite set of predicate symbols, arity is a mapping of function or predicate symbols to a natural number. An interaction ontology InteractOnt is used to describe the (externally observable) behaviour of an agent. It is the union of input (for observations and incoming communications) and output (for actions and outgoing communications) ontologies: InteractOnt = InputOnt ∪ OutputOnt. For example, observed(a, t) means that an agent has an observation of state property a at time point t, communicated(a1, a2, m, v, t) means that message m with confidence v is communicated from agent a1 to agent a2 at time point t, and performing_action(b) represents action b. The internal ontology InternalOnt is used to describe the agent’s internal cognitive state properties. Within the state ontology also numbers are included with the usual relations and functions. In RTPL state properties as represented by formulae within the state language are used as terms (denoting objects). The set of function symbols of RTPL includes ∧, ∨, →, ↔: STATPROP x STATPROP → STATPROP; not: STATPROP → STATPROP, and ∀, ∃: SVARS x STATPROP → STATPROP, of which the counterparts in the state language are Boolean propositional connectives and quantifiers. To represent dynamics of a system sort TIME (a set of time points) and the ordering relation > : TIME x TIME are introduced in RTPL. To indicate that some state property holds at some time point the relation at: STATPROP x TIME is introduced. The terms of
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
69
RTPL are constructed by induction in a standard way from variables, constants and function symbols typed with all before-mentioned sorts. The set of well-formed RTPL formulae is defined inductively in a standard way using Boolean connectives and quantifiers over variables of RTPL sorts. More details can be found in [12]. Agent models are specified within RTPL in the following format: at(a, t) ⇒ at(b, t+d) where d is the time delay of the effect of state property a on state property b, which for dynamical systems is often indicated by Δt. These state properties may involve variables, for example for real numbers. This format subsumes both causal modelling languages (e.g., GARP [8]) and dynamical system modelling languages based on difference or differential equations (e.g., [10]), as well as hybrid languages combining the two, such as LEADSTO [3].
3 Abstraction of an Internal Agent Model: Eliminating Loops In this section first the general transformation procedure as adopted from [12] is described. Next the contributed loop elimination procedure is addressed, starting by discussing the assumptions underlying the procedure, and further showing in more detail how both single loops and interaction between loops can be handled. The general transformation procedure The format at(a, t) ⇒ at(b, t+d) is equivalent to at(a, t-d) ⇒ at(b, t), where t is a variable of sort TIME. When a number of such specifications are available for one atom at(b, t), by taking the disjunction of the antecedents one specification in past to present format can be obtained ∨i at(ai, t-di) ⇒ at(b, t). When in addition a form of closed world assumption is assumed, also the format ∨i at(ai, t-di) ⇔ at(b, t) is obtained, which specifies to equivalence of the state formula b at t with a past formula. This type of format, called pp-format is used in the abstraction procedure introduced in [12]. The rough idea behind the overall procedure is as follows. Suppose a ppspecification B ⇔ at(p, t) is available. Moreover, suppose that in B only two atoms of the form at(p1, t1) and at(p2, t2) occur, whereas as part of the agent model also specifications B1 ⇔ at(p1, t1) and B2 ⇔ at(p2, t2) are available. Then, within B the atoms can be replaced (by substitution) by the formula B1 and B2. This results in B[B1/at(p1, t1), B2/at(p2, t2)] ⇔ at(p, t) which again is a pp-specification. Here for any formula C the expression C[x/y] denotes the formula C transformed by substituting x for y. Such a substitution corresponds to an abstraction step. For the general case the procedure includes a sequence of abstraction steps; the last step produces a behavioural specification that corresponds to the given agent model. Assumptions underlying the loop elimination approach 1. Internal dynamics develop an order of magnitude faster than the dynamics of the world external to the agent. 2. Loops are internal in the sense that they do not involve the agent’s output states. 3. Different loops have limited mutual interaction; in particular, loops may contain internal loops; loops may interact in couples; interacting couples of loops may interact with each other by forming noncyclic interaction chains. 4. For static input information any internal loop reaches an equilibrium state for this input information.
70
A. Sharpanskykh and J. Treur
5. It can be specified how the value for this equilibrium state of a given loop depends on the input values for the loop. 6. In the agent model the loop can be replaced by the equilibrium specification of 4. The idea is that when these assumptions are fulfilled, for each received input, before new input information arrives, the agent computes its internal equilibrium states, and based on that determines its behaviour. Loop elimination setup To address the loop elimination process, the following representation of a loop is assumed at(has_value(u, V1) ∧ has_value(p, V2), t) ⇒ at(has_value(p, V2 + f(V1, V2)d), t+d)
(1)
here u is the name of an input variable, p of the loop variable, t is a variable of sort TIME, and f(V1, V2) is a function combining the input value with the current value for p. Note that an equilibrium state for a given input value V1 in (1) is a value V2 for p such that f(V1, V2) = 0. A specification of how V2 depends on V1 is a function g such that f(V1, g(V1)) = 0. Note that the latter expression is an implicit function definition, and under mild conditions (e.g., ∂f(V1, V2)/∂V2 ≠ 0, or strict monotonicity of the function V2 → f(V1, V2)) the Implicit Function Theorem within calculus guarantees the existence (mathematically) of such a function g. However, knowing such an existence in the mathematical sense is not sufficient to obtain a procedure to calculate the value of g for any given input value V1. When such a specification of g is obtained, the loop representation shown above can be transformed into: at(has_value(u, V1) ⇒ at(has_value(p, g(V1)), t+D),
where D is chosen as a timing parameter for the process of approximating the equilibrium value up to some accuracy level. To obtain a procedure to compute g based on a given function f, two options are available. The first option is, for a given input V1 by numerical approximation of the solution V2 of the equation f(V1, V2) = 0. This method can always be applied and is not difficult to implement using very efficient standard procedures in numerical analysis, taking only a few steps to come to high precision. The second option, elaborated further below is by symbolically solving the equation f(V1, V2) = 0 depending on V1 in order to obtain an explicit algebraic expression for the function g. This option can be used successfully when the symbolic expression for the function f is not too complex; however, it is still possible to have it nonlinear. In various agent models involving such loops a threshold function is used to keep the combined values within a certain interval, for example [0, 1]. A threshold function can be defined, for example, in three ways: (1) as a piecewise constant function, jumping from 0 to 1 at some threshold value (2) by a logistic function with format 1/(1+exp(-σ(V1+ V2-τ)), or (3) by a function β (1-(1- V1)(1- V2)) + (1-β) V1 V2. The first option provides a discontinuous function, which is not desirable for analysis. The third format is used here, since it provides a continuous function, can be used for explicit symbolic manipulation, and is effective as a way of keeping the values between bounds. Note that this function can be written as a linear function of V2 with coefficients in V1 as follows:
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
71
f(V1, V2) = β(1-(1- V1)(1- V2)) + (1-β) V1 V2 – V2 = - [(1- β)(1- V1) +β V1 ] V2 + β V1 From this form it follows that ∂ f(V1, V2) /∂ V2 = ∂ -[[(1- β)(1- V1)
+β V1 ] V2 + β V1]/∂ V2 = - [(1- β)(1- V1) +β V1 ] ≤ 0
This is only 0 for extreme cases: β = 0 and V1 = 1 or β = 1 and V1 = 0. So, for the general case V2 → f(V1, V2) is strictly monotonically decreasing, which shows that it fulfills the conditions of the Implicit Function Theorem, thus guaranteeing the existence of a function g as desired. Obtaining the equilibrium specification: single loop case Using the above expression, the equation f(V1, V2) = 0 can be easily solved symbolically: V2 = β V1 / [(1- β)(1- V1) +β V1 ]. This provides an explicit symbolic definition of the function g: g(V1) = V2 = β V1 / [(1- β)(1- V1) +β V1 ]. For each β with 0 Pe ( I ′′ ), ∀ q ∈ EP , if Pi ( I ) > Pe ( I ) ⎨ ⎩ Pi ( I ′ ) > Pi ( I ′′ ), ∀ q ∈ EP , if Pi ( I ) < Pe ( I )
(30)
where I, I’ and I’’ are the gray levels of the voxels x ,x’ and q respectively, Then we calculate Pbridge(x) using the formula (29). The data consistency term SF at point x belonging to the interface is defined as a decreasing function of Pbridge(x), and given by:
SF = g b ( Pbridge ( x I , ω ))
(31)
The decreasing function gb is given by the following formula: ⎧⎪1 − 3 x 2 if x < 0 . 5 ∀ x ∈ [ 0 ,1], g b ( x ) = ⎨ 2 ⎪⎩ 3(1 − x ) else
(32)
4 Experiments and Results In order to test the BGFMM for the segmentation of the brain structures several series of experiments were made on varied data bases, also quantitative measurements were performed on a data base comprising a manual segmentation used as reference(ISBR). For all the series of the experiments we have initialised the number of class for the FCMA method to 5 for detecting the five regions (gray matter, white matter, cerebrospinal fluid and the two other classes constituting of voxels corresponding to the partial volume (WM-GM) and (GM-CSF).
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
165
Fig. 2. Results of the segmentation of WM (first line), GM (second line) and CSF (third line of the image). The left images represent the 3D view of the results.
(A)
(B)
Fig. 3. Segmentation of (a) the whole brain and (b) the ventricles using the BGFMM
First, the method was applied to the segmentation of a particular class of tissue (white matter, grey matter, cerebro-spinal fluid). For this purpose, we have initialized the surface with a small cube located inside the tissue. An example of the results is visible on figure 2 (In order to improve the visibility of the results a white mask was superimposed on the surface of the tissues). We have also evaluated the method by computing the overlapping index DSC (Dice Similarity Coefficient) between the manual segmentation of the three tissues provided with the IBSR base, considered as ground truth, and our results. The values of similarity obtained for the 18 volumes of this base treated for each tissue are presented on the curves of figure 4(A). In another series of experiments, we have segmented the whole brain, and the ventricles. For the whole brain, the segmentation was initialized with a cube 100×80 ×80, and the classes
166
M. Baghdadi, N. Benamrane, and L. Sais
Fig. 4. Values of DSC for the segmentation of: (A) white matter, grey matter, and CSF. (B) whole brain, and the ventricles.
of interest (grey matter + white matter) automatically determined as described in section (3.4). An example of the results is visible on figure 3. The quantitative results are presented on figure 4(B). The method gives satisfying results for the segmentation of the three brain tissues (WM, GM, CSF) of the whole of the bases, although more or less marked according to subjects. Also for the brain and the ventricles, in particular the furrows of the brain and the contours of the ventricles are well segmented. It remains that the segmentation of the ventricles is not always perfect expressed by a low value of DSC. This can be explained by the fact that the ventricles are structures of small size what is reflected on the values of the DSC. Also the characteristics of the base have an influence on the results because the dynamics of the intensity is reduced even very reduced for certain volumes of the ISBR base.
5 Conclusion We have proposed in this paper a new approach based on the Deformable model and on the Bayesian model for the segmentation of 3D medical images. This new model was applied more specifically to brain MRI volumes for the segmentation of the brain structures. This approach is divided into two parts. Initially, a preliminary stage allows constructing the information map. Then, a deformable model, implemented with the Generalized Fast Marching Method (GFMM), evolves towards the structure to be segmented. Our contribution consists of the use and the improvement of the GFMM for the segmentation of 3D images and the design of a robust evolution model based on adaptive parameters. In a future work we expect to enrich this evolution model with some a priori knowledge (expert, anatomical atlas, form models, space relations...) to improve the performances of the method and also to extend it to more difficult and complicated applications.
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
167
References 1. Bezdek, J.C.: A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Transaction, Pattern Analysis. Machine Intelligence 2(1), 1–8 (1980) 2. Carlini, E., Falcone, M., Forcadel, N., Monneau, R.: Convergence of a Generalized Fast Marching Method for a non-convex eikonal equation (2007) 3. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 4. Chen, X., Teoh, E.K.: 3D object segmentation using B-Surface. Image and Vision Computing 23(14), 1237–1249 (2005) 5. Forcadel, N.: Comparison principle for the generalized fast marching method. In: ENSTA, Mars 20 (2008) 6. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1(4), 321–331 (1987) 7. Lynch, M., Ghita, O., Whelan, P.F.: Left-ventricle myocardium segmentation using a coupled level-set with a priori knowledge. Computerized Medical Imaging and Graphics 30, 255–262 (2006) 8. Malladi, R., Sethian, J.A., Vemuri, C.: Shape modelling with front propagation: a level set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), 158– 175 (1995) 9. Osher, S., Sethian, J.A.: Fronts propagating with curvature dependant speed: Algorithms based on Hamilton-Jacobi formulation. Journal of Computational Physics 79, 12–49 (1988) 10. Sethian, J.: Level set methods and fast marching methods. In: Evolving interfaces in computational geometry, fluid mechanics, computer vision and material science. Cambridge University Press, Cambridge (1999) 11. Xiao, D., Sing, W., Charles, N., Tsang, B., Abeyratne, U.R.: A region and gradient based active contour model and its application in boundary tracking on anal canal ultrasound images. Pattern Recognition 40(12), 3522–3539 (2007)
Domain-Specific Modeling as a Pragmatic Approach to Neuronal Model Descriptions Ralf Ansorg1 and Lars Schwabe2 1
2
Technische Universität Berlin, Dept. of Electrical Engineering and Computer Science, 10623 Berlin, Germany Universität Rostock, Dept. of Computer Science and Electrical Engineering, Adaptive and Regenerative Software Systems, 18051, Germany
Abstract. Biologically realistic modeling has been greatly facilitated by the development of neuro-simulators, and the development of simulatorindependent formats for model exchange is the subject of multiple initiatives. Neuronal systems need to be described at multiple levels of granularity, and compared to other such multi-level systems they also exhibit emergent properties, which are best described with computational and psychological terminology. The links between these levels are often neither clear in terms of concepts nor of the underlying mathematics. Given that modeling and simulation depends on explicit formal descriptions, we argue that rapid prototyping of model descriptions and their mutual relations will be a key to making progress here. Here we propose to adapt the paradigm of domain-specific modeling from software engineering. Using the popular Eclipse platform, we develop the modular and extensible NeuroBench1 model and showcase a toolchain for code generation, which can also support, mediate between, and complement ongoing initiatives. This may kick-start the development of a multiplicity of model descriptions, which eventually may lead to ontologically sound multi-level descriptions of neuronal systems capturing neuronal, computational, and even psychological and social phenomena.
1
Introduction
Publications of simulation studies of neuronal systems need to contain enough details about the model and its parameters in order to allow for re-implementing the model, but due to the high level of detail such re-implementations are often tedious and time-consuming. Sharing models in terms of scripts for established simulators like NEURON, GENESIS or NEST has been a major step towards model exchange. Recently, these efforts have been extended by the development of simulator-independent model descriptions like generating models via Python scripts [8], renewed interest in NeuroML [3], and efforts of the International Neuroinformatics Coordination Facility (INCF) to develop an open standard for neuronal model descriptions. Compared to systems biology, however, corresponding efforts in computational neuroscience are less developed in terms of model 1
See http://www.neurobench.org
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 168–179, 2010. c Springer-Verlag Berlin Heidelberg 2010
Domain-Specific Modeling as a Pragmatic Approach
169
exchange [9], because a standard as widely accepted as the Systems Biology Markup Language (SBML) [10] is still missing. This advantage of systems biology could be partly traced back to the success of bioinformatics, which has always been a heavily data-driven enterprise depending on common standards and formats. Nowadays, most major publications of models in systems biology provide a model description in SBML, but most publications of models in computational neuroscience do not provide any machine readable description or executable code. Is this solely due to the fact that neuroinformatics matured after computational neuroscience, as compared to systems biology becoming more popular after bioinformatics? Here we argue that computational neuroscience is facing a few challenges, which are seldom considered in systems biology. For example, the notion of the “biological function” of a certain “neuronal computation” is omnipresent in most computational neuroscience publications, whereas the focus on quantitative models in systems biology so far masked the possible computational nature of many subcelluar processes (but see [14]). Here we identify and focus on four desirable properties of neuronal model descriptions: First, they should be usable by domain experts like mainly experimentally working neuroscientists, not only neuroinformatics professionals. Second, they should allow for formulating models at different levels of granularity. Examples include quantitative models of synaptic plasticity [12] at a very fine-grained spatial and temporal scale, which may include receptor trafficking, binding mechanisms, or the movements of vesicles. Third, they should allow for formulating models at different levels of abstraction. Examples include phenomenological models of synaptic plasticity [13] without an explicit link to the underlying molecular mechanisms, or computational models aiming at teleological explanations based on first principles derived within domains outside biophysics like information theory, statistical inference, or decision and game theory. Finally, they should allow for formulating explicit links between different ontological spheres. Examples include the link between population activity (“electrical discharges”) and computations hypothesized to take place in sensory systems (“algorithms and computations”), or ultimately the link between the electrical neuronal activations and cognitive phenomena like perception and imagery at the phenomenal level. How to obtain and agree upon a single model description standard with these properties? On the one hand, proper tool support is an essential factor determining the “success” of any new standard. On the other hand, properties of a standard like modularity and extensibility determine the nature of the software ecosystem developing around it. For example, a large and still growing ecosystem has been developed around SBML, and the availability of toolboxes and libraries further facilitates its adoption as a standard for model exchange. Thus, for neuronal model descriptions both the tool support as well as openness and extensibility of the standard need to be considered. Here we argue that a single model description with all four properties is neither achievable nor desirable,
170
R. Ansorg and L. Schwabe
but aiming at many different model descriptions with explicit relations between each other is a promising and feasible approach. More specifically, we suggest to apply the software engineering methodology of domain-specific modeling (DSM) to the modeling of neuronal systems. We suggest to make use of model-to-model transformations to interface between intentionally very specific domain models and more general purpose model descriptions, which are suitable for driving popular neuro-simulators via automatic code generation. Here we develop a modular and extensible model for a general purpose neuronal model description, describe a domain-specific “visual cortex modeling language” as toy-example to showcase model-to-model transformation and code generation using the popular Eclipse platform [7,11]. Our main contribution in this work is to demonstrate that industry-proven methods and tools can be applied in a neuroinformatics setting for the benefit of fostering a communitybased development of a multitude of model descriptions, which acknowledges the multiplicity of “world views” in the neuroscience community, but due to the model-to-model transformations (and finally the code generation) are explicitly related to each other.
2 2.1
Background Domain-Specific Modeling
DSM is a software engineering methodology, where highly specific modeling languages are used in order to generate software via fully automatic code generation directly from the models. This shall be contrasted with the use of general purpose modeling languages like the Unified Modeling Language (UML). The UML defines graphical notations for concepts, which are already present in objectoriented programming languages. While UML certainly eases communication between developers, or software manufacturers and their clients, it does not raise the level of abstraction much above the abstractions already present in the programming languages. In contrast, the so-called domain-specific languages (DSLs) of DSM make use of concepts from the particular application domain. Together with the fully automatic code generation from models expressed in such DSLs, the productivity gain of DSM compared to more classical methods using general purpose languages like UML can be dramatic [17]. Applying DSM calls for redefining the roles of the people involved in software development. A major change in DSM compared to more classical approaches is the separation between a domain expert and the author of the code generators. While the latter is usually a well-experienced programmer with much knowledge about the target platform, the former only needs to be an expert in the application domain. Most important, however, is the active involvement of the domain expert in the development process: Using a DSL, the domain expert is doing the actual development and the “programming” via the code generators as tools. For that reason, the author of the code generators is sometimes referred to as the “toolsmith,” which is the terminology we adopt here.
Domain-Specific Modeling as a Pragmatic Approach
171
Since a DSL has to be developed for each particular application domain, DSM can only be successfully applied with sufficient experience in developing, continuously improving, refining, and adapting the DSLs, where proper tool support is crucial. The former methodological aspect is subject of ongoing research and calls for communicating “best practices” between practitioners. In terms of tool support, multiple commercial (see, e. g., [2,16]) and open source options are available. While individual projects and approaches may differ in terms of technology and in the specifities of the proposed methodology, a common property is their productive use of models beyond plain documentation purposes. Here we use the popular Eclipse platform in order to develop a toolchain, which builds upon well-established object-oriented paradigms, methodologies, and formats like MOF and the OMG’s MDA. 2.2
OMG’s MOF and MDA with QVT
The Object Management Group (OMG) is an international consortium, which now focuses on modeling and the corresponding standards. Currently, the main application is in model-driven architecture (MDA), which is a model-based approach for developing software systems, but the methods and standards are applicable in the broader field of engineering in general (see [4] for a prominent example). One standard proposed by the OMG is the Meta-Object Facility (MOF), which is a DSL to define metamodels, such as the metamodel for the UML. More specifically, MOF proposes a multi-layered architecture: Objects in the real world have a one-to-one correspondence to objects in the runtime environment (M0-model), which are instances of classes defined in a domain-specific model (M1-model). For example, individual book objects are entities in M0models, whereas the class book is defined in an M1-model. The domain-specific M1-models are expressed using structures defined in the layer above (M2-model or metamodel). The most frequently used metamodel is the UML, which specifies how classes and relations between classes in M1-models are defined. For example, the UML metamodel states that classes have methods and attributes. Other metamodels than UML could specify how, for example, other M1-models like Petri-nets or tables in relational databases are defined. In a similar manner, the MOF specifies how M2-models are defined. For that reason, the MOF is a M3-model and can viewed as a meta-metamodel or a DSL for describing metamodels. A major technological contribution of the OMG is the specification of the XML Metadata Interchange (XMI) format. It was initially developed in order to exchange metadata and hence became a de facto standard for exchanging UML models, but it can also be used in order to exchange instances of models as well. Note that, for example, a particular UML model is nothing but an instance of the UML metamodel. In order to use models in a productive way within MDA, the OMG specified the Query View Transformation (QVT). Using QVT, a transformation from a source to a target model can be defined and executed without user intervention. Hence, QVT is a candidate for a key tool in a DSM toolchain. However, the extent to which DSM shall make use of such model-to-model transformations
172
R. Ansorg and L. Schwabe
compared to a one-shoot transformation from a high-level description into code is subject of an ongoing debate [17].
3 3.1
Proposal for the Extendable NeuroBench Model The Core of the NeuroBench Model
Here we propose the NeuroBench model as a modular and extensible model for describing neuronal models. To be specific in terms of terminology: We propose to follow the OMG’s approach to MDA and develop an M1-model with instances of this model corresponding to descriptions of neuronal models. The core of this model is shown in Fig. 1. Root objects of model descriptions are of class ModelDesc, which contain general information such as name, version, or author. The actual model elements are contained within the root object and are of class ModelElement. Subclasses of ModelElement may define specific models like, for example, a single compartment Hodgkin-Huxley neuron model. Most important, however, is the explicit distinction between ModelElement and Factory, which itself is a ModelElement. Model-to-model transformations and code generators need to operationalize this distinction as follows: When processing a NeuroBench model description, only objects of class Factory shall trigger the instantiation of Factory.copies objects in the destination model or the generation of code, based upon the ModelElement referenced by the Factory as template. While such a constructive approach is part of other neuronal model descriptions [3] in order to allow for constructing, for example, large populations of model neurons without enumerating them, a key advantage of our model is that every Factory is also a ModelElement. This allows a very compact compositional description of neuronal models of even very large neuronal structures like populations of populations of neurons, which in a certain context may correspond to a functionally defined neocortical brain area (see Sec. 5 for examples). Besides referencing a ModelElement as a template, a Factory may or may not have associated labelings (class Labeling) for the to-be-created objects. Such labelings can be used to attach spatial positions or other non-physiological attributes to the to-be-created objects, which can be used when establishing connections between them. As to whether such a labeling shall be created for each processing of a Factory (like randomly assigning spatial positions to each created model neuron) or shared between multiple invocations for object creation is represented in the isGlobal attribute. A Factory as well as a ModelElement can be compositional (CompositeFactory and CompositeModelElement, the latter not shown in Fig. 1), which serves mainly to structure larger model descriptions. Another key property of the core model is the specification of state variables of model elements. Attributes of subclasses of ModelElement shall be viewed as constants like the membrane time constant in a possible subclass IaFNeuron, which could represent an integrate-and-fire model neuron. However, the membrane potential of a model neuron is certainly a state variable, which needs to be allocated for each created model instance. Our core model separates the definition of such state variables into the specification and the declaration. For
Domain-Specific Modeling as a Pragmatic Approach
173
example, a variable specification named “membrane_potential” (VarSpec.name) together with the type float (or an SI unit in a future revision of the core model) would be contained as a VarSpec in the root object, which is then available for the whole model description. Each ModelElement contains declarations (ModelElement.vars) for the state variables, which contain an Initializer. Initializers shall drive code generators such that state variables are set to initial values. Note that code generators can also make the initialization be dependent on the labelings attached to a Factory. We make use of this in the examples in Sec. 5. Connections between ModelElement objetcs are represented by a DirectedLink. While such a connection could correspond to a synaptic connection between model neurons, a DirectedLink could also represent a link between two compartments of a multicompartment neuron model, or a whole projection pattern between two populations of neurons. In the latter case, the source and destination objects would be of class Factory having a model neuron, i. e. a subclass of ModelElement, as the template.
ModelDesc name : EString varspecs
author : EString
VarSpec name : EString
version : EString
0..*
type : vartype 1 varspec
elements 0..* VarDecl
vars ModelElement template 1 labelings
0..*
1 src
name : EString comment : EString
1 dst
0..*
1
Labeling name : EString type : vartype
initializer
1 Initializer
modelfactory 0..*
isGlobal : EBoolean
Factory
DirectedLink
copies : EInt labelings 0..* factories
CompositeFactory
Fig. 1. Core of the NeuroBench model for neuronal model descriptions. The root object of any model description is of class ModelDesc, and the ModelElements describe the individual model elements in greater detail. Particular neuron or synapse models are subclasses of ModelElement and DirectedLink, respectively. A key property of the core model is to consider a Factory of model elements as a ModelElement itself, which allows for compact compositional descriptions of even very large neuronal structures as a Factory’s template can also be a Factory. See text for a detailed explanation.
174
3.2
R. Ansorg and L. Schwabe
Extensions for Neurons, Synapses and Connectivity
Extending the core model with particular models for neurons and synapses is straightforward: For specialized neuron models and synapses the ModelElement or DirectedLink class need to be extended, respectively. The details of such subclasses are not of interest here as they could be defined on-demand using the definition of a particular target simulation platform, referring to external definitions in other standards like SBML, or using the “lowest common denominator approach” of PyNN [8]. Note that if a particular model does not call for any constants, there is also no need to subclass ModelElement of DirectedLink. While we have subclassed ModelElement into a LinearRateNeuron with threshold and gain as constants in order to describe firing rate models, we attached a state variable called Weight to a DirectedLink in order to describe the coupling between such rate-based neuron models, i. e. no subclassing of DirectedLink is necessary in such cases, but for kinetic synapse models one may want to store the transition rates between states of synaptic channels as constants in a proper subclass of DirectedLink. So far, the support of the NeuroBench model for connectivity is kept intentionally minimal, but it still allows for DSM with full code generation. The model defines a FullDirectedLinkFactory subclass of Factory and demands that the corresponding template of this factory is a DirectedLink. In other words, such a special factory shall trigger the creation of DirectedLink objects, which connect model elements. The core model also defines a subclass ToolsmithDefinedInitializer of Initializer with funname (of type String), as well as arg1, arg2, etc. (of type Double) as attributes. If an initializer for the state variables of such a DirectedLink (the template of a FullDirectedLinkFactory) is a ToolsmithDefinedInitializer, then the code generator shall delegate the initialization to a function funname of the desired target platform with optional arguments arg1, arg2, etc. Here, the code generator may or may not pass the values of the labelings as additional arguments. We defined all extensions in separate models, which import the NeuroBench core model (Fig. 1). In the same manner, other users can extend the core model or our extensions and develop their own code generators.
4 4.1
An Eclipse-Based Toolchain The Eclipse Platform
DSM claims to raise the level of productivity in software development similar to the move from assembler to higher programming languages [17]. While such productivity increases depend on a few tool-independent abstractions, proper tool support is essential for DSM. While the OMG’s approach to MDA is only one among other approaches (see, e. g., [6,16]), building upon OMG standards such as MOF and XMI prevents re-inventing some wheels and ensures an interoperability among tools and libraries. The popular Eclipse platform makes intensive use of OMG standards, and we selected it as the ecosystem for the NeuroBench model
Domain-Specific Modeling as a Pragmatic Approach
175
and toolchain. In short, Eclipse has been originally developed by IBM in order to unify all its development environments on a single code base. While it is widely known as an Integrated Development Environment (IDE), it provides a whole ecosystem for software and system development purposes, now based upon the Equinox OSGi code. As an IDE it supports multiple target programming languages, and as a DSM infrastructure it supports any target platform. 4.2
M2M and M2T in the Eclipse Modeling Project
The Eclipse Modeling Project [1] brings together multiple Eclipse-based technologies for MDA. The NeuroBench toolchain has been set up using the M2M and M2T subprojects. The M2M subproject realizes so-called model-to-model transformations. For example, a model of decision making (defined with computational concepts) would be translated into a neuronal network model (defined with biophysical concepts) using M2M’s technologies. We make use of QVT for such transformations. The M2T subproject is used for model-to-text transformations, i. e. for code generation. We make use of Xpand for these tasks. All NeuroBench software artifacts, together with a short tutorial-like introduction for applying them to neuronal modeling, can be downloaded from www.neurobench.org/publications/braininf2010.tgz as supplementary online matreial (SOM).
5 5.1
Examples Recurrently Connected Rate-Based Neurons
Let us consider a first example in order to demonstrate how neuronal models can be expressed using the NeuroBench model. Here we consider the structural description of a simple recurrent network model with, say, N = 191 excitatory neurons (Fig. 2a). Each neuron has an activity state variable, and with each neuron i a one-dimensional position xi is associated. The N = 191 neurons shall be equally spaced between the positions −10 and 10. The neurons shall be recurrently connected, where the weight of the synaptic connection between any two neurons is computed by a user-defined function, which may depend on the positions of the neurons. Fig. 2b shows an instance of the corresponding NeuroBench model as an object diagram. We defined a LinearRateNeuron as a subclass of ModelElement, and a LinspaceLabeling as a subclass of Labeling. The former represents a ratebased neuron with a threshold-linear activation function, the latter the particular labeling strategy of equal spacing. The corresponding XMI-file storing these objects has been created and edited with an editor generated from the NeuroBench model using the Eclipse modeling framework (see SOM). First, note the strict usage of Factory objects, each having their own template. While the CompositeFactory referenced by the root object serves to structure the model description and does not have a template itself, the two contained Factory objects reference a ModelElement, i. e. a LinearRateNeuron and a SynapticConnection (subclassed from DirectedLink ). Second, note the separation of
176
R. Ansorg and L. Schwabe a) ...
...
N=191 exc. neuron
b) Ex01:ModelDesc name = “ex01”
MainFac:CompositeFactory modelfactory name = “mainfac”
ExcNeurons:Factory
ExcToExcPrj:FullDirectedLinkFactory
name = “ExcNeurons” copies = 191
name = “ExcToExcPrj”
Weight:VarSpec name = “Weight” type = float
template
template
varspec
labelings
Position:LinspaceLabeling
ExcToExc:SynapticConnection
name = “Position” min = -10 max = 10 type = float isGlobal = true
name = “ExcToExc”
src
dst
ExcNeuron:LinearRateNeuron
VD2:VarDecl
name = “ExcNeuron” gain = 1.0 th = 1.0
VD1:VarDecl
MyWeightInitFun:ToolsmithDefinedInitializer name = “myWeightInitFun” arg1 = 3.0 arg2 = 1.0 arg3 = 19.0 defaultValue = 0.0
varspec
Init:Initializer defaultValue = 0.0
Activity:VarSpec name = “Activity” type=float
Fig. 2. Example of a network with N = 191 recurrently connected neurons. a) Illustration of the network model with neurons as circles and recurrent synaptic connections. b) Instance of the corresponding NeuroBench model as an object diagram. Not shown is the containment of the two VarSpec’s in the ModelDesc.
the variable specifications and declarations. Each of the ModelElement objects declares a state variable (“Activity” and a “Weight”). Third, note that the initialization of the state variables is delegated to Initializer objects. In order to showcase code generation within the proposed DSM approach, we set up socalled Xpand templates in order to transform such a NeuroBench model into proper Matlab code. Listing 1 shows the definition and initialization of the state variables, and a different set of templates was used to generate C code (see SOM). Variables of ModelElement and DirectedLink became vectors and matrices, respectively. Code generation for neuro-simulators would be even more straight-forward as they already provide many domain-specific abstractions for general purpose neuronal modeling. In a second example we extended the model shown in Fig. 2b in order to showcase how small changes to a NeuroBench model description together with the very same code generation templates can be used in order to model large neuronal structures such as a sheet of recurrently connected orientation hypercolumns in a model of orientation tuning and contextual effects in primary visual cortex (V1) (see, e. g., [15]). In addition to another input state variable each neuron has two associated labels, an orientation bias corresponding to a “preferred
Domain-Specific Modeling as a Pragmatic Approach
177
orientation” of an orientation-selective V1 neuron and a position. Most important, however, is the creation of neurons via a factory of factories. Here, the Factory “ExcSheet” (with ExcSheet.copies=25) has another Factory “ExcNeurons” (with ExcNeurons.copies=80) as a template. The FullDirectedLinkFactory “ExcToExcHCs” has ExcToExcHCs.src = ExcToExcHCs.dst = ExcSheet. It connects a factory of factories with another factory of factories, which is possible due to Factory being a ModelElement itself. As toolsmiths we exploited this in the code generation templates and generated the code in listing 2. Note that the values of the lables are passed to the user-defined function initializing the weights. Listing 1. Matlab code generated from the first example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
m Ex cN eurons _ A ct i v i t y = z e r o s ( 1 9 1 ) ; f or iExcNeurons = 1 : 1 9 1 m Ex cN eurons _ A ct i v i t y ( i E x c N e u r o n s ) = 0 . 0 ; end mExcToExcPrj_Weight = z e r o s ( 1 9 1 , 1 9 1 ) ; f o r i D s t_ Ex cN eurons = 1 : 1 9 1 vLabelsDst = [ ] ; v L a b e l s D s t = [ v L a b e l s D s t v E x c N e u r o n s _ P o s i t i o n ( i D s t_ Ex cN eurons ) ] ; f or iSrc_ExcNeurons = 1 : 1 9 1 vLabelsSrc = [ ] ; v L a b e l s S r c = [ v L a b e l s S r c vExcNeurons_Position ( iSrc_ExcNeurons ) ] ; mExcToExcPrj_Weight ( i D s t_ Ex cN eurons , i S r c _ E x c N e u r o n s ) = . . . m y Wei ghtI ni tFun ( v L a b e l s D s t , v L a b e l s S r c , 3 . 0 , 1 . 0 , 1 9 . 0 ) ; end end
Listing 2. Matlab code generated from the second example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
N_EXCNEURONS = 8 0 ; % L a b e l i n g s f o r ’ ExcNeurons ’ vExcNeurons_ OrientationBias = . . . l i n s p a c e ( −90. 0 , 9 0 . 0 , N_EXCNEURONS ) ; vExcNeurons_Position = . . l i n s p a c e ( 2 3 . 0 , −42. 0 , N_EXCNEURONS ) ; % S t a t e v a r i a b l e s f o r ’ ExcNeurons ’ m Ex cSheet_ Ex cN euron s _ A ct i v i ty = z e r o s ( 2 5 , 8 0 ) ; mExcSheet_ExcNeurons_Input = z e r o s ( 2 5 , 8 0 ) ; for iExcSheet = 1: 25 f or iExcNeurons = 1 : 8 0 m Ex cSheet_ Ex cN eurons _ A c ti v i t y ( i E x c S h e e t , i E x c N e u r o n s ) = 0 . 0 ; mExcSheet_ExcNeurons_Input ( i E x c S h e e t , i E x c N e u r o n s ) = 0 . 0 ; end end % S t a t e v a r i a b l e s f o r ’ mExcToExcHCs ’ mExcToExcHCs_Weight = z e r o s ( 2 5 , 8 0 , 2 5 , 8 0 ) ; f or iDst_ExcSheet = 1 : 2 5 f o r i D s t_ Ex cN eurons = 1 : 8 0 vLabelsDst = [ ] ; vLabelsDst = [ vLabelsDst ] ; % f o r ’ E x c Sh e e t ’ vLabelsDst = [ vLabelsDst v E x c N e u r o n s _ O r i e n t a t i o n B i a s ( i D s t_ Ex cN eurons ) . . . v E x c N e u r o n s _ P o s i t i o n ( i D s t_ Ex cN eurons ) ] ; % f o r ’ ExcNeurons ’ f or iSrc_ExcSheet = 1 : 2 5 f or iSrc_ExcNeurons = 1 : 8 0 vLabelsSrc = [ ] ; vLabelsSrc = [ vLabelsSrc ] ; % L a b e l s f o r ExcSheet vLabelsSrc = [ vLabelsSrc v Ex cN eurons _ Ori entationBias( iSrc_ExcNeurons ) . . . v E x c N e u r o n s _ P o s i t i o n ( i S r c _ E x c N e u r o n s ) ] ; % f o r ’ ExcNeurons ’ mExcToExcHCs_Weight ( . . . i D s t_ Ex cSheet , i D s t_ Ex cN eurons , . . . iSrc_ExcSheet , iSrc_ExcNeurons ) = . . . m y O r i D i f f I n i t ( vLabelsDst , v Label s Src , 1 0 . 0 , 0 . 0 , 0 . 0 ) ; end end end end
178
5.2
R. Ansorg and L. Schwabe
Model-to-Model Transformation
The second example suggests that the use of a Factory object as a template of another Factory together with CompositeFactory objects for structuring larger models may be suffient for building and exchanging models of large neuronal structures. However, while modeling and code generation using the NeuroBench model is already powerful and due to customizable Xpand templates flexible, true DSM shall raise the level of abstraction by making use of highly specific DSLs. In order to highlight this disctinction, we set up a toy DSL for modeling in the visual system (called “VDSL”), which contains only a single construct: a class OrientationHypercolumn with the single attribute cols to define the number of orientation columns within an orientation hypercolumn. This class is not related to any class in the NeuroBench model, because the latter is intended for general purpose modeling, whereas the VDSL shall be used by a visual system domain expert. More specifically, QVT scripts (see SOM) translate each instance of an OrientationHypercolumn object into a Factory with a LinearRateNeuron as a template, proper recurrent synaptic connections, etc.
6
Discussion
Here we argued for adopting DSM for developing neuronal model descriptions. We developed the NeuroBench model for general purpose neuronal modeling, and we presented a toolchain based upon the popular Eclipse platform. However, in which ways could the promise of DSM for increased productivity carry over to neuronal modeling, and in which ways is the DSM approach different from ongoing initiatives? Current initiatives are facing the major challenge of converging on common model descriptions and formats, which involves proper abstractions for general purpose modeling. Our NeuroBench model is yet another general purpose model description. We argue that this multiplicity of model descriptions is beneficial for neuroinformatics at this point, because only such a grass-root approach to model descriptions will ultimately yield widely accepted and usable standards. In particular, we select the OMG’s approach to MDA as a conceptual basis, and the popular Eclipse platform as the infrastructure. In other words, our approach is inherently open, because mediating between different model descriptions becomes just another model-to-model or model-to-text transformation. For example, the NeuroML initiative provides XML Schema definitions, which can be readily imported as a model into the Eclipse modeling project, and hence be source or target of model-to-model transformations. Another example is PyNN, which can be considered as a prime target for model-to-text transformations. Thus, the promise of DSM could be fulfilled in neuronal modeling as long as neuroscience modelers finally address a currently underappreciated but intellectually stimulating enterprise: meta-modeling and the explicit formulation of links between different levels of abstraction. We also need to point out that the NeuroBench model has been constructed as a minimal model located at a strategic position within the hierarchy of
Domain-Specific Modeling as a Pragmatic Approach
179
abstractions. It covers only structural aspects, but so far no explicit description of the dynamics of a model. We argue that such aspects could also be delegated to the toolsmith compared to being stated explictly in a model description (although this would be desirable). In other words, we intentionally ignore the definition of the operational semantics of a model, and in this sense our approach is pragmatic. Future extensions of the NeuroBench model and transformations to other more fine- and coarse-grained descriptions as well as connections to modeling approaches rooted in psychology such as, for example, ACT-R [5] will show if the power of DSM can be unleashed in neuronal modeling. In other words, probably the best way to evaluate and compare our approach is to measure increases in productivity for formulating and simulating models spanning multiple levels of granularity and abstraction.
References 1. 2. 3. 4. 5. 6. 7. 8.
9. 10.
11. 12.
13. 14. 15.
16. 17.
The Eclipse Modeling Project, http://www.eclipse.org/modeling/ MetaCase, http://www.metacase.com/ NeuroML, http://www.neuroml.org/ Open System Engineering Environment (OSEE), http://www.eclipse.org/osee/ Anderson, J.R., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological Review 111(4), 1036–1050 (2004) Czarneck, K., Eisenecker, U.W.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley, Reading (2000) Paternostro, M., Merks, E., Steinberg, D., Budinsky, F.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley, Reading (2009) Eppler, J.M., Kremkow, J., Muller, E., Pecevski, D.A., Perrinet, L., Yger, P., Davison, A.P., Bruederle, D.: PyNN: a common interface for neuronal network simulators. Front. Neuroinform. 2 (2008) De Schutter, E.: Why are computational neuroscience and systems biology so separate? PLoS Comput. Biol. 4(5), e1000078 (2008) Hucka, M., et al.: The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003) Gronback, R.C.: Eclipse Modeling Project: A Domain-Specific Language (DSL) Toolkit. Addison-Wesley, Reading (2009) Kotaleski, J.H., Blackwell, K.T.: Modelling the molecular mechanisms of synaptic plasticity using systems biology approaches. Nat. Rev. Neurosci. 11(4), 239–251 (2010) Morrison, A., Diesmann, M., Gerstner, W.: Phenomenological models of synaptic plasticity based on spike timing. Biol. Cybern. 98(6), 459–478 (2008) Regev, A., Shapiro, E.: Cellular abstractions: Cells as computation. Nature 419(6905), 343 (2002) Schwabe, L., Obermayer, K., Angelucci, A., Bressloff, P.C.: The role of feedback in shaping the extra-classical receptive field of cortical neurons: a recurrent network model. J. Neurosci. 26(36), 9117–9129 (2006) Simonyi, C.: Intentional software, http://intentsoft.com/ Tolvanen, J.-P., Kelly, S.: Domain-Specific Modeling: Enabling Full Code Generation. Wiley, Chichester (2008)
Guessing What’s on Your Mind: Using the N400 in Brain Computer Interfaces Marijn van Vliet, Christian M¨ uhl, Boris Reuderink, and Mannes Poel University of Twente, Human Media Interaction, Enschede 7522 NB, NL
[email protected] Abstract. In this paper, a method is proposed for using a simple neurophysiological brain response, the N400 potential, to determine a deeper underlying brain state. The goal is to construct a BCI that can determine what the user is ‘thinking about’, where ‘thinking about’ is defined as being primed on. The results indicate that a subject can prime himself on a physical object by actively thinking about it during the experiment, as opposed to being shown explicit priming stimuli. Probe words are presented that elicit an N400 response which amplitude is modulated by the associative relatedness of the probe word to the object the user has primed himself on.
1
Introduction
Brain Computer Interfaces (BCI) are devices that let a user control a computer program, without any physical movement. A BCI measures the activity within the brain directly, interprets it and sends a control signal to a computer. By actively or passively changing his brain activity, the user can send different control signals and by doing so, operate the system. The effectiveness of a BCI depends highly on the ability to measure relevant processes within the brain and the performance of the classification of the signal. 1.1
Low Level Brain Responses versus High Level Cognitive Processes
Without significant progress in recording and signal processing technology, braincomputer interfaces (BCI) that rely on electroencephalography (EEG) recordings only have access to basic, neurophysiological responses. Examples include the P300 response, event related (de)synchronization (ERD/ERS) and steady state visually evoked potentials (SSVEP). These directly measurable phenomena can be regarded as low level responses. These are manifestations of higher level cognitive processes that are more complex and cannot be directly measured, such as object recognition, intention of movement and visual processing. When measuring the low level responses, most information about specifics of the higher level processing is lost. However, the low level responses can give insight into the higher level brain processes by using probes in a search scheme. By using probes, different possibilities for the high level brain state can be tested until the correct one is found. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 180–191, 2010. c Springer-Verlag Berlin Heidelberg 2010
Guessing What’s on Your Mind: Using the N400 in BCI
181
One example of existing BCI that try to determine a deeper underlying brain state are those that try to determine the memories of the user, usually by exploiting the P300 effect [9]. The P300 potential is linked to an oddball task: when multiple stimuli are presented to the subject, the task related interesting stimulus will elicit a bigger P300 then a uninteresting one, due to a response triggered by increased attention due to recognition. This method allows the user to both consciously control which stimulus to select by focussing on a task and unconsciously be probed by stimuli that trigger a recognition response. The unconscious probes could for instance be used to determine whether a subject looked into a box containing some objects, or not. He would be instructed to look for images of birds in a collection of various photographs. Photographs of birds and those of objects in the box both elicit an enlarged P300 potential in relation to irrelevant photographs [1]. In this paper, the N400 potential is used as the low level brain response to determine a search scheme using probes to determine the high level brain state. While the P300 is related to attention, the N400 has been associated with semantic processing [6]. 1.2
N400
The N400 potential was first discovered by Kutas et al. [5], who were analyzing the Event Related Potential (ERP) of subjects that were reading sentences. They studied the effect of adding words that did not make sense given the preceding ones in a sentence in order to get an insight into brain activity during semantic parsing. To this end, a set of sentences was created of which half the sentences ended in a semantically congruent word (e.g. I drink coffee with milk and sugar ) and half the sentences ended incongruent (e.g. I drink coffee with milk and socks).
Grand averages ERP channel PO3
correct incorrect
6 4 2 0 −2 −4 −6 0
I
drink 1000
coffee 2000
with 3000
milk 4000
and 5000
sugar socks 6000
7000
Time (ms)
Fig. 1. Plot of an ERP recorded on a subject which was shown 7 word sentences. When the last word is shown, there is a distinctive difference between a word that lies in the line of expectation (labeled as correct) and a word that does not (labeled as incorrect).
182
M. van Vliet et al.
The grand average ERP’s, recorded at position PO3, of both classes are shown in fig. 1 for a subject, recorded at position PO3. Each second, a word is flashed on the screen and a recognizable series of ERP components reappears after the onset of each word. One of these components, which appears around 400ms after the word is shown, changes amplitude when the word is unrelated to the rest of the sentence and is called the N400 potential. The N400 has been linked to the concept of priming. Priming is an improvement in performance in a perceptual or cognitive task, relative to an appropriate baseline, which is caused by previous, related experience. In semantic priming, the ability to decode a stimulus carrying meaning is improved by previous exposure to a stimulus with a related meaning. In the experiment described above, the subject is primed on the first six words of a sentence, and the seventh word is either semantically related to the prime or not. It was later discovered that the N400 effect not only occurs in sentences, but it is shown that a whole range of stimuli can be used. The underlying strategy is to first show a prime stimulus and a short time after, show a probe stimulus, where the prime and probe can for example be word–word, image–word, image–image or even sound–word pairs [2][10]. There are two competing theories as to the cause of the N400 potential [12]. The integration view states that the N400 is caused by a difference in difficulty integrating the symbol in a context. This theory is obviously in line with the results of the experiments of Kutas et al. where the last word had to be integrated with the rest of the sentence. The results with word pairs can be explained by regarding the first word as a context where the second word has to be integrated into. The lexical view states that the N400 is caused by a difference in the difficulty of long term memory access. According to the spreading activation model, the activation of a symbol from memory causes nearby symbols to preactivate, making subsequent access of these symbols easier. Findings in [8], in which fMRI is used, and [7], in which MEG is used to localize the N400 effect, suggest that the effect is primarily due to facilitated memory access, which states a case for the lexical view. 1.3
Goals of the Present Research
Like the P300, the N400 potential can give insight into high level brain processes by using probes. In this paper, the possibility is explored of using the N400 potential to determine which one out of several possible objects the user is thinking of. For example, to differentiate between the user thinking about a coffee mug or a tomato. The advantage of using the N400 over the P300 is that the N400 effect will also occur on stimuli that do not correspond completely with the object the subject is thinking of, but also on stimuli that are closely related. This could in the future allow for a system to deploy a binary search algorithm in order to find the target object, allowing for a much larger choice for the subject. For instance, the system can first try to determine whether the object is a living
Guessing What’s on Your Mind: Using the N400 in BCI
183
organism or not, before descending down the search tree, playing a BCI version of 20 questions1 with the user. When showing a probe word, the N400 effect can be used to detect whether the subject was primed on a stimulus related to the probe word or not. Current N400 research does not leave any choice to the subject as to which stimulus he will be primed, so the prime will always be known in advance. If however the subject is allowed to choose his prime and this choice can be detected, the N400 potential will be a useful feature to use in a BCI. This choice presents a problem when showing the priming stimulus: how can a system know which stimulus the subject wants to be primed on? Showing the stimuli corresponding to all the choices will most likely prime the subject for all of them, disallowing any choice. In this paper, a method is investigated in which the subject is not presented with a priming stimulus, but must achieve the proper priming by actively thinking about a physical object. When a user has primed himself on an object, showing probe words corresponding with all the possibilities and examining the N400 potential elicited by them may enable an automated system to determine which object the user was thinking about. The ambiguous term ‘thinking about’ is now defined as ‘being primed on’. Since the priming effect occurs for many different types of stimuli, such as words, images and sounds, the hypothesis that a subject can prime himself by being told to think about an object is assumed to be possible and is evaluated with an experiment. The goal of the experiment is to determine whether a subject can prime himself on an object without being shown a priming stimulus.
2
Method
Probe words have been prepared that correspond to one of two possible primes (e.g. a book or a bottle). The problem is how to convey the choices of prime to the subject. Telling him the choices may cause him to be primed on all of them. In the experiment, the subject was therefore not given a choice. He was given a physical object, such as a book, mug or tomato, to hold. Physical objects were chosen to not limit the subject’s mind to a visual, auditory or lingual stimulus, but allow him to choose for himself how to go about thinking about the object. In order to promote the latter, two auditory beeps are played before showing the probe word. The subject is instructed to close his eyes on the first (low pitched) beep and concentrate on the object and to open his eyes on the second (high pitched) beep and look at the screen, where the probe word appears. 2.1
Participants
The experiment was performed on three participants aged between 23 and 28, all of which were males and native speakers of Dutch. All of them were right handed. They were placed in a comfortable chair in front of a desk with a computer screen and did not leave the chair for the duration of the experiment. 1
A game in which the player is allowed 20 yes-or-no questions to determine what the opponent is thinking of. See also http://www.20q.net
184
2.2
M. van Vliet et al.
Design
Each participant completed one session with consisted of three blocks. The procedure during a block is as follows: 1. An object is given to the subject, allowed to be held and placed before him on the table. 2. Instructor leaves the room. 3. 100 Trials were performed. 50 trials matching the object shown to the subject, 50 matching a different object, not shown to the subject. 4. Instructor enters the room. 5. 5-10 minute break. Fig. 2 summarizes the way each trial was presented to the subject. Prior to each word, a low beep was heard. The subject was instructed to close his eyes when hearing this beep and think about the shown object. Two seconds later, a high beep followed. The subject was instructed to open his eyes when hearing this beep. A fixation cross appears and the subjects eyes are drawn to the center of the screen. This closing and opening of the eyes will produce a large EOG2 artifact, which is dealt with in the signal processing step later on. After 2 seconds to prevent overlap with the artifact, a probe word replaces the fixation cross for 200ms. The next beep would sound 1800ms after that, prompting the subject to close his eyes again for the next trial.
1
0
lo beep
fixation cross
2
time (s) 3
eyes closed
4
hi beep
fixation cross
5
6
word
fixation cross
Fig. 2. Timeline of a single trial. Two beeps sound, one indicating the subject to close his eyes and think about the object, one indicating him to open his eyes.
2.3
Procedure
1. Subject is seated in a comfortable chair in front of a computer screen. 2. Subject is told about the goal of the experiment and given instructions on the procedure. 3. Subject is fitted with electrodes during the explanation of the experiment. 4. 10 Test trials were presented to acquaint the subject with the procedure. 5. Three blocks were performed. 6. End of experiment. 2
Electro-OculoGraphy, in this context meaning the electric current produced by eye movements that shows up in the EEG recording.
Guessing What’s on Your Mind: Using the N400 in BCI
2.4
185
Stimuli
The experiment consists of 3 blocks. In each block, the subject is given a physical object and 10 words are shown that match the given object and 10 words are shown that match a different object. Each word was included in the randomized sequence 5 times, in order to average the recordings of the 5 repetitions later on. The words that were shown to the subject have to be closely related to one object, but not at all related to the other objects. To archive this, the Dutch Wordnet [13] was used. This wordnet is a graph with Dutch words as nodes, with semantic relations drawn between them as edges, such as synonyms, hyponyms, opposites etc. The 6 physical objects are chosen to be exemplars which can be described with one word (e.g. a book, a mug, a tomato, etc.), hereafter called the object name. The goal is to associate 10 Dutch words with each object name. Candidate words are generated by traversing the Dutch Wordnet, starting at the object word o1 (for instance ‘book’) and spreading outwards in a breadth-first fashion. For each word that is encountered in this fashion, the distance to each of the 6 object names is calculated and a score is calculated: 6 d(w, oi ) − 5d(w, o1 ) (1) s= i=2
Where s is the score, w is the word under consideration, oi is one of the object names, with o1 being the object name from which the search was started. d(x, y) is the distance function between two words: the number of edges in the shortest path between them, which is used as a measure of relatedness of the words. Words with a high score will be close to the object we are searching from, but distant from any of the other objects. A search like this will generate many words that are either uncommon or not necessarily associatively related to the object. For each object name, a list of 30 words is created by sorting all the generated words by score, highest to lowest, and manually taking the first 30 words that were judged to be strongly associatively related to the object by the instructor. This subjective selection improves the effectiveness of the dataset, because the distance function used takes only purely semantic relatedness into account, whereas the N400 effect is also attributed to associative relatedness between words. The total list of 180 words (30 words times 2 objects times 3 blocks) has been presented to each subject at least a week before the experiment. The subject was asked to score each word in relation to a photograph of the corresponding object. A 5 point scale was used, 1 being not related at all, 5 being practically the same thing. For each object, the 10 words that the subject scored highest where chosen to be used in the experiment. When choosing between words with the same score, the scores by the other subjects was taken into account and the words with a high score assigned by all subjects were favored. For example, fig. 3 lists the words that the first subject scored highest in relation to an object. The full list of words used during the experiment is included in appendix A.
186
M. van Vliet et al. Dutch original: woordenboek bibliotheek bijbel bladzijde hoofdstuk paragraaf uitgever verhaal auteur kaft English translation: dictionary library bible page chapter section publisher story author cover
Fig. 3. left: a photograph of a sample object that was shown to the subjects. right: the 10 words marked as most related to the object in the photograph by the first subject.
2.5
Method of Analysis
A schematic overview of the data analysis method is presented in fig. 4. Numbers in the text below correspond to the blocks in the diagram. The recordings were made with a 32 channel EEG cap and 3 external electrodes placed in the middle of the forehead and below each eye (1). All data was recorded with a samplerate of 256Hz, average referenced (2) and bandpass filtered between 0.3Hz and 30Hz (3). Trials were extracted on the interval t = −3s 3
2
1
4
0,3 Hz
32 EEG
30 Hz
-3s
2s
trials
CAR
3 EOG
-1s
EEG EOG
AVG
AVG 0s
256 Hz
100 Hz
5
7
6
5
correct
ERP 150
50%
8
50%
10
T-Test
incorrect correct
9
incorrect
11
ERP 150
10
Fig. 4. Diagram summarizing the data analysis process. Each block has a number which correspond to the numbers in the text.
Guessing What’s on Your Mind: Using the N400 in BCI
187
to t = 2s, relative to the onset of the probe word at t = 0s (4). This includes the moment the subject opens his eyes until the moment any N400 effects should no longer be visible. Each probe word occurred 5 times in the presentation sequence. The corresponding trials were averaged (5) to form the final trials. These averaged trials were filtered with an automated method for reducing EOG effects [11] which involves calculating a mixing matrix between the recorded EOG signal and the recorded EEG (6). Using this matrix, the EOG can be subtracted from the EEG, reducing the effect of eye movements in the data, which are severe but predictable in this experiment, since the subject was instructed to close and open his eyes before being shown a probe word. Application to the average of 5 trials instead of unaveraged data increases the effectiveness of this filtering [3]. Each trial was baselined on the interval t = −1s to t = 0s (7) and resampled to 100Hz (8) to reduce the number of data points. For each class, an Event Related Potential (ERP) plot was created (10). Student’s t-tests were performed on each 10ms segment (11) between both classes to determine the statistical significance between any differences between the two classes.
3
Results
The resulting ERP plots are presented in fig. 5. It can be seen that starting around 400ms, the waveforms diverge between the classes for all subjects. From the topo plots can be seen that the location of the N400 effect differs for each subject. This could be explained by the fact that the subjects employed different strategies for concentrating on an object, ranging from visualizing it to thinking about related symbols. The ERP plots show that there is a dipole effect: the N400 is a positive deflection in relation to the baseline at frontal/right positions and a negative deflection in occipital/left positions. Variation of the N400 amplitude and timing is to be expected between subjects, as it is also the case in other studies (see for an example [4], figure 16). There are also differences in the duration of the effect, in the recordings for subject 3, the effect is measurable for more than a second, up until the subject closes his eyes again, while subject 1 only shows the effect for a few hundred milliseconds. All data preprocessing steps are performed on the dataset as a whole, except for the baselining. It is possible that the separate calculation of the baseline for each class creates an artificial difference between the ERPs. This could for instance be the case when the baseline is calculated on a unstable portion of the signal containing lots of artifacts. Such periods exists in the recordings, where the subject opens and closes his eyes causing an EOG artifact. Calculating the baseline on the wrong portion of the signal will effectively generate a random baseline value for each class. In order to rule this possibility out, the exact same data analysis was performed again, but the trials were assigned random class labels by shuffling them and assigning the first half the label correct and the last half the label incorrect. The result was that any differences between the classes that could be seen were not statistically significant and randomly distributed over channels and time. This bestows confidence in the method of analysis.
188
M. van Vliet et al. Subject 1
Subject 2
Subject 1
P3
Pz
PO4
5
5
5
0
0
0
0
−5
−5 500
1000
−5
0
CP1 Subject 2
PO3
5
0
500
1000
−5
0
500
1000
0
F4
FC6 5
5
5
0
0
0
0
−5
−5
−5
−5
500
1000
0
FC1
500
1000
0
C3
500
1000
0
5
5
5
0
0
0
0
−5
0
500
1000
0
−5 500
1000
0
1000
500
1000
Pz
CP1
5
−5
500
Cz
5
0
Subject 3
Subject 3
−5 500
1000
0
500
1000
correct incorrect
Fig. 5. Top: For each subject, the topo plot shows the mean significance for each channel during the time interval 500–1000ms. The values are given as 1/p-value, so a higher value means the difference between the classes is more significant. Bottom: ERP plots for all three subjects. Each row corresponds to a subject and the 4 most significant channels. Shaded area’s indicates a statistically significant (p ≤ 0.01) difference between the waveforms.
4
Conclusions
The purpose of the experiment was to explore whether a subject can be primed without being shown an explicit stimulus, such as a word, image or sound. The subject was instead asked to prime himself by actively thinking about an object when hearing a beep. Single probe words were used to trigger a N400 response when the word matched the previously shown object.
Guessing What’s on Your Mind: Using the N400 in BCI
189
The recordings indicate that the N400 effect is indeed elicited using this strategy and so, the experiment gives support for the hypothesis that a subject can prime himself by thinking about an object in such a way that the N400 effect occurs when shown probe words. Evidence is given that priming can be achieved without using explicit stimuli, leaving choice for the subject to prime himself. Using this pilot experiment as a basis for further research, a BCI could be constructed which can guess the object that the user is primed on and by extension what it is the user is ‘thinking about’.
5
Future Work
Many questions remain to be answered before the N400 signal can be reliably used in a BCI context as envisioned in this paper. The decision to ask the subject to close his eyes, made signal processing considerably harder, because of the generated EOG artifacts. It was included to make it easier for the subject to concentrate and not be distracted by outside stimuli. In retrospect, this might have done more harm than good, so a future experiment can be performed to compare results without the closing of the eyes. A first attempt has been made on trying to automatically classify the trials. A linear support vector machine was trained on the data segment corresponding to the 4 channels with the lowest average t-test scores (the most significance) and the time interval 400ms–600ms, resulting in 80 datapoints per trial. However, naive classification of this kind proved to be insufficient as the performance was around chance level. After the experiment, the subjects were familiar with the words used, which can have an influence on their ability to trigger an N400 effect if there same dataset would be used again. Research can be done to determine the impact of these repetition effects. When constructing a BCI, care must perhaps be taken to present different probe words every time it is used by the same user. Improvements due to user training can also be explored.
Acknowledgements The authors gratefully acknowledge the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs and the Netherlands Ministry of Education, Culture and Science. This work is also made possible by the academic-assistant project of the Royal Dutch Academy of Science, financed by the Dutch Ministry of Education, Culture and Science. We would also like to thank the test subjects for their efforts.
References 1. Abootalebi, V., Moradi, M.H., Khalilzadeh, M.A.: A new approach for EEG feature extraction in P300-based lie detection. Computer methods and programs in biomedicine 94(1), 48–57 (2009)
190
M. van Vliet et al.
2. Bajo, M.T.: Semantic facilitation with pictures and words. Journal of Experimental Psychology: Learning, Memory, and Cognition 14(4), 579–589 (1988) 3. Croft, R.J., Barry, R.J.: EOG correction: a new aligned-artifact average solution. Electroencephalography and clinical neurophysiology 107(6), 395–401 (1998) 4. Hagoort, P., Brown, C.M., Swaab, T.Y.: Lexical-semantic event-related potential effects in patients with left hemisphere lesions and aphasia, and patients with right hemisphere lesions without aphasia. Brain: a journal of neurology 119, 627–649 (1996) 5. Kutas, M., Hillyard, S.A.: Reading Senseless Sentences: Brain Potentials Reflect Semantic Incongruity. Advancement of Science 207(4427), 203–205 (1980) 6. Kutas, M., Hillyard, S.A.: Brain potentials during reading reflect word expectancy and semantic association. Nature 307, 161–163 (1984) 7. Lau, E., et al.: A lexical basis for N400 context effects: evidence from MEG. Brain and language 111(3), 161–172 (2009) 8. Lau, E.F., Phillips, C., Poeppel, D.: A cortical network for semantics (de)constructing the N400. Nature reviews. Neuroscience 9(12), 920–933 (2008) 9. Meegan, D.V.: Neuroimaging techniques for memory detection: scientific, ethical, and legal issues. The American journal of bioethics: AJOB 8(1), 9–20 (2008) 10. Orgs, G., Lange, K., Dombrowski, J.H., Heil, M.: Conceptual priming for environmental sounds and words: an ERP study. Brain and cognition 62(3), 267–272 (2006) 11. Schl¨ ogl, A., et al.: A fully automated correction method of EOG artifacts in EEG recordings. Clinical neurophysiology: official journal of the International Federation of Clinical Neurophysiology 118(1), 98–104 (2007) 12. Thompson-Schill, S.L., Kurtz, K.J., Gabrieli, J.D.E.: Effects of Semantic and Associative Relatedness on Automatic Priming. Journal of Memory and Language 38(4), 440–458 (1998) 13. Vossen, P., Bloksma, L., Boersma, P.: The Dutch Wordnet (1999)
Guessing What’s on Your Mind: Using the N400 in BCI
A
191
Words Used in Experiment 4 zin sentence bibliotheek library trilogie trilogy letter letter paragraaf section hoofdstuk chapter inleiding introduction voorwoord foreword nawoord epilogue uitgever publisher kaft cover woordenboek dictionary bladzijde page verhaal story auteur author pint pint glas glass hals neck kroonkurk bottle cap alcohol alcohol biertap beer pul pint doorzichtig transparent gezellig merry fust cask bier beer wijn wine statiegeld deposit krat crate flessenopener bottle opener
snuiten blow servet napkin snuiven sniff wegdoen put away hoesten cough doekje handkerchief niezen sneeze afvegen wipe broekzak pocket neus nose snotteren snivel weggooien throw away niesbui sneezing fit opvouwen fold
karton cardboard verhuizen move bewaren keep verpakken package etiket label inpakken pack tillen lift magazijn storehouse plakband duct tape stapelen stack opbergen store schoenendoos shoe box opslag storage zolder attic dragen carry fruit fruit groente vegetable tortilla tortilla paprika paprika peterselie parsley voeden feed ovenschotel oven dish tros bunch plant plant voedsel food lekker delicious ketchup ketchup saus sauce pizza pizza groentenboer greengrocer plukken pick gerecht dish maaltijd meal salade salad beker cup thee tea chocolademelk chocolate theelepel teaspoon slurpen slurp schenken pour out gieten pour bord plate oortje ear onderzetter coaster koffie coffee breken break keukenkast cupboard drinken drink drank beverage
A Brain Data Integration Model Based on Multiple Ontology and Semantic Similarity Li Xue, Yun Xiong, and Yangyong Zhu School of Computer Science, Fudan University Shanghai 200433, P.R. China
[email protected],
[email protected],
[email protected] Abstract. In this paper, a brain data integration model(BDIM) is proposed by building up the Brain Science Ontology(BSO), which integrates the existing literature ontologies used in brain informatics research. Considering the features of current brain data sources, which are usually large scale, heterogeneous and distributed, our model offers brain scientists an effective way to share brain data, and helps them optimize the systematic management of those data. Besides, a brain data integration framework(BDIF) is presented in accordance with this model. Finally, many key issues about the brain data integration are also discussed, including semantic similarity computation, new data source insertion and the brain data extraction.
1
Introduction
In recent years, the research of brain science has offered BI plenty of useful brain data. However, it is difficult and ineffective to extract data from diverse brain databases, as well as liable to make errors due to the inherent features, such as, heterogeneity and decentralization. Moreover, the volume of such data still increases rapidly with the development of the research related to brain science. Due to these problems, the BI researchers are facing some common pain points, which mainly lie in three aspects: – The researchers have to master the usage of several specific query languages and interfaces, because different database systems do not support one common query language and application interface. – Due to the heterogeneities of current existing brain databases, the contradictions usually occur, thus most of the brain data need to be reorganized and cleaned manually. – For obtaining the latest and the most complete brain data, problems mentioned in 1) and 2) usually occur repeatedly, because the source databases are realtime updated together with the new development of BI research. Due to these problems, we propose a brain data integration model based on integrating some existing ontologies used in brain science. Furthermore, to deal with the inconsistencies according to a unified semantics, the measure of the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 192–199, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Brain Data Integration Model Based on Multiple Ontology
193
semantic similarity between concepts in the ontology is defined, which represents the similarity degree of brain concepts from semantic view. Several other related methods for extracting brain data are given too, which are used for new data source insertion, data extracting. The rest of this paper is organized as follows. Section 2 introduces related works. Section 3 describes our brain data integration model, including the model structure, the function of each component, and model development process. Section 4 presents the integration framework built up in accordant with this model, as well as several critical issues, in particular a novel semantic similarity computation method. Finally, Section 5 gives concluding remarks.
2
Related Works
Currently, there are several research branches related to brain science, such as Neuroscience, Cognitive Informatics, Cognitive Neuroscience, and Brain Informatics, among which many effective integration platforms have been developed by using information technologies. However, most of the integration models have been applied in a limited area, due to researchers’ specific motivations. Neuroscience researchers explore the brain functions by cooperating the mathematics theory and computer simulation methods,such as Gardner,et al[1]design and implement a Neuroscience Information Framework on the Web. However, the research attention is limited among brain structure data and function data. Cognitive informatics is an interdisciplinary research field of cognitive science and machine learning method, which investigates the brain from the view of informatics, but the researchers of this domain do not make systematically research on large scale data integration. As to the cognitive neuroscience, scientists focus on the mechanism of cognitive activities of the brain. That is, they study on how the brain regulates its components on different levels, i.e, molecule level, cell level, brain region level, and the whole brain level. Finally, the Brain Informatics researcher take advantage of both neuroscience methods and web intelligence technologies, to perform systematical investigation on human cognitive mechanisms. Chen, et al [2] bring out the Data-Brain concept model for explicitly representing various relationships among multiple human brain data sources, with respect to all major aspects and capabilities of human information processing systems(HIPS), which can be utilized for data sharing and integration. Besides, although most of the aforementioned integration systems take ontology as an critical tool for building up conceptual model, a universal ontology of brain science is still missing, which greatly limits the further investigation insight the brain science. Inspired by the existing literature, we propose a brain data integration model by building up a universal brain science ontology, which integrates many existing brain science related ontologies. According to this model, a brain data integration framework is built up too, which offers a more effective way to manipulate the brain data in many heterogeneous data sources.
194
3 3.1
L. Xue, Y. Xiong, and Y. Zhu
The Brain Data Integration Model The Structure of BDIM
The BDIM is comprised of two main parts: The core of BDIM and Selected existing data sources. As shown in Fig.1, the core of BDIM is the fundamental part of this model, which is made up of three basic modules, the Brain Data Access Interface(BDAI), the Brain Data Management Agent(BDMA), and the Brain Science Ontology(BSO).
Fig. 1. The Brain Data Integration Model
The BDAI module offers the users an unified access method to the distributed brain data sources. The BDMA module mainly implements basic management of the brain data, such as query, modify and delete etc. And the BSO acts as the global ontology of brain science, and covers comprehensive concepts and relationships among this field. The role of BSO is to link and integrate the existing ontologies, such as, ERP ontology, fMRI ontology, etc. Among the core part of BDIM, building up BSO is the most important task. With respect to state-of-the-art method of integrating multiple ontologies, our method can be categorized into Hybrid Approaches[3]. Following this approach, the semantics of each data source is described by its own ontology. In order to make the source ontologies comparable to each other, all the ontologies are built upon a global shared vocabulary. The shared vocabulary contains basic terms (the primitives) of a domain. Thus, for creating compound terms of a source ontology, the primitives are combined by some operators. In this case, the terms of all source ontologies are based on common primitives, so it is easier to compare them than multiple ontology approaches. Sometimes the shared vocabulary is
A Brain Data Integration Model Based on Multiple Ontology
195
also an ontology[4], BSO is such kind a case, which extracts primitives from the Unified Medical Language System(UMLS) and some influential ontologies, e.g., Gene Ontology(GO), fMIR Ontology, ERP Ontology, etc. 3.2
The Development of BSO
According to the ontology development methodology, there are many successful methods (e.g. METHONTODOLOGY or TOVE). The development process of BSO is in accordance with method proposed by Uschold and Gruninger[5], which divides the development process into four phases: Phase one. Identifying development purpose and scope: Specialization, intended use, scenarios, set of terms including characteristics and granularity. Obviously, the BSO is created for brain science research, but it covers many different researching levels and aspects, which has been mentioned above. Hence, the concepts abstracted from UMLS and other source ontologies also varies a lot reflecting diverse levels and various aspects. In fact, building up appropriate concepts set and relationships set is an extremely critical and tuff work, which has to deal with much existing inconsistencies among source ontologies. Generally, the inconsistencies between ontologies are considered on three levels[6]: inconsistency on the instance level, inconsistency on the concept level, and inconsistency on the relation level. Phase two. Building the ontology: (a) Ontology capture: interacts with requirements of phase one for Knowledge acquisition. (b)Ontology coding: Building up the conceptual model for domain knowledge (c) Integrating existing ontologies: The reuse of existing ontologies speeds up the development process, but some new problems will arise up at the same time, in particular the inconsistency problem mentioned in phase one. During the development process of BSO, We apply three different strategies, proposed by Ngoc Tjanh Nguyen[6], to these three types of inconsistency problems accordingly. Phase three. Evaluation: Verification and Validation. Phase four. Guidelines for each phase.
4 4.1
The Integration Framework Based on BDIM The Structure of Integration Framework
In this section, we bring out an integration framework based on BDIM. This framework is composed of four layers, shown in figure 2: Data Source Layer: contains many brain data sources, which is the lowest layer of the integration framework. At the beginning,we select some influential and authorized brain databases as data sources of the integration platform,e.g. FlyBase(Drosophila), the Saccharomyces Genome Database(SGD), and the Mouse Genome Database(MGD), etc. Data integration layer: implements the data integration task by using Extraction, Transformation and Load (ETL) tools. Supervised by the BSO, the meta
196
L. Xue, Y. Xiong, and Y. Zhu
Fig. 2. The Brain Data Integration Framework
data mappings between data sources and the objective datahouse are defined. According to these mappings, the ETL tools automatically extract data from source databases, transform source data into the objective format, and finally load them into the objective datahouse. User Data View Layer: the platform applies both physical and logical integration strategy. The physical integration strategy will be performed for the frequently used brain data.In other cases, the platform carries logic strategy, which does not load the integrated brain data into a real physical database. This compromised integration strategy takes both time and space factors into consideration. Application Service layer offers the users various data application services as follows: a) Offering the researchers standard brain data,which can be used as a criteria for analyzing the brain data. b) Brain data query, including general brain data query and bibliographic data query. c) Brain data mining. d) Brain data display and demonstration. e) Online analysis via Web service. 4.2
Several Critical Techniques
New Data Source Insert. To insert a new data source into the BDIF, the researchers need perform the following steps: a) Wrap and publish the data source as a service node before insert it into the integration platform. b)Authenticate the database register tool before using it. c) Seek and download the related virtual table for the data source from the information center by register tool. d) Submit the data source information to the information center by the register tool. Then, this service node can be discovered by information center.
A Brain Data Integration Model Based on Multiple Ontology
197
Semantic Similarity Computation. In BDIF, a novel similarity computation method based on the semantic path coverage(SPC) is adopted, which is first proposed by us[7]. The basic idea of this method will be presented after making the following definitions. Definition 1. Ontological Link-structure Graph The ontological link-structure graph is an acyclic, denoted by G =< V, E, W, r >, where, V (V = Φ ) is the node set, r is the root node, and E (E ⊂ V ×V ) denotes the directed arc set. The W is a weighting function, defines the mapping from V to the positive real number set. Definition 2. Semantic Path For a given ontological link-structure graph G =< V, E, W, r >, each path from r to v(v ∈ V ) is called a semantic path of v. Among G, each node v has at least one semantic path, and the semantic path set is denoted by Φ(v). Definition 3. Intersection of Semantic Path ), which Given two semantic paths, P = (v0 , v1 , ..., vn ) and Q = (v0 , v1 , ..., vm are among the ontological link-structure graph G =< V, E, W, r >. Suppose n ≥ m > 0, then the intersection of P and Q is a semantic path,denoted by P ∩ Q. Definition 4. Union of Semantic Path Suppose P = (v0 , v1 , ..., vn ) and Q = (v0 , v1 , ..., vm ) are two semantic paths among the ontological link-structure graph G =< V, E, W, r >. where n > 0, m > 0, the union of P and Q is a node set containing all the nodes in P and Q, denoted by P ∪ Q. Based on the above definitions, our proposed method computes the semantic similarity of the nodes v1 and v2 among BSO by the following steps: Step one: Compute the total number of the nodes among BSO. Step two: Compute the number of descendant nodes of each node ϕ, denoted by ϕ . Step three: Compute the occurrence probability of the descendance of ϕ, de noted by p(ϕ) = ϕN . Step four:Compute the information content of ϕ according to the informatics theory, have IC(ϕ) = −log(p(ϕ)). Step five: Compute the intersection of the semantic pathes of v1 and v2 , denoted by α: N Pi ) ∩ ( N Pj )). α = (( N Pi ∈ϕ(v1 )
N Pj ∈ϕ(v2 )
Where, ϕ(v1 ) and ϕ(v2 ) respectively indicates the semantic path set of node v1 and v2 . Step six: Compute the union of the semantic pathes of v1 and v2 , denoted by β: β = (( N Pi ) ∪ ( N Pj )). N Pi ∈ϕ(v1 )
N Pj ∈ϕ(v2 )
198
L. Xue, Y. Xiong, and Y. Zhu
Step seven: Compute the sum of information content of the nodes in α, IC(α): IC(α) = IC(ωi ). ωi ∈α
Step eight: Compute the sum of information content of the nodes in β, IC(β): IC(β) = IC(ωi ). ωi ∈β
Step nine: Define the semantic similarity of v1 and v2 as the ratio of IC(α) to IC(β), that is: Sim(v1 , v2 ) =
IC(α) IC(β)
Brain Data Extraction – For General Brain Databases: The BDIF extracts brain data from general brain database by building up wrappers for complex data objects, which mainly adopts the instance segmentation method. This approach is a perfect combination of top-down and bottom-up methods. Follow this idea, the locator(the segmentation tags set) on sibling nodes are developed for locating the searching area in a probable scope, which is in accordance with the top-down thoughts. Then, checking wether the instance is extracting object according to constraints of the node. – For Bibliographic database: Compared with the general brain databases, the bibliographic database of brain science are a special group of data sources, among which the data are mainly nonstructural textual data. The BDIF applies a method, named Question Net(QNet) extraction method, to the bibliographic data, which is proposed by us[8]. The QNet method transforms the sentence into a directed graph, which puts both the weight and the order of words into consideration. This method is a two-phased extraction method, builds up the question net at first by training process, and then finds out similar sentences with the question net as candidate objectives. The final objectives are picked out from these candidates.
5
Conclusion
The brain data integration is a hot issue of BI research, which becomes more and more important for brain science study. For realizing systematic data management for brain investigation, a novel brain data integration model is proposed, which integrates multiple brain databases by linking up existing ontologies through a global Brain Science Ontology. The fundamental idea of our model is an extension of our former idea of Data-Brain Model[9]. Furthermore, a concrete framework has been built up according to this model, which offers the brain scientists an effective way to obtain data from various heterogeneous data sources and is more suitable for performing multi-dimensional study on brain science.
A Brain Data Integration Model Based on Multiple Ontology
199
Acknowledgement The research was supported by the National Science Foundation Project of China under Grant. No.60903075 and Shanghai Leading Academic Discipline Project under Grant No. B114.
References 1. Gardner, D., Akil, H., Ascoli, G.A., Bowden, D.M., Bug, W., Donohue, D.E., et al.: The Neuroscience Information Framework: a data and knowledge environment for neuroscience. Neuroinformatics (2008), doi:10.1007/s12021-008-9024-z 2. Chen, J.H., Zhong, N.: Data-Brain Modeling Based on Brain Informatics Methodology. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 41–47. IEEE Computer Society Press, Los Alamitos (2008) 3. Wache, H., Voegele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Huebner, S.: Ontology-Based Integration of Information - A Survey of Existing Approaches. In: Proceedings of the IJCAI 2001 Workshop on Ontologies and Information Sharing, pp. 108–118 (2001) 4. Stuckenschmidt, H., Wache, H., Vogele, T., Visser, U.: Enabling technologies for interoperability. In: Workshop on the 14th International Symposium of Computer Science for Environmental Protection, pp. 35–46 (2000) 5. Uschold, M., Gruniger, M.: Ontologies: Principles, methods and applications. Knowledge Engineering Review 11(2), 93–155 (1996) 6. Nguyen, N.T.: Advanced Methods for Inconsistent Knowledge Management, pp. 242–262. Springer, Heidelberg (2008) 7. Li, R., Chao, S., Li, Y., Tan, H., Zhu, Y., Zhou, Y., Li, Y.: Ontological Similarity Computation Method Based on Semantic Path Coverage. Progress in Nature Science 16(07), 916–919 (2006) 8. Yang, Q., Zheng, G., Xiong, Y., Zhu, Y.: Qnet-BSTM: An Algorithm for Mining Transcription Factor Binding Site from Literature. Journal of Computer Research and Development 45(suppl.), 323–329 (2009) (in Chinese) 9. Zhu, Y., Zhong, N., Xiong, Y.: Data Explosion, Data Nature and Dataology. In: IEEE/WIC International Conference on Brain Informatics, pp. 147–158. Springer, Heidelberg (2009)
How Does Repetition of Signals Increase Precision of Numerical Judgment? Eike B. Kroll1, Jörg Rieger2, and Bodo Vogt2 1
Karlsruhe Institute of Technology (KIT), Institute of Economic Theory and Statistics, Karlsruhe, Germany 2 Otto-von-Guericke University Magdeburg, Chair of Empirical Economics, Magdeburg, Germany
Abstract. This paper investigates the processing of repeated complex information. The focus of this study is how precision of stated numerical estimates is influenced by repetition of the signal and information about the estimates of others. The key question is whether individuals use the law of large numbers in their estimates. In an experiment, participants are asked to estimate the number of points in a scatter plot, which is visible for a short time. The setting of the experiment allows for stating intervals and/or point estimates. Our analysis shows that the estimated interval gets smaller with each repetition of the signal, but the pattern does not follow the prediction of statistical models. The difference between their own estimates and information about the estimates of others does not lead to higher stated precision of the estimate, but does improve its average quality, i.e. the difference between answer and signal gets smaller.
1 Introduction In advanced societies, people constantly face complex decision tasks in their jobs and private life. Although standard economic models on decision making assume full information and an unlimited capacity for information processing in human brains, experimental work suggests that the capacity for information processing of humans is limited. In consequence, complex information can be processed only to a certain degree of abstraction. The result of this information processing is diffuse information, which in turn leads to imprecision of judgment [1]. In scientific literature it is well established, that humans cannot grasp a large number of objects without counting them [2]. The perception of visual information and its numerical transformation is investigated in the literature also under the term numerosity [3,4]. Economic agents face similar decisions and are required to make judgments based on numerical information. This paper deals with the human information process. In other contexts of economic research departures from theoretical predictions are considered to be caused by the lack of experience of the decision-makers. This paper addresses this issue by analyzing the effect of task repetition on the accuracy of subjects’ statements. More specifically, this paper analyzes whether estimation of decision-makers, when facing complex information, follows statistical estimation methods. Therefore, an experimental setting is derived to show the degree of precision which subjects can provide when facing a complex estimation task and how the precision of subjects’ estimations change when a task Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 200–211, 2010. © Springer-Verlag Berlin Heidelberg 2010
How Does Repetition of Signals Increase Precision of Numerical Judgment?
201
is repeated. Specifically, the proposed experiment allows analyzing whether subjects follow a statistical method, the law of large numbers that is proven to be used intuitively in various aspects [5,6]. The applicability of the law of large numbers is discussed in the following section when the research hypotheses are defined. The issue of information processing in human brains is not only considered in economic research, but it is also subject to psychological studies identifying how humans process signals. Following the literature on human signal processing one can argue that imprecision is caused by the complexity of tasks faced by individuals [7]. The processing capacity of human brains necessary for solving a decision problem is the determinant for complexity. The fact that the information processing of the human brain is limited is the main aspect discussed with respect to the phenomena of imprecision of judgment caused by the processing of complex information [8]. This limitation in itself forces the human brain to simplify complex information. However, this process of simplifying the input causes a certain degree of imprecision when interpreting subjects’ decisions. Following this argument there seems to be a tradeoff between problem simplification and quality of the decision. How the degree of vagueness or imprecision in judgment affects the outcome of a decision-making process remains an open question. It seems to be an established fact that humans have a limited capacity for processing information coming from the outside world. Early work about psychophysical perception shows that stimuli from the outside world need to reach a certain threshold of signal intensity in order to be perceived by the human brain at all [9], which was later termed the Weber-Fechner-Law. The implication of this finding is that a part of the stimulus will not be perceived and therefore cannot be processed. As argued in the paragraph before, this can be regarded as a simplification due to the limitations of human brains and as one cause for imprecision of judgment as revealed when answering complex questions. These fundamental findings about the way humans perceive their environment is adopted in theoretical models. One example is the theory of the "window of attention” [10] describing how visual perception is limited in human brains. This model is also based on the assumption that human perception is associated with a certain degree of imprecision. This "lack of precision" can be explained by a restriction of conscious perception of visual information by the visual field [10]. That means that the model on how visual information is processed and conceived by humans includes a process of simplifying the signal and creating imprecision before the perception of visual information becomes conscious. Furthermore, psychological experiments have shown a correlation between shortterm memory and judgment [11]. These findings inspired a modeling approach for human information processing based on the assumption that the structure of the human brain only allows the processing of a limited number of information chunks [11]. While the initial finding suggested that humans have a memory span of seven information chunks, the exact number is debatable. However, this work focuses on limitations of information processing in human brains as the cause for imprecision in judgment. In summary, one can argue that decisions in economic contexts contain some degree of imprecision caused by mental processes developed to cope with the complexity of the world surrounding us. Following the argumentation above, it seems to establish that decision-processes are limited by the capacity of the human brain to
202
E.B. Kroll, J. Rieger, and B. Vogt
process incoming signals. The question arises as to what happens when individuals gain more experience with the task at hand or are provided with additional or repeated information. While psychologists and economists seem to agree on the fact that more information leads to better performance in judgment, more recent research suggests that increasing information does not necessarily lead to better performance of the individuals [12]. It may be possible that the degree of imprecision inherent in judgment decreases with the repetition of signals when individuals face similar decisions over and over again. This argument would follow similar argumentation as stated in discussions about anomalies in expected utility theory [13]. That is, the frequency of departures from expected utility theory decreases when subjects gain more experience with otherwise unfamiliar tasks [14]. The question arises as to how the performance of signal processing changes and how models can describe this process. One possibility to predict the processing of repeated complex information by the human brain could be the use of statistical models. Following the argument that approximate numerical attributes play a central role in human mathematical thinking [15], one can argue that statistical models are used in human decision processes. This is based on the idea that judgment is indeed provided with a degree of imprecision, but when people are faced with the same task repeatedly, the precision increases when the individual gains experience with each repetition of the task at hand [16]. In order to address the question whether the repetition of a task leads to a decrease in imprecision, we design an experimental task which is explained in further detail in the next section of the paper. Furthermore, the analysis of this experiment provides insights on how an increase in information that is available to the subjects affects the precision of their statements. In particular, we check in an experiment if the increase in precision follows statistical models. While there are a variety of statistical models that can be applied to the task provided in the experiment, this paper focuses on the law of large numbers. The justification of using the law of large numbers for the theoretical prediction of the observed effects is as follows. First and foremost, it is based on the assumption that using the mean of different independent estimations is the simplest procedure which is available for human beings. This can be confirmed by different kinds of experiments and studies based on the identification of the phenomenon "the wisdom of the crowds" [17]. The law of large numbers indicates that the relative frequency of random results approximating the probability of the random results if the underlying random experiment is performed repeatedly. That means it provides a projection of the probability distribution of the result of a repeated random event. In combination with using the mean of independent estimates, this means that the variance of the estimate decreases with each repetition of the signal. Further investigation shows, what the differences are between receiving information by your own and additionally receiving information about the interpretation of that information by others. Therefore, the focus is whether there is a difference between receiving a signal repeatedly and observing the interpretation of the signal by a number of other players who received the same signal. In a last step it is analyzed what the general effect of receiving information about the observations of others has on the precision and quality of statements about the signal.
How Does Repetition of Signals Increase Precision of Numerical Judgment?
203
2 The Task We designed a simple task where the manner in which numerical information is aggregated in the brain can be analyzed and compared with standard technical procedures to do this aggregation. Experimental subjects faced the following task: They were shown a scatter plot with a fixed number of points which shows the true value for ten seconds (in this time, subjects were not able to count all the points). In the next stage, subjects were asked to estimate the number of points. Specifically they were asked to state an interval framing the actual number of points shown to them before. Subjects were asked to state this interval as accurately as possible but as widely as necessary. This task was repeated for ten rounds with each scatter plot showing the same number of points but with a varying distribution on the screen. This fact was known to the subjects.
Fig. 1. Example of scatter plot as shown to the subjects
The problem that is analyzed by means of this task is not how human beings try to count the points, but how they deal with the different counts they get per round. We are interested in how they aggregate the imprecise information. A statistical description of this task is that every round the subject observes a random variable, which is characterized by its mean and variance. We assume that the mean and variance is constant in every round. An effective strategy to see how subjects could deal with this task is to determine an estimated number of points every round and then use the law of large numbers. The estimations should be independent of each other. If in round n, a subject calculates the mean of these estimates, this mean will approach the true value and the variance (measured as an interval) should decrease by a factor of n. In a second task, a subject is informed about the estimates of other subjects. In this task observations are independent since the individuals did not interact.
3 Hypotheses Information is one of the most important factors in economics. Therefore, a lot of research focuses on how new information is processed by individuals as well as
204
E.B. Kroll, J. Rieger, and B. Vogt
groups. Although economics assumes perfect sensitivity of decision makers [18], more recent research on decision-making has brought attention to imprecise judgments as a factor when choices are observed in experimental laboratories [19]. Therefore, the question arises as to how precise the perception of new information can be when the information received is not perfect and requires the recipient to process the information and calculate an approximation. The estimate moves from an extreme to a mean of a distribution, because subjects use the law of large numbers for high frequency observations [20]. The law of large numbers states, that with increasing number of trials for the same random experiment, the average observation will be close to the expected value of the experiment. Furthermore, the accuracy increases with increasing numbers of trials. That means the variance of the estimator decreases with each additional trial. Thus, if decision makers were perfect statisticians, or in this case would follow the law of large numbers, the precision of their estimates increases with the number of observations. We will use stated standard deviation as a measure of precision of a response. The quality of a response is measured as the difference between the response and a true value. Following the law of large numbers, this difference should tend to go to zero. In the following part we state and derive the three hypotheses we will test. Hypothesis 1a: Precision of an estimate increases with the observation of repetitions of the signal following the law of large numbers. Following the law of large numbers, the precision increases by a defined factor. The level of precision in this case is reflected by the variance in the stated estimates. One can further assume that for increasing numbers of repetition for the random event, the distribution of the expected value converges to a normal distribution. For this case, assuming that the experimental trials are independent, the variance can be calculated. Following this calculation using the law of large numbers to a signal being repeated n-times, one can conclude that the standard deviation of the stated estimate decreases by the factor . However, there is a discussion in economic and psychological literature whether humans use the law of large numbers when faced with imprecise information. For example, experimental subjects are known to misconceive the fairness of a random draw leading to a perception of the probability of successes in random outcomes [21]. Following this argument, subjects are overly confident in their prognosis of future outcomes when a series of signals is observed. Therefore, the perceived precision is even higher than calculated by the law of large numbers. The application of the law of large numbers in human judgment is a controversial subject. There are arguments favoring the law of large numbers as a good approximation of human behavior [5] as we see that even children are found to have an intuitive understanding of it [6]. Because experiments reveal human behavior that is close to statistical models [22], it seems that people act on the basis of a rule system similar to the law of large numbers [23]. In contrast, other researchers found that people tend to discard sample sizes when facing decisions [24] and do not realize the effect of sample size on sample heuristics applied to decision tasks [25]. Following their argument, the number of repetitions of an incoming signal is neglected by the subjects and does not change the precision of the estimate following the law of large numbers.
How Does Repetition of Signals Increase Precision of Numerical Judgment?
205
While there are different arguments in favor of and against statistical models and the law of large numbers in specific, the comparison of different tasks in the economic literature suggests that the law of large numbers is applied with relative high frequency when questions concern the mean rather than the tail of a distribution [26]. Furthermore, the law of large numbers holds for decisions about frequency distribution tasks [16]. Therefore, the experiment reported in this paper is using a question about a mean of a distribution in a frequency distribution task. Hypothesis 1b: Precision increases with the knowledge of estimates of others following the law of large numbers. Generally speaking, two possibilities exist for receiving additional information about a signal. The first concerns receiving the signal multiple times. The second concerns receiving information about the estimates of other subjects who received the same signal. We expect the law of large numbers to also hold if the observations of others are included. Hypothesis 2: The increase of precision is lower when the estimates of other participants are observed compared to their own information. Experimental results show that subjects tend to copy the behavior of others. For example, analysts adjust their own forecasts if information about the forecasts of others is available neglecting their own private information [27] and even in simple games with two agents, subjects copy the decision of the first mover when it is observable significantly more often than when it is not observable [28]. In games where information of other subjects is observable for the individual, cascade behavior is initialized [29]. That means people make the same choices as others without using their own private information. Furthermore, participants do not recognize cascade behavior of others [30], which is sometimes described as the persuasion bias [31]. Additionally, one can find differences in mental activity between processing private information and the information provided by others, which can explain why people tend to follow the behavior of others [32]. Following these arguments, it seems that humans prefer to stick with the behavior of the group where that behavior is observable and their own decisions can be changed accordingly after observing such behavior. Therefore, one can conclude that subjects should feel more certain about their own answers when following a group. Thus, the precision of answers should be even higher, when the estimates of other participants are observable. Hence, subjects place a higher value on signals from other participants than receiving more information themselves. On the other hand, overconfidence [33] predicts that participants rely more on their own information than on the information on others. In our setting, no strategic implications have to be considered. We also do not compare estimates of subjects in front of the group such that they might feel a pressure to stick to the group opinion. We simply test which information gets heavier weight in the estimates: their own or private information. We think that for individual estimates their own information gets a higher weight and have stated the hypothesis above. Hypothesis 3: The individual estimate is closer to the true value when estimates of others can be taken into account, i.e. the quality gets higher.
206
E.B. Kroll, J. Rieger, and B. Vogt
One of the most famous examples for dealing with one’s own estimates compared to the estimates of others is the winner’s curse [34]. This effect describes the change of a person’s perception of her own signal after being made aware of the estimates of other persons that are made public through a price mechanism. Furthermore, it stresses the importance of receiving signals from others in order to make better estimates of an imprecise signal. In auctions revealing winner’s curse, bidders are found to play best-response to the distribution of other bids [35], depending on the beliefs of the rival’s uncertainty [33]. Since according to the theory, the winner’s curse disappears in market settings [36] and players correct their behavior with respect to the received information [35], one can expect group estimates to have a higher quality than a series of individual estimates. In our setting, the observation of the estimates of others should also lead to a higher quality of the own estimate since the estimates of the others are independent observations.
4 Experiment The experiment was conducted in a laboratory environment at the MaXLab of the Otto-von-Guericke University in Magdeburg. The group of participants consisted of 48 students from the Otto-von-Guericke University Magdeburg enrolled in different fields of study, matched randomly using the ORSEE System [37] into two different groups and several sessions. The subjects faced the task described in the section task. They had to give an estimate of the numerical value representing the number of points in a cloud (see section task). The true value was 143. The experiment consists of two different treatments. In the first treatment, subjects could just look at the plot for ten seconds before they gave their answer.. In the second treatment, additional information was added. In each round, after stating the interval, the subjects were shown a table with the estimated intervals of ten other participants before seeing the next scatter plot. In the second treatment, the number of observations is ten times as high as in the first treatment. All groups played both treatments while one half of the groups started with the first treatment and the other started with the second treatment. Therefore, we were able to test for sequence effects. The software implementation of the experiment in the laboratory was designed with ztree [38]. Table 1. Sequence of the experiment and comparison of treatments Step 1 2 3
Treatment 1&2 1&2 2
Screen Scatter plot Provide upper and lower bound of interval Table with answers of all the other participants
Time Frame 10 sec. Press OK Press OK
5 Results In the experiment, an interval or a point estimate is stated by the participants framing the true value for the number of points in the scatter plot. Therefore, the width of this
How Does Repetition of Signals Increase Precision of Numerical Judgment?
207
interval represents how confident the subjects are about their stated estimation. We use this interval as a proxy for the standard deviation attributed to the estimate by the participants. 5.1 Hypothesis 1a and 1b The analysis of the estimates without revealing the answers of other participants (Treatment 1) shows a decreasing width of these intervals. While the width of the mean interval in the first round is 40, it shrinks to a mean width of 27.5 in the tenth round. Thus, the interval decreases with the number of observations (Wilcoxon-Test, 5%-level). When comparing the width of the interval in rounds one and ten, the law of large numbers would predict a decrease of the interval by the factor For the analysis we calculate this factor for each individual by dividing the interval width of round one by the interval width of round ten .A factor larger than 3.16 can be interpreted as the subject is in line with the law of large numbers and a smaller factor as the subject not being in line with the law of large numbers. It has to be noted here, that one session only consisted of eight subjects because not all recruited subjects showed up. For this session the factor was corrected for the decreased number of observations. Individuals do not provide estimates that can be explained by the law of large numbers (Binomial-Test, 5%-level). That means that while it is true that the precision of the stated estimates increases with the number of observations (Hypothesis 1a), the participants of this experiment do not follow the law of large numbers (Hypothesis 1b). The decrease of the interval width is by far less than predicted. The analysis stated above focused on repeating reception of a signal and the precision of subjects’ perception of this signal. As stated previously, there is a second possibility of acquiring information about a signal, which is the observation of others’ estimates. Analyzing the data of the second treatment, we see that the width of the estimated interval is getting smaller as well. This in turn confirms the observation that the precision of the estimate interval is increasing. 5.2 Hypothesis 2 Following the arguments derived from the literature developing Hypothesis 2, the expectation is that the same number of observations from the estimates of others would lead to higher precision than when the observations were made by the subject. Therefore, the width of the intervals in round two of Treatment 2 (with information) was compared with the width of the intervals in round ten of Treatment 1 (without information). At these stages, participants had the same number of observations, with the only difference being that in Treatment 1 their own observations were considered while in Treatment 2 the observations are estimates of other participants. Following the law of large numbers, one would not expect a difference between these data points. When considering the arguments on information cascade behavior, one would expect the intervals in round two with information to be even smaller than the interval in round ten without information. However, the contrary is true. The interval is significantly smaller for the same number of observations in Treatment 1 (Wilcoxon-Test, 1%-level). Furthermore, the changes in the width of the intervals for increasing numbers of rounds do follow very similar patterns in
208
E.B. Kroll, J. Rieger, and B. Vogt
both treatments. While the precision in round ten with information is slightly smaller than without information in the mean, the difference in the data is not significant at any common level of significance (Wilcoxon-Test). This is in line with Hypothesis 2. 5.3 Hypothesis 3 Until now, the analysis focuses on the precision of the estimates provided by the subjects. The data can also be interpreted in terms of how confident the subjects are with their point estimation. Further analysis is required to check the quality of the estimates. Therefore, the question is whether a group average or a group decision gets an estimate of higher quality. This is checked by comparing the distance between estimates provided by the participants with the true number of points in the scatter plot. For the analysis of the quality of the estimate, we calculate the midpoints of the intervals provided by the participants for all rounds in both treatments. Then the distance between these midpoints and the true value of 143 is calculated. Comparison of the treatments shows that without information about others (Treatment 1), the mean of the calculated midpoints remains at 153.75, while with information about others (Treatment 2), the mean shifts to 143.75 and is significantly closer to the true value (Wilcoxon-Test, 10%-level). The basis of this test is the difference between the midpoints of the estimated interval and the true value of points. Using the Wilcoxon-Test, we tested whether the differences are significantly different. Therefore, one can conclude that the quality of an estimate is higher for group decisions (Hypothesis 3), however, the confidence of the participants as reflected in the precision of the stated estimate does not differ between the groups. This favors our Hypothesis 3.
Fig. 2. Results compared by treatment
How Does Repetition of Signals Increase Precision of Numerical Judgment?
209
6 Conclusion This paper deals with the question of how the perception of imprecise signals changes in an experimental setting when the signal is repeated. We show that the processing of complex information of the human brain in most cases leads to imprecise numerical judgments. Furthermore, it was investigated whether or not the repetition of the consideration of complex information leads to more accurate judgments. It was shown that over the rounds of the experiments, the responses get more precise. But, this does not happen in the dimensions of the theoretically expected factor. The question whether or not humans follow statistical models when dealing with repeating inputs, with a focus on the law of large numbers as the benchmark, can be negated. While the notion of the precision increasing with repetition of the signal is true, we find that it does so to a significantly lower extent than the law of large numbers predicts. Furthermore, this paper underlines the importance of considering imprecise judgment in economic decision making even for repeated situations. In addition, it can be shown that imprecise judgment of repeated information over the rounds is not greatly improved, as it can be expected by theoretical methods. The second issue analyzed in this study is the value of observing the estimates of others in a group. The theoretical argument states that it does not matter whether additional information is received from observing the information of others or from an additional private observation. That means, the source of the information is not considered to be a factor. In our experiment, we show that the information about the decisions of the other experiment participants has an influence on the numerical response. It could also be investigated that responses which were made with the information about the responses of the other participants were closer to the real answer then those responses which were made without the information about the decision of the other participants. Although the literature on information cascades shows that people tend to be more comfortable with copying the actions of others, our data shows that observing the estimates of others has little to no effect on the precision of estimates. However, the quality of the estimates increases when the information is available. This finding is partly in line with overconfidence since the precision increases and partly with other literature, which describes the avoidance of errors by considering the opinion of others.
References [1] Kahneman, D., Knetsch, J.L.: The Endowment Effect, Loss Aversion, and Status Quo Bias. Journal of Economic Perspectives 5, 193–206 (1991) [2] Jevons, W.: The power of numerical discrimination. Nature 3, 281–282 (1871) [3] Braunstein, M.L.: Depth perception in rotating dot patterns: effects of numerosity and perspective. Journal of experimental psychology 64, 415–420 (1962) [4] Piazza, M., Mechelli, A., Price, C.J., Butterworth, B.: Exact and approximate judgements of visual and auditory numerosity: an fMRI study. Brain research 1106, 177–188 (2006) [5] Peterson, C.R., Beach, L.R.: Man As an Intuitive Statistician. Psychological Bulletin 68, 29–46 (1967) [6] Piaget, J., Inhelder, B.: The Origin of the Idea of Chance in Children. Norton, New York (1975)
210
E.B. Kroll, J. Rieger, and B. Vogt
[7] Akin, O., Chase, W.: Quantification of Three-Dimensional Structures. Journal of Experimental Psychology: Human Perception and Performance 4, 397–410 (1978) [8] Miller, J.: Discrete and continuous models of human information processing: theoretical distinctions and empirical results. Acta Psychologica 67, 191–257 (1988) [9] Fechner, G.T.: In Sachen der Psychophysik. Kessinger, Leipzig (1877) [10] Oishausen, B.A., Anderson, H., Essenla, D.C.: A Neurobiological Model of Visual Attention and Invariant Pattern Recognition Based on Dynamic Routing of Information. The Journal of Neuroscience 13, 4700–4719 (1993) [11] Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review 63, 81–97 (1956) [12] Fiedler, K., Kareev, Y.: Does Decision Quality (Always) Increase With the Size of Information Samples? Some Vicissitudes in Applying the Law of Large Numbers. Journal of Experimental Psychology / Learning, Memory & Cognition 32, 883–903 (2006) [13] Kahneman, D., Tversky, A.: Prospect Theory: An analysis of decision under risk. Econometrica 47, 263–292 (1979) [14] Cox, J.C., Grether, D.M.: The preference reversal phenomenon - Response Mode Markets and Incentives.pdf. Economic Theory 7, 381–405 (1996) [15] Borst, A., Theunissen, F.E.: Information theory and neural coding. Nature Neuroscience 2, 947–957 (1999) [16] Sedlmeier, P., Gigerenzer, G.: Intuitions about sample size: the empirical law of large numbers. Journal of Behavioral Decision Making 10, 33–51 (1997) [17] Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few. Abacus, Great Britain (2004) [18] Edwards, W.: The theory of decision making. Psychological Bulletin 51, 380–417 (1954) [19] Loomes, G., Butler, D.J.: Imprecision as an account of the preference reversal phenomenon. American Economic Review 1, 277–297 (2007) [20] Rabin, M.: Inference by Believers in the Law of Small Numbers. Quarterly Journal of Economics 117, 775–816 (2002) [21] Tversky, A., Kahneman, D.: Belief in the law of small numbers. Psychological Bulletin 76, 105–110 (1971) [22] Evans, J., Pollard, P.: Intuitive statistical inferences about normally distributed data. Acta Psychologica 60, 57–71 (1985) [23] Nisbett, R.E.: Rules for Reasoning. Earlbaum, Hillsdale (1992) [24] Kahneman, D., Tversky, A.: Subjective probability: A judgment of representativeness. Cognitive Psychology 3, 430–454 (1972) [25] Reagan, R.T.: Variations on a seminal demonstration of people’s insensitivity to sample size. Organizational Behavior and Human Decision Processes 43, 52–57 (1989) [26] Well, A.D., Pollatzek, A., Boyce, S.J.: Understanding the effects of sample size on the variability of the mean. Organizational Behavior and Human Decision Processes 47, 289– 312 (1990) [27] Bloomfield, R., Hales, J.: An Experimental Investigation of the Positive and Negative Effects of Mutual Observation An Experimental Investigation of the Positive and Negative Effects of Mutual Observation. The Accounting Review 84, 331–354 (2009) [28] González, M., Modernell, R., París, E.: Herding behavior inside the board: an experimental approach. Corporate Governance: An international review 14, 388–405 (2005) [29] Anderson, L.R., Holt, C.A.: Information cascades in the laboratory. American Economic Review 87, 847–862 (1997) [30] Grebe, T., Schmidt, J., Stiehler, A.: Do individuals recognize cascade behavior of others? An experimental study. Journal of Economic Psychology 29, 197–209
How Does Repetition of Signals Increase Precision of Numerical Judgment?
211
[31] DeMarzo, P.M., Vayanos, D., Zwiebel, J.: Persuasion Bias, Social Influence and Undimensional Opions. The Quarterly Journal of Economics 18, 909–968 (2003) [32] Prechter, R.R.: Unconscious herding behavior as the psychological basis of financial market trends and patterns. Journal of Psychology 2, 120–125 (2003) [33] Charness, G., Levin, D.: The Origin of the Winner’s Curse: A Laboratory Study. American Economic Journal: Microeconomics 1, 207–236 (2009) [34] Capen, E.C., Clapp, R.V., Campbel, W.M.: Competitive bidding in high-risk situations. Journal of Petroleum Technology 23, 641–653 (1971) [35] Eyster, E., Rabin, M.: Cursed Equilibrium. Econometrica 73, 1623–1672 (2005) [36] Cox, J.C., Isaac, R.M.: In Search of the Winner‘s Curse. Economic Inquiry 22, 579–592 (2007) [37] Greiner, B.: The Online Recruitment System ORSEE 2.0 - A Guide for the Organization of Experiments in Economics. University of Cologne, Cologne (2004) [38] Fischbacher, U.: z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics 10, 171–178 (2007)
Sparse Regression Models of Pain Perception Irina Rish1 , Guillermo A. Cecchi1 , Marwan N. Baliki2 , and A. Vania Apkarian2 1
IBM T.J. Watson Research Center, Yorktown Heights, NY 2 Northwestern University, Chicago, IL
Abstract. Discovering brain mechanisms underlying pain perception remains a challenging neuroscientific problem with important practical applications, such as developing better treatments for chronic pain. Herein, we focus on statistical analysis of functional MRI (fMRI) data associated with pain stimuli. While the traditional mass-univariate GLM [8] analysis of pain-related brain activation can miss potentially informative voxel interaction patterns, our approach relies instead on multivariate predictive modeling methods such as sparse regression (LASSO [17] and, more generally, Elastic Net (EN) [18]) that can learn accurate predictive models of pain and simultaneously discover brain activity patterns (relatively small subsets of voxels) allowing for such predictions. Moreover, we investigate the effect of temporal (time-lagged) information, often ignored in traditional fMRI studies, on the predictive accuracy and on the selection of brain areas relevant to pain perception. We demonstrate that (1) Elastic Net regression can be highly predictive of pain perception, by far outperforming ordinary leastsquares (OLS) linear regression; (2) temporal information is very important for pain perception modeling and can significantly increase the prediction accuracy; (3) moreover, regression models that incorporate temporal information discover brain activation patterns undetected by non-temporal models.
1 Introduction Brain imaging studies of pain perception are a rapidly growing area of neuroscience, motivated both by a scientific goal of improving our understanding of pain mechanisms in the human brain and by practical medical applications [2,3,1,5,15]. Localizing painspecific brain areas remains a challenging problem due to a complex nature of pain perception that involves activations of multiple brain processes [2]. In this work, we focus on pain perception analysis based on fMRI studies, and explore advantages of statistical predictive modeling techniques known as sparse (l1 -regularized) regression. To our knowledge, this is the first attempt to analyze pain perception using the sparse regression methodology. Functional Magnetic Resonance Imaging (fMRI) uses MR scanner to measure the blood-oxygenation-level dependent (BOLD) signal, known to be correlated with neural activity in response to some input stimuli. Such scans produce a sequence of 3D images, where each image typically has on the order of 10,000-100,000 subvolumes, or voxels, and the sequence typically contains a few hundreds of time points, or TRs (time repetitions). Standard fMRI analysis approaches, such as the General Linear Model (GLM) [8], examine mass-univariate relationships between each voxel and the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 212–223, 2010. c Springer-Verlag Berlin Heidelberg 2010
Sparse Regression Models of Pain Perception
213
stimulus in order to build statistical parametric maps that associate each voxel with some statistics that reflects its relationship to the stimulus. Commonly used activation maps depict the “activity” level of each voxel determined by the linear correlation of its time course with the stimulus. However, the GLM approach models each voxel separately, missing potentially important information contained in the interactions among voxels. Indeed, as it was shown in [12], highly predictive models of mental states can be built from voxels with submaximal activation. Recently, applying multivariate predictive methods to fMRI became an active area of research, focused on predicting “mental states” from fMRI data (see, for example, [14,9,4,6]). In this paper, we focus on sparse regression modeling, a fast-growing statistical field that aims at learning predictive models from data while simultaneously discovering sparse predictive patterns. Two main advantages of sparse modeling are (1) effective regularization via l1 -norm constraint that allows to avoid overfitting on small-sample, high-dimensional data (typical for fMRI) and (2) variable selection, naturally embedded into the model estimation due to sparsity-enforcing properties of l1 -regularization. Such embedded variable selection leads to both predictive and interpretable statistical models that pinpoint informative variables (e.g. groups of voxels) that are most relevant for prediction. (From the variable-selection perspective, GLM approach can be viewed as a more simplistic filter-based variable selection, where each variable/voxel is ranked separately by its relevance to the response/stimulus using a univariate criterion such as correlation; besides, GLM does not provide a predictive model of the response.) Specifically, we experiment with the Elastic Net (EN) [18]) regression, a recent extension of the original l1 -regularized linear regression method called Lasso [17]. Besides sparsity, EN also enforces a grouping property that Lasso lacks, namely, it tends to assign similar coefficients to highly-correlated variables, thus including (or excluding) them as groups. This EN property yields more interpretable solutions that show whole groups of relevant predictors (e.g, spatially coherent groups of voxels) rather than just single representatives of such groups [18,6]. We observe that the Elastic Net is capable of learning highly-predictive models of subjective pain-perception from fMRI data, often achieving 0.7-0.8 correlation between the predicted and actual pain ratings. In practically all cases, EN outperforms the Ordinary Least Squares (OLS) regression, often by far, indeed making use of regularization to prevent overfiting that usually hurts OLS. (Similar results are also observed when predicting ratings of a visual stimulus that are included in our fMRI the experiment together with pain ratings.) Another key aspect of this work is exploring effects of temporal information on the predictive modeling. As we demonstrate, incorporating functional dynamics by using the information from the past time slices (up to 8 in this study) provides consistent and often significant improvement in predictive accuracy. Moreover, using such temporal information may provide new insights into the brain mechanisms related to pain perception, since sparse temporal models discover highly-predictive and thus functionally relevant brain activity patterns that are left undetected by more traditional, non-temporal models.
214
I. Rish et al.
2 Materials and Methods 2.1 Experimental Setup Our analysis was performed on the fMRI dataset originally presented in [2]. The group of 14 healthy subjects participated in this study, including 7 healthy woman and 7 healthy men, of age 35.21±11.48yr; All gave informed consent to procedures approved by the Northwestern University Institutional Review Board committee. The experiment consisted of two sessions, focusing on two different rating tasks, respectively: pain rating and visual rating. The visual task was included in order to compare the activation patterns that relate specifically to pain perception versus the patterns that related, in general, to rating the magnitude of different types of stimuli; as it was observed by [2], “brain activations segregate into two groups, one preferentially activated for pain and another one equally activated for both visual and pain magnitude ratings”. During the first session (pain rating), the subjects in the scanner were asked to rate their pain level (using a finger-span device) in response to a painful stimuli applied to their back. An fMRI-compatible device was used to deliver fast-ramping (20C/s) painful thermal stimuli (baseline 38C; peak temperatures 47, 49, and 51C) via a contact probe. During each session, nine such stimuli were generated sequentially, ranging in duration from 10s to 40s, with similar-length rest intervals in between. During the second session (visual stimulus rating), subjects had to rate the magnitude of the bar length, which was actually following their ratings of the thermal stimulus (although the subjects were unaware of this). The data were acquired on a 3T Siemens Trio scanner with echo-planar imaging (EPI) capability using the standard radio-frequency head coil. An average of 240 volumes were acquired for each subject, and each task, with the repetition time (TR) of 2.5s. Each volume consists of 36 slices (slice thickness 3mm), each of size 64 × 64 covered the whole brain from the cerebellum to the vertex. The standard fMRI data preprocessing was performed using the Oxford Centre for Functional MRI of the Brain (FMRIB) Expert Analysis Tool (FEAT; Smith et al. 2004, http://www.fmrib.ox.ac.uk/fsl), including, for each subject: skull extraction using a brain extraction tool (BET), slice time correction, motion correction, spatial smoothing using a Gaussian kernel of fullwidth half-maximum 5 mm, nonlinear high-pass temporal filtering (120 s), and subtraction of the mean of each voxel time course from that time course. Pain and visual ratings were convolved with a generalized hemodynamic response function (gamma function with 6s lag and 3 s SD). 2.2 Methods Let X1 , · · · , XN be a set of N predictors, such as voxel intensities (BOLD signals), and let Y be the response variable, such as pain perception rating, or visual stimulus. Let X = (x1 | · · · |xN ) denote the M × N data matrix, where each xi is an M -dimensional vector consisting of the values for predictor Xi for M data samples, while the M dimensional vector y denotes the corresponding values for the response variable Y . When using regularized regression, such as Lasso and Elastic Net, the data are usually preprocessed, ensuring that the response variable is centered to have zero mean and
Sparse Regression Models of Pain Perception
215
all predictors have been standardized to have zero mean and unit length. Herein, we consider the problem of estimating the coefficients βi in the following linear regression model ˆ = x1 β1 + · · · xN βN = Xβ, (1) y ˆ is an approximation of y. As a baseline, we use the Ordinary Least Squares where y (OLS) regression which finds a set of βi that minimize the sum-squared approximation error ||y − Xβ||22 of the above linear model. When X has the full column-rank (which also implies that the number of samples M is larger than the number of variables N ), OLS find the (closed-form) unique solution βˆ = (XT X)−1 XT y. However, when N > M , as it is often the case in fMRI data with (dozens of) thousands of predictors (voxels) and only a few hundreds of samples (TRs), here is no unique solution, and some additional constraints are required. (Herein, we used the pseudoinverse based on the Matlab pinv function in order to solve OLS when N > M ). However, in general, OLS solutions are often unsatisfactory, since (1) their predictive accuracy can be low due to overfitting, especially in presence of large number of variables and relatively small number of samples and (2) no variable selection occurs with OLS (i.e., all coefficients tend to be nonzero), so that it is hard to pinpoint which predictors (e.g., voxels) are most relevant to the response. Various regularization approaches have been proposed in order to handle large-N , small-M datasets, and to avoid the overfitting [13,10,11,17]. Moreover, recently proposed sparse regularization methods such as Lasso[17] and Elastic Net [18] address both of the OLS shortcomings, since variable selection is embedded into their model-fitting process. Sparse regularization methods include the l1 -norm regularization on the coefficients1 , which is known to produce sparse solutions, i.e. solutions with many zeros, thus eliminating predictors that are not essential. In this paper, we use the Elastic Net (EN) regression [18] that finds an optimal solution to the least-squares (OLS) problem objective, augmented with additional regularization terms that include the sparsity-enforcing l1 -norm constraint on the regression coefficients that “shrinks” some coefficients to zero, and a “grouping” l2 -norm constraint that enforces similar coefficients on predictors that are highly correlated with each other, thus allowing selection of relevant groups of voxels, which l1 -constraint alone is not providing. This can improve the interpretability of the model, for example, including a group of similarly relevant voxels, rather than one representative voxel from the group. Formally, EN regression optimizes the following function2: Lλ1 ,λ2 (β) = ||y − Xβ||22 + λ1 ||β||1 + λ2 ||β||22 .
(2)
In order to solve the EN problem, we use the publicly available Matlab code [16] that implements the LARS-EN algorithm of [18]. It takes as an input the grouping parameter λ2 and the sparsity parameter that specifies the desired number of selected predictors. Since this number corresponds to a unique value of λ1 in Eq. 2, as shown in [7], we 1
2
q 1/q Given some q > 0, anlq -norm is defined as lq (β) = ( N . E.g., ||β||1 = i=1 |βi | ) N N 2 i=1 |βi |, and ||β||2 = i=1 βi . Note that EN becomes equivalent to Lasso when λ2 = 0 and λ1 > 0, while for λ1 = 0 and λ2 > 0 it is equivalent to ridge regression.
216
I. Rish et al.
will slightly abuse the notation, and, following [6], denote the sparsity parameter as λ1 , while always interpreting it as the number of selected predictors. Selecting Predictor Sets: Temporal and Non-temporal. When predicting a stimulus or behavior from fMRI data, it is typical to use as the predictors the voxels intensities at the current TR, and treat TRs as independent and identically distributed (i.i.d.) samples [6]. However, temporal information from the past TRs may sometimes improve the predictive model, as we demonstrate in this paper. We considered as a set of predictors all voxels from the past 8 time lags (previous TRs), including the current TR. However, due to very high dimensionality of this set, we selected only a subset of those voxels that were correlated with the response variable above the given threshold (herein, we used = 0.2). (Note that time-lagged voxel’s time series were shifted forward by the appropriate lag in order to properly align it with the response time series). Overall, we experimented with the following sets of predictors: Set1 - all brain voxels at the current TR; Set2 a subset of Set1 that included only (current-TR) voxels correlated with the response variable above the same threshold = 0.2, for a more fair comparison with time-lagged voxel subset described above, that we denoted Set3. For the response variable, we first used the pain perception, and then the visual stimulus (recomputing Set2 and Set3 according to the correlation with the different response; we denote the corresponding sets for visual stimulus as Set2v and Set3v ). Moreover, we experimented with the two more sets of predictors, that we refer to as pain-only voxels, which were obtained by removing the “visual” voxels from the “pain” voxels, in the corresponding time-lag and no-lag settings. Specifically, “painonly” time-lagged voxel Set4 was obtained by removing the time-lagged visual voxels Set3v from the time-lagged pain voxels Set3, while the “pain-only” no-lag (current TR) voxel set Set5 was obtained by removing the no-lag visual voxels Set2v from the no-lag pain voxels Set2, respectively. The objective of the experiments with the “painonly” voxels, inspired by the similar work of [2] (that was performed in GLM rather than predictive setting, and without considering temporal information) was to test the hypothesis that exclusion of the voxels common to both pain- and visual stimulus rating (and thus possibly just relevant to magnitude rating) leaves us with a set of only painrelevant voxels that contains enough information for a good predictive modeling of pain. We experimented with EN, varying the sparsity (number of voxels selected into EN solution) and grouping (weight on the l2 -norm) parameters, and compared the results with the OLS as a baseline. We used the first 120 TRs for training the model, and the remaining 120 TRs for testing its predictive accuracy, measured by the Pearson’s ˆ. correlation coefficient ρ(ˆ y, y) between the response variable y and its prediction y The resulting sparse EN solutions were visualized as brain maps.
3 Results EN parameter selection We explored a range of grouping parameters for EN, from λ2 = 0.1 to 20, and observed, similarly to [18,6], that higher values of grouping parameter yielded similar
Sparse Regression Models of Pain Perception
217
Pain Prediction 0.8
predictive accuracy (corr w/ response)
0.75
0.7
0.65
OLS lambda2=0.1 lambda2=1 lambda2=5 lambda2=10
0.6
0.55
0
500
1000
1500
number of voxels (sparsity)
Fig. 1. Effects of sparsity and grouping parameters on the performance of EN
(and often better) predictive accuracy while allowing EN to select larger and more spatially coherent clusters of correlated voxels, and thus improving the interpretability of the corresponding brain maps. A typical behavior of EN as a function of its sparsity and grouping parameters is shown in Fig. 1, for one of the subjects: as λ2 increases, the peak performance is achieved for larger number of selected voxels. As we can also see, higher λ2 achieve better peak performance than the lower ones, which seems to be a common trend in other subjects as well, although there are a few exceptions. We also noticed that increasing λ2 beyond 20 did not produce any further significant improvement in performance, and thus we decided to fix the grouping parameter to λ2 = 20 in our experiments. In the following, we present the results for EN with the fixed sparsity parameter of 1000 voxels, since EN’s predictive accuracy often reached a plateau around this number. Later in this section we will also present the full set of experiments with varying sparsity, for all subjects, and for all subsets of predictors discussed above. Elastic Net versus OLS First, we observe a significant improvement in predictive accuracy when comparing EN versus OLS on the same subsets of voxels. Fig. 2 shows the results for OLS versus EN with a fixed sparsity parameter λ1 = 1000 voxels, and the grouping parameter λ2 = 20. Specifically, Fig. 2a shows the results for both methods on the same Set3 of temporal (time-lagged) voxels. For all 14 subjects, EN was always making a prediction that was better than OLS prediction, measured by correlation between the predicted and actual pain rating of a particular subject, and the improvement was often quite significant (e.g., from 0.2 to about 0.65, or from 0.3-0.4 to 0.55). (Note that the straight line in the Fig. 2a corresponds to equally predictive values, and EN is always above it). Similar results were observed for visual stimulus prediction, where EN was compared to OLS on Set3v of the corresponding temporal (time-lagged) voxels for visual stimulus (Fig. 2c). On 12 out of 14 subjects, EN made more accurate predictions than OLS, often improving the correlation of the prediction with the response from about 0.3-0.5 to about 0.6-0.7. Finally, EN was also clearly outperforming OLS on the set of time-lagged “pain-only” voxels (Set4), as shown in Fig. 3d.
218
I. Rish et al. Pain: EN w/lag vs EN no lag
0.8
0.8
0.7
0.7
0.6
0.6
EN (lag)
EN (lag)
Pain: EN vs OLS (both w/ lag)
0.5
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.2 0.2
0.9
0.3
0.4
OLS (lag)
Pain: (a) Set3, EN vs OLS 0.9
0.8
0.8
0.7
0.7
0.6
0.6
EN (lag)
EN (lag)
0.9
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.2
0.4
0.8
0.9
1
1.2
0.5
0.4
0
0.7
Visual: EN lag vs EN no lag 1
−0.2
0.6
(b) EN on Set3 vs Set1
Visual: EN vs OLS (both w/ lag) 1
0 −0.4
0.5
EN (no lag, all vox)
0.6
OLS (lag)
Visual: (c) Set3V , EN vs OLS
0.8
1
0 −0.2
0
0.2
0.4
0.6
0.8
EN (no lag, all vox)
(d) EN on Set3V vs Set1
Fig. 2. Prediction results for Elastic Net (fixed sparsity λ1 = 1000 voxels, grouping λ2 = 20) versus OLS, for predicting pain perception and visual stimulus. First column: Elastic Net outperforms OLS for (a) pain perception prediction on the voxel Set3 (time-lagged pain voxels) and (c) visual stimulus prediction on the Set3V (time-lagged visual voxels). Second column: effects of temporal information - EN w/ time-lag outperforms EN w/ no lag for (b) pain prediction on the time-lagged Set3 voxels vs Set1 voxels (no-lag, full-brain) and (d) visual prediction on the time-lagged visual voxels Set3V vs Set1 voxels (no-lag, full-brain).
Temporal information: EN with time-lag outperforms EN without time-lag Next, we compare the prediction results for EN on the time-lagged voxels versus EN on the current-TR (no-lag) voxels. Fig. 2b shows the results for pain perception prediction when using EN on time-lagged voxels (Set3) versus the current-TR, full-brain voxel set (Set1). (Note that the results for EN on Set1 and Set2 were almost identical, perhaps because EN would not select voxels below the correlation threshold = 0.2 with the response variable, anyway; similarly, for visual prediction, we did not see much difference between Set1 and Set2V - for more detail, see Fig. 5a,b.) We can see that using temporal information in the time-lagged voxels very often improves the predictive performance, sometimes quite dramatically, e.g., from about 0.47 to about 0.6. Again, similar results are observed for visual stimulus prediction (Fig. 2d, using time-lagged
Sparse Regression Models of Pain Perception Lagged "pain−only" vs no−lag, all voxels"
0.8
0.8
0.7
0.7
EN (lag, setdiff)
EN (lag, setdiff)
Lagged "Pain−only" voxels vs. just lag
0.6
0.5
0.6
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.2 0.2
0.9
0.3
0.4
EN (lag)
0.7
0.7
EN (lag, setdiff)
EN (lag, setdiff)
0.8
0.6
0.5
0.9
0.5
0.4
0.3
0.3
0.6
EN (no lag, setdiff, top vox)
(c)
0.8
0.6
0.4
0.5
0.7
"Pain−only" (set−difference) voxels: EN vs OLS (both w/ lag)
0.8
0.4
0.6
(b)
Lagged "pain−only" vs no−lag "pain−only" voxels"
0.3
0.5
EN (no lag, all vox)
(a)
0.2 0.2
219
0.7
0.8
0.9
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
OLS (lag, setdiff)
(d)
Fig. 3. Information preserved in “pain-only” voxels: EN pain perception prediction on (a) timelagged, “pain-only” voxels (Set4) versus all time-lagged voxels (Set3), (b) time-lagged, “painonly” voxels (Set4) versus full-brain, but no lag voxels (Set1), (c) time-lagged, “pain-only” voxels (Set4) versus the “pain-only” voxels selected without the lag (Set5) and (d) EN versus OLS for pain perception prediction using “pain-only” time-lagged voxels Set4.
Set3V vs no-lag Set1), and for pain perception prediction when using pain-only voxels (Set4 vs Set5) (Fig. 3c), although the advantages of using time-lagged versus no-lag voxels were most clear for pain perception, where time lag practically always improved the performance, unlike the cases of visual and pain-only voxels. This suggest that brain states immediately preceding the pain rating contain a significant amount of information related to the pain perception, and thus should be taken into account in any pain perception model (note that current GLM approaches do not incorporate time lag). Predictive information preserved in pain-only voxels Finally, we investigate how much information about the pain perception is preserved in the pain-only time-lagged voxels (Set4), i.e. the voxels remaining after eliminating from the ”pain” time-lagged voxels (Set3) all the voxels that also appear in the “visual” time-lagged voxels (Set3V ). The hypothesis suggested by [2], as we mentioned before, is that the pain-rating task also activates brain areas generally related to rating the
220
I. Rish et al.
magnitude of a stimulus (e.g., visual stimulus) besides activating pain-specific areas; removing voxels common to pain and visual ratings would allow for a better localization of the pain-specific areas. While [2] investigate such areas in GLM rather than predictive setting, and do not exploit temporal information, we explore the “pain-only” voxel sets chosen by sparse predictive model such as EN, both with and without the time-lag. We observe that (a) as expected, just reducing the set of all time-lagged voxels (Set3) to its pain-only subset may somewhat lower the predictive accuracy (Fig. 3a); (b) however, time-lagged pain-only voxels are still preserving enough information to frequently (6 out of 14 subjects) outperform even the full-brain set of voxels that ignores such temporal information (Set1), as shown in (Fig. 3b); (c) moreover, time-lagged pain-only voxels outperform, or are comparable with, the no-lag pain-only voxels (Set5) even more frequently, on 9 out of 14 subjects Fig. 3c, and (d) finally, EN on pain-only, timelagged voxels Set3 clearly outperforms OLS on the same set of voxels, just as observed earlier for other voxels subsets (Fig. 3d). Varying EN sparsity level While the above results were obtained for EN with a fixed sparsity (1000 voxels), Fig. 5 shows a more comprehensive set of results, where the sparsity, i.e. the number of voxels selected by EN, was varied from 30 to 2000. The first row shows the results for pain perception prediction (using the time-lagged Set3 and no-lag Set1 and Set2), the second row shows the results for visual stimulus prediction (using the corresponding time-lagged Set3V and no-lag Set1 and Set2V ), and the third row shows the results for pain perception prediction when using pain-only, time-lagged voxels (Set4) and no-lag Set5, also compared with the no-lag, full-brain set of voxels Set1. Each subplot shows the results for one subjects, and in each row, subjects are sorted in by the predictive accuracy of OLS on the corresponding set of voxels. We can see that the accuracy of EN, especially on the Set1 and Set2 typically increases with increasing number of voxels selected, and stabilizes around 1000 voxels (this is why we selected this sparsity level for the comparison presented above). Interestingly, however, the time-lagged
Fig. 4. Brain maps visualizing sparse EN solutions over no-lag (Set1, red and blue) versus timelag (Set3, green and pink) subsets of voxels
Sparse Regression Models of Pain Perception
221
voxels in Set3 and Set4 sometimes reach the best performance for a much lower number of voxels (from 30 to 500), and then the performance may actually decline. This suggest that a relatively small number of time-lagged voxels may contain a better predictive information than similar or larger number of non-temporal voxels. Clearly, using cross-validation to select best sparsity parameter value for each voxel subset, rather than fixing the sparsity level to 1000 voxels as we did in Fig. 2 and Fig. 3, would show an even more dramatic performance improvement in predictive performance due to inclusion of time-lagged voxels. (We hope to include these cross-validation results into the final version in case the paper is accepted for publication.) Predicting pain perception OLS and EN on time-lagged voxels (Set3), and EN on the voxels without the lag (Set1 and Set2 ) 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1 0
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1
0 1000 2000
0
0.1 0
0 1000 2000
0 1000 2000
0.2 OLS (Set3 − time lag) EN (Set3 − time lag) 0.1 0.1 0.1 0.1 0.1 0.1 EN (Set1 − no 0.1lag/all vox) EN (Set2 − no lag/top vox) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Predicting visual stimulus OLS and EN on time-lagged voxels (Set3V ), and EN on the voxels without the lag (Set1 and Set2V ) 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1 0
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1
0 1000 2000
0
0.1 0
0 1000 2000
0 1000 2000
0.2 OLS (Set3V − time lag) EN (Set3V − time lag) 0.1 0.1 0.1 0.1 0.1 0.1 EN (Set1 − no0.1 lag/all vox) EN (Set2V − no lag/top vox) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Predicting pain perception w/ pain-only voxels OLS and EN on time-lagged, pain-only voxels (Set4), EN on no-lag, pain-only voxels(Set5), and EN on Set1 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
0
0
0
0
0
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0
0 1000 2000
0.2 0.2 OLS (Set4 − time lag/pain−only) EN (Set4 − time lag/pain−only) 0.1 0.1 0.1 0.1 0.1 0.1 − no lag/all vox) 0.1 EN (Set1 EN (Set5 − no lag/top vox/pain−only) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Fig. 5. Prediction results for Elastic Net with varying sparsity from 30 to 2000 predictors/voxels, and same grouping parameter λ2 = 20 as in Figures 2 and 3
222
I. Rish et al.
Brain Maps: Visualizing Sparse Solutions Finally, Fig. 4 displays the brain maps corresponding to the sparse models found by EN when applied to the time-lagged (Set3) vs no-lag, full-brain (Set1) sets of predictors. For each of the 14 subjects, we produce two EN solution maps (for lag vs no-lag voxels), where the value at each voxel corresponds to its coefficient in the regression model. Next, we aggregate the maps of each type (lag vs no-lag) over the 14 subjects by selecting only statistically significant voxels, using binomial test of statistical significance. We show the superposition of the two resulting spatial maps corresponding to EN solutions on no-lag Set1 (red and blue for positive and negative values, respectively), and on time-lagged Set3 (green and pink). Given that the lag model also includes zero lags, it is expected that both models overlap. Indeed, this is what we observe in most of the areas identified by the full model as relevant for prediction; we highlight three of them with dashed circles. Less obviously, although intuitively expected, we also observe that the lag model selects a significant number of voxels that are disregarded by the no-lag model; some of them are indicated by the arrows, and include both positive (green) and negative (pink) values. Given that our time-lagged models are highly-predictive, these areas must contain functionally relevant information about pain perception that is ignored by the non-temporal models.
4 Conclusions Based on our results, we conclude that: (a) the use of sparse predictive modeling for pain perception analysis reveals that functional MRI signals possess considerably higher information than expected by a simple linear regression approach; (b) functional dynamics can also considerably increase the amount of information about the subject’s performance, as opposed to single-TR, or “instantaneous” approaches, and (c) the sparse, temporal models reveal functional areas that are not identified by non-temporal approaches, and yet can be highly predictive and thus functionally relevant.
Acknowledgements Marwan N. Baliki was supported by an anonymous donor; A. Vania Apkarian and experimental work were supported by NIH/NINDS grant NS35115.
References 1. Apkarian, A.V., Bushnell, M.C., Treede, R.D., Zubieta, J.K.: Human brain mechanisms of pain perception and regulation in health and disease. Eur. J. Pain (9), 463–484 (2005) 2. Baliki, M.N., Geha, P.Y., Apkarian, A.V.: Parsing pain perception between nociceptive representation and magnitude estimation. Journal of Neurophysiology (101), 875–887 (2009) 3. Baliki, M.N., Geha, P.Y., Apkarian, A.V., Chialvo, D.R.: Beyond feeling: chronic pain hurts the brain, disrupting the default-mode network dynamics. J. Neurosci. (28), 1398–1403 (2008) 4. Battle, A., Chechik, G., Koller, D.: Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks. In: Sch¨olkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 121–128. MIT Press, Cambridge (2007)
Sparse Regression Models of Pain Perception
223
5. Buchel, C., Bornhovd, K., Quante, M., Glauche, V., Bromm, B., Weiller, C.: Dissociable neural responses related to pain intensity, stimulus intensity, and stimulus awareness within the anterior cingulate cortex: a parametric single-trial laser functional magnetic resonance imaging study. J. Neurosci. (22), 970–976 (2002) 6. Carroll, M.K., Cecchi, G.A., Rish, I., Garg, R., Rao, A.R.: Prediction and Interpretation of Distributed Neural Activity with Sparse Models. Neuroimage 44(1), 112–122 (2009) 7. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Statist. 32(1), 407–499 (2004) 8. Friston, K.J., et al.: Statistical parametric maps in functional imaging - a general linear approach. Human Brain Mapping 2, 189–210 (1995) 9. Pereira, F., Gordon, G.: The Support Vector Decomposition Machine. In: ICML 2006, pp. 689–696 (2006) 10. Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–148 (1993) 11. Fu, W.: Penalized regression: the bridge versus the lasso. J. Comput. Graph. Statist. 7(2), 397–416 (1998) 12. Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex. Science 293(5539), 2425–2430 (2001) 13. Hoerl, A., Kennard, R.: Ridge regression. Encyclopedia of Statistical Sciences 8(2), 129–136 (1988) 14. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S.: Learning to Decode Cognitive States from Brain Images. Machine Learning 57, 145–175 (2004) 15. Price, D.D.: Psychological and neural mechanisms of the affective dimension of pain. Science (288), 1769–1772 (2000) 16. Sj¨ostrand, K.: Matlab implementation of LASSO, LARS, the elastic net and SPCA, Version 2.0. (June 2005) 17. Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58(1), 267–288 (1996) 18. Zou, H., Hastie, T.: Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B 67(2), 301–320 (2005)
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink Chen Xie1,2,3 , Lun Zhao4,5, Duoqian Miao1,2,3, , Deng Wang1,2,3 , Zhihua Wei1,2,3 , and Hongyun Zhang1,2,3 1
Department of Computer Science and Technology, Tongji University, Shanghai 201804, P.R.China 2 Key Laboratory of Embedded System & Service Computing, Ministry of Education of China, Tongji University, Shanghai 201804, P.R.China 3 Tongji Branch, National Engineering & Technology Center of High Performance Computer, Shanghai 201804, P.R.China Tel.: +86-21-69589375
[email protected] 4 Visual Art & Brain Cognition Lab., Beijing Shengkun Yanlun Technology Co. Ltd., Beijing 100192, P.R.China 5 Institute of Public Opinion, Renmin University of China, Beijing, P.R.China
Abstract. In this study, we investigated the existence of the temporal component of Mozart effect, analyzed the influence of arousal or mood changing to attentional blink task performance when listening to Mozart Sonata. The results of the experiment showed the performance of subjects in attentional blink task did not significantly improve when they listened to Mozart Sonata played in either normal or fast speed. It is indicated that the temporal component of Mozart effect is not general exist. We propose that Mozart Sonata might possibly induce listener’s arousal or mood shifting, but could not give significantly influence to temporal attention.
1
Introduction
A set of research results indicate that listening to Mozart’s music may induce a short-term improvement on the performance of certain kinds of mental tasks. Mozart effect is reported firstly by Rauscher, Shaw, and Ky (1993)[11] who investigated the effect of listening to music by Mozart on spatial reasoning. In their study, the subjects got 8 to 9 points improvement in spatial-temporal tasks after they listened 10 min Mozart’s Sonata for Two Pianos in D Major, K.448). However, among the large number of attempts trying to replicate the findings, some have, indeed, reproduced the findings, while others failed to show a significant effect of listening to Mozart’s music. Nonetheless, despite critical discussions, the more widely accepted account to explain those failures of replication is that Mozart’s music may induce the change of listener’s arousal or mood rather than their spatial-reasoning ability, and that change may influence
Corresponding author.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 224–231, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
225
the spatial reasoning processing. It is well know that arousal and mood influence cognition. According to the arousal-mood hypothesis, listening to music affects arousal and mood, which then influence performance on various cognitive skills[3][5][10][12][13][14]. Several studies supported the similar results that participants who listened quick major music had better performance in tests than those who listened slow minor music. For example, in one study, researchers examined the effects of Musical Tempo and Mode on arousal, mood, and spatial abilities. They detailed their experiments that participants were asked to do the paper-folding-and-cutting (PF&C) task while listening to one of four versions of the Mozart sonata (K.448) by adjusting specific properties of the music: tempo (fast or slow) and mode (major and minor). According to their results, exposure to fast-major K.448 helped participants improving performance significantly [2]. Further more, another report claimed their finding of the existence of a temporal component of the ’Mozart effect’ in non-spatial visual Attentional Blink task experiments[1]. They compared participants’ temporal attention in attentional blink task under three conditions (Mozart sonata played normally, in reverse and in silence). They put forward the result that ’Mozart effect’ influenced temporal attention. It is discussed that the temporal influence may depend on the changing of the arousal or mood induced by Mozart’s music. It is an exciting finding if the temporal component does exist in Mozart effect. To assess the validity and determine the explanation of Mozart effect’s temporal influence, more evidence and analysis are needed. Attentional blink will be introduced later in detail at ’Prior knowledge’. The purpose of present study was to validate whether Mozart effect can influence temporal attention in a general way, in another word, if Mozart effect temporal influence is a robust phenomenon. Following this, we also further investigate the reliable explanation of the Mozart effect temporal influence if it exactly exists. Toward this end, we also using attentional blink experiment, as attentional blink can be viewed as a method to access the limits of humans’ ability to consciously perceive stimuli distributed across time. We manipulated audio background conditions in the experiment as: in silence (baseline), Mozart Sonata (K.448 D Major) played normally and Mozart Sonata (K.448 D Major) played in fast speed. We predicted that, if Mozart effect temporal influence exists, participants should do better in attentional blink task when they listen to Mozart Sonata played in normal than in silence. As enjoyment ratings were much higher when listening to faster major music, if the Mozart effect temporal influence depends on the arousal or enjoyment induces, participants should do the best in the attentional blink task under the Mozart Sonata (K.448 D Major) fast condition among those three audio background conditions. We will briefly address those three experiment conditions as: silence, Mozart normal and Mozart fast condition in the following.
2
Prior Knowledge
Visual attention plays a vital role in visual cognition. The mechanism of visual attention has been studied over 50 years as one of the major goals of both
226
C. Xie et al.
cognitive science and neuroscience[6]. In the last 15 years, the intense interest among researchers has shifted from the mechanisms and processes involved in deploying across space dimension to time dimension[9]. Attentional blink is a robust phenomenon which reflects human attention constraint. In a typical attentional blink experiment, participants are required to observe a rapid stream of visually presented items (RSVP). There are two targets (T1 and T2) embedded in the stream of nontargets (i.e., distracters). Participants are instructed to report the two targets after the stimulus stream ended. The attentional blink is defined as having occurred when T1 is reported correctly but report of T2 is inaccurate at short T1-T2 lags, typically between 100 to 500 ms, but recovers to the baseline level of accuracy at longer intervals. Fig. 1 shows standard attentional blink task and results.
Fig. 1. Standard attentional blink task and results
Theoretical accounts of the Attentional blink indicate that attentional demands of T1 for selection, prevents attentional resource from being applied to T2 and transiently impairs the redeployment of these resources to subsequent targets at short T1-T2 lags. The research of attentional blink helps us to investigate human reaction in some real-life situations in which multiple events may rapidly succeed each other (e.g., in traffic).
3 3.1
Methods Subjects
Twenty six participants between 21 and 27 years old (Mean = 23.9) were recruited from the local university of applied sciences, twelve were female, and all right-handed. Subjects were paid for participation and oral consent was obtained prior to start of the experiment. All participants had normal or corrected-tonormal visual acuity and normal hearing by self-report. The experiments lasted for approximately 40 min. All participants had no specific music or instruments learning experience.
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
3.2
227
Apparatus and Materials
The software program E-Prime (Psychology Software Tools, Inc., Pittsburgh, PA ) installed on a desktop computer with CRT monitor ( screen refresh rate of 85Hz) was used to display the visual stimuli and record the data. The distance between participant and monitor screen was approximately 65 cm. Participant sat directly in front of the monitor in a quiet experimental room and had a comfortable sight view of the screen. Visual stimuli consisted of letters from the alphabet (omitting letters I, O, Q, and S) and digits 2 to 9, was displayed in black in the center of a gray background with Courier New font, size 22. Auditory stimulus was Mozart Sonata for Two Pianos in D Major, K.448 played in normal speed (tempo of 120 bpm) or fast speed (tempo of 65 bpm), and was played over headphone. In silence condition, no music was played over the headphone. 3.3
Design and Procedure
The present study employed a dual target task in four blocks. The first block was practice block with 10 trials under silence condition and not included in statistic, the left three blocks were statistic blocks with 100 trials each and under silence, Mozart normal, Mozart fast condition respectively. Each trial began with the presentation of a fixation cross ’+’ for 1000ms followed by 13 - 21 distracter letters (presented randomly without replacement from 22 letters except letter ’X’ ), one of which was replace by a digit (first target T1, presented randomly without replacement from 8 digits). The letters and digit were presented for 65ms each, followed by a 15 ms blank interval. The second target T2 in each trial was letter ’X’, presented on 80% of the trials with 3-6 positions randomly from the end of the stimulus steam. The first target digit (T1) was presented
Fig. 2. Sequence of screen presentation of a typical trial
228
C. Xie et al.
randomly 1, 3, 5, 8 stream positions (80ms, 240ms, 400ms, 640ms) before T2. After the presentation of RSVP in each trial, two questions about T1 and T2 (’Was the digit even number or odd number?’, ’Was there a letter ’X’ in the stream?’ ) was presented orderly. The participants were instructed to answer these two questions by pressing the specific letter key on the keyboard of the computer at the end of each trial. The second question was presented 250ms after the response to the first question. The next trial began 500ms after the participants had responded to the second question.(see Fig. 2) Participants were asked to concentrate their mind to the RSVP on the screen and answer the two questions as accurately as possible. All responds of the participants were recorded. The experiment was within-participants manipulation with balanced block design of conditions (i.e. Mozart Normal-Silence-Mozart Fast, Silence-Mozart Fast-Mozart Normal, Mozart Fast-Mozart Normal-Silence).
4
Results
The data of all twenty six participants were taken into statistics. We concerned the second target T2 report accuracy at the trials that the first target was reported correctly. Fig. 3 shows Mean T2 detection accuracy while T1 detect correctly as a function of Condition and Lags. Lag 0 represents the trails which contained no letter ’x’, Lag1, 3, 5, 8 represent T1-T2 lags. As we can see in Fig. 3, at Lag0, 5, 8 the accuracy of T2 is almost the same, at lag1 and lag3, the accuracy of T2 is slightly better under the Mozart Normal condition than under the silence condition. Nevertheless, under Mozart fast condition, the accuracy
Fig. 3. Mean T2 detection accuracy while T1 detect correctly as a function of Condition and Lags
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
229
of T2 is worse than under the Mozart Normal condition and almost the same as under the silence condition. This is not consistent with previous hypothesis. This result indicates that even if any Mozart effect influence exists, it is not induced by the change of arousal and enjoyment. To make clear if changes of the accuracy of T2 among three experiment conditions were significant, we performed the two-way analysis of variance (ANOVAs) on the accuracy data for T2 with the within-participants factors of Condition(Mozart normal, Mozart fast, or silence) and T1-T2 Lags(0,1,3,5,8). According to the results of statistics in SPSS, there were no main effect of condition, F(2, 50) = 1.045, p > .05, nor any interaction between condition and lag, F(8, 200) = 0.731, p > .05. The main effect of lag was significant, F(4, 100) = 45.089, p < .001. Subsequent pair-wise comparisons revealed significant differences among all 4 lags (omitting lag0, Lag0 was not related to attentional blink phenomena.), p < .05, except the differences between lag1 and lag3 (p > .05).
5
Discussion
In the present study, we conducted the attentional blink experiments under three conditions (Mozart normal, Mozart fast and silence). The results of the experiments revealed that though it seemed there was a slight trend of accuracy improvement on detecting the second target T2 at lag 1 and lag 3 under Mozart normal condition than silence condition, but there was no significant difference between these two conditions. In another words, we didn’t observe Mozart effect temporal component in the present study. In a different report[1] which claimed the significant temporal influence of Mozart Sonata, the ANOVAs analysis result of the difference between Mozart Sonata and silence was little smaller than significance level, and was little bigger than significance level while excluding the non-blink participants. One explanation for this inconsistence could be that the temporal influence over attention of Mozart effect may not exist or not strong in general. Even if this influence does exist, the factor induced it could not be the change of arousal caused by Mozart effect, since in present study, the detection accuracy on T2 under the Mozart fast condition is almost the same as under the silence condition, and worse than under the Mozart normal condition which expected to be better according to the arousal theory. In contrast to findings of Olivers and Nieuwenhuis’s (2005)[7], they reported the improvement of the T2 accuracy under the music condition to silence condition also. However, the music they used in the experiment was a tune with continuous beats which had not the same musical meaning as music works like Mozart Sonata. It can be explained that rhythm beat could induce arousal change more easily, and could attractive human attention so that hearing rhythm beat became into an irrelative task to the participant. That irrelative task caused the redeployment of the attention resource of the participant applied to the first target T1, and eventually improved the detection of the second target T2. It is not only validated in laboratory but also experienced in real life that Music including Mozart Sonata does change listener’s arousal or mood[4] .It
230
C. Xie et al.
might bring a change of the detection accuracy on T2 if the participants’ arousal or mood was shifted, which was supported by the resource theory of attentional blink. Why didn’t it appear in present study? It can be explained that the arousal change caused by Mozart effect is not strong enough to influence attention, and the mood change given by music often does not occur immediately[8]. It needs to do further investigation to examine whether Mozart Sonata has post effect on attentional blink. Another possible explanation of the result in present study is culture gap. All the participants were Chinese with no special musical education. They self reported that they seldom listened to classical music. All of them never heard of even Mozart Sonata’s name. Their cognitive activity of listening Mozart Sonata might be different from those who were familiar with classical music or grown up in western culture environment.
6
Conclusion
The present study revealed that the temporal attention influence of Mozart effect is not general exist. Though, Mozart Sonata changed listeners’ arousal or mood in many researches, it failed to induce any temporal influence in present experiment. Acknowledgments. This research was supported by the National Natural Science Foundation of China (No. 60775036 , No. 60970061) and the Research Fund for the Doctoral Program of Higher Education (No. 20060247039).
References 1. Cristy, H., Oliver, M., Charles, S.: An investigation into the temporal dimension of the Mozart effect: Evidence from the attentional blink task. Acta psychological 125, 117–128 (2007) 2. Gabriala, H., William, F.T., Glenn S, E.: Effects of musical tempo and mode on arousal, mood, and spatial abilities. Music Perception 20(2), 151–171 (2002) 3. Gabrielsson, A.: Emotions in strong experiences with music: Music and emotion: Theory and research. In: Juslin, P.N., Sloboda, J.A. (eds.), pp. 431–449. Oxford University Press, New York (2001) 4. Gabrielsson, A., Lindstr¨ om, E.: The influence of musical structure on emotional expression: Music and emotion: Theory and research. In: Juslin, P.N., Sloboda, J.A. (eds.), pp. 223–248. Oxford University Press, New York (2001) 5. Krumhansl, C.L.: An exploratory study of musical emotions and psychophysiology. Canadian Journal of Experimental Psychology 51, 336–352 (1997) 6. Miller, G.A.: The cognitive revolution: A historical perspective. Trends in Cognitive Sciences 7, 141–144 (2003) 7. Olivers, C.N.L., Nieuwenhuis, S.: The beneficial effect of concurrent task-irrelevant mental activity on temporal attention. Psychological Science 16, 265–269 (2005) 8. Panksepp, J., Bernatzky, G.: Emotional sounds and the brain: the neuro-affective foundations of musical appreciation. Behavioural Processes 60, 133–155 (2002)
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
231
9. Paul, E.D.: The attentional blink: A review of data and theory. Attention, Perception, & Psychophysics 71(8), 1683–1700 (2009) 10. Peretz, I.: Listen to the brain: A biological perspective on musical emotions. In: Juslin, P.N., Sloboda, J.A. (eds.) Music and emotion: Theory and research, pp. 105–134. Oxford University Press, Oxford (2001a) 11. Rauscher, F., Shaw, G., Ky, K.: Music and spatial task performance. Nature, 365– 611 (1993) 12. Schmidt, L.A., Trainor, L.J.: Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition and Emotion 15, 487–500 (2001) 13. Sloboda, J.A., Juslin, P.N.: Psychological perspectives on music and emotion. In: Juslin, P.N., Sloboda, J.A. (eds.) Music and emotion: Theory and research, pp. 71–104. Oxford University Press, New York (2001) 14. Thayer, J.F., Levenson, R.W.: Effects of music on psychophysiological responses to a stressful film. Psychomusicology 3, 44–54 (1983)
Attentional Disengage from Test-Related Pictures in Test-Anxious Students: Evidence from Event-Related Potentials Rui Chen1,2 and Renlai Zhou1,2,3,4 1
Key Laboratory of Child Development and Learning Science (Southeast University), Nanjing, 210096 2 Research Center of Learning Science, Southeast University, Nanjing, 210096 3 State Key Laboratory of Cognitive Neuroscience and Learning(Beijing Normal University), Beijing, 100875 4 Beijing Key Lab of Applied Experimental Psychology (Beijing Normal University), Beijing, 100875
Abstract. The present study aims to investigate neural correlations on attentional disengagement in test-anxious students. Event-related potentials were recorded from 28 undergraduates, grouped according to their scores in Sarason test scale (TAS). All students performed a same central cue task. Results of response times (RTs) show slowing effect of test-related stimuli appeared on high test-anxious students only. ERPs results show the targets following test-related cues captured more attentional processing in the early period and attentional resource allocation (enhanced N100 and P300 amplitude) both in high and low test-anxious students. These findings indicate the behavioral performance is consistent with cognitive processing in high test-anxious students only. This means the test-related cue captured more attentional resource of high testanxious students, and bring them difficulty in shifting attention from target following test-related cue. For the low test-anxious students, however, there is no slowing effect on test-related trials. Keywords: test-anxiety, undergraduates, attentional disengagement, ERPs.
1 Introduction Test anxiety is a situation-specific trait anxiety [1]. Almost all of students suffer from test anxiety when faced with an examination. According to a questionnaire survey, Wang (2001) found that the rate of high test anxiety in Chinese undergraduates was 21.8% [2]. Generally, test anxiety was described as an emotional state in which a person experiences distress before, during, or after an examination or other assessment to such an extent that this anxiety causes poor performance or interferes with normal learning. Numbers of previous studies testified that the negative mood of high anxious individuals could be elicited and maintained by their attentional bias to threat stimuli. It finally led to response-delay effect on threat stimuli, which means high anxious individuals had more difficulties in attentional disengage from threat stimuli Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 232–239, 2010. © Springer-Verlag Berlin Heidelberg 2010
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
233
than low anxious individuals [3-5]. Moreover, researches with the cue-target paradigm indicated that attentional bias to threatening locations might often arise from slow reactions to neutral locations due to delays in disengaging from the threatening locations, and there was no evidence for a facilitated detection of threatening information [6,7]. For high test-anxious students, the index which reflected attentional disengage from test-related threatening words was higher than low test-anxious ones in cue-target paradigm [8]. However, visuospatial factors might influence on allocation of attention, which means the difficulty of attentional disengage from threat stimuli is elicited by its location but its valence or any other features. In this current experiment, we used central-cue paradigm to investigate a pure attentional disengagement without visuospatial factors. In this paradigm, all stimuli were presented in the centre of screen that ensures no shifts of visuospatial attention are required [5]. In order to investigate neural mechanism of attentional processing, event-related potentials (ERPs) were used in this experiment. A couple of previous researches suggested that N100 and P300 components which related to attentional modulation were more enhanced following threat stimuli cues compared with non-threat stimuli cues [9,10]. These two components were mostly detected at prefrontal cortex, frontal cortex and parietal cortex [11,12]. The enhancement of amplitudes in both components is interpreted as signs of facilitated attentional processing and more attentional resource is occupied. In details, the enhancement of N100 component is a sign of active attentional orienting to task-relevant location, and P300 index a complex late positive component (LPC) which most relate to allocation of attentional resource [13,14]. To summarize the previous studies, we noticed that attentional resource of high anxious individuals can be captured by threat stimuli more facilely. Meanwhile, both N100 and P300 of ERPs are important components to attentional processing, which reflect the allocation of attentional resource during a cognitive task. Thus, as a subset of anxiety, is a test-related stimulus often considered being a threat stimulus by testanxious individuals? Are there any differences of behavior response and ERPs between high and low test-anxious students? According to this hypothesis, the present study aims to investigate the differences of cognitive and neural mechanism between high and low test-anxious students in central-cue paradigm. The results from EEG data combined with response time (RT) should reflect that early attentional selection and response preparation would be often influenced by the different types of cueing stimulus (test-related vs. test-unrelated) within the central-cue paradigm.
2 Methods 2.1 Participants Twenty-eight right-handed students from Southeast University in China voluteered for this experiment. Thirteen of them were high-test anxious students (TAS≥20) and the rests are low ones (TAS≤12), according to their scores of TAS (Sarason, 1978). These 28 subjects, 12 females and 16 males, were aged between 19 and 27 years (mean=22.25, S.D. =1.79). All participants were normal sight or corrected to normal.
234
R. Chen and R. Zhou
2.2 Stimuli Cue stimuli in this experiment consisted of two different types of pictures (test-related and test-unrelated). In order to avoid the affection of color to participants’ emotion, 30 pictures in each type are set to grey mode by Adobe Photoshop software. Moreover, target stimuli also consisted of two types: one is an arrow point to left, the other point to right. Both cue and target stimuli were displayed in the center of a light grey box on screen, which is sized 9 cm × 9 cm. The cue stimuli were assessed on the basis of three-factor theory of emotions [15,16]. For test-related pictures, average index of pleasure-displeasure is 4.96, and for test-unrelated pictures, the index is 5.02. It ensured that all pictures were neutral without interference of positive or negative emotion. The other two dimension of emotion were not involved in this experiment. 2.3 EEG Recording Electroencephalogram was continuously recorded (band pass 0.05-100Hz, sample rate 1000Hz) with Neuroscan Synamp2 Amplifier (Scan 4.3.1, Neurosoft Labs, Inc.), using an electrode cap with 64 Ag/AgCI electrodes mounted according to the extended international 10-20 system and hemi-reference to linked left mastoid. Vertical and horizontal elecoculograms are recorded with two pairs of electrodes, one place above and below the right eye, and another 10 mm from the lateral canthi. Electrode impedance was maintained below 5 KOhm throughout the experiment. All 64 sites were chosen for statistical analysis. The early attentional orienting component and late positive component were measured separately in the 110-170 ms and 290-340 ms time windows, respectively Repeated-measures analysis of variance (ANOVA) was conducted on each ERP component with three factors: test-anxious state (high/low), picture type (test-related/-unrelated) and electrode site. 2.4 Procedure Participants were placed in a ERPs isolated room, and sat in a comfortable chair, 60 cm from the screen. As already indicated, they were told to look continuously at a light grey box located in the center of the black background screen. All stimuli were presented in the central box using E-Prime version 1.1 software. This experiment consisted of 12 practice trials, and 270 experimental trials split equably into 3 blocks. In order to avoid the anticipated response to the target, 10% of all experiment trials were filled as blank targets. Each trial began with a central light grey box for 1000 ms, which remained on the screen throughout the experiment. A cue stimulus then appeared within the grey box for 200 ms, and after a blank mask for 200 to 600 ms, a target (left or right arrow) was presented until response, otherwise 3000 ms has elapsed without response. Participants were instructed to press one of two horizontally-positioned buttons on a keyboard, using the index finger of each hand, for response as quickly and accurately as possible to the type of target (i.e., they are instructed to press the left key if the arrow point left). There was a variable inter-trial interval (ITI) ranging from 500 to 1050ms. An equal number of trials with each type of cue valence (test-related and test-unrelated) and target direction
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
235
(left and right) were required in this experiment. Trials were presented in a new random order for each participant. 2.5 Preparation of EEG and RTs Data Both RTs and EEGs from incorrect trials were eliminated. Following inspection of the RT data with SPSS, RTs which beyond 3 SDs of each participant’s mean were excluded in order to reduce the influence of outliers.
3 Results 3.1 Task Performance To estimate response delay (slowing) effects on the central cue task, response times difference score, which was also called attentional disengagement index, was calculated for each participant by subtracting the mean RT on test-unrelated cue trials from the mean RT on test-related cue trials, so that positive values indicate a slowing effect, and negative values indicate a speeding effect. Table 1. Mean RTs and SD for low and high test anxiety groups in each condition of the central cue paradigm Stimuli type Test-related Test-unrelated Test-related Test-unrelated
Subjects group
Number of subjects
High test-anxiety
13
Low test-anxiety
15
Mean 425.30a 418.28 431.15 432.85
Std. Deviation 104.61 101.32 108.37 103.02
a. The unit of mean is millisecond.
A 2 × 2 mixed design ANOVA of RT was carried out with test-anxiety groups (high and low), types of cue (test-related and -unrelated) (see Table 1). The results show a significant anxiety groups × cue types interaction, F (1, 26) = 5.45, p < .05, ηp2 = .17. Subsequently, a further simple effect analysis finds that RTs to the test-related cues are significantly longer than test-unrelated ones merely in high test-anxious students, F (1, 12) = 6.58, p < .05, ηp2 = .20. There is no other significant results found in ANOVA. Furthermore, an independent sample T-test is was used to assess the attentional disengagement index between two anxious groups (see Table 2). This indicates a significant difference between high and low test-anxious groups; t (26) = 2.33, p < 0.05, d = 0.88, the high test-anxious students show a slowing effect following the test-related cues. Table 2. Mean and SD of attentional disengagement index for low and high test anxiety groups in the central cue paradigm Subjects group High test-anxiety Low test-anxiety
Number of subjects 13 15
Mean 7.02 -1.70
Std. Deviation 10.20 9.57
236
R. Chen and R. Zhou
3.2 Event-Related Potentials Data Peak amplitudes were calculated for N100 (110-170 ms), P100 (150-200ms) and N200 (180-230 ms) time windows, and average amplitudes were calculated for P300 time window (290-340 ms).
Fig. 1. Grand averages separating high and low groups of subjects according to their TAS scores in each condition. An average of the six recording channels is represented.
For N100 component, the amplitude of the most prominent peak is computed for each individual ERP. The peak amplitude for test-unrelated trials is more enhanced than test-related both in high and low test-anxious students (see Fig. 1). These differences are significant at prefrontal region (F (1, 26) =27.53, p < .01, ηp2 = .51), frontal region (F (1, 26) =32.38, p < .01, ηp2 = .56) and centro-parietal region (F (1, 26) =11.44, p < .01, ηp2 = .31). Furthermore, P100 and N200 components are only inspected in high test-anxious group. One way ANOVA for P100 indicate that significant difference was detected at left frontal lobe (F (1, 12) =4.95, p < .05, ηp2 = .29). The peak amplitude for test-related trials is more positive than test-unrelated ones. For N200 component, significant differences are detected at left frontal lobe (F (1, 12) =6.82, p < .05, ηp2 = .36), centro-frontal area (F (1, 12) =5.84, p < .05, ηp2 = .31) and centro-patietal region (F (1, 12) =5.73, p < .05, ηp2 = .32). The peak amplitudes of all above brain areas for test-unrelated trials are more negative than test-related ones. In addition, Fig. 2 shows that average amplitude of P300 component for testrelated trials is more enhanced than test-unrelated in both high and low test-anxious students. These differences are significant at centro-frontal and superior parietal region (F (1, 26) =9.15, p < .01, ηp2 = .26), posterior parietal region (F (1, 26) =4.92, p < .05, ηp2 = .16).
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
237
Fig. 2. Grand averages separating high and low groups of subjects according to their TAS scores in each condition. An average of the four recording channels is represented.
4 Discussions The primary aim of this study was to characterize the behavioral response combined with ERPs to test-related or -unrelated cues both in high and low test-anxious students. We hypothesized that slowing effect in task performance are appeared only on high test-anxious students when target stimulus is presented following a test-related cue, but not on low test-anxious students. The results of RTs are identified with the hypothesis. An explanation for this result is the theory of behavior inhibition system (BIS), which asserts risk assessing, and risk aversion increasing in conflict situations [17]. This means the test-related cues are detected as the threat stimuli by high testanxious students only, and it can capture more attentional resource. The BIS is activated to increase assessment for the valence of threat stimuli, and interfere in the response to targets. Finally, the slowing effect on high test-anxious student in the testrelated trials is generated. According to the results of ERPs, the most prominent N100 amplitudes are mainly detected at prefrontal region and centro-parietal region. This component is an early ERPs component, which related to attentional processing and active orienting of attention to a task-related location [13]. In the present study, N100 peak amplitudes are more enhanced in the test-unrelated trials on both high and low test-anxious students. This means that test-related cues capture more attentional resources before target
238
R. Chen and R. Zhou
stimuli appear. It leads to less attentional resource allocate to the following target stimuli. Another significant component in this experiment is P300, which reflects a readjustment of cognitive strategies in preparation for future stimulus processing [9]. The P300 average amplitudes are typically measured most strongly by the electrodes covering the parietal region though it is generated by various parts of the brain. In this current study, the enhanced P300 amplitude in test-related trials is more enhanced than test-unrelated trials. This implies that the test-related trials occupy more attentional resources in both high and low test-anxious individuals. Moreover, P100 and N200 components are only appeared on high test-anxious students, which means more feature processing (enhanced P100) and stronger behavior inhibition (enhanced N200) are found in test-related trials. It is inferred that both P100 and N200 components related to the slowing effect, which is investigated in high test-anxious students in behavioral performance. According to above analyses, we find there is no difference during the stages of attentional orienting and readjustment of cognitive strategies between high and low testanxious students. The main differences between these two groups appeared on the stages of feature processing and response inhibition. Therefore, both P100 and N200 should be the special components to high test-anxious students, which reflects the difficulty of attentional disengage from test-related stimuli in the behavioral performance.
References 1. Keogh, E., French, C.C.: Test anxiety, evaluative stress, and susceptibility to distraction from threat. European Journal of Personality 15, 123–141 (2001) 2. Wang, C.K.: Reliability and Validity of Test Anxiety Scale (Chinese Version). Chinese Mental Health Journal 2, 96–97 (2001) 3. Fox, E., Russo, R., Dutton, K.: Attentional bias for threat: Evidence for delayed disengagement from emotional faces. Cognition and Emotion 16, 355–379 (2002) 4. Jongen, E.M.M., Smulders, F.T.Y., et al.: Attentional bias and general orienting processes in bipolar disorder. Journal of Behavior Therapy and Experimental Psychiatry 38, 168–183 (2007) 5. Mogg, K., Holmes, A., et al.: Effects of threat cues on attentional shifting, disengagement and response slowing in anxious individuals. Behaviour Research and Therapy 46, 656– 667 (2008) 6. Derryberry, D., Reed, M.A.: Temperament and attention: orienting toward and away from positive and negative signals. Journal of Personality and Social Psychology 66, 1128–1139 (1994) 7. Koster, E.H.W., Crombez, G., et al.: Selective attention to threat in the dot probe paradigm: differentiating vigilance and difficulty to disengage. Behaviour Research and Therapy 42, 1183–1192 (2004) 8. Liu, Y., Zhou, R.L.: The Cognitive Mechanism of Attentional Bias in Test-anxious Students. Annual Report of Southeast University, Nanjing (2008) 9. Wright, M.J., Geffen, G.M., et al.: Event related potentials during covert orientation of visual attention: effects of cue validity and directionality. Biological Psychology 41, 183– 202 (1995)
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
239
10. Bar-Haim, Y., Lamy, D., et al.: Threat-related attentional bias in anxious and nonanxious individuals: a meta-analytic study. Psychological Bulletin 133, 1–24 (2007) 11. Woods, D.L., Knight, R.T.: Electrophysiologic evidence of increased distractibility after dorsolateral prefrontal lesions. Neurology 36, 212–216 (1986) 12. Eimer, M.: Effects of attention and stimulus probability on ERPs in a Go/Nogo task. Biological Psychology 35, 123–138 (1993) 13. Heinze, H.J., Luck, S.J., et al.: Visual event-related potentials index focused attention within bilateral stimulus arrays. I. Evidence for early selection. Electroencephalography and clinical neurophysiology 75, 511–527 (1990) 14. Gray, H.M., Ambady, N., et al.: P300 as an index of attention to self-relevant stimuli. Journal of Experimental Social Psychology 40, 216–224 (2004) 15. Tucker, D.M., Hartry-Speiser, A., et al.: Mood and spatial memory: emotion and right hemisphere contribution to spatial cognition. Biological Psychology 50, 103–125 (1999) 16. Kemp, A.H., Gray, M.A., et al.: Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. NeuroImage 17, 1684–1692 (2002) 17. Putman, P., Hermans, E., et al.: Emotional stroop performance for masked angry faces: it’s bas, not bis. Emotion. 4, 305–311 (2004)
Concept Learning in Text Comprehension Manas Hardas and Javed Khan Computer Science Department, Kent State University, Kent Ohio 44240, USA {mhardas,javed}@cs.kent.edu
Abstract. This paper presents a mechanism to reverse engineer a cognitive concept association graph (CAG) which is formed by a reader while reading a piece of text. During text comprehension a human reader recognizes some concepts and skips some. The recognized concepts are retained to construct the meaning of the read text while the other concepts are discarded. The concepts which are recognized and discarded vary for every reader because of the differences in the prior knowledge possessed by all readers. We propose a theoretical forward calculation model to predict which concepts are recognized based on the prior knowledge. To demonstrate the truthful existence of this model, we employ a reverse engineered approach to calculate a concept association graph as per the rules defined by the model. An empirical study is conducted of how six readers from an undergraduate class of Computer Networks form a concept association graph given a paragraph of text to read. The model computes a resultant graph which is flexible and can give quantitative insights into the more complex processes involved in human concept learning.
1
Introduction
Text comprehension is a high level cognitive process performed remarkably well by humans. From a computational view point, it is remarkable because humans can understand and comprehend any piece of text they have never seen before and learn completely new concepts from it they never knew before. Human concept learning as defined by Bruner et. al. (1967) is the correct classification of examples into categories. Previous computational models of human concept learning (Tanenbaum 1999, Dietterich et.al. 1997) give very good approximations of this kind of concept learning. However learning concepts from text is unlike learning from examples. Completely new concepts are not learnt by hypotheses induction but by making new associations with prior knowledge. Therefore a cognitive theory of concept construction is needed to explain this process rather than a theory of generalization. Constructivism (Piaget, 1937) is a cognitive theory of learning which explains how concepts are internalization based on previously acquired concepts by assimilation and accommodation. It gives a systematic cognitive model for acquiring new concepts in context of the prior knowledge. Hence we propose that the process of concept learning from text be examined in the light of the cognitive process involved in constructivism. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 240–251, 2010. c Springer-Verlag Berlin Heidelberg 2010
Concept Learning in Text Comprehension
241
Text comprehension research in the cognitive sciences domain considers comprehension, from a process model point of view (Kintsch 1988, van den Broek, Risden, Fletcher and Thurlow, 1999, Tzeng, Y., van den Broek, P., Kendeo, P., Lee, C., 2005, Gerrig and McKoon, 1998; Myers and O’Brien, 1998) or from a knowledge point of view (Landauer, Dumais, 1988, 1997). Both approaches to text comprehension necessitate a mathematical model for learning and the involvement of prior knowledge. There is ample evidence for the importance of prior knowledge in learning from texts (Kintsch, E., and Kintsch, W., 1995, McKeown, M. G., et al., 1992, Means, M., et al., 1985, Schneider, W., et al. 1990). As observed by Verhoeven, L. and Perfetti, C. (2008), over the past decade, research on text comprehension has moved towards models in which connectionist memory-based and constructivist aspects of comprehension are more integrated. There are two main contributions of this paper. The first is a mechanistic model of the cognitive processes involved in concept learning in text comprehension. The process of text comprehension is defined as concept recognition and concept association. During human text comprehension the CAG goes through a series of incremental changes as specific concepts are recognized or discarded depending upon the reader’s prior knowledge. This paper proposes a computational model for the two processes. The second main contribution is a reverse engineered approach to the model for obtaining the concept association graph. When a person reads text, the CAG which is formed cannot be known beforehand. Hence we need a reverse engineered approach to find the CAG from the data generated by the subjects. An empirical study is conducted in which reader drawn CAGs are fed into a constraint satisfaction system which computes the CAG for a reader or a group of readers. This novel method to compute the comprehensive graph can be efficiently used to comment on the state of learning for a reader or a group of readers.
2 2.1
Computational Model Knowledge Representation
The knowledge representation is a concept association graph (CAG). The graph consists of nodes which represent concepts and the edges between the nodes signify the association between concepts. The strength of the association is given by an association strength. Association strength is positive and can also be negative in some special circumstances. Any CAG, for example T, has a set of concepts and a set of associations represented by tuple CT = [c1 , c2 , c3 , ...] and AT = [lc1 ,c2 , lc1 ,c3 , lc1 ,c4 , ...] respectively. Figure 1. shows an example of a simple concept association graph. From the graph it can be seen that the concept “ethernet” is most strongly associated with “CSMA” because of it high association strength. Similarly “LAN” and “CSMA” are the two most weakly associated concepts. The time line, represented by t=1, t=2 and so on, is the order in which the concepts were acquired. A lower time value means that the concepts were acquired relatively earlier in learning. The semantic of the association strengths is provided by the theory of constructivism.
242
M. Hardas and J. Khan
Fig. 1. Example of a simple concept association graph (CAG)
For the concepts which are acquired at t=2, the concepts acquired at t=1 are considered as previous knowledge. Since according to the theory all new concepts can only be acquired in the context of previous concepts, the strength of the association signifies the importance of the existence of a prior concept to learn a particular new concept. For example, to learn concept “CSMA”, it is more important to learn “ethernet” than it is to learn “LAN” because the association strength between ethernet-CSMA (10) is much greater than that between LANCSMA (2). 2.2
Terminology
1. L(t) is defined as the current learned graph at time “t”. It is the graph formed by the reader from episodic memory of the text. The graph goes through a series of changes during text comprehension and is continuously evolving as L(t → ∞). 2. Z is the graph which represents the global prior knowledge a reader possesses i.e. the non text/domain specific knowledge. It is assumed for computational purposes, that Z has the association information for all new concepts. So whenever a reader is confronted by a totally new concept not present in L(t), the new concept is acquired in L(t + 1) by getting the association information from Z. 3. S(t) is a series of graphs each representing the newly introduced concepts in each comprehension episode. Based on the prior knowledge some concepts from this graph are recognized while some are discarded. The concepts which are recognized are then associated with concepts from L(t) to get L(t + 1), refer Figure 2. 4. Learning: It is a series of ’comprehension episodes’, each a two step process. Latent concept recognition: From the set of presented concepts, some are recognized while some are not. Latent concept association: From the set of recognized concepts, the concepts are associated with previously known concepts.
Concept Learning in Text Comprehension
3
243
CAG Transition Process
In the given model we assume two instances of CAG. The first one is the initially learned CAG namely L(t = 1) represented by concept set CL(t=1) and association set AL(t=1) . Learning progresses through a set of learning episodes. It starts with L(t = 1) and then incrementally constructs L(t = 2, 3, 4, ...). In each learning episode a small graph S(t) is presented. S(t) is the graph for every new sentence at time t. It too has a concept set CS(t=1) and an association set AS(t=1) . Generally learning episode can range from reading a sentence, or a part of a sentence or simply a word. By the immediacy assumption we consider the smallest unit of a learning episode as a sentence. S(t = 1) includes novel concepts and associations which are not in L(t = 1) and well as known elements. The second instance of CAG is Z. Z provides a learner with the connections between the current L(t = 1) and the newly acquired concepts from S(t = 1). By definition L(t = 1) cannot have association information that can connect the new concepts in S(t = 1) to that of L(t = 1). Thus the model requires an imaginary CAG namely Z to provide the learner with some basis of computation to discover new concepts and attach them to current L(t). Whenever a new concept is presented to the learner the association information to connect the concept to L(t = 1) is acquired from Z. It is assumed that Z has all the concepts and association information. From the example it can be seen that in the first learning episode when concepts “d” and “e” are presented, the connection between “a” and “d” is obtained from Z. Concept “e” is not present in Z and therefore has no connectivity information. Hence it is discarded while forming the learned graph L(t = 2). In the second episode when concepts “f” and “g” are presented, again the connection information is obtained from Z and L(t = 3) if formed. Fig. 2 shows only the relevant part of Z w.r.t. this example.
Fig. 2. CAG transition
244
4
M. Hardas and J. Khan
Processing in a Learning Episode
As mentioned in the previous section a learning episode is made up of two distinct processes as detailed below. 4.1
Latent Concept Recognition
The first process is called latent concept recognition. A new concept graph denoted by S(t) is presented out of which some concepts are recognized based on the prior knowledge and some are not. Let the L(t) be the learned CAG denoted by its concept set CL(t) and association set AL(t) . A new sentence S(t), presented at step t, has a finite set of discrete concepts CS(t) and association set AS(t) . The set may contain already learned (i.e. which are already in L(t)) or new concepts (i.e. which are not already in L(t)). The new concepts from S(t) which are recognized by the learner are called as “latent concepts” and denoted by the set Clat(t) , where Clat(t) ⊂ CS(t) . The latent concept set Clat(t) is formed by evaluating a comprehension function which returns comprehension strength (hi ) for each concept “i” in S(t). The concepts for which hi exceeds a certain threshold hT are added to Clat(t) i.e. are recognized. Clat(t) = [i|i ∈ CS(t) ; hi > hT ]
(1)
The comprehension strength for a node “i” is function of the association strengths of the links between concept “i” and prior concepts in L(t). It is computed as follows, hi = f (ls,i ); ∀s ∈ CL(t) . We assume a linear relationship between the comprehension strength and the threshold coefficient analogous to linear weighted sum activation function in a simple artificial neural network. It is possible that a nonlinear function holds true, but that is a part of another discussion. Therefore, hi =
ls,i
(2)
s∈CL(t)
An example calculation of the comprehension strength for each concept in CS(t) is shown in Fig. 3. Assume that the association strength between the concepts in L(t) and S(t) are known. Let the threshold coefficient for this particular example be, hT = 10. Calculating the individual comprehension strengths for individual concepts in CS(t=1) we have, 1. hCSMA = lethernet,CSMA + lLAN,CSMA = 12 > hT (10) 2. hcarrier sense = lethernet,carrier sense + lLAN,carrier sense = 9 < hT (10) 3. hcollision detect = lethernet,collision detect + lLAN,collision detect = 13 > hT (10) Since comprehension threshold for “carrier sense” is below the threshold it is not recognized and not included in Clat(t=1) . After the step of latent concept recognition the set consists of Clat(t=1) = (CSM A, collision detect).
Concept Learning in Text Comprehension
245
Fig. 3. Latent concept recognition
4.2
Latent Concept Association
The second process associates the latent concepts in Clat(t) with the concept(s) in the learned set L(t) to form L(t + 1). The set of latent associations is denoted by Alat(t) , where Alat(t) ⊂ AZ . The latent association set if formed by evaluating an association function which gives the association strengths ai,j for a concept i and j. The association strength ai,j is simply the scalar link strength of li,j . All the associations with strengths greater than a certain threshold aT are included. Alat(t) = [li,j ; ∀i ∈ CL(t) , ∀j ∈ Clat(t) |ai,j ≥ aT ]
(3)
Fig. 4. Latent concept association
If we assume the association threshold aT = 5, then from the figure it can be seen that the association between “LAN” and “CSMA” is dropped because it is less than the association threshold aT (5) > aLAN,CSMA (2). After the process of recognition and association the concept map evolves from L(t) to L(t + 1) represented by concept set and association set computed as follows, CL(t+1) = CL(t) ∪ Clat(t) & AL(t+1) = AL(t) ∪ Alat(t)
5
(4)
Reverse Engineering the Association Strengths
We present a constraint satisfaction model to calculate the association strengths for the example CAG as shown in Figure 5. Since the concepts (c, d and g) are
246
M. Hardas and J. Khan
recognized, it means that the comprehension strengths of each of the nodes is greater than the comprehension threshold. This can be represented by a set of linear equations called as recognition threshold (hT ) equations. hT equations lac + lbc ≥ hT lad + lbd ≥ hT lae + lbe ≤ hT laf + lbf + lcf + ldf ≤ hT lag + lbg + lcg + ldg ≥ hT
aT equations lac ≥ aT , lac > 0 lbc ≤ aT , lbc > 0 lad ≥ aT , lad > 0 lbd ≥ aT , lbd > 0 lag ≤ aT , lag > 0 lbg ≤ aT , lbg > 0 lcg ≥ aT , lcg > 0 ldg ≥ aT , ldg > 0
It is seen from the graph that concepts “e” and “f” are not recognized. Therefore no associations exist for them. Also, the links lbc , lag and lbg are less than the threshold, therefore these links are not present in the learned graph. In this discussion we do not consider associations between concepts recognized at the same time. So we form the association threshold (aT ) equations. The hT equations are the ones that constrain the recognition of concept while the aT equations are the ones that constrain the association of concepts. It may happen that the association strengths between concepts in L(t) and CS(t) − Clat(t) maybe greater than the association threshold for example, maybe lae > aT . But since the summation of lae and lbe is not greater than hT , concept “e” is not recognized. And since concept “e” is not recognized we do not put aT constraints on it associations. All links are constrained to be greater than 0. The matrix representation for the equations as a linear programming problem is as follows; min f*x subject to constraints A ∗ x ≤ b where x is the vector of variables and f, A and b are the coefficient matrices for the objective function, equation set and the result. Since we are not trying to optimize an objective function all the members of f are set to 0. An important observation here is that, the recognition and association thresholds (hT and aT ) are variable and
Fig. 5. Example of a learned CAG at time t=3
Concept Learning in Text Comprehension
247
Fig. 6. Matrix representation
factored into the coefficient matrix for the equation set. Fig.6 shows an example matrix representation. Solving this gives the association strengths for all the associations for a given CAG.
6 6.1
Finding and Analyzing the Solution CAG Experiment Setup
To find the comprehensive complex graph that can explain the concept learning for a particular example text an experiment is conducted in the classroom setting with a group of six students in the undergraduate “Computer Networks” class. Subjects were given a paragraph of text about the concept “Ethernet” from the standard text book prescribed for that class and were asked to go through each sentence in the paragraph and simultaneously identify each concept in the sentence and progressively draw CAGs. The paragraph contained 8 sentences, so the concept learning activity was divided into 8 learning episodes. By the end of the eighth episode the students had drawn CAGs from t=1-8 using the concepts from the text. 6.2
Observations
Figure 7 (a) shows the concept graph drawn by one of the students. The student drawn CAGs are used to reconstruct CAGs for all students indicating the concepts which were recognized and those which were not. To construct these graphs we first have to find CS(t) for t=1-8. This is done by collecting concepts in t=1 to 8 for every student. For example, at t=3 the possible set of recognized concepts which covers all students is, CS(t=3) = (P ARC, N etwork, Shared link). Out of these student recognized only the concept “Shared link”. Therefore, Clat(t=3) = (Shared link) and CS(t=3) − Clat(t=3) = (P ARC, N etwork). Figure 7(b) shows the reconstruction of CS(t=3) and Clat(t=3) for a student. After determining CS(t) and Clat(t) for every sentence for every student the concept maps are reconstructed to include the recognized as well as the unrecognized concepts.
248
M. Hardas and J. Khan
(a) Student drawn CAG
(b) Reconstructed CAG
Fig. 7. Reconstructing student drawn CAG
Once CAGs for all students are reconstructed, they are converted into a set of linear equations and solved to obtain the values of the association strengths which satisfy all the constraints. The result is a fully connected CAG called as solution CAG which can mathematically explain the concept learning for all students according to the laws of concept recognition and concept association as specified before. 6.3
Analysis of Solution CAG
The solution CAG is a fully connected graph between concepts from the text and special “hidden” nodes. These types of nodes are introduced at the stage of reconstructing a particular student CAG. A single hidden node is inserted at time t=1 for every student. This hidden node signifies the background knowledge of a particular student. For an experiment like this, it is impossible to actually construct a graph of a student’s entire background knowledge. There exists no method which can accurately hypothesize a person’s concept knowledge graph. Hence we assume that all the background concept knowledge possessed by a student is encompassed in a single hidden node namely “std1” for student 1 and so on. Thus the resultant CAG contains six such hidden nodes, one for each student. The hidden nodes have connections to all the rest of the concept nodes in the CAG. The associations from a hidden node can have positive as well as negative strengths. This can be intuitively explained as, sometimes the student’s background knowledge helps in learning new concepts whereas sometimes it is found to be an obstacle. If the association from the hidden node to a concept node is positive then it implies that the hidden node is beneficial in learning the
Concept Learning in Text Comprehension
249
new concept whereas negative association strength implies that the hidden node is actually detrimental to learning the new concept. A zero strength association from a hidden node implies it’s neither beneficial nor detrimental to learning. The existence of hidden nodes also helps in solving another known problem of learning the XOR function using this model since the hidden node associations are allowed to take negative values. The solution CAG is actually the imaginary CAG namely Z, which we assumed to have contained all the connectivity and association strength information. Z can thus be calculated by reverse engineering the observed student drawn CAGs. Association strength distribution. In this section we analyze the distribution of association strengths. Figure 8(a) shows the sorted association strengths between hidden nodes and concept nodes and Figure 8(b) shows just between concepts. Some of the links between hidden nodes and concepts have negative strength link but most have positive strength indicating that more often than not background knowledge helps in learning new concepts. This plot can be used to determine which of the associations are most important and need reinforcement.
(a) Between hidden and concept nodes
(b) Between concept nodes
Fig. 8. Association strength distribution
Node strength distribution. The node strength of a node is calculated by summing up all the association strengths to that particular node. Figure 9(a) shows the sorted node strengths for the hidden nodes. It is seen that std5 hidden node has highest node strength. Figure 9(b) shows the sorted node strengths for the actual concepts. “CSMA/CD” has the highest node strength signifying it importance in comprehension of this particular paragraph of text. This graph gives us an idea about which concepts are central in comprehending this paragraph of text. As seen from the figure the concepts “CSMA/CD”, “Collision detect”, “Aloha”, “frames” etc. are much more central in comprehending the concept of “ethernet” than say, “bus”, “coax cable” or “shared medium”.
250
M. Hardas and J. Khan
(a) For hidden nodes
(b) For concept nodes
Fig. 9. Nodes strength distribution
Correlation with hT . Each student is assumed to have a variable hT and aT . These variables are factored into the problem while constructing the equations and coefficient matrices. The variable hT for each student signifies the difficulty or ease of a student in comprehending the particular paragraph of text. A lower value of hT implies that the student possibly has a lower threshold for learning new concepts. Meaning the student is more easily capable of learning new concepts than one with a high threshold. To observe this we simply plot the correlation between the threshold hT for each of the students and the number of concepts recognized (n) by the student. Figure 10 table shows the exact values hT against n and the plot. As expected there is negative correlation between the two variable equal to -0.172.
student hT 1 554.027 2 582.764 3 622.879 4 631.583 5 775.794 6 574.077
n 17 16 11 10 13 9
Fig. 10. Correlation between hT and n for six students is -0.172
7
Conclusion and Potential Directions
In this paper we proposed a computational model for computing the concept associating graph which is formed during human text comprehension. A study
Concept Learning in Text Comprehension
251
is conducted to explain concept learning for a group of six readers, which can be extrapolated to any number of subjects. We perform simple graph analysis on the obtained CAG to find peculiar characteristics a cognitive concept graph might have. Some of the questions we are able to answer are, which associations are important than others, what distribution do association strengths have, which concept is central in comprehending a particular topic, which student has the maximum chance of learning new concepts and what is the significance of the threshold coefficient in learning new concepts. The CAG can be subjected to rigorous complex network analysis to derive other interesting inferences. From the theory it is clear that prior concepts play an important role in learning new concepts. As a potential direction we plan to study how the sequence of concepts affects concept learning.
References 1. Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997) 2. Tenenbaum, J.B.: Bayesian modeling of human concept learning. In: Proceedings of the 1998 conference on Advances in neural information processing systems, vol. II, pp. 59–65 (July 1999) 3. Kintsch, W.: Predication. Cognitive Science: A Multidisciplinary Journal 25(2), 173– 202 (2001) 4. Kintsch, W.: The Role of Knowledge in Discourse Comprehension: A ConstructionIntegration Model. Psychological Review 95(2), 163–182 (1988) 5. Kintsch, W.: Text Comprehension, Memory, and Learning. American Psychologist 49(4), 294–303 (1994) 6. Kintsch, W., Van Dijk, T.A.: Toward a Model of Text Comprehension and Production. Psychological Review 85(5), 363–394 (1978) 7. Chater, N., Manning, C.D.: Probabilistic Models of Language Processing and Acquisition. Trends in Cognitive Sciences in Special issue: Probabilistic models of cognition 10(7), 335–344 (2006) 8. Landauer, T.K., Laham, D., Foltz, P.: Learning human-like knowledge by singular value decomposition: a progress report. In: Proceedings of the 1997 conference on Advances in neural information processing systems, Denver, Colorado, United States, vol. 10, pp. 45–51 (July 1998)
A Qualitative Approach of Learning in Parkinson’s Disease Delphine Penny-Leguy1,2 and Josiane Caron-Pargue1 1
Department of psychology, 97 Avenue du Recteur Pineau, F-86022 Poitiers cedex 2 CHU-Poitiers, 2 rue de la Milétrie, F-86000 Poitiers
[email protected],
[email protected] Abstract. Verbal reports of PD patients and of two control groups (their matched Elderly and Young), obtained during the solving of the 4- disks Tower of Hanoi, were analyzed in terms of enunciative operations, cognitively interpreted. The analysis focuses on the processes involved in the reconstruction of implicit knowledge in a new context. Results show processes of deconstruction of implicit knowledge and several impairments in the processes involved in its reconstruction and stabilization. Notably, instead of locating objects relative to one another at the declarative level, PD patients locate objects, at the procedural level, by means of locating the place where they are or where they go, relative to one another.
1 Introduction Many studies attest to the emergence and the progress of cognitive impairments in Parkinson’s disease (PD). These impairments affect controlled processing and executive functions. More specifically, they are mainly due to working memory, notably when the task requires manipulation of information, but they remain secondary in visual recognition [3], [6], [10]. Furthermore, several difficulties in working memory have been specified and linked to the PD. A lack of flexibility appears when patients are confronted to new situations. A process of de-automatization arises when automaticity would have to be applied again, or when its construction has to be completed. Difficulties affect parallel processing, mainly at the declarative level while the procedural level seems independent from executive impairments [8], [9], [11]. Furthermore, impairments in language affect mainly the production of verbs, sentence comprehension, and pragmatic communication abilities (cf. [7] for a review). In fact, all these impairments appear as the result of deeper dysfunctions. The key point remains to characterize the deficient cognitive processes underlying them. Our assumption is that new insights could be brought from a cognitive approach of language by recent contributions in cognitive linguistics, notably by Culioli’s enunciative theory [5]. Indeed, some cognitive processes, marked by enunciative operations, intervene in the reconstruction of knowledge within the task [1], [2]. The aim of this paper is to compare verbal reports produced during the solving of the Tower of Hanoi test by PD patients and control groups, their matched Elderly (CE), and Young (CY). Our intent is to focus on three kinds of enunciative operations: external locations, in relation or not with an internal access to abstraction, modal Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 252–262, 2010. © Springer-Verlag Berlin Heidelberg 2010
A Qualitative Approach of Learning in Parkinson’s Disease
253
expressions and connectives which mark the planning. All of them intervene in the construction of cognitive units. Our general hypothesis is that impairments in PD can be specified at these points in order to show impairments in the construction of implicit knowledge and in the planning. The task. All participants solved the 4-Disks Tower of Hanoi four consecutive times. Half of each group was verbalizing, and the other half was not, during the solving process.1 Only the verbalizing groups are considered in this paper. The Tower of Hanoi was a wooden one, with three pegs (A,B,C) from the left to the right and four colored disks of decreasing size (pink disk 1, green disk 2, yellow disk 3, black disk 4, from the smallest to the largest one). In the initial state, all disks stood on peg A, each on a larger one. The goal was to move all disks on peg C in the same configuration, with two constraints: not to move a disk on a smaller one; not to move a disk on which another one was laying.
2 Theoretical Background Our assumption is that conceptual knowledge is reconstructed in the task as a contextual and local one, restricted to the current situation in time and space before being more or less generalized and de-contextualized [1], [2], [4]. Processes marked by linguistic forms, such as, for example, links between pieces of information or detachments from the situation, play a role in these de-contextualizations. They operate by means of interactions between the different parts of distributed representations, notably of internal-external interactions. Our hypothesis is that PD patients’ impairments which lead to de-automatisation and to a lack of generalization are concerned with these cognitive processes. A theoretical and methodological way for characterizing cognitive operations from linguistic markers is to consider two semiotic levels, one identifying enunciative operations from linguistic markers, the other giving a cognitive interpretation of those operations [1], [2]. Indeed, enunciative operations constitute a formal construct accounting for various steps in the construction of utterances from propositional contents. So, a cognitive interpretation of those operations, close to their formal definition, allows the characterization of cognitive re-organizations, escaping from both literal and subjective interpretations. In order to show PD patients’ impairments, we will consider links involved in the structure of implicit knowledge, and in the planning. These links can be marked by linguistic forms as follows. Propositional contents. A succession of similar propositional contents may be interpreted as a step toward the construction of implicit knowledge. According to our assumption, the cognitive processes involved in the actualization of propositional contents do not generally retain the organization of this knowledge, but re-organize it. 1
A preliminary experiment, not reported here, was about possible effects of verbalization. The results showed only a significant increase of total time in the three verbalizing groups (Young, Elderly, PD patients). No significant difference arose for the number of moves between verbalizing vs non verbalizing groups.
254
D. Penny-Leguy and J. Caron-Pargue
Then, we can expect impairments for the PD group in the internal-external interaction which requires several manipulations of information in working memory. These impairments must be found notably in the operations which intervene in the processes of internalization and externalization, and which categorize external occurrences. They could lock previous implicit knowledge to a very local situation by lack of internalization, or to a completely detached level unusable in the current situation by lack of externalization. Then, a new explanation could be given for the already mentioned PD’s de-automatization. Locations. Every oriented relation from a toward b is formalized in Culioli’s theory by an operation of location (fr. repérage), and cognitively interpreted as an attentional focusing on a, and as b coming with a [1]. Indeed, locations intervene in the re-organization of propositional contents as cognitive aggregates, in order to construct decontextualized and stabilized knowledge by means of internalization and externalization. The basic re-organization of a propositional content stands in its contextualization by means of locating it either relative to the situation or to the internal subjective space. The criterion used in order to recognize an internal location is the presence of a starting term. A starting term is defined by the detachment of an argument from the propositional content, this detachment being marked by an anaphora. A starting term marks an access to abstraction and to conceptual knowledge in order to internally reorganize and partly re-construct it within the situation. However, there is a constraint for that. At least one external location has to be categorized, or else the categorization is reduced to the prior local external occurrence [1]. The categorization of external occurrences plays a double role, first in internalization, re-inscribing the categorized locations at the internal level, second in externalization, re-introducing them at the external level. The absence of a starting term is the criterion used to recognize a cognitive processing in the external space. External locations may occur either at first as a very local processing, or later when decontextualized. Besides, procedural vs declarative aggregates, both with or without starting terms, may be constructed at different levels of control [2]. The criterion used in order to recognize them is the repetition of lexical choices associated either to the objects or to the moves. In the case of PD, one can expect impairments in the construction of declarative aggregates and in their matching with procedural aggregates. Modal expressions and planning. Two kinds of planning may be distinguished [2], one being automatic without difficulties, marked mainly by connectives, without modal expressions. The other, marked by modal expressions, is the critical, strategic planning. It involves a detachment from the situation aiming at recovering information, considering other possibilities, and re-organizing the planning. It arises in case of difficulties in the current processing of the situation. The critical planning may concern a strategic positioning of goals (marked by modal verbs can, want, have to), a strategic initialization of sequence (marked by well), or a strategic access to memory in terms of storage or retrieval (marked by interjections and verbs such as I think, I believe). It marks consecutive steps for identifying the constraints linked to the task and to the situation [2]. Impairments in PD can be expected in the planning as a deterioration of the automatic and an increase of the critical planning which will concern notably local reorganizations.
A Qualitative Approach of Learning in Parkinson’s Disease
255
3 Method 3.1 Participants Three groups of French subjects participated in the experiment: PD: a group of 20 non-demented patients, at the beginning of Parkinson’s Disease according to Hoehn and Yahr’s score (stades 1 to 3); CE: a control group of 20 elderly matched to the Parkinson group; CY: a control group of 20 young students. The mean age of each group was respectively 62, 61, and 18. All participants were healthy, right-handed, and novice for solving the ToH puzzle. Each group comprised 10 women and 10 men. PD patients were medicamented with L-Dopa. Both PD and CE were selected on the basis of their score at the following tests,: score higher or equal to 25 at the Mini Mental State Examination (MMSE); score lower or equal to 20 at Mongomery and Asberg’s scale of depression (MADRS); score between 120 and 136 at MATTIS’s scale of impairments in executive functions. 3.2 Linguistic Criteria The blocks. The succession of similar propositional contents verbalizing consecutive moves characterizes implicit units of knowledge, called ‘blocks’. Different kinds of blocks can be defined, and differentiated by criteria defining the predicative relations as follows, independently of final lexical choices (PD1 means PD patient, trial 1; CE3 Elderly, trial 3; CY3 Young, trial 3): TGT (Take, Go, To), e.g. (PD1) I take the pink disk I place it on the C; (CE3) I remove the pink one it goes on the A; (CY3) I take the pink one I put it to A. GT (Go, To), e.g. (PD1) I put the B on the C; (CE2) then I pass the pink one to B. T (To), e.g. (CY4) pink one on B; (PD3) the C on the B. 0 when the predicative relation is not verbalized. For example: (CP4) and then ABC no? ; (CE4) and the pink one; (CP4) oh right! ; (CP3 ) no-verbalization at all. Then four kinds of blocks may be defined, each of them associated to the predicative relations: BTGT, BGT, BT, B0. The succession of these blocks may imply either a simplification or a complexification of the verbalization, according to the following criteria: Simplified blocks: if two blocks follow each other in the order BTGT, BGT, BT, B0. Complexified blocks: if the reverse order arises between two blocks. Furthermore, blocks can be reorganized by the following linguistic markers, which can appear inside a block or between blocks: Connectives. Inside a block, connectives partition the implicit knowledge; between blocks, they link them. Modal expressions. Inside a block, modal expressions mark a local modification of the planning; outside, they mark a substantial reorganization of the planning.
256
D. Penny-Leguy and J. Caron-Pargue
Internal-external organization. Our intent is to study this organization from the repartition of locations between two consecutives moves, constructing various aggregates of moves, with or without starting terms. Starting terms e.g. (PD1) I take the green disk I put it on the B; (CE1) the pink one I put it on the C. The terms the green disk and the pink one have both the status of starting terms, marked by anaphora it. External aggregates e.g. (CY3) the green one on the A – the pink one on the A; (CE1) the green one on the yellow one – the pink one on the green one. The criterion for recognizing an aggregate is the repetition of the lexical choice of arguments: either the repetition of the naming of pegs, e.g. the A in (CY3), or the repetition of the naming of disks, e.g. the green one in (CE1). Furthermore no starting term appears in the verbalization of an external aggregate. Categorized aggregates e.g. (PD1) I take the green disk I place it on the C – I take the pink disk I place it on the green one on the C; (CE2) then the green one the A I put it on the C – the pink one the B I put it on the C. The criterion is that there is at least one starting term in the verbalization of an aggregate: e. g., in (PD1) the green disk and the pink disk are both starting terms, marked by anaphora it, with two aggregates, respectively marked by repetitions the green and the C; in (CE2) the green one and the pink one are both starting terms, marked by anaphora it, and with an aggregate marked by the repetition the C. Declarative-procedural organization. This category refers to aggregates with or without starting terms. Declarative aggregates. The criterion for recognizing them is the repetition of the naming of disks, e. g. repetition of the green one in (CE1) then the green one I put it on the B on the yellow disk – the pink one on the green one; repetition of the black one in (CY2) the black one on the C – the pink one on the black one. Procedural aggregates. The criterion is the repetition of the naming of pegs. We distinguish the specific case where the repeated naming is the naming of a peg referring to a disk: Peg-aggregates e. g. with the repetition the B in (CY1) the yellow one on the B – er the pink one on the B; repetition B in (CE2) right! the C to B – the A to B. Disk-peg-aggregates e. g. with repetition B in (CE2) the yellow one B returns on the C – the pink one on the B; repetition A in (PD2) I take the A I put it on the C – I take the pink one I put it on the A; repetition the A in (PD4) the A on the B – the C on the A; repetition A in (PD4) the pink disk on the A – the disk A on the C; repetition AB in (PD4) the AB on the C – and the pink one AB. 3.3 Dependent Variables Dependent variables were constructed in relation to the different linguistic criteria used for studying the structure of propositional contents and the re-organization of this structure. Blocks of identical propositional contents, each verbalizing a move, were identified and characterized by their respective number and length.
A Qualitative Approach of Learning in Parkinson’s Disease
257
The ratio to moves was computed for each linguistic criterion by 100 x ratio to the total number of moves, except for the ratio of blocks, computed in ‰. A ratio to words (1000 x ratio to the total number of words) was also computed, but not presented here. The length of blocks, and the ratio of simplified blocks or of complexified blocks were computed as follows: Length: the length of a block is the number of moves inside the block. Ratio of simplified blocks: it is the ratio of the number of blocks followed by a more simple one, to the total number of blocks of the trial. Ratio of complexified blocks: it is the ratio of the number of blocks followed by a more complex one, to the total number of blocks of the trial.
4 Results Specificities of PD patients. Both ratio and length of blocks B0 are higher for PD than for CE and CY, F(2, 27) = 4.62(3.60), p < .05(.05) respectively (cf. Table 1), both non-significantly decreasing with trials. Furthermore, in blocks BTGT, the length decreases with trials for PD, F(3, 27) = 2.68, p = .05, without significant effect for others. In blocks BGT, the length is higher in trial 1, decreasing faster with trials than for CE, with interaction group × trials, F(3, 54) = 4.51, p < .01. The simplification of blocks shows a higher ratio for PD than for CE, itself higher for CE than for CY, F(2, 27) = 9.21, p < .05 (means PD: 4.8, CE: 2.8, CY: 1.3). Furthermore, both ratios of simplification and of complexification decrease for CE (p < .05) and for CY (p < .01), while non-significantly decreasing for PD. Inside blocks, the ratio of Modal Expressions is higher on trial 1, mainly in blocks BGT, for PD than for CE, but without significant effect on other trials, with the interaction groups × trials, F(6, 54) = 3.30, p < .05. It persists for PD in blocks BT and B0, and almost disappears for CE. The ratio of Connectives decreases with trials for CE and CY, but does not significantly for PD. In the whole strategy, independently of blocks: at the procedural level, the ratio of Disk-peg-aggregates increases with trials for PD, while non-significant for CE, with a significant interaction groups × trials, F(3, 54) = 3.53, p < .05 (cf. fig 1). Table 1. Means of ratio and Length for the different blocks and groups
Ratio ‰
Length
PD
CE
CY
PD
CE
CY
Block BTGT
1.5
2.3
1.5
0,8
1,6
1,8
Block BGT
3.0
6.8
1.0
1,7
1,2
0,2
Block BT
8.5
9.0
6.0
9,8
9.0
13.8
Block B0
4.9
1.2
0,7
2,8
0,3
0,1
258
D. Penny-Leguy and J. Caron-Pargue
Fig. 1. Ratio to moves of linguistic markers with trials for the three groups PD, CE, and CY. Cnn: Connectives, Mod: Modal expressions, Loc: Aggregates, LocST: Categorized aggregates, PegLoc: Peg-Aggregates, D-Peg-Loc: Disk-Peg-Aggregates.
A Qualitative Approach of Learning in Parkinson’s Disease
259
Similarities between PD patients and Young. Other specificities of PD patients are marked by a non-significant difference with CY but a significant one with CE, as follows: Between blocks, the ratio of Connectives is lower for PD and CY than for CE, respectively F(1, 18) = 5.05(4.62), p < .05(.05). In the whole strategy, independently of blocks (cf. fig. 1): Categorized aggregates show a non-significant effect of trials for PD and CY, but a significant decrease for CE, F(3, 27) = 3.68, p < .05. Declarative aggregates have their ratio decreasing with trials for PD and CY, respectively F(3, 27) = 2.7(3.15), p = .05(< .05), while increasing for CE, F(3, 27) = 3.67, p < .05. Peg aggregates have an increasing ratio with trials for PD and CY, respectively F(3, 27) = 2.86(4.01), p = .05( p < .05), but non-significant for CE. Similarities between PD patients and Elderly. No significant difference appears between PD and CE, both distinct from CY: In block BGT, the ratio and the decreasing length, are both higher for PD and CE than for CY, respectively F(2, 27) = 3.68(4.06), p < .05(.05). The complexification of blocks shows a higher ratio for PD and CE than for CY (means PD: 2.5, CE: 2.1, CY: 0.4), F(2, 27) = 4.70, p < .05. Inside blocks, the ratio of Connectives does not significantly decrease for PD, but does for CY, F(3, 27) = 2.65, p = .05. Between blocks, for Modal expressions, the ratio decreases with trials for both PD and CE, F(3.54) = 4.32, p < .01. In the whole strategy, independently of blocks: Modal expressions show no group effect and no effect with trials between PD and CE. The ratio in CY is very low (cf. fig. 1). Starting terms show a higher ratio for PD and CE than for CY (means PD: 3.1, CE: 4.8, CY: 1.1), with a significant difference between CE and CY, F(1, 18) = 4.95, p < .05. Their ratio decreases with trials for PD and CE, F(3, 81) = 11.03, p < .001. External aggregates show an effect of trials, with an increase of their ratio, significant for PD and CE, F(3, 81) = 5.25, p < .01), but not for CY. Peg aggregates. A significant effect of group shows a lower ratio for PD and CE than for CY, F(2, 27) = 3.30, p = .05. Disk-peg aggregates. Only PD and CE produce them (means PD: 1.28, CE: 0.79, CY: 0.06). There is no significant effect of group between PD and CE. Similarities among the three groups. In blocks BTGT, BT, there is no effect of group for ratio and length. In BT, the ratio increases with trials in the three groups. Inside blocks there is no effect of group but a decreasing effect of trials for Connectives. There are very few Modal expressions in blocks BTGT for the three groups. In the whole strategy, independently of blocks: for Connectives, there is no effect of group, but a decrease with trials; for Declarative aggregates, there is no main effect, nor interaction (means PD: 0.22, CE: 0.44, CY: 0.24); for External and Categorized aggregates, there is no group effect.
260
D. Penny-Leguy and J. Caron-Pargue
5 Discussion Our results show two complementary kinds of processes underlying PD patients’ impairments, both explaining disorganization and lack of stabilization in knowledge. The first kind of impairments concerns implicit knowledge which is disorganized as it is constructed. The construction of implicit knowledge by PD patients shows two specificities. One is due to the highest ratio and the highest length of blocks B0, defined by the absence of verb, and the other to a high ratio, persisting along trials, in the simplification of propositional contents defining blocks. This simplification is progressive and begins by a decrease in the length of blocks BTGT and BGT, and by the decrease of re-organizations inside blocks with Connectives and Modal expressions. But, at the same time implicit knowledge is de-constructed, and gives rise to an apparent de-automatization [8], [9], [11]. That is marked by the highest ratio of complexification of propositional contents, persisting along trials, and by the amount of Modal expressions inside blocks BGT, persisting in blocks BT, and B0. In order to explain that, one can’t resort to impairments in the use of verbs [7], because verbs appear without difference in blocks BTGT, BT, among the three groups of subjects, and in BGT between PD patients and Elderly. The explanation rather stands in impairments in the interaction between two kinds of detachments from the situation, one marked by Modal expressions, the other by starting terms. Both kinds of detachments aim at the re-organization of solving process and at the identification of constraints. They should give rise to an automatic planning, marked mainly by connectives instead of modal expressions. In fact, it is not the case, as shown by the absence of significant difference for the ratio of Connectives, between PD patients and Young, while this ratio is significantly higher for Elderly. This discrepancy leads to think that the apparent similarity between PD patients and Young does not rely on the same kind of processes, while the discrepancy between PD patients and Elderly marks a real impairment for PD patients. A second kind of impairments observed for PD patients involves qualitative differences in procedural processing, and constitutes the most original part of our data. That relies on a specific increase of the ratio of Disk-peg aggregates for PD patients, while it decreases for Elderly, and is almost nil for Young. These differences show an odd location of disks relative to one another taking its explanation in the light of BégoinAugereau and Caron-Pargue’s data [2]. Indeed, the location of disks relative to one another should have been marked by a repetition of the naming of disks, so constructing declarative aggregates. That is not the case, the naming of disks refers to the place where disks are. Then, instead of disks being locating relative to one another, it is the places where disks are, which are located relative to one another, so constructing procedural aggregates. That arises at the detriment of the further constitution of disks in larger units, notably pyramid 12, and therefore at the detriment of the chunking of the conditions of procedures. Here again, the apparent absence of impairment of PD patients relative to Young, in the decrease with trials of Declarative aggregates, probably relies on different kinds of processes, while there is a real impairment relative to their matched Elderly. One can think that such impairments might result from the motor training of PD patients in order to compensate them. That might entail a shift of attention toward the places of objects at the detriment of objects themselves. But the convergence of our
A Qualitative Approach of Learning in Parkinson’s Disease
261
data convinces that the problem is elsewhere. In fact, an apparent absence of impairment of PD patients relative to Young for Categorized aggregates is, in fact, a real impairment relative to Elderly. Furthermore, complementary analyses, not presented here, show that the non-significant decrease with trials of Categorized aggregates, observed for PD patients, concerns mainly the Categorized-Disk-peg-aggregates, while a significant decrease arises for both Categorized-Peg-aggregates and Categorized-Disks-aggregate. That is a specificity of PD patients because, on the contrary, the ratio of Categorized-Disk-peg-aggregates decreases significantly for Elderly, and the ratios of Categorized-Disk-aggregates and Categorized-peg-aggregates, remain both non significantly decreasing for Elderly and for Young. In fact, the categorized aggregates play an essential role in the processes of generalization by internalization and externalization, and in adjustments between conditions and actions, giving rise to the matching between declarative and procedural levels. The amount of Disk-pegaggregates hampers these processes. In sum, our data show that PD patients might begin internal-external interactions, essential for stabilization, and manipulation of information in working memory. But, they do not succeed in carrying on this interaction, because some slight deviations occur during the usual processing. That hampers their whole coordination, which intervene necessarily in the generalization of knowledge. Some of these odd processings were characterized in our analysis, but further investigations are needed.
6 Conclusion A main assumption underlying this research is that implicit knowledge has not the same nature as stabilized representational knowledge. Between them, there is a gap where several re-organizations take place and result in the already observed PD patients’ impairments. Our data provide new explanations relying on the characterization of refined cognitive processes underlying these impairments. These processes, marked by linguistic forms, may give rise to testable predictions, able to characterize early PD. Nevertheless, this research, which takes place within a larger study aiming at formalizing the problem solving processes from linguistic markers, remains exploratory. Furthermore, the case of the 4-disks Tower of Hanoi, too simple for the control groups, does not allow the grasp of all the steps involved in the processes we have characterized. Further analyses are needed. By and large, our approach involves a semiotic approach of general cognitive processes, relying on an enunciative approach of relations in the context of the current situation. Such an approach could be extended to the study of communication [2], and suggests very refined and interacting neurological processes underlining those processes.
References 1. Bégoin-Augereau, S., Caron-Pargue, J.: Linguistic Markers of Decision Processes in a Problem Solving Task. Cognitive Systems Research 10, 102–123 (2009) 2. Bégoin-Augereau, S., Caron-Pargue, J.: Modified decision processes marked by linguistic forms in a problem solving task. Cognitive Systems Research 11, 260–286 (2010)
262
D. Penny-Leguy and J. Caron-Pargue
3. Blanchet, S., Marié, R.M., Dauvillier, F., Landeau, B., Benali, K., Eustache, F., Chavoix, C.: Cognitive processes involved in delayed non-matching-to-sample performance in Parkinson’s disease. European Journal of Neurology 7, 473–483 (2000) 4. Clancey, W.J.: Is abstraction a kind of idea or how conceptualization works? Cognitive Science Quarterly 1, 389–421 (2001) 5. Culioli, A.: Cognition and representation in linguistic theory. J. Benjamins, Amsterdam (1995) 6. Goel, V., Pullara, D., Grafman, J.: A computational model of frontal lobe dysfunction: working memory and the Tower of Hanoi task. Cognitive Science 25, 287–313 (2001) 7. Holtgraves, T., McNamara, P., Cappaert, K., Durso, R.: Linguistic correlates of asymmetric motor symptom severity in Parkinson’s disease. Brain and Cognition 72, 189–196 (2010) 8. Koerts, J., Leenders, K.L., Brouwer, W.H.: Cognitive dysfunction in non demented Parkinson’s disease patients: controlled and automatic behavior. Cortex 45, 922–929 (2009) 9. Muslimovic, D., Post, B., Speelman, D., Schmand, B.: Motor procedural learning in Parkinson’s disease. Brain 130, 2887–2897 (2007) 10. Owen, A.M., Iddon, J.L., Hodges, J.R., Summers, B.A.: Spatial and non spatial working memory at different stages of Parkinson’s disease. Neuropsychologia 35, 519–532 (1997) 11. Taylor, A.E., Saint-Cyr, J.A.: The neuropsychology of Parkinson’s disease. Brain and cognition 28, 281–296 (1995)
Modelling Caregiving Interactions during Stress Azizi Ab Aziz, Jan Treur, and C. Natalie van der Wal Department of Artificial Intelligence, VU University Amsterdam De Boelelaan 1081, 1081HV Amsterdam, The Netherlands {mraaziz,treur,cn.van.der.wal}@few.vu.nl http://www.few.vu.nl/~{mraaziz,treur,cn.van.der.wal}
Abstract. Few studies describing caregiver stress and coping have focused on the effects of informal caregiving for depressed care recipients. The major purpose of this paper was to investigate the dynamics of the informal care support and receipt interactions among caregivers and care recipients using a computational modelling approach. Important concepts in coping skills, strong ties support networks and stress buffering studies were used as a basis for the model design and verification. Simulation experiments for several cases pointed out that the model is able to reproduce interaction among strong tie network members during stress. In addition, the possible equillibria of the model have been determined, and the model has been automatically verified against expected overall properties.
1 Introduction Caring for a family member, spouse or friend (informal caregiving) who is diagnosed with a severe illness (e.g., a unipolar disorder) can be a stressful experience. While most caregivers adapt well to the situation of caring for a person with a unipolar depression, some do not. A number of studies investigate the negative consequences for the informal caregiver, such as the development of depression, burden, burnout, or (chronic) stress, when caring for elderly patients or patients with illnesses like dementia, or Parkinson’s [5], [6], [7], [9], [10]. The current paper addresses the development of stress in informal caregivers of patients with unipolar depression and the effect of this stress on the interactions between the caregiver and care recipient. To understand the caregiver’s adaptations to the cognitive disabilities of his/her close acquaintance, the complex nature of stress processes must be accounted for and the constructs and factors that play a function in the caregiving must be considered. For each individual a number of cognitive and physiological mechanisms regulate the impact of stress on health and well-being. Individuals typically occupy multiple roles in life; becoming a caregiver of a person with depression introduces an additional role, and therefore will require some rearrangement of priorities, and redirection of energy [10]. Not only is this likely to produce strain at a personal level, but it is also likely to spur reactions (potentially negative) from diverse people who are interconnected to a person through his or her roles outside the realm of caregiving. Although much work has been dedicated to understand the caregiving mechanism, little attention has been paid to a computational modelling angle on how caregivers work together to support their close acquaintances under stress. The caregiving process Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 263–273, 2010. © Springer-Verlag Berlin Heidelberg 2010
264
A.A. Aziz, J. Treur, and C.N. van der Wal
is highly dynamic in nature, and it requires demanding resources to monitor such a process in the real world [6]. The aim of this paper is to present a computational model that can be used to simulate the dynamics in the caregiver and care recipient under influence of external events. The current work is an addition to our previous model of social support selection, where in the current model, individuals with a depressive state are receiving help from close acquaintances [1]. The paper is organized as follows; Section 2 describes several theoretical concepts of social support networks and their relation to stress. From this point of view, a formal model is designed (Section 3). Later in Section 4, a number of simulation traces are presented to illustrate how the proposed model satisfies the expected outcomes. In Section 5, a mathematical analysis is performed in order to identify possible equilibria in the model, followed by verification of the model against formally specified expected overall patterns, using an automated verification tool (Section 6). Finally, Section 7 concludes the paper.
2 Underlying Principles in Informal Caregiving Interactions Researchers from several domains have become increasingly interested in social support, caregiving, and mental health. For instance, researchers in nursing and healthcare domain have contributed several theories to explain those relationships by presenting foundations on coping behaviours, mediating attributes, caregiving adaptation, and stress. One of the theories that has been used to explain these interactions is the Theory of Caregiver Stress and Coping which combines important principles in Lazarus Stress-Coping Theory, Interpersonal Framework of Stress-Coping, and Stress Process Theory of Pearlin [3], [4], [11]. Within the model introduced, three aspects play important roles to regulate support and maintain the caregiver’s personal health: 1) externally generated stressors (negative events), 2) mediating conditions, and 3) caregiver outcomes [4], [6], [10]. For the first aspect, stressors are related to specific internal or external demands (primary stressors) that the caregiver has to manage. For example, several studies show that sufficient caregiver personal resources (e.g. financial incomes, social) reduces the perception of caregiving burden, while a loss of emotional resources (long term emotional exhaustion) amplifies the perceived burden [9]. The second aspect represents how the caregiver reacts (coping strategies) when facing the adversity in caregiving. In the proposed model, caregivers who face a primary stressful situation generally use a combination of problem-focused coping and emotion-focused coping. Problem-focused coping is associated with positive interpersonal efforts to get the problem solved [3]. In contrast to this, emotion-focused coping strategies (thinking rather than acting to change the person-environment relationship) entail efforts to regulate the emotional consequences (e.g. avoidance) of stressful or potentially stressful events [4]. This choice of coping is related to the caregiver’s personality, for example, a caregiver with a positive personality (e.g., low in neuroticism) tends to choose problem-focused approach [5]. Another important concept that can derived from these coping strategies is the relationship focused coping (positive or negative). The combination of high caregiver’s empathy (perceiving the inner feeling of care recipient) and problem-focused coping will lead to positive relationship coping, and vice versa [4], [7], [8]. The third aspect is related to the caregiver’s outcome. Mainly, this component ranges on a continuum from bonadaptation
Modelling Caregiving Interactions during Stress
265
(meeting the needs to support the care recipient) to maladaptation (continued negative situation and need for referral and assistance) [4], [11]. In addition to this, bonadaption is related to the high personal accomplishment (expected personal gain) and provided support (social support), while maladaptation is linked to the emotional exhaustion [9]. A high expected personal gain reduces the short term and long term stress level in caregivers, which will improve interaction during the caregiving process [7]. When the care recipients receive support, it will reduce their stress by the resource serves as an insulating factor, or stress buffer, so that people who have more social support resources are less affected by negative events [5], [6].
3 Modeling Approach Based on the analysis of the dynamics in coping behaviours, mediating attributes, caregiving adaptation, and stress, as given in the previous section, it is possible to specify computational properties for the multi-agent model. The results from the interaction between these variables form several relationships, both in instantaneous and in temporal form. To represent these relationships in agent terms, each variable will be coupled with an agent’s name (A or B) and a time variable t. When using the agent variable A, this refers to the caregiver agent and B to the care recipient agent. This convention will be used throughout the development of the model in this paper. The details of this model are shown in Fig. 1.
Caregiver
Caregiver personality
Problem focused Coping
Caregiver empathy
Caregiver (primary stressors)
Bonadaptation
Burden Neg. r/ship focused coping
Caregiver’s short-term stress Maladaptation
Experienced personal satisfaction
Care recipient negative events Care recipient negative interaction
Social support
Expected personal gain
Coping style
Caregiver outcomes
Care recipient health problem status
Support from caregiver
Long term emotional exhaustion Short term emotional exhaustion
Caregiver personal resources
Emotion focused Coping Caregiver negative events
Pos. r/ship focused coping
Negative interaction
Caregiver’s long-term stress
Care recipient functional status
Care recipient negative personality
Care recipient’s long-term stress Care recipient behavioral status
Care recipient coping skills
Stress buffering
Care recipient perceived stress
Care recipient’s short- term stress
Care recipient
Fig. 1. Global Relationships for Caregiving Interactions During Stress
266
A.A. Aziz, J. Treur, and C.N. van der Wal
3.1 The Caregiver Model This component of the overall model aims to formalise important concepts within the caregiver. The instantaneous relationships are expressed as follows. The problemfocused coping PfC is calculated using the combination of the caregiver personality GpP and burden Bd. Note that a high burden level close to 1 will have the effect that the choice of using problem focused coping becomes smaller. PfCA(t) = GpPA(t).(1-BdA(t)) EfCA(t) = (1-GpPA(t)).BdA(t)
(1) (2)
However in emotional-focused coping EfC, those factors provide a contrasting effect. Positive relationship focused coping (RfC+) depends on the relation between problem focused coping and caregiver’s empathy. A high empathy will increase this function, while reducing its counterpart (negative relationship focused coping (RfC-)). RfCA+= PfCA(t).GEA(t) RfCA- = EfCA(t).(1-GEA(t))
(3) (4)
Burden (Bd) is determined by regulating proportional contribution β between caregiver primary stressors (GpS), long term emotional exhaustion (ExH), and caregiver resources (GpR). Expected personal gain (PgN) is measured using the proportional contribution (determined by α) of the bonadaption (Bn) and experienced personal satisfaction EpN. Short term emotional exhaustion EsH is measured by combining maladaption Md and negative relationship of expected personal gain. BdA(t)=[β.GpSA(t)+(1-β).ExHA(t)].(1-GpRA(t)) PgNA(t) = σ.BnA(t) + (1-σ).EpNA(t) EsHA(t) = MdA(t).(1-PgNA(t))
(5) (6) (7)
Caregiver short term stress GsS is related to the presence of caregiver negative events GnE and burden Bd. Note that a high expected personal gain will reduce the short term stress level. The maladaptation Md is calculated using the combination of negative (RfC-), positive, relationship, and emotional-focused coping. In the case of bonadaptation, it is determined by measuring the level of positive, negative, relationship, and problem-focused coping. Parameters φ, ϒ, and ρ provide a proportional contribution factor in respective relationships. In addition to the instantaneous relations, there are four temporal relationships involved, namely experienced personal satisfaction EpN, long term emotional exhaustion ExH, caregiver long term stress GlS, and social support ScP. The rate of change for all temporal relationships are determined by flexibility rates, γ, ϑ, ϕ, and ψ, respectively GsSA(t) = [φ.GnEA(t) + (1-φ).BdA(t)].(1-PgNA(t)) MdA(t) =[ϒ.RfCA-(t)+(1-ϒ).EfCA(t)](1-RfCA+(t)) BnA(t) =[ρ.RfCA+(t)+ (1-ρ).PfCA(t)].(1-RfCA-(t))
(8) (9) (10)
The current value for all of these temporal relations is related to the previous respective attribute. It should be noted that the change process is measured in a time interval between t and t+Δt. The operator Pos for the positive part is defined by Pos(x) = (x + |x|)/2, or, alternatively; Pos(x) = x if x≥0 and 0 else. ExHA(t+Δt) = ExHA(t)+γ.[(Pos(EsHA(t)- ExHA(t)).(1-ExHA(t))) – Pos(-(EsHA(t) - ExHA(t)).ExHA(t))]. Δt EpNA(t+Δt)=EpNA(t)+ ϑ.[(Pos ((ScpA(t)- GpSA(t))–EpNA(t)).(1-EpNA(t))) – Pos(- ((ScpA(t)-GpSA(t)) – EpNA(t)).EpNA(t))].Δt GlSA(t+Δt)= GlSA(t) + ϕ.(GsSA(t)-GlSA(t)).(1- GlSA(t)).GlSA(t)].Δt ScPA(t+Δt)= ScPA(t) + ψ.[(Pos(PgNA(t) - ScPA(t)).(1-ScPA(t))) – Pos(-(PgNA(t)- ScPA(t)).ScPA(t))].Δt
(11) (12) (13) (14)
Modelling Caregiving Interactions during Stress
267
3.2 The Care Recipient Model The care recipient model is another interacting components in the overall model. It has five instantaneous relations (care recipient perceived stress RpS, stress buffer SbF, care recipient short term stress RsS, care recipient functional RfS, and behavioural status RbS) and one temporal relation (care recipient long term stress RlS). RpSB(t) = τ.RnIB(t) + (1-τ).RnEB(t) SbFB(t) = ω.RsGB(t) RsSB(t) = [λ.RpB(t) + (1-λ).(1- RcSB(t))].RpSB(t).(1-SbfB(t)) RfSB(t) = RhSB(t).RlSB(t) RbSB(t) = RpB(t).RlSB(t) RlSB(t+Δt)= RlSB(t) + η.(RsSB(t)-RlSB(t)).(1-RlSB(t)).RlSB(t).Δt
(15) (16) (17) (18) (19) (20)
Care recipient perceived stress is modelled by instantaneous relations (regulated by a proportional factor τ) between the care recipient negative interactions RnI and events RnE. Stress buffer is determined by ω times received support RsG. Care recipient short term stress depends on the relation between stress buffer SbF, and the proportion contribution λ of care recipient coping skills RcS, perceived stress RpS, and negative personality RpS. For the care recipient functional and behaviour status levels, both of these relations are calculated by multiplying the value of care recipient health problem status RhS and negative personality Rp with care recipient long term stress RlS respectively. In addition, the temporal relation of care recipient long term stress is contributed from the accumulation exposure towards care recipient short term stress with the flexibility rate η.
4 Simulation Results In this section, a number of simulated scenarios with a variety of different conditions of individuals are discussed. Only three conditions are considered: prolonged, fluctuated stressor, and non-stressful events with a different personality profile. For clarity, cg and cr denotes caregiver and care recipient agent profiles respectively. The labels ‘good’ and ‘bad’ in Table 1 can also be read as ‘effective’ and ‘ineffective’ or ‘bonadaptive’ and ‘maladaptive’. Table 1. Individual Profiles cg1 cg2 cr1 cr2
Caregiver (‘good’ caregiver) (‘bad’ caregiver) Care recipient (‘good’ coping skills) (‘bad’ coping skills)
GpR 0.8 0.1 RhS 0.9 0.9
GE 0.7 0.2 Rp 0.9 0.9
GpP 0.7 0.2 RcS 0.8 0.1
Corresponding to these settings, the level of severity (or potential onset) is measured, defining that any individual that scored more than 0.5 in their long term stress level (within more than 336 time steps) then the caregiver or support receipt agent will be experiencing stress. There are several parameters that can be varied to simulate different characteristics. However, the current simulations used the following parameters settings: tmax=1000 (to represent a monitoring activity up to 42 days), Δt=0.3, (flexibility rate) ϕ=η=β=ψ=ϑ=0.3, (regulatory rate) α=β=ϒ=ρ=σ=φ=τ=λ=0.5,
268
A.A. Aziz, J. Treur, and C.N. van der Wal
ω=ξ=0.8. These settings were obtained from previous systematic experiments to determine the most suitable parameter values in the model.
lev els
lev els
Result # 1: Caregiver and receiver experience negative events. During this simulation, all agents have been exposed to an extreme case of stressor events. This kind of pattern is comparable to the prolonged stressors throughout a life time. For the first simulation trace (Fig. 2(a)), a good caregiver tends to provide a good social support provision towards its care recipient even facing persistent heighten stressors. This pattern is in line with the 1 findings reported in [5]. 0.8 One of the factors can be used to explain this condi0.6 Long term stress (cg1) tion is the increasing level Expected personal gain (cg1) Long term stress (cr2) of caregiver’s personal Stress buffer (cr2) 0.4 gain. It proposes that care0.2 givers do not unequivocally view caregiving as an over0 1 101 201 301 401 501 601 701 801 901 time(t) whelmingly negative ex1 perience but can appraise 0.8 the demands of caregiving as rewarding [4], [9]. Pre0.6 Long term stress (cg2) vious research works has Expected personal gain (cg2) Long term stress (cr2) 0.4 also suggests that caregivStress buffer(cr2) ing satisfaction is an im0.2 portant aspect of the care0 giving experience and seem 1 101 201 301 401 501 601 701 801 901 time (t) to share parallel relationships with other variables Fig. 2. Simulations during prolonged stressors for (a, upper (e.g, personality and empagraph) a good caregiver and bad care recipient (b, lower graph) a bad caregiver and bad recipient thy) [4], [11]. Moreover, a good caregiver normally uses a problem focused coping to solve the perceived problem and later increases positive relationship focused coping. By the same token, research has consistently established a significant relationship between personal gains, problem focused coping, and positive social support. For example, several studies reported that caregivers who were satisfied with caregiving used more problem-focused coping [3]. Having this in motion, it provides a positive view of social support and later will be translated as a support received by the care recipient. In the second simulation trace (as shown in Fig. 2(b)), both agents (caregiver and care recipient) are facing high long term stress levels in the long run. The precursors of having these conditions are perception of caregiving as a burden and the inability of the caregiver to provide positive coping during stressful events [11]. These factors lead to the decreasing level of caregiver’s positive relationship focused coping and experienced personal gain, and later will reduce the ability to provide support. Additionally, in the real world, it can be perceived as feeling overwhelmed and out of control of the situation. This condition occurs almost within the majority of caregivers when they feel burdened by the demands of caregiving [6].
Modelling Caregiving Interactions during Stress
269
levels
lev els
Result # 2: Caregiver and receiver experience different types of negative events. In this simulation, a new kind of stressor was introduced. This stressor comprises two parts: the first part is one with very high constant prolonged stressors, and is followed by the second one, with a 1 very low stressor event. During simulation, the 0.8 Long term stress (cg1) caregiver agents (cg1 and 0.6 Expected personal gain (cg1) Long term stress (cr2) cg2) were exposed towards Stress buffer (cr2) these stressors, while the 0.4 care recipient agents will 0.2 only experience prolonged 0 stressors. As it can be seen 1 101 201 301 401 501 601 701 801 901 time(t) from Fig. 3(a), the graph 1 Long term stress (cg2) Expected personal gain (cg2) indicates both agents (cg1 Long term stress (cr2) 0.8 Stress bufffer (cr2) and cr2) experience grad0.6 ual drops in their long term stress. Comparison be- 0.4 tween Fig. 3(a) and Fig. 3(a), shows that the sce- 0.2 nario’s almost have a simi0 1 101 201 301 401 501 601 701 801 901 time(t) lar pattern, but 3(a) has a substantial decrease in a Fig. 3. Simulation traces during different stressors for caregiver’s long term stress (a, upper graph) a good caregiver and bad care recipient (b, lower graph) a bad caregiver and bad recipient level after the first half of the simulation. It is consistent with the findings that caregivers with a positive personality, empathic, and high personal resources tend to help more if they experienced less negative event [3], [8]. Meanwhile, Fig. 3(b) provides different scenarios. The simulation results show that caregivers with a negative personality, less empathic, and low personal resources is incapable to provide support during caregiving process. Note that despite the caregivers experience non-stressor events after the first half of the simulation, their care recipient is still experiencing a high long term stress level. Similar findings can be found in [5], [10]. Result # 3: Managing a good care recipient. In this part, simulation was carried out to investigate the effects of the caregiving behaviours of caregiver agents with different profiles to good care recipients, during prolonged negative stressors. Interaction between good caregiver and recipient shows that both agents have low long term stress levels, while the recipients stress buffer and the caregiver’s expected personal gain are increasing [5], [7]. On the contrary, interaction between bad caregiver and good care recipient indicates that both agents are experiencing high long term stress levels. However, the care recipient experiences lesser long term stress compared to the caregiver.
5 Mathematical Analysis In this section it is discussed which equilibria value are possible for the model, i.e., values for the variables of the model for which no change will occur. As a first step
270
A.A. Aziz, J. Treur, and C.N. van der Wal
the temporal relations for both caregiver and care recipient will be inspected (refer to the equations (11),(12),(13),(14),and (20)). An equilibrium state is characterised by: ExHA(t+Δt) = ExHA(t), ScPA(t+Δt)= ScPA(t), GlSA(t+Δt)= GlSA(t), EpNA(t+Δt)=EpNA(t), and RlSB(t+Δt)= RlSB(t). Assuming γ, ψ, ϕ, ϑ nonzero, and leaving out t, this is equivalent to: [(Pos(EsHA-ExHA).(1-ExHA)) – Pos(-(EsHA-ExHA).ExHA)] = 0 [(Pos(PgNA-ScPA).(1-ScPA)) – Pos(-(PgNA-ScPA).ScPA)] = 0 (GsSA-GlSA).(1-GlSA).GlSA = 0 [(Pos((ScPA-GpSA)–EpNA).(1-EpNA))-Pos(-((ScPA-GpSA)–EpNA).EpNA)] = 0 (RsSB-RlSB).(1-RlSB).RlSB = 0
These equations are equivalent to: and (EsHA-ExHA).ExHA = 0 (EsHA-ExHA).(1-ExHA) = 0 (PgNA-ScPA).(1-ScPA) = 0 and (PgNA-ScPA).ScPA = 0 (GsSA-GlSA).(1-GlSA).GlSA = 0 ((ScPA-GpSA) –EpNA).(1-EpNA)) = 0 and ((ScPA-GpSA) –EpNA).EpNA = 0 RlSB = RsSB or RlSB = 0 or RlSB = 1
These have the following solutions: EsHA = ExHA PgNA = ScPA GlSA = GsSA or GlSA = 0 or GlSA = 1 ScPA-GpSA = EpNA RlSB = RsSB or RlSB = 0 or RlSB = 1
(21) (22) (23) (24) (25)
This means that for the caregiver short term and long term emotional exhaustion are equal (21). Also for both the caregiver and the care recipient short term and long term stress are the same, when the long term stress is not 0 or 1 (23) and (25). Moreover, for the caregiver social support provision is equal to expected personal gain (22), and on the other hand social support provision is equal to the sum of experienced personal gain and the caregiver’s primary stressors (24).
6 Formal Verification of the Model This section addresses the analysis of the informal caregiving interactions model by specification and verification of properties expressing dynamic patterns that are expected to emerge. The purpose of this type of verification is to check whether the model behaves as it should by running a large number of simulations and automatically verifying such properties against the simulation traces. A number of dynamic properties have been identified, formalized in the language TTL and automatically checked [2]. The language TTL is built on atoms state(γ, t) |= p denoting that p holds in trace γ (a trajectory of states over time). Dynamic properties are temporal predicate logic statements that can be formulated using such state atoms. Below, a some of the dynamic properties that were identified for the informal caregiving interactions model are introduced, both in semi-formal and in informal notation. Note that the properties are all defined for a particular trace γ or a pair of traces γ1, γ2. P1 – Stress level of cg
For all time points t1 and t2 in traces γ1 and γ2 if in trace γ1 at t1 the level of negative life events of agent cg is x1 and in trace γ2 at t1 the level of negative life events of agent CG is x2, and in trace γ1 at t1 the level of personal resources of agent cg is y1 and in trace γ2 at t1 the level of personal resources of agent cg is y1, and in trace γ1 at t1 the level of long term stress of agent cg is z1 and in trace γ2 at t1 the level of caregiver stress of agent cg is z2,
Modelling Caregiving Interactions during Stress and then
271
x1 ≥ x2, and y1 ≤ y2, and t1 < t2, z1 ≥ z2.
∀γ1, γ2:TRACE, ∀t1, t2:TIME ∀x1,x2, y1, y2, z1, z2:REAL state(γ1, t1) |= negative_life_events(ag(cg), x1) & state(γ2, t1) |= negative_life_events(ag(cg), x2) & state(γ1, t1) |= personal_resources(ag(cg), y1) & state(γ2, t1) |= personal_resources (ag(cg), y2) & state(γ1, t2) |= long_term_stress(ag(cg), z1) & state(γ2, t2) |= long_term_stress (ag(cg), z2) & x1 ≥ x2 & y1 ≤y2 & t1 < t2 ⇒ z1 ≥ z2
Property P1 can be used to check whether caregivers with more stressful life events and lack of resources will experience a higher level of caregiver (long term) stress. The property succeeded when two traces were compared where in one trace the caregiver had more (or equal) negative life events and less personal resources than the caregiver from the other trace. In this situation the first caregiver experienced more long term stress than the caregiver with more personal resources and less negative life events. Notice that since this property checks whether it is true for all time points in the traces, in some simulation traces the values for negative life events or personal resources change halfway the simulation trace, then the property succeeds for only a part of the trace, which can be expressed by an additional condition stating that t1 is at time point 500 (halfway our traces of 1000 time steps). P2 – Stress buffering of cr
For all time points t1 and t2 in trace γ, If at t1 the level of received social support of agent cr is m1 and m1 ≥ 0.5 (high) and at time point t2 the level of the stress buffer of agent cr is m2 and t2≥ t1+d, then m2 ≥ 0.5 (high). ∀γ:TRACE, ∀t1, t2:TIME ∀m1, m2, d:REAL state(γ, t1) |= received_social_support(ag(cr), m1) & state(γ, t2) |= stress_buffer(ag(cr), m2) & m1 ≥ 0.5 & t2= t1+d ⇒ m2 ≥ 0.5
Property P2 can be used to check whether social support buffers the care recipient’s stress. It is checked whether if the received social support in agent cr is high (a value higher or equal to 0.5), then the stress buffer of agent cr also has a high value after some time (having a value above or equal to 0.5). The property succeeded on the traces, where the received social support was higher or equal to 0.5. Relating positive recovery of care receiver and social support from care giver Property P3 can be used to check whether positive recovery shown by the care recipient, will make the caregiver provide more social support at a later time point. This property P3can be logically related to milestone properties P3a and P3b that together imply it: P3a & P3b ⇒ P3. Given this, using the checker it can be found out why a hierarchically higher level property does not succeed. For example, when property P3 does not succeed on a trace, by the above implication it can be concluded that at least one of P3a and P3b cannot be satisfied. By the model checker it can be discovered if it is property P3a and/or P3b that does/do not succeed. Properties P3a and P3b are introduced after property P3 below. P3 – Positive recovery of cr leads to more social support from cg
For all time points t1 and t2 in trace γ, If at time point t1 the level of primary stressors of agent cg is d1 and at time point t2 the level of primary stressors of agent cg is d2
272
A.A. Aziz, J. Treur, and C.N. van der Wal
and and and then
at time point t1 the level of received support of agent cr is f1 at time point t2 the level of received support of agent CR is f2 d2 ≥ d1, and t1< t2, f2 ≥ f1
∀γ:TRACE, ∀t1, t2:TIME ∀d1, d2, f1, f2:REAL state(γ, t1) |= primary_stressors(ag(cg), d1) & state(γ, t2) |= primary_stressors (ag(cg), d2) & state(γ, t1) |= received_social_support(ag(cr), f1) & state(γ, t2) |= received_social_support(ag(cr), f2) & d2 < d1 & t1< t2 ⇒ f2 ≥ f1
Property P3 succeeded in all generated simulation traces: when the primary stressors of the caregiver decreased, then at a later time point the received social support of the care recipient increased. In some simulation traces the property only succeeded on the first or second half of the trace. In these traces the primary stressors of the caregiver increased in the first part of the trace and then decreased in the second part of the trace. For this, a condition was added to the antecedent of the formal property, namely t1 = 500 or t2 = 500, so that the property is only checked on the second part or first part of the trace respectively. P3a – Positive recovery of cr leads to more personal gain in cg
For all time points t1 and t2 in trace γ, If at t1 the level of primary stressors of agent cg is d1 and at time point t2 the level of primary stressors of agent cg is d2 and at time point t1 the level of personal gain of agent cg is e1 and at time point t2 the level of personal gain of agent cg is e2 and d2 ≤ d1, and t1< t2 then e2 ≥ e1
∀γ:TRACE, ∀t1, t2:TIME ∀d1, d2, e1, e2:REAL state(γ, t1) |= primary_stressors(ag(cg), d1) & state(γ, t2) |= primary_stressors (ag(cg), d2) & state(γ, t1) |= expected_personal_gain(ag(cg), e1) & state(γ, t2) |= expected_personal_ gain (ag(cg), e2) & d2 < d1 & t1< t2 ⇒ e2 ≥ e1
Property P3a can be used to check whether, the caregiver’s expected personal gain will increase, if the primary stressors of the caregiver decrease. This property succeeded on the simulation traces where the primary stressors of the caregiver indeed decreased. P3b – Personal gain in cg motivates cg to provide more social support to cr
For all time points t1 and t2 in trace γ, If at time point t1 the level of personal gain of agent cg is e1 and at time point t2 the level of personal gain of agent cg is e2 and at t1 the level of received support of agent cr is f1 and at time point t2 the level of received support of agent cr is f2, and e2 ≥ e1, and t1< t2, then f2 ≥ f1
∀γ:TRACE, ∀t1, t2:TIME ∀e1, e2, f1, f2:REAL state(γ, t1) |= expected_personal_gain(ag(cg), e1) & state(γ, t2) |= expected_personal_gain(ag(cg), e2) & state(γ, t1) |= received_social_support(ag(cr), f1) & state(γ, t2) |= received_social_support(ag(cr), f2) & e2 > e1 & t1< t2 ⇒ f2 ≥ f1
Property P3b can be used to check whether the caregiver receives more social support if the expected personal gain of the caregiver increases. This property succeeded on the simulation traces where the expected personal gain indeed increased.
Modelling Caregiving Interactions during Stress
273
7 Conclusion The challenge addressed in this paper is to provide a computational model that is capable of simulating the behaviour of an informal caregiver and care recipient in a caregiving process when dealing with negative events. The proposed model is based on several insights from psychology, specifically stress-coping theory, and informal caregiving interactions; see [3], [4]. Simulation traces show interesting patterns that illustrate the relationship between personality attributes, support provision, and support receiving, and the effect on long term stress. A mathematical analysis indicates which types of equillibria occur for the model. Furthermore, using generated simulation traces, the model has been verified against a number of properties describing emerging patterns put forward in the literature. The resulting model can be useful to understand how certain concepts in a societal level (for example; personality attributes) may influence caregivers and recipients while coping with incoming stress. In addition to this, it could be used as a mechanism to develop assistive agents that are capable to support informal caregivers when they are facing stress during a caregiving process. As part of future work, it would be interesting to expand the proposed model in a social network of multiple caregivers and care recipients.
References 1. Aziz, A.A., Treur, J.: Modeling Dynamics of Social Support Networks for Mutual Support in Coping with Stress. In: Nguyen, N.T., Katarzyniak, R., Janiak, A. (eds.) Proc. of the First Int. Conference on Computational Collective Intelligence, ICCCI 2009, Part B. SCI, vol. 244, pp. 167–179. Springer, Heidelberg (2009) 2. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. Int. Journal of Cooperative Information Systems 18, 167–1193 (2009) 3. Folkman, S.: Personal Control, Stress and Coping Processes: A theoretical analysis. Journal of Personality and Social Psychology 46, 839–852 (1984) 4. Kramer, B.J.: Expanding the Conceptualization of Caregiver Coping: The Importance of Relationship Focused Coping Strategies. J. of Family Relations 42(4), 383–391 (1993) 5. Musil, M.C., Morris, D.L., Warner, C., Saeid, H.: Issues in Caregivers Stress’ and Provider’s Support. Research on Aging 25(5), 505–526 (2003) 6. Sisk, R.J.: Caregiver burden and Health Promotion. International Journal of Nursing Studies 37, 37–43 (2000) 7. Sherwood, P., Given, C., Given, B., Von Eye, A.: Caregiver Burden and Depressive Symptoms: Analysis of Common Outcomes in Caregivers of Elderly Patients. Journal of Aging and Health 17(2), 125–147 (2005) 8. Skaff, M.M., Pearlin, L.I.: Caregiving: Role Engulfment and the Lost of Self. Gerontologist 32(5), 656–664 (1992) 9. Ostwald, S.K.: Caregiver Exhaustion: Caring for the Hidden Patients. Adv. Practical Nursing 3, 29–35 (1997) 10. Whitlach, C.J., Feinberg, L.F., Sebesta, D.F.: Depression and Health in Family Caregivers: Adaptation over Time. Journal of Aging and Health 9, 22–43 (1997) 11. Yates, M.E., Tennstedt, S., Chang, B.H.: Contributions to and Mediators Psychological Well-being for Informal Caregivers. J. of Gerontology 54, 12–22 (1999)
Computational Modeling and Analysis of Therapeutical Interventions for Depression Fiemke Both, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur VU University Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081HV Amsterdam, The Netherlands {fboth,mhoogen,mcaklein,treur}@cs.vu.nl http://www.few.vu.nl/~{fboth,mhoogen,mcaklein,treur}
Abstract. Depressions impose a huge burden on both the patient suffering from a depression as well as society in general. In order to make interventions for a depressed patient during a therapy more personalized and effective, a supporting personal software agent can be useful. Such an agent should then have a good idea of the current state of the person. A computational model for human mood regulation and depression has been developed in previous work, but in order for the agent to give optimal support during an intervention, it should also have knowledge on the precise functioning of the intervention in relation with the mood regulation and depression. This paper therefore presents computational models for these interventions for different types of therapy. Simulation results are presented showing that the mood regulation and depression indeed follow the expected patterns when applying these therapies. The intervention models have been evaluated for a variety of patient types by simulation experiments and formal verification.
1 Introduction Major depression is currently the fourth disorder worldwide in terms of disease burden, and is expected to be the disorder with the highest disease burden in high-income countries by the year 2030 (cf. [14]). Effective interventions for treating depressions are of utmost importance for both the patients suffering from a depression as well as for society in general. Supporting software agents can be very helpful in effectively treating a depression by providing personalized support for patients. Thereby, the agent can for example provide feedback on the current situation, give tips, and give certain tasks or assignments. In order for such a personal assistant agents to function effectively, it requires a detailed computational model on the relevant human states and their interrelationship regarding regulation of mood and depression. Such a model can also help to understand and analyze the basics behind a depression better. In [5] an example was shown of a computational model for mood regulation and depression based on literature on emotion and mood regulation. This model however does not explicitly address the functioning of interventions, such as activity scheduling [13] and cognitive restructuring [3]. Particularly for the domain of a personal assistant agent that supports patients during a major depression, knowledge about the functioning of these therapies is crucial to give effective support. In [6] a first attempt has Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 274–287, 2010. © Springer-Verlag Berlin Heidelberg 2010
Computational Modeling and Analysis of Therapeutical Interventions for Depression
275
been made to create such a model that combines the concepts of mood, depression, and a single type of intervention, namely activity scheduling. This paper presents a computational model of the effect on interventions on mood regulation and depression for a number of frequently used interventions such as activity scheduling, cognitive behavioral therapy, and other types of interventions aiming at enhancing coping skills. Within the model, the main principles of the interventions from psychological literature have been incorporated. This computational model is an extension of the mood regulation and depression model as presented in [5]. The model was used to simulate various patient types and the correctness of the behavior was analyzed using formal verification. The obtained model is suitable to be integrated within a personal assistant agent in order to provide effective support for the patient. In recent literature many contributions can be found about relations between mood regulation or depression and brain functioning; e.g., [1, 2, 7, 8, 9, 10, 11, 12, 15]. Much neurological support has been found for the processes of emotion and mood regulation, and in particular for modulation (down-regulation) of a negative mood in order to avoid or recover from a depression; e.g., [1, 2, 7, 12]. To capture this process of down-regulation of negative moods has been a basic point of departure for the model designed. More specifically, the model presented in this paper addresses how this down-regulation process can be stimulated and improved by therapeutical interventions. This paper is organized as follows. In Section 2 the model for mood regulation and depression as taken from [5] is explained in more detail. The various interventions are integrated into the model in Section 3. Section 4 presents simulation results, whereas Section 5 verifies that these results indeed comply with existing theories within clinical psychology. Finally, Section 6 is a discussion.
2 A Model for Mood Regulation and Depression In order to model mood regulation and depression an existing model has been adopted which is based on psychological and neurological literature on mood regulation (cf. [5]). In this section, this model is explained in more detail. The model as described already incorporates the main influences of interventions upon the states in the model (as an extension to the existing model of [5]). The learning effects for each of the specific therapies will be described in Section 3. Figure 1 shows an overview of the relevant states within the model and the relations between the states. In the figure, the states that are depicted in grey represent states that have been added to model the points of impact of interventions. The same holds for the dashed lines. States. In the model, a number of states are defined, whereby each state is represented by a number on the interval [0,1]. First, the states of the previous model will be explained. Hereby, the state objective emotional value of situation is present, which represents the value of the situation a human is in (without any influence of the current state of mind of the human). The state appraisal represents the current judgment of the situation given the current state of mind (e.g. when you are feeling down, a pleasant situation might no longer be considered pleasant). The mood level represents the current mood of the human, whereas thoughts level the current level of thoughts
276
F. Both et al.
ŝŶƚĞƌǀĞŶƚŝŽŶ ĂƉƉƌĂŝƐĂů ĞĨĨĞĐƚ
Žďũ͘ĞŵŽͲǀĂůƵĞ ǁŽƌůĚĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶ ŶĞŐ͘ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐĨŽƌ ŽƉĞŶŶĞƐƐ ĨŽƌy ŝŶƚĞƌǀĞŶƚŝŽŶ
Fig. 1. Model for mood and depression (dashed lines and gray states indicate the extensions compared to [5])
(i.e. the positivism of the thoughts). The long term prospected mood level expresses what mood level the human is striving for in the long term, whereas the short term prospected mood level represents the goal for mood on the shorter term (in case you are feeling very bad, your short term goal will not be to feel excellent immediately, but to feel somewhat better). The sensitivity indicates the ability to select situations in order to bring the mood level to the short term prospected mood level. Coping expresses the ability of a human to deal with negative moods and situations, whereas vulnerability expresses how vulnerable the human is for getting depressed. Finally, world event indicates an external situation which is imposed on the human (e.g., losing your job). In addition to the states mentioned above, a number of states have been added to the model. First, a state representing the intervention (i.e., intervention) expressing that an intervention is taking place. The state reflection on negative thoughts expresses the therapeutic effect that the human is made aware of negative thinking about situations whereas the appraisal effect models the immediate effect on the appraisal of the situation. The world influences state is used to represent the impact of a therapy aiming to improve the objective emotional value of situation. The openness for intervention is a state indicating how open the human is for therapy in general, which is made more specific for each specific influence of the therapy in the state openness for X. Finally, reflection represents the ability to reflect on the relationships between various states, and as a result learn something for the future. Dynamics. The states as explained above are causally related, as indicated by the arrows in Figure 1. These influences have been mathematically modeled. The first state to be discussed is the objective emotional value of situation (oevs). This represents the situation selection mechanism of the human. First, the change in situation as would be selected by the human is determined (referred to as action in this case) as an intermediate step: action(t) = oevs(t) + sensitivity(t) (Neg(oevs(t)·(st_prosp_mood(t)-mood(t)) + Pos((1-oevs(t))·(st_prosp_mood(t)-mood(t))))
Computational Modeling and Analysis of Therapeutical Interventions for Depression
277
In the equation, the Neg(X) evaluates to 0 in case X is positive, and X in case X is negative, and Pos(X) evaluates to X in case X is positive, and 0 in case X is negative. The formula expresses that the selected situation is more negative compared to the previous oevs in case the short term prospected mood is lower than the current mood and more positive in the opposite case. Note that the whole result is multiplied with the sensitivity. The action in combination with the external influences now determines the new value for oevs: oevs(t+Δt) = oevs(t) + (world_event(t)·(action(t) + openness(t)·world_influence(t)·(1 – action(t))) - oevs(t))·Δt
The above equations basically take the value of actions as derived before in combination with the external influences (i.e. world influence and world event). The second step is that the human starts to judge the situation (i.e. appraisal) based upon his/her own state of mind: appraisal(t+Δt) = appraisal(t) + α (γ + openness_intervention(t)·reflect_neg_th(t) - appraisal(t)) Δt
where γ = (vulnerability·oevs(t)·thoughts(t) + coping·(1 - (1-oevs(t))·(1-thoughts(t))))
The value of appraisal is determined by the thoughts of the human in combination with the coping skills and vulnerability. In addition, the intervention related state reflection on negative thoughts plays a role (i.e. being aware that you are judging the situation as more negative than a person without a depression) in combination with the openness to this type of intervention. The state reflection on negative thoughts is calculated as follows: reflect_neg_th(t) = (basic_reflection(t) + appraisal_effect(t)·openness_X(t))·(1-appraisal(t))
Hence, the value increases based upon the appraisal effect of the intervention in combination with the openness to this specific part of the intervention. Furthermore, a basic reflection is expressed, which is the reflection already present in the beginning. Therapy can also dynamically change this basic reflection which can be seen as one of the permanent effects of therapy: basic_reflection(t+Δt) = basic_reflection(t) + α intervention(t)·learning_factor·(1-asic_reflection(t))Δt
The value for mood depends on a combination of the current appraisal with the thoughts, whereby a positive influence (i.e. thoughts and appraisal are higher than mood) is determined by the coping and the negative influence by the vulnerability. mood(t+Δt) = mood(t) + α (Pos(coping·(ε - mood(t))) - Neg(vulnerability·(ε - mood(t)))) Δt
where ε = appraisal(t)·wappraisal_mood + thoughts(t)·wthoughts_mood
Thoughts is a bit more complex, and is expressed as follows: thoughts(t+Δt) = thoughts(t) + α (ζ + (1 - (thoughts(t) + ζ)) · intervention(t)·wintervention(t))Δt
278
F. Both et al.
where: ζ = Pos(coping·(appraisal(t)·wappraisal_thoughts + mood(t)·wmood_thoughts - thoughts(t))) – Neg(vulnerability·(appraisal(t)·wappraisal_thoughts + mood(t)·wmood_thoughts - thoughts(t))) wintervention(t+Δt) = wintervention(t) + α (openness_X(t) - wintervention(t))Δt
This indicates that thoughts are positively influenced by the fact that you participate in an intervention (you start thinking a bit more positive about the situation, you are in therapy). The weight of this contribution depends on the openness for the intervention at that time point. In addition, the thoughts can either be positively influenced due to the higher combination of the levels of mood and appraisal (again multiplied with the coping), or negatively influenced by the same state (whereby the vulnerability plays a role). The sensitivity is calculated in a similar manner (without the influence of therapy of course): sensitivity(t+Δt) = sensitivity(t) + α (Pos(coping·(η - sensitivity(t))) - Neg(vulnerability·(η - sensitivity(t))))Δt
where η = mood(t)·wmood_sens + thoughts(t)·wthoughts_sens
Finally, the short term prospected mood is calculated as follows: st_prospmood(t+Δt) = st_prospmood(t) + α (vulnerability·(mood(t) - lt_prospmood) + coping·(lt_prospmood - st_prospmood(t)))Δt
3 Modeling Interventions for Mood Regulation and Depression In this section it is shown how the influences of three types of therapies are modeled in the extended model presented in Section 2. First, activity scheduling (cf. [13]) will be discussed, followed by cognitive behavior therapy (cf. [3]). The third model shows how an intervention that addresses coping skills and vulnerability directly can work. 3.1 Activity Scheduling Therapy Activity scheduling, also called behavioral activation therapy, works according to two principles: the patient learns the relationship between the selection of a relatively positive activity and the level of mood (i.e., when you do fun things, you will start to feel better). In order to learn this relationship again, the therapy imposes the selection of positive situations. In Figure 2 the main influences of this therapy are shown by means of the black arrows. Note that most of the influences have already been explained in the general overview in Section 2. One element part of the therapy states that learning the relationship between mood and objective emotional value of situation results in better coping (as the human can now better cope with a lower mood since he/she knows that an option is to select better situations). This is expressed as follows: coping(t+Δt) = coping(t) + α reflection(t)·wreflection(t)· (1 - |oevs(t) - mood(t)| )·(1 - coping(t)) Δt
where wreflection(t+Δt) = wreflection(t) + α (openness_as(t) - wreflection(t)) Δt
Computational Modeling and Analysis of Therapeutical Interventions for Depression
279
This states that the coping will increase as the difference between the mood and oevs is perceived small (which makes it easy to see the relationship and improve coping). Furthermore, an effect is that the openness for the specific therapy increases as the coping skills go up (since the human notices that the therapy works): openness_as(t+Δt) = openness_as(t) + θ α ·((coping(t) - coping(t-Δt))/dt) Δt
ĂĐƚŝǀŝƚLJ ƐĐŚĞĚƵůŝŶŐ
ŝŶƚĞƌǀĞŶƚŝŽŶ
ǁŽƌůĚ ĞǀĞŶƚƐ Žďũ͘ĞŵŽͲǀĂůƵĞ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ĂƉƉƌĂŝƐĂů
ŵŽŽĚ ůĞǀĞů
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌ^
Fig. 2. Computational model for activity scheduling therapy
3.2 Cognitive Behavioral Therapy Most negative situations occur without being able to control them. It is impossible to avoid all bad situations, it is therefore wise to be able to deal with bad circumstances. The theory behind CBT assumes that emotions are determined by thoughts about a situation and not by the situation itself. In the mood regulation model, it is not the concept ‘thoughts level’ but the concept ‘appraisal’ that corresponds to thoughts in the CBT theory, because the thoughts in CBT are about a specific situation, as the state ‘appraisal’ in the mood regulation model, and do not represent thoughts in general. The intervention CBT consists of understanding (reflection) that thoughts about a situation determine your mood and by detecting and transforming negative thoughts into positive thinking. The fact that you are doing something about your depression improves the thoughts level, which is a shared effect of CBT with the other therapies. Figure 3 shows the relevant part of the model for CBT by means of the black arrows. In this case, the reflection is modeled by learning the relationship between appraisal and mood: coping(t+Δt) = coping(t) + α reflection(t)·wreflection(t)·(1 - |appraisal(t)-mood(t)| )·(1 - coping(t)) Δt
In addition, the openness for CBT is increased by reflection in the same manner as the openness for AS.
280
F. Both et al. ŝŶƚĞƌǀĞŶƚŝŽŶ d ĂƉƉƌĂŝƐĂů ĞĨĨĞĐƚ
ǁŽƌůĚ Žďũ͘ĞŵŽͲǀĂůƵĞ ĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌd
Fig. 3. Computational Model for Cognitive Behavior Therapy
3.3 Intervention Directly Addressing Coping Skills and Vulnerability The last type of intervention investigated is one which is assumed to affect coping skills and vulnerability directly. Such a type of intervention might be based, for example, on a belief that coping skills and vulnerability may be affected negatively by traumatic experiences in the past, and that these effects could be taken away or diminished by some form of therapy addressing these. For the moment ignoring questions such as whether existing therapies with such claims are effective, or would be possible at all, it still can be explored how such a type of therapy could work according to the computational model. This is shown in Figure 4. Here the impact of the therapy is modeled as a direct causal connection to coping skills and vulnerability.
ŝƌĞĐƚ /ŶƚĞƌǀĞŶƚŝŽŶŝŶƚĞƌǀĞŶƚŝŽŶ ǁŽƌůĚ Žďũ͘ĞŵŽͲǀĂůƵĞ ĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ǀƵůŶĞƌĂďŝůŝƚLJ ĞĨĨĞĐƚ ƌĞĨůĞĐƚŝŽŶ
ǀƵůŶĞƌĂďŝůŝƚLJ
ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ ^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌ/
Fig. 4. Computational Model for an Intervention Directly Addressing Coping Skills and Vulnerability
Computational Modeling and Analysis of Therapeutical Interventions for Depression
281
4 Simulation Results In this section, simulation results are presented. Three different fictional persons are studied with divergent values for coping and vulnerability. Furthermore, the value for openness is varied for each of these persons as well (0.2 and 0.3 for less and more openness respectively). These values are chosen to show the different influences of the therapies on different types of people and are in accordance with real persons who will follow the therapies in the future. Table 1 shows the initial values for the most important variables of the model for each person: Table 1. Initial values for the simulation experiments
coping vulnerability oevs appraisal, mood, thoughts, sensitivity, short term prospected mood, long term prospected mood
person 1 0.1 0.9 0.925 0.6
person 2 0.15 0.85 0.907 0.65
person 3 0.3 0.7 0.84 0.7
For the sake of brevity, this section will only discuss the results for person 1. First, the simulation without any form of therapy is shown. The person experiences very negative events during a substantial period (with value 0.1 during 80 hours). Since the person is highly vulnerable, a depression follows. Note that time is represented in hours. 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 5. Person type 1 without therapy
The figure shows that a negative event of 0.1 is imposed on the person; this has a dramatic effect on all of the internal states of the patient: mood drops to a very low level and so do appraisal and the short term prospected mood. Eventually all states do start to increase again due to relatively good situations selected, but this goes very slowly. Figure 6 shows an example whereby the patient is receiving cognitive behavioral therapy. The patient does however have a relatively low openness of 0.2 for this type of therapy.
282
F. Both et al. 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 6. Person type 1 following CBT with a lower openness 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 7. Person type 1 following CBT with a higher openness 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 8. Person type 1 following AS with a lower openness 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 9. Person type 1 following AS with a higher openness
Computational Modeling and Analysis of Therapeutical Interventions for Depression
283
For this case, it can be seen that the appraisal is increased via reflection on negative thoughts, pulling the other states up as well. It does however still take quite some time to get the mood level sufficiently up. The dip after the intervention stops (after 6 weeks) is the result of the fact that the person is no longer reminded about the correctness/importance of appraisal, resulting in a slight search for a new equilibrium. If the openness is increased, the person recovers more quickly, because reflection on negative thoughts increases faster (see Figure 7). For activity scheduling the same types of experiments have been conducted. Figure 8 shows an example of a person with a lower openness for this type of therapy. In this case, the world influence changes due to the therapy (since the therapy results in better situations being selected). This results in an increase of the objective emotional value of situation, pulling the rest of the states up as well. In case the person is more sensitive to the therapy, the oevs increases more quickly and therefore it takes less time for the person to recover (Figure 9). Finally, in Figure 10 and 11 the results for the direct intervention are shown for a person with a low and high openness respectively. The figures show a more rapid recovery in case of a higher openness. 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 10. Person type 1 following a direct intervention with lower openness 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 11. Person type 1 following a direct intervention with higher openness
5 Analysis of the Computational Model In this section, an analysis of the model described above is presented. Two different types of analysis have been performed, with partly different purposes. First, in order to verify the patterns produced by the model, a number of temporal patterns have been
284
F. Both et al.
specified that reflect a number of general characteristics of the process of depression and its treatment. For example, the characteristic that the length of a depression should be shorter for persons that follow a therapy than for people that did not follow a therapy. These properties have been automatically verified for different simulation traces of the model (Section 5.1). Second, the effect of specific therapies on the change of the values for the different variables in the model has been analyzed. This analysis is also useful for verification of the intended effect of a therapy, but can be used for a different purpose as well. Based on the order in which different model variables start changing in reaction to a specific therapy, it is possible to derive which type therapy is given. Thus, this analysis forms a basis for a diagnostic process that can detect that a person follows some specific type of therapy, based on observations of values of variables that are present in the model (e.g., reports about the mood or an analysis of the objective emotional value of the situation). This part of the analysis is described in Section 5.2. 5.1 Verification The following temporal properties that reflect a number of general patterns and characteristics of the process of depression and the treatment have been formulated. The properties were specified in the TTL language [4]. This predicate logical temporal language supports formal specification and analysis of dynamic properties, covering both qualitative and quantitative aspects. TTL is built on atoms referring to states of the world, time points and traces, i.e. trajectories of states over time. In addition, dynamic properties are temporal statements that can be formulated with respect to traces based on the state ontology Ont in the following manner. Given a trace γ over state ontology Ont, the state in γ at time point t is denoted by state(γ, t). These states can be related to state properties via the infix predicate |=, where state(γ, t) |= p denotes that state property p holds in trace γ at time t. Based on these statements, dynamic properties can be formulated in a sorted first-order predicate logic, using quantifiers over time and traces and the usual first-order logical connectives such as ¬, ∧, ∨, ⇒, ∀, ∃. For more details, see [4]. Automated tool support is also available that allows for verifying whether the properties hold in a set of simulation traces. A number of simulations (thereby considering all the different types of persons mentioned in Section 4 in combination with different openness to therapy) have been used as basis for the verification and were confirmed. P1: Effectiveness of Therapy Persons that follow a therapy are depressed for a shorter period than persons who do not.
∀γ1, γ2:TRACE, ∀ t:TIME [ [ [ state(γ1, t) |= intervention_CBT | state(γ1, t) |= intervention_AS ] & state(γ2, t) |= not intervention_AS & state(γ2, t) |= not intervention_CBT ] ⇒ ∃t2:TIME > t, R1,R2 :REAL [ R1 < MIN_LEVEL & R2 > MIN_LEVEL & state(γ2, t2) |= has_value(mood, R1) & state(γ1, t2) |= has_value(mood, R2)]
P2: Openness to therapy helps Persons more open to therapy remain depressed for a shorter period than those less open. ∀γ1, ∀γ2:TRACE, ∀ R1,R2: REAL, t:TIME [ [ state(γ1 t) |= has_value(openness,R1) & state(γ2 t) |= has_value(openness,R2) & R2 < R1) ] ⇒ ∃t2:TIME, R3,R4 :REAL [ R3 < MIN_LEVEL & R4 > MIN_LEVEL & state(γ2, t2) |= has_value(mood, R3) & state(γ1, t2) |= has_value(mood, R4) ]
Computational Modeling and Analysis of Therapeutical Interventions for Depression
285
P3: Effect on coping skills After a person has followed therapy for some time, the coping skills have improved.
∀γ:TRACE, t:TIME, R1:REAL [ [ [state(γ, t) |= intervention_CBT | state(γ, t) |= intervention_AS ) ] & state(γ, t) |= has_value(coping, R1)] ⇒ ∃t2:TIME > t + MIN_DURATION, R2 :REAL [ R2 > R1 + MIN_INCREASE & state(γ, t2) |= has_value(coping, R2) ]
P4: CBT results in higher appraisal than AS After a person has followed CBT, appraisal is higher than after following AS.
∀γ1, γ2:TRACE, ∀ R1,R2: REAL, t1, t2:TIME [ [ state(γ1, t1) |= intervention_CBT & state(γ2, t1) |= intervention_AS & state(γ1, t2) |= has_value(appraisal, A1) & state(γ2, t2) |= has_value(appraisal, A2) & T2 > T1 + MIN_DUR] ⇒ A1 > A2 ]
This latter property was confirmed for persons with the same openness for therapy; those following AS with a high openness may end up with a higher appraisal than those following CBT with a low openness. 5.2 Effects of Therapy Types In order to analyze the effect of the different types of therapies on the model variables, it is useful to see when a specific model variable starts changing as a result of the therapy, and in particular which variable changes first. The order in which the different concepts start being influenced by the treatment, is a characteristic of the therapy. For example, when following behavioral activation it is assumed that the objective emotional value of the situation will be affected before the mood itself will change. In contrast, cognitive behavioral therapy will first affect the reflection on negative thoughts. To detect the moment when an intervention affects a variable, we look at a sudden change in the increase or decrease of the value of a concept over time: a form of acceleration. Formally, this can be determined by looking at the relative second-order derivative of a variable over time: the second-order derivative divided by the first-order derivative. This can be calculated more easily by dividing the change of the value of a variable in the current time step (t + Δt) by the change of this value in the previous time step (t - Δt), as this is mathematically almost equivalent: (y(t + Δt) – y(t)) / (y(t) – y(t – Δt)) – 1 = y’(t) / y’(t-Δt) – 1 = [ [[y’(t) - y’(t-Δt) ]/Δt]/ y’(t-Δt) ] Δt
= [ (y(t+Δt) – y(t))/Δt ] / [ (y(t) – y(t-Δt))/Δt] - 1 = [y’(t) - y’(t-Δt) ]/ y’(t-Δt) = [ y’’(t-Δt)/y’(t-Δt) ] Δt
So, to be precise, for mood this relative acceleration y’’(t-Δt)/y’(t-Δt) can be measured by: mood_acceleration(t) = [(mood(t + Δt) – mood(t)) / (mood(t) – mood(t – Δt)) - 1] / Δt
The acceleration values for the concepts mood, objective emotional value of the situation and reflection on negative thoughts can be calculated similarly. All acceleration values have been determined from 5 time steps before the start of the intervention till 15 time steps after the start. Figures 12 and 13 illustrate the order of change of the different variables for the different types of therapy. It can be seen that all therapies start having an effect at time point t = 0. Moreover, Figure 12 shows that AS indeed first affects the situation before the mood is affected. Similarly, CBT first affects the reflection on negative thoughts (Figure 13), however, this is a bit more difficult to see. At t = 0, the acceleration of reflection on negative thoughts is very
286
F. Both et al.
low (far below the bottom of the graph), because of the large increase of this concept at the start of the intervention. At t = 1 this value is almost zero (and therefore visible again in the graph), after which another dip follows at t = 2. This is because the concept stays at the high level for one time step and then starts dropping again, which can be seen in the left panel of Figure 13. However, the conclusion is that the reflection is influenced before the mood is affected. 15
0.05
der2 oevs der2 mood der2 reflect neg th
reflect neg th mood oevs
0.04
10
0.03 0.02 5 0.01 0 430
435
440
445
450
0
-4
-2
0
2
4
6
8
10
12
14
Fig. 12. Original (left) and acceleration (right) of values for a patient following AS 0.25
der2 oevs der2 mood der2 reflect neg th
200 0.2
150 reflect neg th mood oevs
0.15
100 50
0.1
0 -50
0.05
-100 0 430
435
440
445
450
-4
-2
0
2
4
6
8
10
12
14
Fig. 13. Original (left) and acceleration (right) of values for a patient following CBT
6 Discussion In this paper, a computational model has been presented for the effect of three different types of therapies for depression. It extends a computational model for human mood regulation and depression that has been developed in previous work [5]. Simulation results presented have shown how the mood regulation and depression indeed follow the expected patterns when applying these therapies. The intervention models have been analyzed for a variety of patient types by simulation experiments and formal verification. This work is one of the first steps in the development of a software agent to support patients and the therapy followed during a depression in a personal manner. In future work these computational models will be integrated as a domain model within an agent model, in such a way that the agent is able to reason based on the domain model by causal deductive and abductive forms of reasoning. The aim is that in this way the agent can both analyze the state of the patient and generate appropriate (inter)actions to the patient in order to improve the patient’s state.
Computational Modeling and Analysis of Therapeutical Interventions for Depression
287
Acknowledgments. This research has been conducted as part of the FP7 ICT program of the European Commission under grant agreement No 248778 (ICT4Depression). Furthermore, the authors wish to thank Pim Cuijpers within the Department of Clinical Psychology at the VU University Amsterdam for the fruitful discussions.
References 1. Anand, A., Li, Y., Wang, Y., Wu, J., Gao, S., Bukhari, L., Mathews, V.P., Kalnin, A., Lowe, M.J.: Activity and connectivity of brain mood regulating circuit in depression: A functional magnetic resonance study. Biological Psychiatry 57, 1079–1088 (2005) 2. Beauregard, M., Paquette, V., Levesque, J.: Dysfunction in the neural circuitry of emotional self regulation in major depressive disorder. Learning and Memory 17, 843–846 (2006) 3. Beck, A.T.: Depression: Causes and Treatment. University of Pennsylvania Press, Philadelphia (1972) 4. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) 5. Both, F., Hoogendoorn, M., Klein, M.A., Treur, J.: Formalizing Dynamics of Mood and Depression. In: Ghallab, M., Spyropoulos, C.D., Fakotakis, N., Avouris, N. (eds.) Proc. of the 18th European Conf., on Art. Int., ECAI 2008, pp. 266–270. IOS Press, Amsterdam (2008) 6. Both, F., Hoogendoorn, M., Klein, M.C.A., Treur, J.: Design and Analysis of an Ambient Intelligent System Supporting Depression Therapy. In: Azevedo, L., Londral, A.R. (eds.) Proc. of the Second International Conference on Health Informatics, HEALTHINF 2009, pp. 142–148. INSTICC Press (2009) 7. Davidson, R.J., Lewis, D.A., Alloy, L.B., Amaral, D.G., Bush, G., Cohen, J.D., Drevets, W.C., Farah, M.J., Kagan, J., McClelland, J.L., Nolen-Hoeksema, S., Peterson, B.S.: Neural and behavioral substrates of mood and mood regulation. Bio. Psychiatry 52, 478–502 (2002) 8. Drevets, W.C.: Orbitofrontal Cortex Function and Structure in Depression. Annals of the New York Academy of Sciences 1121, 499–527 (2007) 9. Drevets, W.C.: Neuroimaging abnormalities in the amygdala in mood disorders. Ann. N Y Acad. Sci. 985, 420–444 (2003) 10. Harrison, P.J.: The neuropathology of primary mood disorder. Brain 125, 1428–1449 (2002) 11. Konarski, J.Z., McIntyre, R.S., Kennedy, S.H., Rafi-Tari, S., Soczynska, J.K., Ketter, T.A.: Volumetric neuroimaging investigations in mood disorders: bipolar disorder versus major depressive disorder. Bipolar Disorder 10, 1–37 (2008) 12. Lévesque, J., Eugene, F., Joanette, Y., Paquette, V., Mensour, B., Beaudoin, G., Lerous, J.M., Bourgouin, P., Beauregard, M.: Neural circuitry underlying voluntary suppression of sadness. Biological Psychiatry 53, 502–510 (2003) 13. Lewinsohn, P.M., Youngren, M.A., Grosscup, S.J.: Reinforcement and depression. In: Dupue, R.A. (ed.) The psychobiology of depressive disorders: Implications for the effects of stress, pp. 291–316. Academic Press, New York (1979) 14. Mathers, C.D., Loncar, D.: Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 3, e442 (2006) 15. Mayberg, H.S.: Modulating dysfunctional limbic-cortical circuits in depression: towards development of brain-based algorithms for diagnosis and optimized treatment. British Medical Bulletin 65, 193–207 (2003)
A Time Series Based Method for Analyzing and Predicting Personalized Medical Data Qinwin Vivian Hu1 , Xiangji Jimmy Huang1 , William Melek2 , and C. Joseph Kurian2 1
Information Retrieval and Knowledge Management Research Lab York University, Toronto, Canada 2 Alpha Global IT, Toronto, Canada
[email protected],
[email protected], {william,cjk}@alpha-it.com
Abstract. In this paper, we propose a time series based method for analyzing and predicting personal medical data. First, we introduce an auto-regressive integrated moving average model which is good for all time series processes. Second, we describe how to identify a personalized time series model based on the patient’s history information, followed by estimating the parameters in the model. Furthermore, a case study is presented to show how the proposed method works. In addition, we forecast the laboratory tests for the next twelve months in the future, with giving the corresponding prediction limits. Finally, we draw our contributions as our conclusions.
1
Introduction and Motivation
Like many areas in medicine, medical tests conduct on small samples collected from the human being’s body and then provide the information a doctor needed to evaluate a human being’s health or to understand what is causing an illness. Sometimes, doctors need to order tests to find out more. With the development of the health care theories, techniques and methods, all kinds of clinic laboratory tests are available. Then, how to make good use of these large amount of data and how to predict the laboratory tests in the future are important for the health care systems [7, 12]. In this paper, we are motivated to analyze the personalized time series process for a patient for predicting her/his laboratory tests in the future. The data are from a real research project, which will be introduced in Section 2. We have 79 monthly laboratory test records for each patient. Therefore, for each patient, we build up a time series process to predict the laboratory tests in the next N th month or the next N th year. First, we employ a general auto-regression integrated moving average (ARIMA) model [3, 8, 9]which is good for any time series process. Then, according to the history data, we identify a personalized time series model for each patient by conducting transformations, calculating the sample auto-correlation function (ACF) [6, 10]and the sample partial auto-correlation function (PACF) [6, 10]. Third, we estimate the parameters in the personalized model, based on the modified stationary model for the patient. Later, we Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 288–298, 2010. Springer-Verlag Berlin Heidelberg 2010
Analyzing and Predicting Personalized Medical Data
289
present a case study to show how the proposed method works, with forecasting the laboratory tests in the future and giving the 95% prediction interval. The remainder of this paper is organized as follows. First, we describe the data set in Section 2. Then Section 3, we introduce a personalized model identification for each patient. An ARIMA model which is the most general model fitting for every time series process is shown and the steps for how to set up a model for a patient according to his/her unique time series are presented, followed by estimating the parameters in the model. After that, we present a case study to present the experimental results, discuss and analyze the influences of our work in Section 4. Finally, we briefly draw the contributions of this paper in Section 5.
2
Data Set Description
The datasets in our experiment are obtained from Alpha Global IT [1]. Alpha Corporate Group is an authorization that has been providing Medical Laboratory, Industrial/Pharmaceutical Laboratory, Diagnostic Imaging services and Managed Care Medical Clinic in addition to providing commercial Electronic Medical Record and Practice Management Software. The medical test datasets contains 78 monthly patients’ blood testing records. We first extract data for each patient and rank the data according to time order. Then we apply a general time series model and identify a personalized stationary model for predictions. In order to understand the data set better, we present some sample data in Table 1. There are five attributes employed in this paper, in which SDTE stands for service date, PNUM for patient health card number, PSEX for patient gender, BDTE for patient date of birth and TSEQ for test sequence number. In particular, for the sake of privacy, the information in Table 1 is faked and it only shows the format of a class of datasets. Table 1. Criteria of Theoretical ACF and PACF for Stationary Processes SDTE PNUM PSEX 20020101 patient number female ... ... ... 20030201 patient number female ... ... ... 20040101 patient number female ... ... ... ... ... 20080601 patient number female ... ... ...
3
BDTE TSEQ mm/dd/yyyy test1 ... ... mm/dd/yyyy test9 ... ... mm/dd/yyyy test5 ... ... mm/dd/yyyy test1 ... ...
Personalized Model Identification
In this section, we introduce a personalized model identification for each patient. First, we introduce an ARIMA model which is the most general model fitting for every time series process. Then, we describe the steps for how to set up a model for a patient according to his/her unique time series.
290
3.1
Q.V. Hu et al.
The General ARIMA Model
In statistics, and in particular in time series analysis, not all time series are always stationary. A homogeneous non-stationary time series can be reduced to a stationary time series by taking a proper degree of differencing. The auto-regressive model, the moving average model and the auto-regressive moving average model, are useful in describing stationary time series. Then the auto-regressive integrated moving average (ARIMA) model is built as a large class of time series model using differencing, which is useful in describing various homogeneous nonstationary time series. The general ARIMA model can be presented in Equation 1. 1. ARIM A(p, d, q) : φp (B)(1 − B)d Zt = θ0 + θq (B)at
(1)
where B is a back shift operator [3]; φp (B) is the stationary AR operator [8, 9] with φp (B) = (1 − φ1 B − ...− φp B p ); θq (B) is the invertible MA operator [11, 13] with θq (B) = (1 − θ1 B − ... − θq B q ); φp (B) and θq (B) share no common factors; θ0 is a parameter related to the mean of the process; and at is a white noise process [3, 9]. The parameter θ0 plays very different roles for d = 0 and d > 0. When d = 0, the original process is stationary, and we get θ0 = μ(1 − φ1 − ... − φp ). When d > 0, however, θ0 is called the deterministic trend term and is often omitted from the model unless it is really needed. 3.2
Steps for Model Identification
To illustrate the model identification, we consider the generalARIM A(p, d, q) model introduced in Section 3.1. Model identification refers to the methodology in identifying the required transformations, the decision to include the deterministic parameter θ0 , and the proper order of p the AR operator and q for the MA operator. Given a time series of a patient, we use the following steps to identify a tentative model for predicting the lab tests in the future. Step 1. Plot the time series data and choose proper transformations. In a time series analysis, plotting the time series data is always the first step. Through careful examination of the plot, we usually get a good idea about whether the series contains a trend, seasonality, outliers, non-constant means, non-constant variances, and other abnormal and non-stationary phenomena. This understanding often provides a basis for postulating a possible data transformation. Since we prefer examining the plot automatically, there are more than one way to understand the series, such as simulating a distribution for the data. Differencing and variance-stabilizing transformations are two commonly used transformations, in time series analysis. Because variance-stabilizing transformations such as the power transformation require non-negative values and differencing may create some negative values, we should always apply variance-stabilizing transformations before taking differencing. A series with non-constant variance
Analyzing and Predicting Personalized Medical Data
291
often needs a logarithmic transformation. More generally, we refer to the transformed data as the original series in the following discussion unless mentioned otherwise. Step 2. Compute and examine the sample ACF and the sample PACF of the original series to further confirm a necessary degree of differencing so that differenced series is stationary. We employ two rules as follows. First, if the sample ACF decays very slowly (the individual ACF may not be large) and the sample PACF cuts off after lag 1, then it indicates differencing is needed. In general, we try taking the first differencing (1 − B)Zt . One can also use the unit root test proposed by Dickey and Fuller (1979) [4]. In a borderline case, differencing is recommended by Dickey, Bell and Miller (1986) [5]. Second, in order to remove non-stationary, we may need to consider a higher order differencing (1 − B)d Zt for d > 1. In most cases, d is ether 0, 1, or 2. Note that if (1 − B)d Zt is stationary, then (1 − B)d+i Zt for i = 1, 2, ... are also stationary. Step 3. Compute and examine the sample ACF and the sample PACF of the properly transformed and differenced series to identify the orders of p and q where we recall that p is the highest order in the AR polynomial (1 − φ1 B − ... − φp B p ), and q is the highest order in the MA polynomial (1 − θ1 B − ... − θq B q ). Usually, we obtain the needed orders of p and q less than or equal to 3. We also present a table in Table 2 to summarize the important criteria for selecting p and q. Table 2. Criteria of Theoretical ACF and PACF for Stationary Processes Process AR(p)
ACF PACF Tails off as exponential decay or Cuts off damped sine wave M A(q) Cuts off after lag q Tails off damped ARM A(p, q) Tails off after lag (q − p) Tails off
after lag p as exponential decay or sine wave after lag (p − q)
Step 4. Test the deterministic trend term θ0 when d > 0. For a non-stationary model, φB(1 − B)d Zt = θ0 + θ(B)at , the parameter θ0 is usually omitted so that it is capable of representing series with random changes in the level, slope or trend. If there is reason to believe that the differenced series contains a deterministic trend mean, however, we can test for its inclusion by ¯ of the differenced series Wt = (1 − B)d Zt with comparing the sample mean W its approximate standard error SW ¯ . To derive Sw ¯ , we note that ¯)= limn→∞ nV ar(W
∞
γj
(2)
j=−∞
Hence, we get 2 σW ¯ =
γ0 1 1 j = −∞∞ ρj = j = −∞∞ γj = γ(1) n n n
(3)
292
Q.V. Hu et al.
where γ(B) is the auto-covariance generating function and γ(1) is its value at ¯ is model dependent. B = 1. Thus, the variance and the standard error for W For example, for the ARIM A(1, d, 0) model, (1 − φB)Wt = at , we have: γ(B) =
σa2 (1 − φB)(1 − φB −1 )
(4)
so that 1 σa2 σ 2 1 − φ2 = W 2 n (1 − φ) n (1 − φ)2 σ 2 1 + ρ1 σ2 1 + φ = W = W n 1−φ n 1 − ρ1
2 σW ¯ =
2 where we note that σW = σa2 /(1 − φ2 ). The required standard error is γˆ0 1 + ρˆ1 SW ( ) ¯ = n 1 − ρˆ1
(5)
(6)
Expressions of SW ¯ for other models can be derived similarly. At the model identification phase, however, because the underlying model is unknown, most available software use the approximation SW ¯ =[
γˆ0 (1 + 2ρˆ1 + 2ρˆ2 + ... + 2ρˆk )]1/2 n
(7)
where γˆ0 is the sample variance and ρˆ1 , ..., ρˆk are the first k significant sample ACFs of {Wt }. Under the null hypothesis ρk = 0 for k ≥ 1, we can reduce Equation 6 to γˆ0 /n (8) SW ¯ = Alternatively, one can include θ0 initially and discard it at the final model estimation if the preliminary estimation results is not significant. 3.3
Parameter Estimation
After we identify a personalized model in Section 3.2, we have to estimate the parameters in the model. In this section, we apply the method of moments for parameter estimation. The method of moments consists of substituting sample moments such as the ¯ sample variance γˆ0 and the sample ACF ρˆi for their theoretical sample mean Z, counterparts and solving the resultant equations to obtain estimates of unknown parameters. For better understanding, we take an auto-regressive process AR(p) as an example. In a similar way, we can deal with a moving average process M A(q) and an auto-regression moving average process ARM A(p, q) at the same way. In an AR(p) process, we have ˙ + φ2 Zt−2 ˙ + ... + φp Zt−p ˙ + at Z˙ t = φ1 Zt−1
(9)
Analyzing and Predicting Personalized Medical Data
293
¯ To estimate φ, we first use that ρk = the mean u = E(Zt ) is estimated by Z. φ1 ρk−1 + φ2 ρk−2 + ... + φp ρk−p for k ≥ 1 to obtain the following system of Yule-Walker [13] equations: ρ1 = φ1 + φ2 ρ1 + φ3 ρ2 + ... + φp ρp−1 ρ2 = φ1 ρ1 + φ2 + φ3 ρ1 + ... + φp ρp−2 ...
(10)
ρp = φ1 ρp−1 + φ2 ρp−2 + φ3 ρp−3 + ... + φp Then, replacing ρk by ρˆk , we obtain the moment estimators φˆ1 , φˆ2 , ..., φˆp by solving the above linear system of equations. That is, ⎛ ˆ ⎞ ⎛ ⎞−1 ⎛ ⎞ φ1 ˆ ρp−1 ˆ 1 ρˆ1 ρˆ2 .. ρp−2 ρˆ1 ⎜ φˆ2 ⎟ ⎜ ρˆ1 ⎟ ⎜ ρˆ2 ⎟ 1 ρ ˆ .. ρ ˆ ρ ˆ 1 p−3 p−2 ⎜ ⎟=⎜ ⎟ ⎜ ⎟ ⎝ .. ⎠ ⎝ .. .. ⎠ ⎝ .. ⎠ ρp−1 ˆ ρp−2 ρˆp ˆ ρp−3 ˆ .. ρˆ1 1 φˆp
(11)
These estimators are usually called Yule-Walker estimators [13]. Having obtained φˆ1 , φˆ2 , ..., φˆp , we use the result ˙ + φ2 Zt−2 ˙ + .. + φp Zt−p ˙ + at )] γ0 = E(Z˙t Z˙ t ) = E[Z˙t (φ1 Zt−1 = φ1 γ1 + φ2 γ2 + .. + φp γp + σa2
(12)
and obtain the moment estimator for σa2 as σa2 = γˆ0 (1 − φˆ1 ρˆ1 − φˆ2 ρˆ2 − φˆp ρˆp )
4
(13)
Case Study
In this section, we present an example to show our proposed personalized time series model. 4.1
Model Identification
The number of laboratory tests can be attractive for many health care systems. Figure 1 shows a time series for a female patient who had done her blooding tests from January, 2002 to July, 2008. In total, there are 79 records as the observations of consecutive months. From this figure, it indicates that the series is not stationary in the mean and variance. We compute the sample ACF and sample PACF of the time series Z are shown in Figure 2 and 3 for choosing transformations or differencing. In Figure 2 and 3, we can see that the sample ACF doesn’t decays very slowly, and the sample PACF does not cut off after lag 1. Therefore, we do not need to consider a degree of differencing to make the time series stationary. At the same time, to investigate the required transformation for variance stabilization, we apply the power transformation analysis [2] to suggest
294
Q.V. Hu et al.
15 0
5
10
Z_t
20
25
30
Time Series Z Process: Monthly Lab Tests
0
20
40
60
80
t
Fig. 1. 79 Monthly Blood Testing Records: Z
0.6 0.4 0.0
0.2
rho_k
0.8
1.0
ACF for Time Series Z
0
5
10
15
20
k
Fig. 2. Sample ACF for Time Series Z
an optimal parameter as λ = 0.25. The power transformation is presented in Equation 14 Wt = T (Zt ) =
Ztλ − 1 λ
(14)
The transformed time series process W is plotted in Figure 4, in which we can see that W is stationary in the mean but may not be stationary in the
Analyzing and Predicting Personalized Medical Data
295
0.0 -0.3
-0.2
-0.1
phi_kk
0.1
0.2
PACF for Time Series Z
0
5
10
15
20
k
Fig. 3. Sample PACF for Time Series Z
1.0 0.0
0.5
W_t
1.5
2.0
Transformed W process
0
20
40
60
80
t
Fig. 4. Transformed Time Series: W
variance. Hence, we further compute the sample ACF and sample PACF for the transformed series W , which are shown in Figure 5 and 6. The sample ACF shows a dample sin-consine wave and the sample PACF has relatively large spike at lag 1, 8 and 13, suggesting that a tentative model may be an AR(1) model in Equation 16. (1 − φB)(Wt − μ) = at where Wt = T (Zt ) =
Ztλ −1 λ
in Equation 14 with λ = 0.25.
(15)
296
Q.V. Hu et al.
0.6 0.4 0.0
0.2
rho_k
0.8
1.0
ACF for Transformed Time Series W
0
5
10
15
20
k
Fig. 5. Sample ACF for Transformed Time Series W
0.00 0.05 0.10 0.15 0.20 -0.10
phi_kk
PACF for Transformed Time Series W
0
5
10
15
20
k
Fig. 6. Sample PACF for Transformed Time Series W
4.2
Forecasting
We have identified an AR(1) model in Section 4.1 for the transformed series. In this section, we also use this transformed series to forecast the next N th month laboratory tests in the future [3]. For this AR(1) model, we have (1 − φB)(Wt − μ) = at
(16)
Analyzing and Predicting Personalized Medical Data
297
where φ = −0.1, μ = 1.2 and σa2 = 0.1. In this case, we have 79 observations and want to forecast the next year of twelve months with their associated 95% forecast limits. First of all, we write the AR(1) model as Wt − μ = φ(Wt−1 − μ) + at
(17)
and the general form of the forecast equation is ˆ t (l) = μ + φ(Wˆt−1 (l − 1) − μ) W = μ + φl (Wt − μ)
(18)
Thus, the following twelve months’ predictions are computed in Equation 19 and the results are shown in Table 3. −1 ) 0.25 20.25 − 1 Wˆ79 (2) = 1.2 + (−0.1)2 ∗ ( ) 0.25 ... 2 Wˆ79 (1) = 1.2 − 0.1 ∗ (
0.25
2 Wˆ79 (12) = 1.2 + (−0.1)12 ∗ (
(19)
−1 ) 0.25
0.25
Table 3. Forecasting for the Next Twelve Months Predicted Month 1 2 3 4 5 6 7 8 9 10 11 12 Numbers 2.4 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 Table 4. 95% Forecasting Limits for the Next Twelve Months Predicted Month 1 2 3 4 5 6 Intervals [0.2, 12.1] [0.1, 10.9] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] Predicted Month 7 8 9 10 11 12 Intervals [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0]
Second, in order to obtain the forecast limits, we calculate the weights called ψ from the relationship (1 − φB)(1 + ψ1 B + ψ2 B 2 + ...) = 1
(20)
ψj = φj , ∀j ≥ 0
(21)
That is, Therefore, the 95% forecast limits for the forecasting results in Table 3 are computed in Table 4. Wˆ79 (1) ± 1.96 ∗ σa2 (22)
298
5
Q.V. Hu et al.
Conclusions
In this paper, we propose a time series method on how to identify a personalized model based on the patient’s laboratory test records. After successfully building up the personalized model, we predict the laboratory tests in the future. In addition, we also give a predictive limits for the forecasting, which is useful for many health care systems. The case study shows that the proposed method provides a good way for personalization analysis. In the future, we will continue on working for personalization tools, such as making this time series method be a GUI tool. Furthermore, we plan to work on group information for predictions.
Acknowledgements This research is supported part by NSERC CRD project. We would also like to thank three anonymous reviewers for their useful comments on this paper.
References [1] Alpha Global IT, http://www.alpha-it.com/ [2] Box, G.E.P., Cox, D.R.: An Analysis of Transformations. Journal of the Royal Statistical Society, Series B 26(2), 211–252 (1964) [3] Box, G.E.P., Jenkins, G.M.: TIme Series Analysis Forecasting and Control, 2nd edn. Holden-Day, San Franscisco (1976) [4] Dickey, D.A., Fuller, W.A.: Distribution of the Estimators for Autoregressive Time Series With a Unit Root. J. Amer. Statist. Assoc. 74, 427–431 (1979) [5] Dickey, D.A., Bell, B., Miller, R.: Unit Roots in Time Series Models: Tests and Implications. The American Statistician 40(1), 12–26 (1986) [6] Dunn, P.F.: Measurement and Data Analysis for Engineering and Science. McGraw–Hill, New York (2005) ISBN 0-07-282538-3 [7] Garg, A., Adhikari, N., McDonald, H., Rosas-Arellano, M., Devereaux, P., Beyene, J., Sam, J., Haynes, R.: Effects of Computerized Clinical Decision Support Systems on Practitioner Performance and Patient Outcomes: A Systematic Review. Jama 293(10), 1223 (2005) [8] Mills, T.C.: Time Series Techniques for Economists. Cambridge University Press, Cambridge (1990) [9] Pandit, S.M., Wu, S.-M.: Time Series and System Analysis with Applications. John Wiley & Sons, Inc., Chichester (1983) [10] Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques, pp. 190–195. Cambridge University Press, Cambridge (1993) ISBN 0-521-43541-2 [11] Slutzky, E.: The Summation of Random Causes as the Source of Cyclic Processes. Econometrica 5, 105–146 (1937); Translated from the earlier paper of the same title in Problems of Economic Conditions [12] Stead, W.W., Garrett Jr., L.E., Hammond, W.E.: Practicing nephrology with a computerized medical record. Kidney Int. 24(4), 446–454 (1983) [13] Yule, G.U.: On a method of Investigating Periodicities in Disturbed Series with Special Reference to Wolfer’s Sunspot Numbers. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 226, 267–298 (1927)
Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer’s Disease William L. Jarrold1 , Bart Peintner2 , Eric Yeh2 , Ruth Krasnow2, Harold S. Javitz2 , and Gary E. Swan2 1
2
UC-Davis, Davis, CA, 95616
[email protected] SRI International, Menlo Park, CA, 94025
[email protected] Abstract. We present data demonstrating how brain health may be assessed by applying data-mining and text analytics to patient language. Three brain-based disorders are investigated - Alzheimer’s Disease, cognitive impairment and clinical depression. Prior studies identify particular language characteristics associated with these disorders. Our data show computer-based pattern recognition can distinguish language samples from individuals with and without these conditions. Binary classification accuracies range from 73% to 97% depending on details of the classification task. Text classification accuracy is known to improve substantially as training data approaches web-scale. Such a web scale dataset seems inevitable given the ubiquity of social computing and its language intensive nature. Given this context, we claim that the classification accuracy levels obtained in our experiments are significant findings for the fields of web intelligence and applied brain informatics.
1 Motivation Computational analysis of language shows promise as a diagnostic. Word choice and other linguistic markers are heavily affected by many brain-based disorders. This increasingly substantiated phenomenon implies that language analysis can contribute to early diagnosis, and therefore to more timely and effective treatment. As individuals continue to increase the quantity and richness of their language-based interaction through the web, and as we increase the sophistication of automatic language analysis, there is a corresponding increase in the viability of near-continuous monitoring for early signs of brain-based disorders, an important capability for an intelligent web. Therefore, it is of substantial importance to investigate and identify particular language measures associated with each disorder and how this association varies with context. As reasonably accurate models have been proven, continuous language analysis can provide doctors with an objective, unobtrusive, and ecologically-valid measure of cognitive status. We present a machine learning-based methodology and architecture for identifying and testing language (and other) measures that serve as markers for brain-based Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 299–307, 2010. c Springer-Verlag Berlin Heidelberg 2010
300
W.L. Jarrold et al.
disorders. We evaluate its application to three disorders: pre-symptomatic Alzheimer’s Disease (Pre-AD), cognitive impairment, and depression. We show that the methodology independently discovers relationships previously reported in the literature and produces accurate diagnostic models. Finally, we demonstrate the importance of context when processing unstructured speech or language, and discuss this problem as a critical area for future work.
2 Methodology and Architecture We describe a method that allows researchers to classify patients according to atypicalities in speech and language production. Using only speech samples labeled with the speakers’ clinical classification, plus a set of controls the system determines the associations between the disorder and a large set of lexical measures, and produces a model that maps multiple measures to a prediction of the disorder. Here, we focus on the language elements of the process, but it works for acoustical features as well.
Fig. 1. Data flow for speech and language data analysis architecture used to determine key measures develop predictive models for a particular disorder
Figure 1 shows the process graphically. First, the raw audio is transcribed into text. In these studies, we used human transcribers for maximum accuracy, but automatic transcription has been similarly used ([1], [2]). Next, during lexical feature extraction, the transcript for each subject is fed into lexical analysis tools which extract a number of linguistic features, each of which may vary based on which disorder is present. The present work uses three lexical analyzers: Part of Speech Tagger(POST). Part of speech frequency features are extracted from a text sample using this tool [3]. The result is a vector consisting of the percent frequency of nouns, adjectives, verbs, etc. present in the sample. Linguistic Inquiry Word Count(LIWC). This tool [4] computes the frequency of words from pre-defined lists based on categories such as positive emotion words, socially related words, first-person words, etc. The are approximately 80 such features in the output vector. CPIDR. CPIDR [5] accurately measures propositional idea densities, which intuitively is the density of distinct facts or notions contained in a text. Low idea density has been show to presage AD decades before any overt signs of the disorder [6]. During feature selection (see Figure 1), we remove features that do not vary significantly based on the presence or absence of a particular disorder. The measures simply
Language Analytics for Assessing Brain Health
301
need to be partitioned based on the label of the subject, e.g., ‘depressed’ or ‘control’. The last step in the process feeds the measures and the associated label for a group of subjects into machine learning tools, which induce models that classify arbitrary text samples via their lexical features. Diagnostic models are induced from a subset of the patients (i.e. the training set) and then evaluated via a test set. During training, each patient’s transcript is fed to lexical feature extraction resulting in a vector of lexical features values that characterize a given patient’s speech. Each vector used in the training set is paired with the corresponding patient’s diagnosis and input to ML. The learner outputs an executable classifier that should predict diagnosis associated with a given text. In the test phase (not shown in 1), a classifier is evaluated by presenting it with lexical feature vectors from patients in the held-out test set. The classifier outputs a diagnosis which is compared with the known diagnosis resulting in a correct or incorrect score for that patient.
3 AD and Cognitive Impairment (CI) Experiments This section contains our result showing that induced models of linguistic features can detect current cognitive impairment and predict future onset of AD. 3.1 Background on Language Connections with AD and CI The scientific basis of language-based markers of conditions such as AD and cognitive impairment is found across a variety of studies. Cognitive impairment (CI) was shown to be measurable via computer-based analysis of speech ([2], [7]). Conditionspecific language features (e.g. [8]) can be detected and exploited using automatic speech recognition, text analytics and machine learning (ML) to classify different types of fronto-temporal lobar degeneration - a gerontological neurodegenerative condition distinct from AD [1]. Regarding AD, [6] has shown that a language characteristic known as low idea density in the autobiographical writings of Nuns in their 20’s was a strong predictor of Alzheimer’s disease (AD) at time of death more than 50 years later. This finding provides the basis for our aim to detect preclinical AD. This is because although there are no disease modifying treatments for AD, the consensus in the field is that when treatments are available it will be very important to start treatment long before clinically significant damage has occurred to the brain. These findings provide the basis for two main questions: (1) can measures and models be developed to distinguish Pre-AD subjects from age-matched healthy controls; and (2) can measures and models be developed to distinguish current cognitive impairment subjects from healthy controls? Regarding (1) we aimed to evaluate language markers that predict which WCGS 1980s interview participants eventually died with AD. This had two aspects. First, we aimed to replicate findings from the Nun Study [6] in which a language measure known as idea density was found to be associated with the development of AD decades later. Secondly, we aimed to evaluate the ability of machine learning to identify patterns across the entire feature set that are predictive of AD acquisition.
302
W.L. Jarrold et al.
Regarding (2) we aimed to replicate findings of another study in which a computer tool called PCAD was shown to assess current cognitive impairment in spontaneous speech [2]. We evaluated the ability of machine learning to classify patients via features other than PCAD-based ones. 3.2 Data and Subject Selection Patient speech and other clinical data were obtained from the Western Collaborative Group Study (WCGS), a 40+ year longitudinal study involving a wide-ranging array of demographic, personality, neuropsychological and cause of death data collected for the purpose of studying behavioral and neuropsychological variables associated with cardiovascular outcomes. Although cardiovascular outcome was not of interest in the present study, speech data were obtained from audio recordings of the 15-minute structured interview [9] administered to every WCGS participant circa 1988. We transcribed interviews from subsamples of the WCGS population. We applied the three lexical analyzers described in Section 2. The output of these analyses (e.g. measures of frequencies of various types of words and phrases) were compared to expectations from prior literature. The output vectors were fed to ML algorithms. The diagnostic accuracies of the resulting classifiers were evaluated. The following describes our method for selecting the sub-samples: Pre-symptomatic AD vs Controls. A pre-symptomatic AD group was identifiable because of the 40+ year WCGS duration. We were able to select a subsample of 22 who were cognitively normal at the time of the 1988 interview but would eventually die with the cause of death be listed as clinically verified AD (all ICD-9 = 331.0). Controls were an age-matched cognitively normal sub-sample of 23 men never diagnosed with dementia. Mean age at time of interview was 73.13 (SD 4.9; range 65 80). Cognitively normal was defined as scoring less than 0 on the IOWA Screening Battery for Mental Decline [10] (a neuropsychological measure of cognitive impairment) at time of interview. CI vs Controls. Groups were formed via random selection of subjects with IOWA > 8 (Cognitively Impaired) and IOWA