Advances in Intelligent and Soft Computing Editor-in-Chief: J. Kacprzyk
99
Advances in Intelligent and Soft Computing Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 87. E. Corchado, V. Snášel, J. Sedano, A.E. Hassanien, J.L. Calvo, ´ ezak (Eds.) and D. Sl˛ Proceedings of the 7th Atlantic Web Intelligence Conference, AWIC 2011, Fribourg, Switzerland, January 26–28, 2011 Soft Computing Models in Industrial and Environmental Applications, 6th International Workshop SOCO 2011 ISBN 978-3-642-19643-0 Vol. 88. Y. Demazeau, M. Pˇechoucˇek, J.M. Corchado, and J.B. Pérez (Eds.) Advances on Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19874-8 Vol. 89. J.B. Pérez, J.M. Corchado, M.N. Moreno, V. Julián, P. Mathieu, J. Canada-Bago, A. Ortega, and A.F. Caballero (Eds.) Highlights in Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19916-5
Vol. 93. M.P. Rocha, J.M. Corchado, F. Fernández-Riverola, and A. Valencia (Eds.) 5th International Conference on Practical Applications of Computational Biology & Bioinformatics 6-8th, 2011 ISBN 978-3-642-19913-4 Vol. 94. J.M. Molina, J.R. Casar Corredera, M.F. Cátedra Pérez, J. Ortega-García, and A.M. Bernardos Barbolla (Eds.) User-Centric Technologies and Applications, 2011 ISBN 978-3-642-19907-3 Vol. 95. Robert Burduk, Marek Kurzy´nski, ˙ Michał Wo´zniak, and Andrzej Zołnierek (Eds.) Computer Recognition Systems 4, 2011 ISBN 978-3-642-20319-0 Vol. 96. A. Gaspar-Cunha, R. Takahashi, G. Schaefer, and L. Costa (Eds.) Soft Computing in Industrial Applications, 2011 ISBN 978-3-642-20504-0
Vol. 90. J.M. Corchado, J.B. Pérez, K. Hallenborg, P. Golinska, and R. Corchuelo (Eds.) Trends in Practical Applications of Agents and Multiagent Systems, 2011 ISBN 978-3-642-19930-1
Vol. 97. W. Zamojski, J. Kacprzyk, J. Mazurkiewicz, J. Sugier, and T. Walkowiak (Eds.) Dependable Computer Systems, 2011 ISBN 978-3-642-21392-2
Vol. 91. A. Abraham, J.M. Corchado, S.R. González, J.F. de Paz Santana (Eds.) International Symposium on Distributed Computing and Artificial Intelligence, 2011 ISBN 978-3-642-19933-2
Vol. 98. Z.S. Hippe, J.L. Kulikowski, and T. Mroczek (Eds.) Human – Computer Systems Interaction: Backgrounds and Applications 2, 2012 ISBN 978-3-642-23186-5
Vol. 92. P. Novais, D. Preuveneers, and J.M. Corchado (Eds.) Ambient Intelligence - Software and Applications, 2011 ISBN 978-3-642-19936-3
Vol. 99. Z.S. Hippe, J.L. Kulikowski, and T. Mroczek (Eds.) Human – Computer Systems Interaction: Backgrounds and Applications 2, 2012 ISBN 978-3-642-23171-1
Zdzisław S. Hippe, Juliusz L. Kulikowski, and Teresa Mroczek (Eds.)
Human – Computer Systems Interaction: Backgrounds and Applications 2 Part 2
ABC
Editors Dr. Teresa Mroczek Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszów, Poland E-mail:
[email protected] Dr. Zdzisław S. Hippe Department of Expert Systems and Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszów, Poland E-mail:
[email protected] Dr. Juliusz L. Kulikowski Polish Academy of Sciences, M. Nalecz Institute of Biocybernetics and Biomedical Engineering, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland E-mail:
[email protected] ISBN 978-3-642-23171-1
e-ISBN 978-3-642-23172-8
DOI 10.1007/978-3-642-23172-8 Advances in Intelligent and Soft Computing
ISSN 1867-5662
Library of Congress Control Number: 2011936642 c 2012 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India Printed on acid-free paper 543210 springer.com
From the Editors
The history of human-system interactions is as long as this of human civilization. Human beings by natural evolution have been adapted to live in groups and to commonly fight for food and shelter against other groups or against the natural forces. The effects of this fight was depended on two basic factors: on ability to communicate among collaborating groups or persons and on capability to understand and to preview the principles and behavior of the opponent groups or forces. This, in fact, is also the main contemporary human-system interaction (H-SI) problem. A system is in this case – in a narrow sense – considered as a system created on the basis of electronic, optoelectronic and/or computer technology, in order to aid humans in reaching some of their vital goals. So-defined system is not only a passive tool in human hands; it is rather an active partner equipped with a sort of artificial intelligence, having access to large information resources, being able to adapt its behavior to the human requirements and to collaborate with the human users in order to reach their goals. The area of such systems’ applications practically covers most of human activity domains and is still expanding. Respectively, the scientific and practical H-SI problems need a large variety of sophisticated solution methods. This is why the H-SI problems in the last decades became an important and extensively growing area of investigations. In this book some examples of the H-SI problems and solution methods are presented. They can be roughly divided into the following groups: a) Human decisions supporting systems, b) Distributed knowledge bases and WEB systems, c) Disabled persons aiding systems, d) Environment monitoring and robotic systems, e) Diagnostic systems, f) Educational systems, and g) General H-SI problems. As usually, some papers to more than one class can be assigned and that is why the classification suits only to a rough characterization of the book contents. The human decisions supporting systems are presented by papers concerning various application areas, like e.g.: enterprises management (A. Burda and Z.S. Hippe; T. Żabiński and T. Mączka; S. Cavalieri), healthcare (E. Zaitseva), agricultural products storage (W. Sieklicki, M. Kościuk and S. Sieklicki), visual design (E.J. Grabska), sport trainings planning (J. Vales-Alonso, P. López-Matencio, J.J. Alcaraz, et al.). The papers by I. Rejer; J.L. Kulikowski; K. Harężlak and A. Werner; E. Nawarecki, S. Kluska-Nawarecka and K. Regulski; A. Grzech, A. Prusiewicz and M. Zięba; A. Andrushevich, M. Fercu, J. Hopf, E. Portmann and A. Klapproth to various problems of data and knowledge bases exploration in computer decision-aiding systems are devoted.
VI
From the Editors
The WEB-based, including distributed knowledgebases based systems, are presented in the papers by N. Pham, B.M. Wilamowski and A. Malinowski; M. Hajder and T. Bartczak. K. Skabek, R. Winiarczyk and A. Sochan present a concept of a distributed virtual museum. An interesting concept of managing the process of intellectual capital creation is presented by A. Lewicki and R. Tadeusiewicz. A document-centric instead of data-centric distributed information processing paradigm in a paper by B. Wiszniewski is presented. New computer networks technologies by K. Krzemiński and I. Jóźwiak and by P. Rożycki, J. Korniak and J. Kolbusz are described. The last two Authors also present a model of malicious network traffic. Selected problems of distributed network resources organization and tagging are presented by A. Dattolo, F. Ferrara and C. Tasso as well as by A. Chandramouli, S. Gauch and J. Eno. Various problems of disabled persons aiding by their communication with external systems improvement in a next group of papers are presented. The papers by M. Porta and A. Ravarelli and by D. Chugo, H. Ozaki, S. Yokota and K. Takase to physically disabled persons aiding systems are devoted. The spatial orientation and navigation aiding problems by P. Strumillo; A. Śluzek and M. Paradowski and by M. Popa are described. A proposal of an ubiquitous health supervising system by P. Augustyniak is presented. The problems of hand posture or motions recognition for disabled persons aiding by R.S. Choraś and by T. Luhandjula, K. Djouani, Y. Hamam, B.J. van Wyk and Q. Williams have been described while similar problems for a therapy of children supporting by J. Marnik, S. Samolej, T. Kapuściński, M. Oszust and M. Wysocki are presented. A paper by Mertens, C. Wacharamanotham, J. Hurtmanns, M. Kronenbuerger, P.H. Kraus, A. Hoffmann, C. Schlick and J. Borchers to a problem of communication through a touch screen improvement is devoted. Some other problems of tactile communication by L. M. Muñoz, P. Ponsa and A. Casals are considered. J. Ruminski, M. Bajorek, J. Ruminska, J. Wtorek, and A. Bujnowski present a method of computer-based dichromats aiding in correct color vision. In the papers by A. Roman-Gonzalez and by J.P. Rodrigues and A. Rosa some concepts of direct EEG signals using to persons with lost motor abilities aiding are presented. Similarly, some basic problems and experimental results of a direct braincomputer interaction by M. Byczuk, P. Poryzała and A. Materka are also described. A group of papers presented by Y. Ota; P. Nauth; M. Kitani, T. Hara, H. Hanada and H. Sawada; D. Erol Barkana, and by T. Sato, S. Sakaino and T. Yakoh contains description of several new robotic systems’ constructions. The group concerning diagnostic systems consists of papers mainly to medical applications devoted (K. Przystalski, L. Nowak, M. Ogorzałek and G. Surówka; P. Cudek, J.W. Grzymała-Busse and Z.S. Hippe; A. Świtoński, R. Bieda and K. Wojciechowski; T. Mroczek, J.W. Grzymała-Busse, Z.S. Hippe and P. Jurczak; R. Pazzaglia, A. Ravarelli, A. Balestra, S. Orio and M.A. Zanetti; M. Jaszuk, G. Szostek and A. Walczak; Gomuła, W. Paja, K. Pancerz and J. Szkoła ). Besides, in a paper by R.E. Precup, S.V. Spătaru, M.B. Rădac, E.M. Petriu, S. Preitl, C.A. Dragoş and R.C. David an industrial diagnostic system is presented. K. Adamczyk and A. Walczak present an algorithm of edges detection in images which in various applications can be used.
From the Editors
VII
In the papers by L. Pyzik; C.A. Dragoş, S. Preitl, R.E. Precup and E.M. Petriu and by E. Noyes and L. Deligiannidis examples of computer-aided educational systems are presented. K. Kaszuba and B. Kostek describe a neurophysiological approach to learning processes aiding. The group concerning general H-SI problems consists of the papers presented by T.T. Xie, H. Yu and B.M. Wilamowski; H. Yu and B.M. Wilamowski; and G. Drałus. General problems of rules formulation for automatic reasoning are described by A.P. Rotshtein and H.B. Rakytyanska as well as by M. Pałasiński, B. Fryc and Z. Machnicka. Close to the former ones, S. Chojnacki and M.A. Kłopotek consider a problem of Boolean recommenders evaluation in decision systems. Various aspects of computer-aided decision making methods are presented in the papers by M.P. Dwulit and Z. Szymański, L. Bobrowski and by A. Pułka and A. Milik. A problem of ontology creation by A. Di Iorio, A. Musetti, S. Peroni and F. Vitali is described. At last, A. Małysiak-Mrozek, S. Kozielski and D. Mrozek present a concept of proteins structural similarity describing language. This panorama of works conducted by a large number of scientists in numerous countries shows that H-SI is a wide and progressive area of investigations aimed at human life conditions improvement. It also shows that between different scientific disciplines new and interesting problems arise and stimulate development on both sides of the borders. Editors Zdzisław S. Hippe Juliusz L. Kulikowski Teresa Mroczek
Contents
Part IV: Environment Monitoring and Robotic Systems SSVEP-Based Brain-Computer Interface: On the Effect of Stimulus Parameters on VEPs Spectral Characteristics . . . . . . . . . . . . . . . . . . . . . . . M. Byczuk, P. Poryzała, A. Materka
3
Design and Development of a Guideline for Ergonomic Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L.M. Mu˜noz, P. Ponsa, A. Casals
15
Partner Robots – From Development to Business Implementation . . . . . . Y. Ota
31
Goal Understanding and Self-generating Will for Autonomous Humanoid Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Nauth
41
A Talking Robot and Its Singing Performance by the Mimicry of Human Vocalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Kitani, T. Hara, H. Hanada, H. Sawada
57
An Orthopedic Surgical Robotic System-OrthoRoby . . . . . . . . . . . . . . . . . D. Erol Barkana
75
Methods for Reducing Operational Forces in Force-Sensorless Bilateral Control with Thrust Wires for Two-Degree-of-Freedom Remote Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Sato, S. Sakaino, T. Yakoh
91
Part V: Diagnostic Systems Applications of Neural Networks in Semantic Analysis of Skin Cancer Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 K. Przystalski, L. Nowak, M. Ogorzałek, G. Sur´owka
X
Contents
Further Research on Automatic Estimation of Asymmetry of Melanocytic Skin Lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 P. Cudek, J.W. Grzymała-Busse, Z.S. Hippe Multispectral Imaging for Supporting Colonoscopy and Gastroscopy Diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 ´ A. Swito´ nski, R. Bieda, K. Wojciechowski A Machine Learning Approach to Mining Brain Stroke Data . . . . . . . . . . 147 T. Mroczek, J.W. Grzymała-Busse, Z.S. Hippe, P. Jurczak Using Eye-Tracking to Study Reading Patterns and Processes in Autism with Hyperlexia Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 R. Pazzaglia, A. Ravarelli, A. Balestra, S. Orio, M.A. Zanetti Ontology Design for Medical Diagnostic Knowledge . . . . . . . . . . . . . . . . . 175 M. Jaszuk, G. Szostek, A. Walczak Rule-Based Analysis of MMPI Data Using the Copernicus System . . . . . . 191 J. Gomuła, W. Paja, K. Pancerz, J. Szkoła Application of 2D Anisotropic Wavelet Edge Extractors for Image Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 K. Adamczyk, A. Walczak Experimental Results of Model-Based Fuzzy Control Solutions for a Laboratory Antilock Braking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 R.E. Precup, S.V. Sp˘ataru, M.B. R˘adac, E.M. Petriu, S. Preitl, C.A. Dragos¸, R.C. David Part VI: Educational Systems Remote Teaching and New Testing Method Applied in Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 L. Pyzik Points of View on Magnetic Levitation System Laboratory-Based Control Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.A. Dragos¸, S. Preitl, R.E. Precup, E.M. Petriu 2D and 3D Visualizations of Creative Destruction for Entrepreneurship Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 E. Noyes, L. Deligiannidis Employing a Biofeedback Method Based on Hemispheric Synchronization in Effective Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 K. Kaszuba, B. Kostek
Contents
XI
Part VII: General Problems Comparison of Fuzzy and Neural Systems for Implementation of Nonlinear Control Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 T.T. Xie, H. Yu, B.M. Wilamowski Hardware Implementation of Fuzzy Default Logic . . . . . . . . . . . . . . . . . . . 325 A. Pułka, A. Milik Dwulit’s Hull as Means of Optimization of kNN Algorithm . . . . . . . . . . . . 345 M.P. Dwulit, Z. Szyma´nski OWiki: Enabling an Ontology-Led Creation of Semantic Data . . . . . . . . . 359 A. Di Iorio, A. Musetti, S. Peroni, F. Vitali Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 A.P. Rotshtein, H.B. Rakytyanska Server-Side Query Language for Protein Structure Similarity Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 B. Małysiak-Mrozek, S. Kozielski, D. Mrozek A New Kinds of Rules for Approximate Reasoning Modeling . . . . . . . . . . 417 M. Pałasi´nski, B. Fryc, Z. Machnicka Technical Evaluation of Boolean Recommenders . . . . . . . . . . . . . . . . . . . . 429 S. Chojnacki, M.A. Kłopotek Interval Uncertainty in CPL Models for Computer Aided Prognosis . . . . 443 L. Bobrowski Neural Network Training with Second Order Algorithms . . . . . . . . . . . . . 463 H. Yu, B.M. Wilamowski Complex Neural Models of Dynamic Complex Systems: Study of the Global Quality Criterion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 G. Drałus Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
SSVEP-Based Brain-Computer Interface: On the Effect of Stimulus Parameters on VEPs Spectral Characteristics M. Byczuk, P. Poryzała, and A. Materka Institute of Electronics, Technical University of Lodz, Łódź, Poland {byczuk,poryzala,materka}@p.lodz.pl
Abstract. It is demonstrated that spectral characteristics of steady-state visual evoked potentials (SSVEPs) in a brain-computer interface (SSVEP-based BCI) depend significantly on the stimulus parameters, such as color and frequency of its flashing light. We postulate these dependencies can be used to improve the BCI performance – by proper design, configuration and adjustment of the visual stimulator. Preliminary results of conducted experiments show also that SSVEP characteristics are strongly affected by subjects biodiversity.
1 Introduction Brain-Computer Interface (BCI) is an alternative solution for communication between human and machine. In the case of traditional interfaces, the user is expected to make voluntary movements to control a machine (e.g. movements of hands and fingers are required to operate a keyboard). In contrast to commonly used human-machine interfaces, BCI device allows sending commands from brain to computer directly, without using any brain’s normal output pathways or peripheral nerves and muscles [Wolpaw et al. 2000]. This unique feature contributed to great interest in the study of neural engineering, rehabilitation and brain science during last 30-40 years. Currently available systems can be used to reestablish a communication channel for persons with severe motor disabilities, patients in a “locked-in” state or even completely paralyzed people. It is predicted that within next few years BCI systems should be practically implemented. BCI device measures ongoing subject’s brain activity, usually electroencephalographic (EEG) signals, and tries to recognize mental states or voluntarily induced changes in the brain activity. Extracted and correctly classified EEG signal features are translated into appropriate commands which can be used for controlling a computer, wheelchair, operating a virtual keyboard, etc. The various systems differ in the way the intention of the BCI user is extracted from her/his brain electrical activity. Among the approaches, two groups of techniques are most popular, based on:
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 3–14. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
4
M. Byczuk, P. Poryzała, and A. Materka
• identifying changes of the brain activity which are not externally triggered, • detecting characteristic waveforms in EEGs, so called Visual Evoked Potentials (VEP), which are externally evoked by visual stimulus. The class of VEP based BCI systems offer many advantages: easy system configuration, high speed, large number of available commands, high reliability and little user training. Visually evoked potentials can be recorded in the primary visual cortex which is located at the back part of the human brain. VEPs reflect user’s attention on visual stimulus which may be in the form of short flashes or flickering light at certain frequency. VEPs elicited by brief stimuli are usually transient responses of the visual system and are analyzed in time domain. VEPs elicited by flickering stimulus are quasi periodic signals, called Steady-State VEP (SSVEP), and are analyzed in frequency domain. Fig. 1 shows the simplified block diagram of a typical VEP based BCI system. Each target (a letter, direction of a cursor movement, etc.) in a VEP based BCI is encoded by a unique stimulus sequence which in turn evokes a unique VEP pattern. A fixation target can thus be identified by analyzing the characteristics of the VEP: a time of appearance (for flash VEP detection) or fundamental frequency (for SSVEP detection).
Fig. 1 A simplified block diagram of a typical VEP-based BCI system
SSVEP-Based Brain-Computer Interface
5
2 SSVEP-Based BCI Systems In majority of VEP-based BCIs, frequency encoding is used (interface operation is based on SSVEP detection). Energy of SSVEP signals is concentrated in very narrow bands around the stimulation frequency and its harmonics, whereas spontaneous EEG signal may be modeled as a Gaussian noise whose energy is spread over the whole spectrum. Thus SSVEPs can be easily detected using feature extraction based on spectral analysis and classification algorithms. Moreover neither system nor user requires any training since the EEG response to stimulus is known in advance. This approach results in a minimal number of electrodes required for proper operation of BCI, the ability of real-time operation and low hardware cost. Therefore, steady-state visual evoked potentials give raise to a very promising paradigm in brain-computer interface design. Currently, development of BCI systems for real-life applications is emphasized. Research teams still encounter many problems in changing demonstrations of SSVEP-based BCIs into practically applicable systems [Wang et al. 2008]. Two major constraints are: system capacity (a number of available targets or commands) and detection time. They are directly related to speed and reliability of BCI. The overall performance of the BCI system can be expressed numerically with information transfer rate (ITR), which describes the amount of information transferred per unit time (usually minute). ITR is defined as [Wolpaw et al. 2000]: 1− P ) , ITR = s ⋅ log 2 N + P log 2 P + (1 − P ) log 2 ( N − 1
(1)
where s is the number of detections per minute, N is the number of possible selections, and P is the probability that the desired selection will actually be detected. It is assumed that each selection has the same probability of being the one that user desires and each of the other selections have the same probability of selection. ITR of currently available systems usually varies from 10 up to 50 bits/minute. System capacity is limited by the stimulation frequency band (number of available stimulation frequencies), which is directly related to brain electrophysiology and visual information processing mechanisms [Regan 1989]. Detection speed is limited by signal-to-noise ratio (SNR), which may be decreased in subjects with strong spontaneous activity of the visual cortex. Limitations described above can be addressed with different approaches: • Research on stimulation methods that will increase interface capacity when using a limited number of stimulation frequencies: time, frequency or pseudorandom code modulated VEP stimulations [Bing et al. 2009], phase coding, multiple frequency stimulation methods [Mukesh et al. 2006], etc. Advanced methods of stimulation can be used to design interface with more commands available without a need to extend the stimulation frequency band.
6
M. Byczuk, P. Poryzała, and A. Materka
• Research on lead selection for the purpose of SNR enhancement – performance or even applicability of SSVEP-based system is limited due to biological differences between users [Wang et al. 2004]. For subjects with different SSVEP source locations, optimized electrode positions can help achieve high signal-tonoise ratio and overcome SSVEP detection problems. • Research on stimulation methods for the purpose of SNR enhancement – for example half-field alternate stimulation method described in [Materka and Byczuk 2006].
3 Prototype BCI System In our previous research we focused on SNR enhancement. The result of our work was a novel technique of alternate half-field stimulation. The method was practically implemented and tested in the prototype BCI system [Materka et al. 2007] designed in the Institute of Electronics at the Technical University of Lodz. The system can be classified as a noninvasive, SSVEP-based, frequency encoded BCI. Simplified block diagram of the prototype interface is depicted in Fig. 2.
Fig. 2 A block diagram of the prototype SSVEP-based BCI system
The system was implemented as a virtual keypad. Visual stimulator consisted of 8 labeled targets (keys) flickering at different frequencies (Fig. 3). Each target contained three light-emitting diodes (LEDs): two LEDs for alternate stimulation (B) and additional LED as a biofeedback indicator (A), which constantly provided real-time information about amplitudes of the measured SSVEP signals.
SSVEP-Based Brain-Computer Interface
7
A
B
Fig. 3 A view of stimulator targets: A – fixation point and biofeedback indicator, B – stimulating lights
Proper arrangement of stimulation lights within single symbol ensures their images are positioned on the left and right half of the visual field on the human retina. This leads to SSVEP responses (with opposite phases) in the right and left half of the visual cortex, respectively. Differential measurement of the EEG signals from both halves of the visual cortex allows significant SNR increase of the measured SSVEP signals. System operation and usability was tested with contribution of 10 volunteers. Tests showed it is much faster than conventional BCI devices based on SSVEPs. For the user who achieved the best results, detection time was 1.5s (40 detections per minute) with 0% error rate. In this case information transfer rate calculated according to formula (1) equals 120bits/minute. High transfer rate of the interface was obtained mainly due to short detection times (direct result of SNR enhancement). Communication speed of the designed system would be sufficient for most applications but limited capacity makes its usage as a full-alphabet keyboard difficult. Thus new methods for increasing the number of available commands must be developed in order to design fully keyboard-compatible computer interface. Preliminary observations showed that amplitudes of the detected SSVEP signals and the frequency band in which strong SSVEPs can be observed depend on some parameters of the stimulation, e.g. color, size, intensity, layout of stimulation lights and their frequency. Further investigation of the influence of these parameters on the spectral properties of the SSVEPs is the subject of our present research.
8
M. Byczuk, P. Poryzała, and A. Materka
4 Experimental Setup Two experiments were carried out using an alternate half-field stimulation technique. The EEG signal was measured differentially using two electrodes located on the left and right side of the occipital part of the scalp (positions O1 and O2 of the international 10-20 system of EEG electrode placement) with a reference electrode placed between them (position Oz). Amplified EEG signal was sampled at 200Hz. The user was sitting on a comfortable ergonomic chair to minimize activity of neck muscles which might produce EMG artifacts. Fixation LED
A) Viewing direction
B)
F SR
SL
D Beams Stimulating LED
Screen
5mm
Fig. 4 A side view of the stimulator (A), a view of stimulating lights SL, SR and a fixation light F on the screen of stimulator (B)
Visual stimulator used in experiments consisted of three LEDs which were projecting the stimulus on the screen (Fig. 4) – to diffuse (blur) the image of the contrastive shape of the light-emitting semiconductor region in the LED devices. The stimulus was in the form of two lights (left – SL, and right – SR) that flash with the same frequency, alternatively in time. An extra light source (F) was placed between two stimulating lights, slightly above them. This light was used as a fixation point. Additionally, intensity of the light F was changing according to the calculated SSVEP amplitude, to provide a feedback between the user and the system. This helped the user to concentrate his/her attention on the fixation light F. Table 1 Stimulator parameters Experiment
1
2
Diameter (D)
4mm
6mm
Color of lights SL and SR
Green
Red
Color of light F
Red
Green
Intensity of lights SL and SR
Low
High
SSVEP-Based Brain-Computer Interface
9
The distance between the screen of stimulator and the user’s eyes was about 50cm. Two experiments were carried out using different sets of light, described in Table 1. Intentionally all stimulation parameters were changed in the experiment 2 compared to experiment 1, just to demonstrate that these parameters have measurable influence on the SSVEP characteristics. More comprehensive examination of the effect of systematic changes in parameters on the SSVEP BCI performance is currently under way in our laboratory In both experiments, the diameter of fixation light F was 3mm and modulation depth of stimulating lights was 100% (sinusoidal modulation). Stimulation frequency was changing every 5-10 seconds within the range 20-50Hz with a fixed step of 0.78Hz. Each experiment lasted about 5-7 minutes.
5 Results For rough comparison of SSVEP amplitudes in both experiments, power spectral density (PSD) of EEG signals were computed in a sliding window of 1.28s duration (256 samples). This window corresponds to the frequency resolution of about 0.78Hz which was a frequency step of the stimulus. Prior to FFT calculation, the measured signals were filtered using comb filters to reduce the spectral leakage of Fourier analysis [Materka and Byczuk 2006]. Computed spectrograms are shown in Fig. 5 and Fig. 6 for experiment 1 and 2 respectively, carried out by the same user (subject 1).
Fig. 5 A spectrogram of measured EEG signal in experiment 1
10
M. Byczuk, P. Poryzała, and A. Materka
A comparison of the spectra illustrated in Fig. 5 and Fig. 6 demonstrates different frequency ranges with strong SSVEP components. In experiment 1 strong SSVEPs are visible in the range 20-40Hz whereas in experiment 2 SSVEP components may be observed at higher frequencies, in the range 30-50Hz. It seems that evoked potentials are easier to detect in Fig. 5 (because they have higher amplitude than SSVEPs in Fig. 6), so stimulation settings used in experiment 1 are better. However, they are not necessarily weaker in terms of signal power distance from the noise power floor, as will be discussed below.
Fig. 6 A spectrogram of measured EEG signal in experiment 2
To compare both experiments more objectively, a signal-to-background ratio (SBR) for each SSVEP component was computed. An SBR coefficient for frequency f is defined here as a ratio of the PSD at frequency f to the mean PSD value of the signal components at N=10 adjacent discrete frequencies: SBR ( f ) =
N ⋅ PSD ( f ) N /2
( PSD ( f − k ⋅ Δ f ) + PSD ( f + k ⋅ Δ f ))
,
(2)
k =1
where Δf = 0.78Hz is a frequency resolution of Fourier analysis applied for PSD calculation. Maximum values of SBR coefficients for each SSVEP frequency were collected and frequency characteristics for each experiment were estimated using polynomial approximation. A comparison of SBR characteristics for the two experiments carried out by the same subject is shown in Fig. 7.
SSVEP-Based Brain-Computer Interface
11
Fig. 7 A comparison of SBR frequency characteristics measured in experiment 1 (dashed line) and in experiment 2 (solid line)
SBR coefficients obtained in experiment 2 have higher peak value than in experiment 1, while Fig. 6 shows smaller SSVEP amplitudes than it is visible in Fig. 5. It means that EEG signal components other than SSVEP (so-called EEG noise) have much smaller amplitude than SSVEPs in experiment 2. This results in higher signal-to-background ratio. Characteristics presented in Fig. 7 confirm different frequency ranges of strong SSVEP in both experiments, shown in spectrograms (Fig. 5 and Fig. 6, respectively). If detection of SSVEP was done by comparing SBR with threshold value T=30, frequency range of detected SSVEPs would be about 27-40Hz (13Hz width) in experiment 1 and 37-48Hz (11Hz width) in experiment 2. Using different stimulation settings (e.g. in terms of color of the stimulus light) in a BCI system for frequencies below and above 38Hz (a crosspoint of both characteristics in Fig. 7) it is possible to increase stimulation frequency range to 27-48Hz (21Hz width). This may lead to an increased number of available BCI commands. Both experiments were repeated for another subject (Subject 2). Fig. 8 and Fig. 9 present SBR characteristics of the EEG signals measured from Subject 2 in experiment 1 and experiment 2 respectively, calculated according to formula (2).
12
M. Byczuk, P. Poryzała, and A. Materka
Fig. 8 SBR characteristics of measured EEG signal for Subject 2 as function of time in experiment 1
Fig. 9 SBR characteristics of measured EEG signal for Subject 2 as function of time in experiment 2
SSVEP-Based Brain-Computer Interface
13
Frequency range was extended to 3-50Hz in experiment 1 due to the presence of strong SSVEP responses for stimulation frequencies below 20Hz (7-45Hz) in this subject case (Fig. 8). Moreover, experiment 1 for Subject 2 shows different nature of SSVEPs. For frequencies 18-24Hz, the responses contain strong second harmonics and very weak component at the fundamental frequency, whilst for Subject 1 SSVEP signals contain only fundamental harmonics for all frequencies. Fig. 9 shows different stimulation frequency band with strong SSVEPs in experiment 2 (14-35Hz, including second harmonic responses for stimulation frequencies 18-24Hz), when compared to Subject 1 (Fig. 6).
6 Conclusions Steady-state visual evoked potentials give rise to a very promising paradigm in brain-computer interface design. SSVEP-based BCI systems are the most effective solution, in terms of speed and accuracy, when compared to other BCI devices. Experiments presented in the paper show that characteristics of the steady-state visual evoked potentials depend on parameters of visual stimulus. Then we postulate the performance of SSVEP-based BCI systems can be improved by proper construction, configuration and adjustment of the visual stimulator. Moreover, SSVEP characteristics depend on individual features of the subject’s visual system. This suggests that stimulation parameters and SSVEP detection algorithm (tuned to the stimulus fundamental frequency or its second harmonic) should be individually adjusted for a subject. Further research will focus on distinguishing which parameters of stimulus (color, size, shape, etc.) have the strongest influence on the SSVEP characteristics.
Acknowledgment This work is supported by Polish Ministry for Science and Higher Education grant NN515 520838.
References [Bing et al. 2009] Bin, G., Gao, X., et al.: VEP-based brain-computer interfaces: Time, frequency and code modulations. IEEE Comput. Intell. Mag. 4(4), 22–26 (2009) [Materka and Byczuk 2006] Materka, A., Byczuk, M.: Alternate half-field stimulation technique for SSVEP-based brain–computer interfaces. IEEE Electron Lett. 42(6), 321– 322 (2006) [Materka and Byczuk 2006] Materka, A., Byczuk, M.: Using comb filter to enhance SSVEP for BCI application. In: 3rd International Conference on Advances in Medical, Signal and Information Processing MEDSIP 2006, IET Proceedings Series CP520Z (CD-ROM), Glasgow, United Kingdom, 4 p (2006) [Materka et al. 2007] Materka, A., Byczuk, M., Poryzała, P.: A virtual keypad based on alternate half-field stimulated visual evoked potentials. In: Int. Symp. on Information Technology Convergence, Jeonju, Republic of Korea, pp. 296–300 (2007)
14
M. Byczuk, P. Poryzała, and A. Materka
[Mukesh et al. 2006] Mukesh, T.M.S., Jaganathan, V., Reddy, M.R.: A novel multiple frequency stimulation method for steady state VEP based brain computer interfaces. Physiol. Meas. 27(1), 61–71 (2006) [Regan 1989] Regan, D.: Human brain electrophysiology – evoked potentials and evoked magnetic fields in science and medicine. Elsevier, New York (1989) [Wang et al. 2004] Wang, Y., Zhang, Z., Gao, X., et al.: Lead selection for SSVEP-based brain-computer interface. In: Proc. 26th Int. IEEE EMBS Conf., pp. 4507–4510 (2004) [Wang et al. 2008] Wang, Y., Gao, X., Hong, B., et al.: Brain–computer interfaces based on visual evoked potentials, feasibility of practical system design. IEEE Eng. in Medicine and Biology 27(5), 64–71 (2008) [Wolpaw et al. 2000] Wolpaw, J.R., et al.: Brain-computer interface technology: A review of the first international meeting. IEEE Trans. Rehab. Eng. 8, 164–173 (2000)
Design and Development of a Guideline for Ergonomic Haptic Interaction L.M. Muñoz1, P. Ponsa1, and A. Casals1,2 1
Department of Automatic Control, Universitat Politècnica de Catalunya, Barcelona Tech, Spain {luis.miguel.munoz,pedro.ponsa}@upc.edu 2 Institute for Bioengineering of Catalonia, Barcelona, Spain
[email protected] Abstract. The main goal of this chapter is to propose a guideline for human-robot systems focused on ergonomic haptic interaction. With this aim, this model presents several main parts: a set of heuristic indicators in order to identify the attributes of the haptic interaction, the relationship between indicators, the human task and the haptic interface requirements and finally an experimental task procedure and a qualitative performance evaluation metrics in the use of haptic interfaces. The final goal of this work is the study of possible applications of haptics in regular laboratory conditions, in order to improve the analysis, design and evaluation of human task over haptic interfaces in telerobotic applications.
1 Introduction Traditional human-machine interfaces are usually provided with visual displays and sometimes with auditory information (humans process information coming from the visual channel). Compared to vision and audition, our understanding of human haptics, which includes the sensory and motor systems of the human hand, is very limited. One of the reasons of this drawback is the experimental difficulty of presenting controlled stimuli, due to the fact that haptic systems are bidirectional - they can simultaneously perceive and act upon their environment. The interface is the element that permits users to perform a task efficiently and establish a dialog between the human and the system. Interfaces with haptic feedback can enhance the realism of interactive systems through more intuitive interactions (with other variables as force, distance or speed). In such situations, the interaction is often bidirectional, providing some derived measures (mechanical impedance, ratio of force to speed and transparency). With the development of interfacing technology and due to the strong trend to include haptics in multimodal environments, a working group called TC159/SC4/ WG9 has been created with the aim to develop specific guidelines in this domain. For instance, in complex systems, as in telesurgery applications, haptics is a topic of research. In such systems, the aim is to determine the feedback to be applied to
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 15–29. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
16
L.M. Muñoz, P. Ponsa, and A. Casals
improve surgeons’ performance in tasks such as suturing and knot-tying tasks, which are very time consuming. Thus, the availability of guidelines and recommendations are highly appreciated. There are other situations in which human visual and auditory channels are heavily loaded, or on the contrary, visual and auditory information is limited (e.g. undersea navigation in zones having high density of planktons or teleoperation with a variety of information being sent through the visual channel, like in a pilot training cockpit). In such cases, the availability of haptic channels can be very important and the development of haptic interfaces can help to the progress of human-robot interaction. In order to use haptic and tactile interfaces and study the human-robot interaction, the following items are necessary a) evaluate the levels of automation, b) evaluate the relationship between experts, c) create and use an interface object reference model, d) solve the standardization problem and finally e) study the context of their use. The levels of automation include a set of operational modes (manual control, automatic control or shared control) between human and robot in each domain. Another complex aspect is the coordination between experts from different fields (robotics technicians, interface designers, experts on human factors, end users) and the relationship between human and robots (task allocation). Lynch and Mead, advocated that user interface reference models should “provide a generic, abstract structure which describes the flow of data between the user and the application, its conversion into information, and the auxiliary support which is needed for an interactive dialogue” [Lynch and Mead 1986]. This model provides an understanding of the many facets involved in individual and groups of tactile/haptic interaction objects. However, psychophysics studies and ergonomics considerations of the interaction in human-computer systems are not taken into consideration. Later on, Carter uses a reference model that can help to standardize the design and construction of tactile or haptic interaction objects ((identify, description attributes, representation attributes, and operations), by ensuring that all relevant aspects of such interactions are taken into consideration. Recently reference models have been used to define the major components of accessible icons, organizing ergonomic and user interface standardization. The engineering community interest in haptic and tactile interactions and standardization has grown considerably based on recent research [van Herp et al., 2010]. One of the main difficulties to be solved in this domain is related to considering a human- centred design approach and the study of effective human-robot tasks in complex environments. An ergonomic assessment can ensure that systems are designed taking sufficient attention to interoperability, improvement of task effectiveness, avoiding human error and enhancing the comfort and well-being of users. A guideline on ergonomic haptic interaction included inside a generic framework (analysis, design, evaluation) can be useful in the study of several humanrobot systems, for example in assisted surgical applications. The main proposal of this work is the preparation of a guideline for ergonomic haptic interaction design
Design and Development of a Guideline for Ergonomic Haptic Interaction
17
(GEHID), which provides an approach that relates human factors to robotics technology. The guideline is based on measures that characterize the haptic interfaces, users capabilities and the objects to manipulate. In a human-robot scenario the most important part is the human sensory-motor activity, but it is also necessary to analyze the typology of the tasks to be performed and the context of use of the haptic interface. In the next section we describe the previous work on assessment of the quality of the haptic interaction framework. Section 3 explains the functional description of the GEHID indicators. In section 4 the characteristics of the task in haptic applications and the relationship between tasks and indicators are described. In section 5, the performance evaluation method in human-robot systems and a development of life cycle are presented in order to show the ergonomic validation of the proposed guideline. Finally, some conclusions and the future work are presented.
2 Haptic Interaction Framework In order to define a haptic interaction framework it is necessary to understand that many researchers and developers use two concepts: haptic and tactile. There is no difference between haptic and tactile in most dictionary definitions, however, many researchers and developers use haptic to include all haptic sensations and use of tactile is related to the stimulation of the skin (mechanical, thermal, chemical, and/or electrical stimulation). A haptic/tactile interaction framework based on a human centred design approach needs a standard methodology based on the study of human-robot tasks, a process model approach (analyse of requirements, guidance, performance evaluation), an ergonomic validation and a clear layout of the objects to manipulate. Table 1 shows the efforts of the International Standard Organization, the ISO’s work, in this domain. The ISO 9241-920 Ergonomics of human-system interaction constitutes a guidance for the design of tactile and haptic interactions [ISO 9241920 2009]. Table 2 shows diverse guidelines in human-computer interaction, usability engineering and haptic/tactile interaction. The guideline for ergonomic haptic interaction design, GEHID guide, is a method that seeks to cover aspects of the haptic interface design and the humanrobot task in order to improve the performance of haptic teleoperation applications.
18
L.M. Muñoz, P. Ponsa, and A. Casals
Table 1 ISO’s work on tactile/haptic interaction. An adaptation of van Erp work [van Erp et al., 2010] ISO Number
State
ISO 9241-900 Introduction to tactile and haptic interaction
Not started
ISO 9241-910 Framework, terms and definitions
Work in progress
ISO 9241-920 Ergonomics of human-system interaction
Finished in 2009
ISO 9241-930 Haptic/tactile interactions in multimodal environments
Not started
ISO 9241-940 Evaluation of tactile / haptics interactions
Work in progress
ISO 9241-971 Accessibility and use of haptic / tactile interfaces in public environments
Not started
Table 2 Some Guidelines in Human-computer interaction and haptic/tactile interaction Guideline
Domain
Colwell et. al, 1998: “ guidelines for the design of haptic interfaces and virtual environments”
Haptic interface; Blind people
Miller and Zeleznik, 1999: “ 3D haptic interface widgets”
3D Interaction; X windows system
Challis and Edwards, 2000: “design principles for tactile interaction”
Static tactile interaction; Touchpad, tactile overlay
Sjöström, 2002: “guidelines for haptica and tactile interfaces”
Non visual haptic interaction design; Blind people.
The GEHID guide can offer recommendations and define requirements in the use of a new haptic interface created or can help to improve technical features of commercial haptic interfaces. The guideline can be structured into two parts. The first details a set of selected attributes following the heuristic methods proposed by experts in the Human-Computer Interaction domain and haptic interaction domain. The second part is a task allocation: a clear relationship between attributes and basic haptic tasks. Next sections of the chapter explain in more detail the haptic guideline proposed.
Design and Development of a Guideline for Ergonomic Haptic Interaction
19
3 GEHID Indicators When an operator performs a task directly over an object, for moving it for example, (Fig. 1) a reaction force is generated from the object and perceived by the hand through different receptors. When the same task is performed by a teleoperated system (Fig. 2), firstly the interface device should be able to sense the actions of the operator hand, secondly the teleoperated device must have the ability to reproduce the actions of the operator, and, third the reaction forces and movements measured on the object should be faithfully measured in order to be finally reproduced on the interface device.
Fig. 1 Reaction force perceived by the operator in his interaction with the objects of the environment
Fig. 2 (left remote area), (right local area) Reaction force perceived by the operator in his teleoperated interaction through a haptic interface
The aim of the indicators is providing a quantitative and/or qualitative measure of the perceived information from the teleoperated environment, through a teleoperation interface, in order to characterize a task and assess the degree to which a task can be perceived by an operator. Depending on the nature of the task, one or more indicators should be taken into account in order to make the assessment. The indicators represent properties, characteristics or energies that the operator perceives in a manual exploration or manipulation. Some of these indicators act mainly over the cutaneous receptors, others over the kinesthetics, and others over a combination of both.
20
L.M. Muñoz, P. Ponsa, and A. Casals
Perception indicators are classified into groups in accordance with their physical properties or behavior similarities. Although these indicators are magnitudes or physical properties, they are defined taking into account the operator’s sensing and perception. The indicators considered here are: Texture Texture produces a variation in the perceived movement while exploring an object as a consequence of a displacement on its surface. Superficial texture is characterized by size, distance and slope of the elements belonging to the surface and becomes a variation of the movement of the tip or object in contact. This variation can cause a vibration with a specific frequency, amplitude and waveform or a change in the acting speed in function of the exerted force, normally as a consequence of a variation in the coefficient of friction of the surface. Some of the properties that can be extracted observing a superficial texture are: • Rugosity: presence of irregularities on the surface. Can be characterized from the depth or height of the irregularities with respect to the average surface. The order of magnitude is normally under a millimeter. In a general way, the variation of the movement takes place in the direction normal to the surface. • Paterns: Presence of repetitive shapes (like chanels, grooves, undulatings, etc.) or simbolic representation (hieroglyphics, Braille, etc.). • Friction: Force oposed to the movement over the surfaces in contact. In a general way, the variation of the movement will be in the direction tangencial to the surface. Reaction Force/Moment A reaction produces a variation in the force or moment perceived when contacting with an object or exerting a force over it. Forces and moments have a vector nature, in which the module is constrained by the range of values of force required by the task and the direction of each degree of freedom. The range of forces, resolution and number of degrees of freedom of the interface device must be in accordance with those required by the task. Pressure Pressure produces a variation in the force perceived under a contact with a surface unit. The feeling of pressure is perceived through the cutaneous receptors. Then, pressure is always perceived from the interface device and the perception value depends directly on the contact force perceived, being the surface in contact generally constant. In order to perceive the pressure as a particular magnitude, the interface device should be able to change the surface in contact with the operator hand or fingers. Compliance The variation in the perceived position as a consequence of an exerted force, which is restored when the force disappears, constitutes the compliance concept.
Design and Development of a Guideline for Ergonomic Haptic Interaction
21
The behavior of compliance is governed by the Hooke’s law. Some of the magnitudes related with compliance are: • Elasticity: magnitude related directly to compliance. The perception of elasticity depends on the variation of position x with respect to the force F exerted (F = k·x, being k the constant of elasticity). The resolution and range of forces and displacements of the interface device should be in accordance with the task. • Rigidity: absence of displacement perceived when a force is exerted. This happens when the constant of elasticity is very large. A short reaction time is required form the interface device in order to perceive the feeling of rigidity when an effort is aplied at a given time. Weight/Inertia A resistance is perceived when an object is held statically or is displaced freely. The cutaneous and kinesthetic receptors are involved in the production of this effect. The resolution and number of degrees of freedom of the interface device must be in accordance with those required by the task (direction of movement of the object, the object needs to be oriented, etc.). Impulse/Collision Perception of the change of the momentum (mass×velocity). It is perceived like a significant variation of the interface speed. This variation happens when colliding with objects in the environment or when there is a loss of mass in the objects (breaking or disassembling). Vibration Variation in the position perceived in a cyclic way. Vibration differs from texture, taking into account that the variation of the movement produced by vibrations doesn’t appear during exploration, but is a movement generated from the manipulated object. The cutaneous receptors are mainly involved especially when the amplitude is small ( 1 – in the range [0, r lk ] ,
where the minimal solutions R J , J < I , are excluded from the search space. Let R (t ) = [rlk (t )] be some t-th solution of optimization problem (8), that is
F (R (t )) = F (R 0 ) , since for all R ∈ S (μˆ A , μˆ B ) we have the same value of
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
381
criterion (8). While searching for upper bounds r lk it is suggested that I it is suggested that rlk (t ) ≥ rlk (t − 1) , аnd while searching for lower bounds r lk
rlkI (t ) ≤ rlkI (t − 1) . The definition of the upper (lower) bounds follows the rule: if I R (t ) ≠ R (t − 1) , then r lk (r lk ) = rlk (t ) . If R (t ) = R (t − 1) , then the search for the
interval solution [R I , R ] is stopped. Formation of intervals (10) will go on till the condition R I ≠ R J , J < I , has been satisfied. The chromosome needed in the real-coded genetic algorithm for solving fuzzy relational equations (9) includes only the real codes of parameters rlk , l = 1, N , k = 1, M . The crossover operation is carried out by way of exchanging genes inside each variable rlk . Parameters of membership functions are defined simultaneously with the null solution.
5 Computer Experiment The aim of the experiment is to generate the system of IF-THEN rules for the target “two inputs ( x1 , x2 ) – two outputs ( y1 , y2 )” model presented in Fig. 1: y1 = ((2 z − 0.9)(7 z − 1) (17z − 19) (15z − 2)) / 10 , y2 = −y1 / 2 +1,
where z = (( x1 − 3.0) 2 + ( x2 − 2.5) 2 ) / 40 . The training data in the form of the interval values of input and output variables is presented in Table 3.
Fig. 1 “Inputs-outputs” model-generator
382
A.P. Rotshtein and H.B. Rakytyanska
ˆ ,Y ˆ ) Table 3 Training data ( X s s Inputs s 1
Outputs
x1
x2
[0.2, 1.2]
[0.3, 1.6]
y1
y2
[0, 1.0]
[0.5, 1.0]
2
[0.2, 1.2]
[1.3, 4.0]
[0, 0.8]
[0.6, 1.0]
3
[0.7, 3.0]
[0.3, 1.6]
[0, 2.3]
[-0.15, 1.0]
4
[0.7, 3.0]
[1.3, 4.0]
[0, 3.4]
[-0.7, 1.0]
5
[3.0, 5.3]
[0.3, 1.6]
[0, 2.3]
[-0.15, 1.0]
6
[3.0, 5.3]
[1.3, 4.0]
[0, 3.4]
[-0.7, 1.0]
7
[4.8, 5.8]
[0.3, 1.6]
[0, 1.0]
[0.5, 1.0]
8
[4.8, 5.8]
[1.3, 4.0]
[0, 0.8]
[0.6, 1.0]
The total number of fuzzy terms for input variables is limited to three. The total number of combinations of input terms is limited to six. The classes for output variables evaluation are formed as follows:
[y , y1] =[0, 0.2) ∪[0.2, 1.2) ∪[1.2, 3.4] , [ y , y2 ] = [−0.7, 0) ∪[0, 1.2] . 1 2 e11
e12
e13
e21
e22
The null solution R 0 presented in Table 4 together with the parameters of the knowledge matrix is obtained using the genetic algorithm. The obtained null solution allows us to arrange for the genetic search for the solution set of the system ˆ ) for the training data take the folˆ ) and μˆ B ( X (9), where the matrices μˆ A ( X s s lowing form: [0.16, 0.74] [0.21, 0.46] [0, 0.50] [0, 0.46] A μˆ = 0 0 0 0
[0.16, 0.52] 0 [0.33, 0.61] [0.28, 0.52] 0 [0.21, 0.46] 0 [0.35, 0.90] [0.28, 0.52] 0 [0.16, 0.74] 0 [0, 0.50] [0.33, 0.61] 0 [0.21, 0.46] 0 [0, 0.50] [0.37, 0.95] 0 ; [0.16, 0.74] [0, 0.50] 0 [0.33, 0.61] [0, 0.50] [0.21, 0.46] [0, 0.46] 0 [0.34, 0.95] [0, 0.50] [0.16, 0.52] [0.16, 0.74] 0 [0.28, 0.52] [0.33, 0.61] [0.21, 0.46] [0.21, 0.46] 0 [0.28, 0.52] [0.35, 0.90]
[0.33, 0.61] [0.35, 0.86] [0.21, 0.74] [0.21, 0.46] μˆ B = [0.21, 0.74] [0.21, 0.50] [0.33, 0.61] [0.35, 0.90]
[0.16, 0.74] [0.21, 0.46] [0.16, 0.50] [0.16, 0.46] [0.16, 0.50] [0.16, 0.46] [0.16, 0.74] [0.21, 0.46]
[0.30, 0.52] [0.30, 0.52] [0.33, 0.61] [0.37, 0.95] [0.33, 0.61] [0.34, 0.95] [0.30, 0.52] [0.30, 0.52]
[0.33, 0.61] [0.35, 0.80] [0.16, 0.74] [0.21, 0.50] [0.16, 0.74] [0.21, 0.50] [0.33, 0.61] [0.35, 0.75]
[0.30, 0.52] [0.30, 0.52] [0.33, 0.61] [0.37, 0.95] . [0.33, 0.61] [0.34, 0.95] [0.30, 0.52] [0.30, 0.52]
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
383
The complete solution set for the fuzzy relation matrix is presented in Table 5, where inputs x1 , x2 and outputs y1 , y2 are described by fuzzy terms Low (L), Average (A), High (H), higher than Low (hL), lower than Average (lA). The obtained solution provides the approximation of the object shown in Fig. 2. In the experiments, crossover and mutation ratios were set to 0.6 and 0.01, respectively. Beginning with the ten initial rules sets the genetic algorithm could then reach the null solution of optimization problem (8) after 5000 generations. About 1000 generations were required to grow complete solution set for fuzzy relational equations (9) (45 min on Intel Core 2 Duo P7350 2.0 GHz). The resulting solution can be linguistically interpreted as the set of the four possible rules bases (see Table 6), which differ in the fuzzy terms describing output y2 in rule 1 and rule 3 with overlapping weights. Table 4 Fuzzy relational matrix (null solution) IF inputs
THEN outputs
x1
x2
y1
y2
e11
e12
e13
e21
e22
C1
(0.03, 0.72)
(0.01, 1.10)
0.15
0.78
0.24
0.52
0.48
C2
(3.00, 1.77)
(0.02, 1.14)
0.85
0.16
0.02
0.76
0.15
C3
(5.96, 0.71)
(0.04, 0.99)
0.10
0.92
0.27
0.50
0.43
C4
(0.00, 0.75)
(2.99, 2.07)
0.86
0.04
0.30
0.80
0.30
C5
(3.02, 1.80)
(2.97, 2.11)
0.21
0.11
0.10
0.15
0.97
C6
(5.99, 0.74)
(3.02, 2.10)
0.94
0.08
0.30
0.75
0.30
Table 5 Fuzzy relational matrix (complete solution set) IF inputs
THEN outputs
x1
x2
y1
y2
hL
lA
H
lA
L
C1
L
L
[0, 0.21]
[0.74, 1.0]
[0, 0.30]
[0.33, 0.61]
[0, 0.52]
C2
A
L
[0.74, 1.0]
[0, 0.16] ∪ 0.16
[0, 0.30]
[0.74, 1.0]
[0, 0.30]
C3
H
L
[0, 0.21]
[0.74, 1.0]
[0, 0.30]
[0.33, 0.61]
[0, 0.52]
C4
L
H
0.86
[0, 0.16]
0.30
0.80
0.30
C5
A
H
0.21
0.16 ∪ [0, 0.16]
[0.95, 1]
[0, 0.16]
[0.97, 1]
C6
H
H
[0.90, 1.0]
[0, 0.16]
0.30
0.75
0.30
384
A.P. Rotshtein and H.B. Rakytyanska Table 6 System of IF-THEN rules Rule
IF inputs
x1
x2
THEN outputs
y1
y2
1
L
L
lA
lA or L
2
A
L
hL
lA
3
H
L
lA
lA or L
4
L
H
hL
lA
5
A
H
H
L
6
H
H
hL
lA
Fig. 2 “Inputs-outputs” model extracted from data
6 Diagnosis of Heart Diseases The aim is to generate the system of IF-THEN rules for diagnosis of heart diseases. Input parameters are: x1 – aortic valve size (0.75–2.5 cm2); x2 – mitral valve size (1–2 cm2); x3 – tricuspid valve size (0.5–2.7 cm2); x4 – lung artery pressure (65–100 mm Hg). Output parameters are: y1 – left ventricle size (11–14 mm); y2 – left auricle size (40–70 mm); y3 – right ventricle size (36–41 mm); y4 – right auricle size (38–45 mm). The training data obtained in the Vinnica clinic of cardiology is represented in Table 7.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
385
Table 7 Training data Input parameters
Output parameters
s
x1
1
0.75-2
2
2
65-69
12-14
41-44
36
38
2
2.0-2.5
2
2
65-69
11-13
40-41
36
38
3
2.0-2.5
1-2
2
71-80
11
40
38-40
40-45 38-40
x2
x3
x4
y1
y2
y3
y4
4
2.0-2.5
2
2
71-80
11
50-70
37-38
5
2.0-2.5
2
0.5-2
72-90
11-12
60-70
40-41
40-45
6
2.0-2.5
1-2
2-2.7
80-90
11-12
40
40-41
38
7
2.0-2.5
2
2
80-100
11
50-60
36
38
8
2.0-2.5
1-2
2-2.7
80-100
11
40
40-41
38-40
In clinical practice, the number of combined heart diseases (aortic-mitral, mitral-tricuspid etc.) is limited to six ( N = 6 ). The classes for output variables evaluation are formed as follows: [ y , y1 ] = [11, 12) ∪ [13, 14] , [ y , y 2 ] = [41, 50) ∪ [50, 70] , 2
1
e11
e12
e21
e22
[ y , y3 ] = [36, 38) ∪ [38, 41] , [ y , y 4 ] = [38, 40) ∪ [40, 45]. 3 4 e31
e32
e41
e42
These classes correspond to the types of diagnoses e j1 low inflation and ej2 dilation of heart sections y1 ÷ y4 . The aim of the diagnosis is to translate a set of specific parameters x1 ÷ x4 into decision e jp for each output y1 ÷ y4 .The null solution R 0 presented in Table 8 together with the parameters of the knowledge matrix is obtained using the genetic algorithm. The obtained null solution allows us to arrange for the genetic search for the solution set of the system (9), where the ˆ ) for the training data take the following form: ˆ ) and μˆ B ( X matrices μˆ A ( X s s [0.62, 0.94] [0.35, 0.62] [0.21, 0.54] [0.21, 0.54] A μˆ = [0.10, 0.54] [0.10, 0.21] [0, 0.21] [0, 0.21]
[0.32, 0.74] [0.74, 0.90] [0.20, 0.52] [0.20, 0.52] [0.08, 0.52] [0.08, 0.21] [0, 0.21] [0, 0.21]
[0.30, 0.40] 0.40 [0.22, 0.56] [0.22, 0.40] [0.07, 0.56] [0.07, 0.22] [0, 0.22] [0, 0.22]
[0.09, 0.31] [0.07, 0.35] [0.08, 0.29] [0.09, 0.31] [0.07, 0.35] [0.08, 0.29] [0.31, 0.72] 0.35 [0.29, 0.77] [0.31, 0.72] 0.35 [0.29, 0.41] ; [0.31, 0.86] [0.35, 0.89] [0.29, 0.41] [0.72, 0.86] [0, 0.35] [0.41, 0.85] [0.72, 0.90] 0.35 0.41 [0.72, 0.90] [0, 0.35] [0.41, 1.0]
386
A.P. Rotshtein and H.B. Rakytyanska
[0.32, 0.40] 0.40 [0.35, 0.77] [0.35, 0.72] B μˆ = [0.35, 0.89] [0.72, 0.86] [0.72, 0.90] [0.72, 0.90]
[0.62,0.94] 0.63 [0.21,0.54] [0.21,0.54] [0.10,0.54] 0.37 0.37 0.37
[0.62, 0.76] [0.74, 0.90] [0.29, 0.76] [0.29, 0.54] [0.29, 0.56] [0.41,0.76] 0.41 [0.41,0.76]
[0.16, 0.35] [0.16, 0.35] [0.35, 0.59] [0.35, 0.59] [0.35, 0.89] 0.59 0.59 0.59
[0.62, 0.94] [0.74, 0.90] [0.31, 0.55] [0.31, 0.55] [0.31, 0.55] 0.55 0.55 0.55
[0.30,0.40] 0.40 [0.35,0.77] [0.35,0.41] [0.35,0.89] [0.41,0.85] 0.41 [0.41,0.88]
[0.62,0.90] [0.74,0.85] [0.31,0.75] [0.31,0.64] [0.31,0.64] [0.64,0.75] 0.64 [0.64,0.75]
[0.30,0.40] 0.40 [0.35,0.56] [0.35,0.40] . [0.35,0.89] [0.26,0.35] 0.35 [0.26,0.35]
Table 8 Fuzzy relational matrix (null solution) IF inputs
x1
THEN outputs
x2
x3
x4
y1
y2
y3
y4
e11
e12
e21
e 22
e31
e32
e41
e 42
(0.75, 1.30)
(2.00, 0.63)
(2.35, 0.92)
(65.54, 8.81)
0.21
0.95
0.76
0.16
0.95
0.10
0.90
0.10
(2.50, 0.95)
(2.00, 0.65)
(2.44, 1.15)
(64.90, 9.57)
0.40
0.63
0.93
0.15
0.90
0.12
0.85
0.06
(2.52, 1.04)
(1.00, 0.82)
(2.32, 0.88)
(69.32, 10.23)
0.92
0.20
0.86
0.08
0.31
0.75
0.14
0.82
(2.55, 0.98)
(2.00, 0.72)
(2.36, 0.90)
(95.07, 21.94)
0.90
0.15
0.24
0.59
0.55
0.02
0.64
0.26
(2.51, 1.10)
(1.92, 0.75)
(0.50, 0.90)
(100.48, 0.85 26.14)
0.18
0.12
0.95
0.10
0.90
0.21
0.93
(2.55, 0.96)
(1.00, 0.94)
(2.30, 1.20)
(95.24, 22.46)
0.37
0.76
0.31
0.22
0.88
0.75
0.14
0.80
The complete solution set for the fuzzy relation matrix is presented in Table 9, where the valve sizes x1 ÷ x3 are described by fuzzy terms stenosis (S) and insufficiency (I); pressure x4 is described by fuzzy terms normal (N) and lung hypertension (H). The obtained solution provides the results of diagnosis presented in Table 10 for 57 patients. Heart diseases diagnosis obtained an average accuracy rate of 90% after 10000 iterations of the genetic algorithm (100 min on Intel Core 2 Duo P7350 2.0 GHz). The resulting solution can be linguistically interpreted as the set of the four possible rules bases (see Table 11), which differ in the fuzzy terms describing outputs y1 and y3 in rule 3 with overlapping weights.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
387
Table 9 Fuzzy relational matrix (complete solution set) IF inputs
THEN outputs
x1 x2 x3 x4 y1
y2 L
y3
y4
L
D
D
L
S I
I
N
[0, 0.4]
[0.94, 1] 0.76
0.16
[0.94, 1] [0, 0.3]
I
I
I
N
0.4
0.63
[0, 0.35] [0.9, 1]
I
S
I
N
[0.4, 1]
[0, 0.54] [0.56, 1] [0, 0.35] [0, 0.55] [0.4, 1]
[0.9, 1]
D [0, 0.3]
L
D
0.9
[0, 0.3]
0.85
[0, 0.3]
[0, 0.31] [0.56, 1] 0.26 ∪ [0, 0.26]
I
I
I
H
[0.9, 1]
[0, 0.37] [0, 0.41] 0.59 ∪ 0.37
I
I
S
H
[0.89, 1]
[0, 0.54] [0, 0.56] [0.89, 1] [0, 0.55] [0.89, 1] [0, 0.31] [0.89, 1]
I
S
I
H
[0.77, 0.9]
0.37 ∪ 0.76 [0, 0.37]
0.55
[0, 0.41] 0.64
[0, 0.59] [0, 0.55] [0.85, 1] 0.75
[0, 0.26] ∪ 0.26
Table 10 Genetic algorithm efficiency characteristics Output
Type
Number
Probability
parameter
of diagnose
of cases
of the correct diagnose
y1
e11 ( e12 )
20 ( 37)
17/20=0.85 (34/37=0.92)
y2
e21 ( e22 )
26 (31)
23/26=0.88 (28/31=0.90)
y3
e31 ( e32 )
28 (29)
25/28=0.89 (27/29=0.93)
y4
e41 ( e42 )
40 (17)
37/40=0.92 (15/ 17=0.88)
Table 11 System of IF-THEN rules Rule
IF inputs
THEN outputs
x1
x2
x3
x4
y1
y2
y3
y4
1
S
I
I
N
D
L
L
L
2
I
I
I
N
D
L
L
L
3
I
S
I
N
L or D
L
L or D
D
4
I
I
I
H
L
D
L
L
5
I
I
S
H
L
D
D
D
6
I
S
I
H
L
L
D
L
7 Prediction of Diseases Evolution The aim is to generate the system of IF-THEN rules for prediction of the number of diseases. We consider information on the incidence of appendicular peritonitis disease according to the data of the Vinnitsa clinic of children’s surgery in 19822009 presented in Table 12.
388
A.P. Rotshtein and H.B. Rakytyanska Table 12 Distribution of the diseases number Four-year cycle
Year
Four-year cycle
1982
1983
1984
1985
1986
1987
1988
1989
Number of diseases
109
143
161
136
161
163
213
220
Year
1990
1991
1992
1993
1994
1995
1996
1997
Number of diseases
162
194
164
196
245
252
240
225
Year
1998
1999
2000
2001
2002
2003
2004
2005
237
258
245
230
Number of diseases
160
185
174
207
Year
2006
2007
2008
2009
Number of diseases
145
189
152
186
Analyzing the disease dynamics in Fig. 3, it is easy to observe the presence of four-year cycles the third position of which is occupied by the leap year. These cycles will be denoted as follows: ... x4i −1}{x1i x2i x3i x4i }{x1i +1... , where i is the number of a four-year cycle, x1i is the number of diseases during two years prior to a leap year, x2i is the number of diseases during one year prior to a leap year, x3i is the number of diseases during a leap year, x4i is the diseases number during the year following the leap year.
Fig. 3 Disease dynamics
The network of relations in Fig. 4 shows that it is possible to predict the situation for the next four years: for the last two years of the i-th cycle and for the first two years of the succeeding (i+1)-th cycle using the data of the first two years of the i-th cycle.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
389
Fig. 4 A network of relations for prediction
It is necessary to find such knowledge matrices F1 ÷ F 3 , which satisfy the limitations imposed on knowledge base volume and provide the least distance between theoretical and experimental number of diseases: L
L
L −1
L −1
i =1
i =1
i =1
i =1
i i 2 i i 2 i +1 i +1 2 i +1 i +1 2 ( x3 − xˆ 3 ) + ( x4 − xˆ 4 ) + ( x1 − xˆ1 ) + ( x2 − xˆ 2 ) =
min
R1−3 ,Β1−3 ,Ω1−3
,
where x3i , x4i , x1i +1 , x 2i +1 are predicted numbers of diseases depending on parameters Β1− 3 and Ω1− 3 of membership functions and rules weights R1− 3 ; xˆ 3i , xˆ 4i , xˆ1i +1 , xˆ 2i +1 are experimental numbers of diseases; L is the number of fouryear cycles used to extract the model. The total number of fuzzy terms for the sickness rate is limited to five. The total number of combinations of input and output terms is limited to four. The null solutions R10 ÷ R 30 presented in Tables 13-15 together with the parameters of the knowledge matrices F1 ÷ F 3 are obtained using the genetic algorithm. The obtained null solutions allow us to arrange for the genetic search for ˆ ) and μˆ B ( X ˆ ) the solution set of the relations F1 ÷ F 3 . The matrices μˆ1A−3 ( X s 1−3 s formed for each relation F1 − 3 on the basis of the observations during L = 6 fouryear cycles in 1982-2005 take the following form: 0.91 0 0 μˆ1A = 0 0 0
0.20 0 0 0 0 0 0 0.75 0.85 0 0 0.74 0 0 0.99 0.79 0.75 0 0.77 0.84 0 0 0 0.70 0 0 0.74 , μˆ 2A = , μˆ 3A = ; 0 0 0.75 0.67 0.63 0.82 0 0 0 0 0 0.52 0.64 0 0.22 0.99 0 0.91 0 0 0.85 0 0 . 33 0 . 50 0 0 . 78 0 0 0 0 0 0 0 0.96 0.82 0.99 0 0.77 0 0.80 0 0.25 0 0.81 0 0.35 0 0.75 0 0 0.84 0 0 0.80 μˆ1B = , μˆ 2B = , μˆ 3B = . 0.64 0 0.80 0 0.70 0 0 0.87 0 0.27 0 0 0 0.68 0 0.89 0 0.95 0 0 . 50 0 0 . 94 0 . 61 0 0 0 . 98 0
390
A.P. Rotshtein and H.B. Rakytyanska Table 13 Fuzzy relational matrix (null solution) for F1
IF inputs
THEN outputs
x3i
(165.22, 21.15)
(223.64, 21.58)
(169.64, 10.17)
(250.69, 21.92)
x 4i
(138.84, 41.75)
(221.12, 15.82)
(201.04, 8.80)
(235.18, 24.89)
(150.17, 20.81)
0.97
0
0
0
(154.35, 22.68)
(179.89, 28.51)
0.31
0.82
0.75
0.25
(152.63, 21.08)
(191.57, 8.74)
0.17
0
0.69
0
(248.27, 26.92)
(257.64, 9.81)
0
0.57
0
0.87
x1i
x 2i
(103.06, 18.55)
Table 14 Fuzzy relational matrix (null solution) for F 2 IF inputs
THEN outputs
x1i +1
x 4i
(155.08, 12.72)
(240.56, 10.21)
(130.25, 9.86)
0.91
0
(220.11, 6.98)
0.69
0
(209.27, 20.56)
0.77
0.89
Table 15 Fuzzy relational matrix (null solution) for F 3 IF inputs
THEN outputs
x 4i
x1i +1
x2i +1 (162.78, 6.09) (190.20, 7.86)
(135.24, 6.85)
(156.84, 10.07)
0.86
0
0
(222.10, 14.78)
(152.38, 16.54)
0
0.78
0
(203.45, 12.57)
(241.18, 13.26)
0
0
0.94
(256.04, 8.21)
The complete solution sets for the relation matrices F1 ÷ F 3 are presented in Tables 16-18, where the sickness rate is described by fuzzy terms Low (L), lower than Average (lA), Average (A), higher than Average (hA), High (H).
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
391
Table 16 Fuzzy relational matrix (complete solution set) for F1 IF inputs
THEN outputs
x3i
lA
hA
lA
H
x 4i
L
hA
A
hA
lA
[0.91, 1.0]
0
0
0
x1i L
x 2i
lA
lA
0.31 ∪ [0, 0.31]
[0.74, 1.0]
0.75 ∪ [0, 0.75]
0.25
lA
A
[0, 0.31] ∪ 0.31
0
[0.64, 0.75] ∪ 0.75
0
H
H
0
0.57
0
[0.85, 1.0]
Table 17 Fuzzy relational matrix (complete solution set) for F 2 IF inputs
THEN outputs
x 4i
x1i +1
lA
H
[0.75, 1.0]
0
hA
0.77 ∪ [0.67, 0.77]
0
A
[0.50, 0.77] ∪ 0.77
0.89
L
Table 18 Fuzzy relational matrix (complete solution set) for F 3 IF inputs
THEN outputs
x 4i
x1i +1
x2i +1
lA
A
H
L
lA
[0.85, 1.0]
0
0
hA
lA
0
0.78
0
A
H
0
0
[0.91, 1.0]
The obtained solution provides the results of up to 2013 prediction presented in Fig. 5. Since experimental values of the numbers of appendicular peritonitis diseases in 2006-2009 have not been used for fuzzy rules extraction, the proximity of the theoretical and experimental results for these years demonstrates the sufficient quality of the constructed prediction model from the practical viewpoint. A comparison of the results of simulation with the experimental data is presented in Table 19. About 8500 generations were required to grow the complete solution set for fuzzy relations F1 ÷ F 3 (90 min on Intel Core 2 Duo P7350 2.0 GHz).
392
A.P. Rotshtein and H.B. Rakytyanska
Fig. 5 Comparison of the experimental data and the extracted fuzzy model Table 19 Prediction of the number of diseases Four-year cycle Year
1982
1983
Experiment
109
143
Theory Error Year
1990
1991
Four-year cycle 1984
1985
1986
1987
1988
1989
161
136
161
163
213
220
170
140
168
175
220
229
9
4
7
12
7
9
1992
1993
1994
1995
1996
1997
Experiment
162
194
164
196
245
252
240
225
Theory
174
205
170
183
236
244
255
211
Error
12
11
6
13
9
8
15
14
Year
1998
1999
2000
2001
2002
2003
2004
2005
Experiment
160
185
174
207
237
258
245
230
Theory
147
180
147
190
250
238
234
215
Error
13
5
27
17
13
20
11
15
Year
2006
2007
2008
2009
2010
2011
2012
2013
Experiment
145
189
152
186
Theory
172
200
161
200
239
247
252
216
Error
27
11
9
14
The resulting solution for relation F1 can be translated as the set of the two possible rules bases (see Table 20), which differ in the combinations of the fuzzy terms describing outputs x3i and x4i in rule 2 with overlapping weights. To provide more reliable forecast, the resulting solution for relation F 2 can also be translated as the set of the two rules bases (see Table 21), which differ in the fuzzy terms describing output x1i +1 in rule 3 with sufficiently high weights.
Fuzzy Genetic Object Identification: Multiple Inputs/Multiple Outputs Case
393
Table 20 System of IF-THEN rules for F1 F1
IF inputs
THEN outputs
Rule
x1i
x2i
x3i
x4i
1
L
lA
lA
L
2
lA
lA
hA
hA
lA
A
or 3
lA
A
lA
A
4
H
H
H
hA
Table 21 System of IF-THEN rules for F 2 and F 3 F2
IF input
THEN output
F3
IF inputs
Rule
x 4i
x1i +1
Rule
x 4i
1
L
lA
1
L
lA
lA
2
hA
lA
2
hA
lA
A
3
A
lA or H
3
A
H
H
x1i +1
THEN output
x2i +1
8 Conclusions This paper proposes a method based on fuzzy relational equations and genetic algorithms to identify MIMO systems. In experimental data analysis rule generation combined with solving fuzzy relational equations is a promising technique to restore and identify the relational matrix together with a rule based explanation. The method proposed is focused on generating accurate and interpretable fuzzy rulebased systems. The results obtained by the application of the genetic algorithm depend on the randomness of the training data initialization, e.g., on the generation of the training intervals during the execution. It may be the case that the model has the highest rule performance only with the special test and training data partition that was used to build and test the model. In the course of the computer experiment the training intervals are generated artificially. For the practical applications the intervals can be derived directly from the problem. Although the work presented here shows good practical results, some future investigations are still needed. While the theoretical foundations of the fuzzy relational equations are well developed, they still call for more efficient and diversified schemes of solution finding. The issue of adaptation of the resulting solution while the samples of experimental data (training intervals) are changing remains unresolved. The genetically guided global optimization should be augmented by more refined gradient-based adaptation mechanisms to provide the invariability of the generated fuzzy rule-based systems. Such an adaptive approach envisages the development of a hybrid genetic and neuro algorithm for solving fuzzy relational
394
A.P. Rotshtein and H.B. Rakytyanska
equations. By our new hybrid approach it will be possible to avoid random effects caused by different partitions of training and test data by detecting a representative set of rules bases.
References [Bourke and Fisher 2000] Bourke, M.M., Fisher, D.G.: Identification algorithms for fuzzy relational matrices. Part 2: Optimizing algorithms. Fuzzy Sets Syst. 109(3), 321–341 (2000) [Branco and Dente 2000] Branco, P.J., Dente, J.A.: A fuzzy relational identification algorithm and its application to predict the behaviour of a motor drive system. Fuzzy Sets Syst. 109(3), 343–354 (2000) [Di Nola et al. 1989] Di Nola, A., Sessa, S., Pedrycz, W., Sancez, E.: Fuzzy relation equations and their applications to knowledge engineering. Kluwer Academic Press, Dordrecht (1989) [Higashi and Klir 1984] Higashi, M., Klir, G.J.: Identification of fuzzy relation systems. IEEE Trans. on Syst. Man Cybern. 14, 349–355 (1984) [Peeva and Kyosev 2004] Peeva, K., Kyosev, Y.: Fuzzy relational calculus theory Applications and software. World Scientific, New York (2004) [Pedrycz 1984] Pedrycz, W.: An identification algorithm in fuzzy relational systems. Fuzzy Sets Syst. 13, 153–167 (1984) [Pedrycz 1988] Pedrycz, W.: Approximate solutions of fuzzy relational equations. Fuzzy Sets Syst. 28(2), 183–202 (1988) [Rotshtein 1998] Rotshtein, A.: Design and tuning of fuzzy rule-based systems for medical diagnosis. In: Teodorescu, N.H., Kandel, A., Gain, L. (eds.) Fuzzy and Neuro-fuzzy Systems in Medicine, pp. 243–289. CRC Press, Boca Raton (1998) [Rotshtein et al. 2006] Rotshtein, A., Posner, M., Rakytyanska, H.: Cause and effect analysis by fuzzy relational equations and a genetic algorithm. Reliab. Eng. Syst. Saf. 91(9), 1095–1101 (2006) [Rotshtein and Rakytyanska 2008] Rotshtein, A., Rakytyanska, H.: Diagnosis problem solving using fuzzy relations. IEEE Trans. Fuzzy Syst. 16(3), 664–675 (2008)
Server-Side Query Language for Protein Structure Similarity Searching B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek Institute of Informatics, Silesian University of Technology, Gliwice, Poland {bozena.malysiak,stanislaw.kozielski,dariusz.mrozek}@polsl.pl
Abstract. Protein structure similarity searching is a complex process, which is usually carried out through comparison of the given protein structure to a set of protein structures from a database. Since existing database management systems do not offer integrated exploration methods for querying protein structures, the structural similarity searching is usually performed by external tools. This often lengthens the processing time and requires additional processing steps, like adaptation of input and output data formats. In the paper, we present our extension to the SQL language, which allows to formulate queries against a database in order to find proteins having secondary structures similar to the structural pattern specified by a user. Presented query language is integrated with the relational database management system and it simplifies the manipulation of biological data.
1 Introduction Proteins are biological molecules that play very important role in all biological reactions in living cells. They are involved in many processes, like: reaction catalysis, energy storage, signal transmission, maintaining of cell structure, immune response, transport of small biomolecules, regulation of cell growth and division. 1.1 Basic Concepts and Definitions Analyzing the general construction of proteins, they are macromolecules built with amino acids (usually more than 100 amino acids, aa), which are linked to each other by peptide bonds forming a kind of linear chain. Formally, in the construction of proteins we can distinguish four description or representation levels: primary structure, secondary structure, tertiary structure and quaternary structure. Primary structure is defined by amino acid sequence in protein linear chain. There are 20 standard amino acids found in most living organisms. Examples of amino acid sequences of myoglobin and hemoglobin molecules are presented in Fig. 1. Each letter in a sequence corresponds to one amino acid in the protein chain.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 395–415. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
396
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
>1MBN:A|PDBID|CHAIN|SEQUENCE VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALG AILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKY KELGYQG >4HHB:A|PDBID|CHAIN|SEQUENCE VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHV DDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Fig. 1 Protein primary structures (amino acid sequences) in FASTA format for two protein molecules: myoglobin (1MBN) and hemoglobin (4HHB, chain A)
Secondary structure describes spatial arrangement of amino acids located closely in the sequence. This description level distinguishes in the spatial structure some characteristic, regularly folded substructures. Examples of the secondary structures are α-helices, β-sheets and loops. Spiral shapes of α-helices are visible in tertiary structures presented in Fig. 2. Tertiary structure (Fig. 2a, Fig. 2b) refers to spatial relationships and mutual arrangement of amino acids located closely and distantly in the protein sequence. Tertiary structure describes the configuration of a protein structure caused by additional, internal forces, like: hydrogen bonds, disulfide bridges, attractions between positive and negative charges, and hydrophobic and hydrophilic forces. This description level characterizes the biologically active spatial conformation of proteins. Quaternary structure refers to proteins made up of more than one amino amid chain. This level describes the arrangement of subunits and the type of their contact, which can be covalent or not covalent (Fig. 2c). The last three representation levels define protein conformation or protein spatial structure, which is determined by location of atoms in the 3D space. The biochemical analysis of proteins is usually carried on one of the description levels and depends on the purpose of the analysis.
a)
b)
c)
Fig. 2 Protein spatial structures represented by secondary structure elements: a) tertiary structure of myoglobin (1MBN), b) tertiary structure of hemoglobin (4HHB, chain A), c) quaternary structure of hemoglobin (4HHB, all chains)
Server-Side Query Language for Protein Structure Similarity Searching
397
1.2 Scope of the Paper In the paper, we concentrate on secondary structures, which are valuable source of information regarding the construction of protein molecules. Secondary structures provide more information about the spatial construction of proteins than primary structures. On the other hand, they are so straightforward that allow the analysis of proteins at a general level, which is often used in protein similarity search tools. This organization level of protein structure allows studying a general shape of proteins and formation of amino acid chain caused by local hydrogen interactions [Allen 2008]. Visualizing protein spatial structures by secondary structure elements, as it is presented in Fig. 2, allows to reveal and discover what types of characteristic spatial elements are present in the protein conformation (whether there are only α-helices or only β-strands in the structure, or maybe both) and what is their arrangement (whether they are heavily segregated or appear alternately). Secondary structure representation of proteins became very important in the analysis of protein constructions and functions. Therefore, it is frequently used in the protein structure similarity searching performed by various algorithms, like these presented in [Shapiro et al. 2004, Can and Wang 2003, Yang 2008]. If we compare amino acid sequences of myoglobin and hemoglobin in Fig. 1, we can conclude they are not very similar. However, if we compare their tertiary structures in Fig. 2a and Fig. 2b represented by secondary structure elements, we can see they are structurally similar. Since we know these two molecules have similar function, which is oxygen transport, we can confirm the thesis that structural similarity often implies the functional similarity. Moreover, functionally similar molecules do not have to posses similar amino acid sequences. This simple example shows how important secondary structures are. 1.3 Goal of Our Work For scientists that study the structure and function of proteins, it is very important to have the ability to search for structures similar to the construction of a given structure. This is usually hindered by several factors: • Data describing protein structures are managed by database management systems (DBMSs), which work excellent in commercial uses. However, they are not dedicated for storing and processing biological data. They do not provide the native support for processing biological data with the use of the SQL language, which is a fundamental, declarative way of data manipulation in most database systems. • Processing must be performed by external tools and applications, which is a big disadvantage. • Results are returned in different formats, like: table-form data sets, TXT or XML files, and users must adopt them. • Secondary processing of the data is difficult and requires additional tools.
398
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
For these reasons, we have decided to develop an effective, dedicated language for querying protein structures on the secondary structure level. Protein Secondary Structure – Structured Query Language (PSS-SQL) that we have designed and developed supports searching the database against proteins having their structure similar to the structure specified in a user’s query. Moreover, the PSS-SQL extends standard syntax of the SQL language and becomes a declarative method of protein similarity searching, which is integrated with the database server. With the use of the PSS-SQL users are able to retrieve data from bio-databases in a standardized manner by formulating appropriate PSS-SQL queries and receive results in a uniform, tabular form (Fig. 3).
User application User web site
PSSPSS-SQL queries
TableTable-format results
PSS-SQL extension
BIO database
User tool Fig. 3 Exploring bio-databases using PSS-SQL language
2 Effective Storage of Secondary Structures in Database Retrieving protein structures by formulating queries in PSS-SQL requires specific storage format for data describing protein secondary structures. In our solution, we store protein structures as sequences of secondary structure elements (SSE). Each SSE corresponds to one amino acid in the primary structure. In Fig. 4 we show the amino acid sequence of the 4'-phosphopantetheinyl transferase acpT in the Salmonella typhimurium and the corresponding sequence of SSEs. Particular elements have the following meaning: H denotes α-helix, E denotes β-strand, C (or L) stands for loop, turn or coil. Such a representation of protein structure is very simple in terms of storing the structure in a database and allows effective processing and searching. Data describing types and location of SSEs in the protein structure may come from different sources – they can be extracted directly from the Protein Data Bank [Berman et al. 2000], taken from systems that classify protein structures, like SCOP [Murzin et al. 1995] or CATH [Orengo et al. 1997], or generated using programs that predict secondary structures on the basis of primary structures.
Server-Side Query Language for Protein Structure Similarity Searching
399
Q8ZLE2 ACPT_SALTY 4'-phosphopantetheinyl transferase acpT OS=Salmonella typhimurium GN=acpT PE=3 SV=1 MYQVVLGKVSTLSAGQLPDALIAQAPQGVRRASWLAGRVLLSRALSPLPEMVYGEQGKPAFSAGAPLWFNLSHSGDTIALLLS DEGEVGCDIEVIRPRDNWRSLANAVFSLGEHAEMEAERPEQQLAAFWRIWTRKEAIVKQRGGSAWQIVSVDSTLPSALSVSQC QLDTLSLAVCTPTPFTLTPQTITKAL CCCEEEECEEECCCCCCCCCEEECCCCCCHHHHHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCCEEEEECCCCEEEEEEC CCCCCEEEEEEECCCCCHHHHHHHHHCCCHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHCCCCCEEEEEECCCCCCCCCCCCC CCCEEEEEEECCCCCCCCCCCCCCCC
Fig. 4 Sample amino acid sequence of the protein 4'-phosphopantetheinyl transferase acpT in the Salmonella typhimurium with the corresponding sequence of secondary structure elements
Nevertheless, they should be represented in the common format as a sequence of H, E, C/L symbols. In our research, we store sequences of SSEs in the ProteinTbl table of the Proteins database. The schema of the table is presented in Fig. 5. This table will be used in all examples presented in following chapters. id ---799 800 808 809 810
protID -----------ABCX_GUITH 1A02_GORGO 1A110_ARATH 1A111_ARATH ABCX_PORPU
protAC -----O78474 P30376 Q9LQ10 Q9S9U6 P51241
name ---------------Probable ATP-... Class I histo... Probable amin... 1-aminocyclop... Probable ATP-...
length -----253 365 557 460 251
primary ---------------MKKKILEVTNLHA... MAVMAPRTLLLLL... MTRTEPNRSRSSN... MLSSKVVGDSHGQ... MSDYILEIKDLHA...
secondary ---------------CCCCEEECCCHHH... CCCCHHHHHHHHH... CCCCCCCCCCCCC... CCCEEEECCCCCC... CCCHHHHHHHHHH...
Fig. 5 Schema of the table storing protein sequences of SSEs
Detailed description of particular fields of the ProteinTbl is presented in Table 1. Table 1 Description of particular fields of the ProteinTbl database table Field
Description
id
internal identifier of protein in a database
protAC
protein Accession Number
protID
protein IDentification in the popular SwissProt database
name
protein name and description
length
protein length in amino acids
primary
primary structure of a protein (amino acid sequence)
secondary
sequence of secondary structure elements of a protein
3 Alignment Method for Sequences of Secondary Structure Elements In our approach, we assume that the similarity searching is performed by a pairwise comparison of the query sequence of SSEs to a candidate sequence from database. The PSS-SQL language makes use of the alignment procedure in order to
400
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
match two sequences of SSEs. In the section, we describe how the alignment method works. Suppose we have two proteins A and B, one of which represents the given pattern and the other a candidate protein from the database. We represent primary structures of proteins A and B in the following form: P A = p1A p2A ... pnA and P B = p1B p2B ... pmB , where: n is a length of the protein A (in amino acids), m is a
length of the protein B, pi ∈ P , and P is a set of 20 common types of amino acids. We represent secondary structures of proteins A and B in the following form: S A = s1A s2A ...snA and S B = s1B s2B ...smB , where: si ∈ S is a single secondary structure
element (SSE), which corresponds to the i-th amino acid pi, S = {H , E , C , ?} is a set of 3 types of the secondary structures: H denotes α-helix, E denotes β-strand, C stands for loop, turn or coil, the ? symbol corresponds to any of the mentioned SSEs. The alignment is carried out using the Smith-Waterman method [Smith and Waterman 1981]. The method was originally intended to align two input nucleotide sequences of DNA/RNA and amino acid sequences of proteins. However, we modified the Smith-Waterman method to align two sequences of SSEs – one of the sequences is the query sequence, and the second one is a candidate sequence from a database. Moreover, the modified version of the Smith-Waterman method returns more than one optimal solution, by reason of the approximate character of the specified query pattern. In PSS-SQL queries, the pattern is represented by a sequence of segments, where each segment can be defined precisely or by an interval (details concerning the definition of query patterns are described in chapter 4.2). For example, in the pattern h(4),e(2;5), c(2;4) we can distinguish an α-helix containing exactly 4 elements, followed by β-strand of the length 2 to 5 elements, and loop of the length between 2 and 4 elements. During the alignment phase the pattern is expanded to the full possible length, e.g. for the given pattern, it takes the following form HHHHEEEEECCCC. In this form it may take part in comparison to a candidate SSEs sequences from the database. In the alignment process we build the similarity matrix D according to the following rules – for 0 ≤ i ≤ n and 0 ≤ j ≤ m :
Di , 0 = D0, j = 0 ,
(1)
Di(,1j) = Di −1, j −1 + δ ( siA , s Bj ) ,
(2)
= max{Di −k , j − ωk } ,
(3)
( 2) i, j
D
1≤ k ≤n
Di(,3j) = max{Di , j −l − ωl } ,
(4)
Di(,4j) = 0 ,
(5)
Di , j = max{Di(,vj) } ,
(6)
1≤l ≤ m
v =1..4
Server-Side Query Language for Protein Structure Similarity Searching
where: δ ( siA , s Bj ) is an award
δ + , if two SSEs from proteins A and B match to
each other, or a penalty for a mismatch
1 if δ (s , s ) = − 1 if A i
ωk
B j
401
δ − , if they do not match:
siA = s Bj , siA ≠ s Bj
(7)
is a penalty for a gap of the length k:
ω k = ωO + k × ω E ,
(8)
where: ωO = 3 is a penalty for opening a gap, ω E = 0.5 is a penalty for a gap extension. In Fig. 6 we show the scoring matrix for particular pairs of SSEs. This scoring system, with such values of gap penalties, promotes longer alignments, without gaps. Although occurrence of gaps is still possible in the run of algorithm, we assume users can determine places of possible gaps by specifying optional segments in a query pattern.
Fig. 6 Scoring system for compared pairs of secondary structure elements
Filled similarity matrix D consists of many possible paths how two sequences of SSEs can be aligned. In the set of possible paths the modified Smith-Waterman method finds and joins these paths that give the best alignment. Backtracking from the highest scoring matrix cell and going along until a cell with score zero is encountered gives the highest scoring alignment path (Fig. 7). However, in the modified version of the alignment method that we have developed, we find many possible alignments by searching consecutive maxima in the similarity matrix D. This is necessary, since the pattern is usually not defined precisely, contains ranges of SSEs or undefined elements. Therefore, there can be many regions in a protein structure that fit the pattern. In the process of finding alternative alignment paths, the modified Smith-Waterman method follows the value of the internal parameter MPE (Minimum Path End), which defines the stop criterion. We find alignment paths until the next maximum in the similarity matrix D is lower than the value of the MPE parameter. The value of the MPE depends on the specified pattern, according to the following formula.
MPE = ( MPL × δ + ) + ( NoIS × δ − ) ,
(9)
where: MPL is a minimum pattern length, NoIS is a number of imprecise segments, i.e. segments, for which minimum length is different than maximum
402
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
length. E.g. for the structural pattern h(10;20),e(1;10),c(5),e(5;20) containing αhelix of the length 10 to 20 elements, β-strand of the length 1 to 10 elements, loop of the length 5 elements, and β-strand of the length 5 to 20 elements, the MPL=21 (10 elements of the type h, 1 element of the type e, 5 elements of the type c, and 5 elements of the type e), the NoIS=3 (first, second, and fourth segment), and therefore, MPE=18.
Fig. 7 Similarity matrix D showing one of possible alignment paths
The Score similarity measure is calculated for each of possible alignment paths and it totals all similarity awards
ωk
δ + , mismatch penalties δ −
and gap penalties
according to the following formula:
Score = δ + + δ − − ωk .
(10)
Server-Side Query Language for Protein Structure Similarity Searching
403
4 Protein Secondary Structure – Structured Query Language Protein Secondary Structure – Structured Query Language (PSS-SQL) extends the standard syntax of the SQL language providing additional functions that allow to search protein similarities on secondary structures. We disclose two important functions to this purpose: containSequence and sequencePosition, which will be presented in this chapter. However, PSS-SQL covers also a series of supplementary procedures and functions, which are used implicitly, e.g. for extracting segments of particular types of SSEs, building additional segment tables, indexing SSEs sequences, processing these sequences, aligning the target structures from a database to the pattern, validating patterns, and many other operations. The PSSSQL extension was developed in the C# programming language. All procedures were gathered in the form of the ProteinLibrary DLL file and registered for the Microsoft SQL Server 2005/2008 (Fig. 8). DBMS Microsoft SQL Server PSS-SQL Extension
Protein database
Users applications ProteinLibrary DLL
Fig. 8 General architecture of the system with the PSS-SQL extension
4.1 Indexing Sequences of SSEs PSS-SQL benefits from additional indexing structures that should be set on columns storing sequences of SSEs. These indexing structures are not required, but strongly recommended as they accelerate the searching. Calling appropriate procedure usp_indexSSE causes the creation of additional segment table, which is stored in the structure of B-Tree clustered index. EXEC dbo.usp_indexSS @columnName = 'secondary', @indexName = 'TIDX_Secondary';
The @ColumnName parameter indicates which column contains the indexed sequence of SSEs. Execution of a procedure creates a segment table, which name is specified in the @indexName parameter. The segment table (Fig. 9) contains extracted information regarding consecutive segments of particular types of SSEs (type), their lengths (length) and positions (startPos). The information accelerates the process of similarity searching through the preliminary filtering of protein structures that are not similar to the query pattern. In the filtering, we extract the most characteristic features of the query pattern and, on the basis of the information in the index, we eliminate proteins that do not meet the similarity criteria. In the next phase, proteins that pass the preselection are aligned to the query pattern.
404
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek id ----67 68 69 70 71 72
protID -----3 3 3 3 3 3
type ---C H C H C E
startPos -------0 3 26 34 46 49
length -----3 23 8 12 3 3
Fig. 9 Part of the segment table
Metadata describing the relationship between particular segment table and the column describing secondary structures are stored in the MappingTbl table, which structure is presented in Fig. 10. id columnName indexName maxLength ----- ------------ --------------- --------3 secondary TIDX_Secondary 290
Fig. 10 Part of the segment table
4.2 Representation of Structural Pattern in PSS-SQL Queries While searching protein similarities on secondary structures, we need to pass the query structure (query pattern) as a parameter of the search process. Similarly to the storage format, in PSS-SQL queries the pattern is represented as a sequence of SSEs. However, the form of the sequence is slightly different. During the development of the PSS-SQL functionality we assumed the new extensions should allow users to formulate a large number of various query types with different degrees of complexity. Moreover, the form of these extensions should be as simple as possible and should not cause any syntax difficulties. Therefore, we have defined the corresponding grammar in order to help constructing the query pattern. In PSS-SQL queries, the sequence of SSEs is represented by blocks of segments. Each segment is determined by its type and length. The segment length can be represented precisely or as an interval. It is possible to define segments, for which the type is not important or undefined (wildcard symbol ‘?’), and for which the end value of the interval is not defined (wildcard symbol ‘*’). The grammar for defining patterns written in the Chomsky notation has the following form. The grammar is formally defined as the ordered quad-tuple : Gpss = , where the symbols respectively mean: Npss – a finite set of nonterminal symbols, Σpss – a finite set of terminal symbols, Ppss – a finite set P of production rules, Spss – a distinguished symbol S ∈ Npss that is the start symbol.
Server-Side Query Language for Protein Structure Similarity Searching
405
Σpss = {c, h, e, ?, *, N+} Npss = { <sequence>, , <segment>, , , <end>, , <whole_number_greater_than_zero_and_zero>, } Ppss = { <sequence> ::= ::= <segment> | <segment>, <segment> ::= (; <end>) | () ::= <whole_number_greater_than_zero_or_zero> <end> ::= <whole_number_greater_than_zero_or_zero> | ::= <whole_number_greater_than_zero_or_zero> ::= c | h | e | ? <whole_number_greater_than_zero_or_zero> ::= N+ | 0 ::= * } Spss = <sequence>
Assumption: The following terms are compliant with the defined grammar Gpss: • h(1;10) – representing α-helix of the length 1 to 10 elements • e(2;5),h(10;*),c(1;20) – representing β-strand of the length 2 to 5 elements, followed by α-helix of the length at least 10 elements, and loop of the length 1 to 20 elements • e(10;15),?(5;20),h(35) – representing β-strand of the length 10 to 15 elements, followed by any element of the length 5 to 20, and α-helix of the exact length 35 elements With such a representation of the query pattern, we can start the search process using the containSequence and sequencePosition functions described in next sections. 4.3 Cheking for Presence of Query Pattern in Protein Structures The containSequence function allows to check if a particular protein or set of proteins from a database contain the structural pattern specified as a sequence of SSEs. This function returns Boolean value 1, if the protein from a database contains specified pattern, or 0, if the protein does not include the particular pattern. The header of the containSequence function is as follows: FUNCTION containSequence ( @proteinId int, @columnSSeq text, @pattern varchar(4000) ) RETURNS bit
Input arguments of the containSequence function are described in Table 2. Table 2 Arguments of the containSequence function Argument
Description
@proteinId
unique identifier of protein in the table that contains sequences of SSEs (e.g. the id field in case of the ProteinTbl)
@columnSSeq
database field containing sequences of SSEs of proteins (e.g. secondary)
@pattern
pattern that defines the query SSEs sequence represented by a set of segments, e.g. h(2;10), c(1;5),?(2;*)
406
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
The containSequence function can be used both in SELECT and WHERE phrases of the SQL SELECT statement. In Fig. 11 we schematically show how to construct PSS-SQL query with the containSequence function. id protAC protID ----- ------- -----------3294 P01903 2DRA_HUMAN 3295 Q30631 2DRA_MACMU 3296 P11887 2ENR_CLOTY 1.3297 DB table storing Q01284 2NPD_NEUCR
name --------------HLA class II... HLA class II... 2-enoate red... 2-nitropropa...
length -----254 254 30 378
primary --------------MAISGVPVLGFF... MAESGVPVLGFF... MKNKSLFEVIKI... MHFPGHSSKKEE...
secondary structures
secondary --------------CCCCCEEECCCE... CCCCCEEECCCE... CCCCCCEEEEEC... CCCCCCCCCHHH...
2. Processed field
from DB table SELECT id, protID, protAC, name FROM ProteinTbl WHERE containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AND name like '%Arabidopsis thaliana%' 3. Pattern -- Results: id protID ---- -----------175 A494_ARATH 443 AAH_ARATH 522 AASS_ARATH 553 AAT1_ARATH 560 AAT2_ARATH ...
protAC -------P43295 O49434 Q9SMZ4 P46643 P46645
4. Additional filtering criteria
name ------------------------------------Probable cysteine proteinase A494 OS= Allantoate deiminase, chloroplastic O Alpha-aminoadipic semialdehyde syntha Aspartate aminotransferase, mitochond Aspartate aminotransferase, cytoplasm
5. Results
Fig. 11 Construction of PSS-SQL queries with containSequence function
Using the function in the SELECT statement allows to display information, whether the protein or set of proteins contain a specified pattern. Below, we present an example of using the containSequence function in order to verify, whether the structure of the Q9FHY1 protein has the structural region containing β-strand of the length 7 to 20 elements, surrounded by two loops, one of the length 10 to 20 elements, and second of the length of 1 to 20 elements – pattern c(10;20),e(7;20), c(1;20). SELECT id, protID, protAC, name, containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AS containSeq FROM ProteinTbl WHERE protAC='Q9FHY1'
Results of the verification are shown in Fig. 12. id protID protAC name containSeq ---- ------------ -------- ------------------------------------- ---------964 ABIL4_ARATH Q9FHY1 Protein ABIL4 OS=Arabidopsis thaliana 0
Fig. 12 Result of the verification for the protein Q9FHY1
The following query shows an example of using the containSequence function in order to display, whether proteins from the Arabidopsis thaliana species contain the given pattern (containSeq=1) or not (containSeq=0). Structural pattern is the same as in previous example.
Server-Side Query Language for Protein Structure Similarity Searching
407
SELECT id, protID, protAC, name, containSequence(id,'secondary','c(10;20),e(7;20),c(1;20)') AS containSeq FROM ProteinTbl WHERE name like '%Arabidopsis thaliana%'
Results of the search process are shown in Fig. 13. id ---175 244 443 522 553 560 ...
protID -----------A494_ARATH A9_ARATH AAH_ARATH AASS_ARATH AAT1_ARATH AAT2_ARATH
protAC -------P43295 Q00762 O49434 Q9SMZ4 P46643 P46645
name ------------------------------------Probable cysteine proteinase A494 OS= Tapetum-specific protein A9 OS=Arabid Allantoate deiminase, chloroplastic O Alpha-aminoadipic semialdehyde syntha Aspartate aminotransferase, mitochond Aspartate aminotransferase, cytoplasm
containSeq ---------1 0 1 1 1 1
Fig. 13 Partial result of the search process for proteins from the Arabidopsis thaliana species
Using the containSequence function in the WHERE clause allows to find proteins that contain the specified pattern. Below is an example of using the function for searching proteins from the Escherichia coli that contain the pattern h(5;15),c(3),?(6),c(1;*). SELECT id, protID, protAC, name, primary, secondary FROM ProteinTbl WHERE containSequence(id,'secondary','h(5;15),c(3),?(6),c(1;*)')=1 and name like '%Escherichia coli%'
Results of the searching process are shown in Fig. 14. id ---1294 1295 1296 1297 1298 1299 1300 1301
protID -----------ACCA_ECO24 ACCA_ECO57 ACCA_ECOHS ACCA_ECOK1 ACCA_ECOL5 ACCA_ECOL6 ACCA_ECOLI ACCA_ECOUT
protAC -------A7ZHS5 P0ABD6 A7ZWD1 A1A7M9 Q0TLE8 Q8FL03 P0ABD5 Q1RG04
name ----------------------Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca... Acetyl-coenzyme A ca...
primary ----------------------MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID... MSLNFLDFEQPIAELEAKID...
secondary --------------------------CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHCCH... CCCCCCCCHHHHHHHHHHHHHHHH... CCCCCCCCHHHHHHHHHHHHHHHH...
Fig. 14 Partial result of searching process for proteins from the Escherichia coli having given structural pattern h(5;15),c(3),?(6),c(1;*)
4.4 Locating Patterns in Protein Structures The sequencePosition function allows to locate the specified pattern in the structure of a protein or group of proteins in a database. Pattern searching is performed with the use of segment table and through alignment of protein secondary structures. For this purpose, we have adapted the Smith-Waterman alignment method.
408
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
The header of the sequencePosition function is as follows: FUNCTION sequencePosition ( @columnSSeq text, @pattern varchar(4000), @predicate varchar(4000) ) RETURNS @resultTable table ( proteinId int, startPos int, endPos int, length int, gapsCount int, sequence text )
Input arguments of the sequencePosition function are described in Table 3. Table 3 Arguments of the sequencePosition function Argument
Description
@columnSSeq
database field that contains sequences of SSEs, e.g. secondary
@pattern
pattern that defines the query SSEs sequence represented by a set of segments, e.g.: h(2;10), c(1;5),?(2;*)
@predicate
an optional, simple or complex criteria that allow to limit the list of proteins that will be processed during the search, e.g.: name LIKE '%phosphogluconolactonase%'
The sequnecePosition function returns a table containing information about the location of query pattern in the structure of each database protein. Fields of the output table is described in Table 4. Table 4 Output table of the sequencePosition function Field
Description
proteinId
unique identifier of protein that contains specified pattern; using the identifier we can join resultant table with data from other tables
startPos
position, where the pattern starts in the target protein from a database
endPos
position, where the pattern ends in the target protein from a database
length
length of the segment that matches to the given pattern
sequence
sequence of SSEs, which matches to the pattern defined in the query
The sequencePosition function is used in the FROM clause of the SELECT statement. The resultant table is treated as one of source tables used in query execution. In Fig. 15 we schematically show how to construct PSS-SQL query with the sequencePosition function.
Server-Side Query Language for Protein Structure Similarity Searching SELECT p.id, p.name, s.startPos, s.endPos, s.sequence as [matched sequence], p.secondary FROM ProteinTbl p JOIN sequencePosition('secondary', 'h(5;20),c(0;*),e(1;*),c(0;*),e(1;*)','') AS s ON p.id = s.proteinId WHERE p.name LIKE '%PE=4%' AND p.length > 100
3. Additional filtering criteria id ---3298 3298 3298 3918 3918 3918 ...
name -----------------2-nitropropane ... 2-nitropropane ... 2-nitropropane ... Acetoin utiliza... Acetoin utiliza... Acetoin utiliza...
startPos -------330 244 110 115 60 149
409
1. Processed field from DB table
2. Pattern
5. Matched sequence endPos -----350 309 148 146 86 187
matched sequence -------------------------------------------------hhhhhccccccccceeeeee hhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc... hhhhhhhhhhccccccccccccceeecccccceeeeee hhhhhhhhhhhhhhcccccccccceeeeeee hhhhhhhhhhhcccccccccceeeee hhhhhhhhhhhheeeeeeeeeeecccccceeeeeeeee
secondary ---------------------CCCHHHHHHHEEEEECCCC... CCCHHHHHHHEEEEECCCC... CCCHHHHHHHEEEEECCCC... CCCHHHHHHEEEEEECCCH... CCCHHHHHHEEEEEECCCH... CCCHHHHHHEEEEEECCCH...
4. Results
Fig. 15 Construction of PSS-SQL queries with sequencePosition function
Below, we show an example of using the function to locate pattern that contains a β-strand of the length from 1 to 10 elements, optional loop up to 5 elements, α-helix of the length at least 5 elements, optional loop up to 5 elements and β-strand of any length – pattern e(1;10),c(0;5),h(5;*), c(0;5),e(1;*). The pattern is searched only in proteins with the length exceeding 150 amino acids, which secondary structure was predicted (predicate PE=4). SELECT p.protAC AS AC, p.name, s.startPos AS start, s.endPos AS end, sequence AS [matched sequence], p.secondary sequencePosition('secondary', FROM ProteinTbl AS p JOIN 'e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)','') AS s ON p.id = s.proteinId WHERE p.name LIKE '%PE=4%' AND p.length > 150
The query produces results as shown in Fig. 16. It should be noted that there may be many ways how the pattern can be aligned to the protein structure from a database. The modified Smith-Waterman method returns a number of possible alignments based on a value of the internal MPE parameter. As a result, in the table shown in Fig. 16 the same protein may appear several times with different alignment parameters. AC -------P75747 P75747 P75747 P75747 P75747 P75747 Q54GC8 P32104 P32104 P32104 ...
name -----------------Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Protein abrB OS... Acyl-CoA-bindin... Transcriptional... Transcriptional... Transcriptional...
start ----72 222 136 172 4 22 172 185 120 98
end ---107 245 158 202 32 43 197 212 144 123
matched sequence -----------------------------------eeeeeeeeehhhhhhhhhhhhhhhhhheeeeeeee eeeeehhhhhhhhhhhhhhheee eeeeehhhhhhhhhhhcceeee eeeeccccchhhhhhhhhhhhhccceeeee eeeeehhhhhhhhhhhheeeeeeeeeee eeeeeeeeeecchhhhheeee eeeeeccchhhhhhhhhcccceeee eeeecccchhhhhhhhhhhheeeeeee eeeccccchhhhhhhccccceeee eeeecccchhhhhhhhhhhhhheee
secondary ------------------------CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCEEEEEHHHHHHHHHHHHEE... CCCHHHHHHHHHHHHHHHHCCC... CCCCCHHHHHHHHHHHHHHHHH... CCCCCHHHHHHHHHHHHHHHHH... CCCCCHHHHHHHHHHHHHHHHH...
Fig. 16 Partial result of the search process for the given structural pattern e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)
410
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
Predicates that filter the set of rows can be defined in the WHERE clause of the SELECT statement or can be passed as the @predicate argument of the sequencePosition function. However, regarding the query performance, it is better to pass them directly as the @predicate argument, when we call the function. This small extension forces the query processor to filter the set of proteins before creating the resultant table and before executing the Smith-Waterman method. Therefore, we do not waste time for time-consuming alignments that are not necessary in some cases. Sample query with filtering criteria specified in the function call, is shown below. SELECT p.protAC AS AC, p.name, s.startPos AS start, s.endPos AS end, sequence AS [matched sequence], p.secondary sequencePosition('secondary', FROM ProteinTbl AS p JOIN 'e(1;10),c(0;5),h(5;*),c(0;5),e(1;*)', ' p.name LIKE ''%PE=4%'' AND p.length > 150') AS s ON p.id = s.proteinId
5 Effectiveness and Efficiency of PSS-SQL Queries We have performed a set of tests in order to verify the effectiveness and efficiency of PSS-SQL queries containing different patterns. The effectiveness of PSS-SQL queries was successfully confirmed by testing both functions – containSequence and sequencePosition – in a set of 6 230 protein structures stored in the ProteinTbl table. Tests were performed for more than one hundred different SSE patterns, having different complexity, containing various numbers of segments, described precisely and rough, including SSEs of different types – defined explicitly or using wildcards. We also made a set of tests in order to examine the efficiency of PSS-SQL queries. These tests were performed on the PC computer with the processor Intel® 3.2 GHz Core Duo and 2GB of memory, working under the Microsoft Windows XP operating system. Similarly to effectiveness tests, efficiency was tested against the Proteins database containing data describing 6 230 primary and secondary structures of proteins, as well as some additional information. Primary structures and description of proteins were downloaded from the popular SwissProt database [Apweiler et al. 2004]. Secondary structures were predicted on the basis of primary structures with the use of the Predator program [Frishman and Argos 1996]. The execution time of PSS-SQL queries calling the sequencePosition function, which localizes patterns in protein structures, takes from single seconds up to several minutes. It depends on the pattern specified in the query. In Fig. 17 we show execution times for queries containing sample patterns:
Server-Side Query Language for Protein Structure Similarity Searching
• • • • •
411
SSE1: h(38),c(3;10),e(25;30),c(3;10),h(1;10),c(1;5),e(5;10) SSE2: e(4;20),c(3;10),e(4;20),c(3;10),e(15),c(3;10),e(1;10) SSE3: h(30;40),c(1;5),?(50;60),c(5;10),h(29),c(1;5),h(20;25) SSE4: h(10;20),c(1;10),h(243),c(1;10),h(5;10),c(1;10),h(10;15) SSE5: e(1;10),c(1;5),e(27),h(1;10),e(1;10),c(1;10),e(5;20)
The SSE1 pattern represents protein structures with the alternating α-helices and β-strands joined by loops. The SSE2 pattern represents protein structure built only with β-strands connected by loops. The SSE3 pattern consists of undefined segment of SSEs (? - wildcard). Patterns SSE4 and SSE5 have one unique region – h(243) and e(27), respectively. We have observed, the execution time tightly depends on the uniqueness of the pattern. The more unique the pattern, the more proteins are filtered out based on the segment table, the fewer proteins are aligned by the Smith-Waterman method and the less time we need to obtain results. We can see it clearly in Fig. 17 for patterns SSE4 and SSE5, having precisely defined, unique regions h(243) and e(27). For universal patterns, for which we can find many fitting proteins or multiple alignments, we can observe longer execution times of PSS-SQL queries. In such cases, the length of the pattern influences the alignment time – for longer patterns we experience longer response times. We have not observed any dependency between the type of the SSE and the response time. However, specifying wildcards in the pattern increases the waiting period (sometimes up to several minutes). 120
100
SSE1
SSE1 SSE2 SSE3 SSE4 SSE5
SSE3 SSE2
time [s]
80
60
40
20
SSE5 SSE4
0 pattern
Fig. 17 Execution times of PSS-SQL queries containing different patterns SSE1-SSE5
This is typical for standard SQL queries in database systems, where execution times are highly dependent on the selectivity of the queries and the number of data in a database. In Fig. 18 we present histograms of segment lengths for particular secondary structure elements (H, E, C). These histograms show the lengths of segments which occur most and least frequently in protein structures in our Protein database.
412
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
Number of regions
7000
Histogram for a-Helix SSE
6000 5000 4000 3000 2000 1000
Length of segment 210
152
113
89
100
83
73
68
63
58
54
50
46
41
37
33
29
25
21
17
9
a)
13
5
1
0
25000 Number of regions
Histogram for b-Strand SSE 20000 15000 10000 5000 Length of segment 0 1
b)
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 22 23 25 26 27 28 30
25000 Number of regions
Histogram for Loop SSE 20000 15000 10000 5000 Length of segment 193
175
157
131
122
114
108
96
90
85
78
72
67
61
55
49
43
36
31
26
21
16
11
101
c)
6
1
0
Fig. 18 Histograms of segment lengths for particular secondary structure elements (H, E, C)
Additional filtering criteria, which are commonly used in SQL queries, also decrease the execution time. In case of the containSequence function, additional filtering criteria can be specified only in the WHERE clause. In case of the sequencePosition function, they can be placed in the WHERE clause or passed as the @predicate parameter of the function. However, passing the criteria as parameters is better for the performance of PSS-SQL queries. The reason of this is the fact that filtering criteria in the WHERE clause are set on the resultant table of the sequencePosition function after it is constructed and populated. On the other hand, criteria passed as the @predicate parameter are set before the construction of the resultant table. In Fig. 19 we present execution times for PSS-SQL queries using the sequencePosition function searching the structural pattern SSE1: h(38), c(3;10),e(25;30),c(3;10),h(1;10),c(1;5),e(5;10), with additional filtering predicates defined as the @predicate parameter of the function (BUILT-IN) and in the WHERE clause: • • • •
predicate P1: p.name like ''%Homo sapiens%'' predicate P2: p.name like ''%Homo sapiens%PE=1%'' predicate P3: p.name like ''%Homo sapiens%PE=1%SV=4%'' predicate P4: p.primary like ''%NHSAAYRVDQGVLN%''
Server-Side Query Language for Protein Structure Similarity Searching
413
Additional predicate P1 causes the pattern to be compared only to proteins that act in Homo sapiens organisms. In the predicate P2 we added the condition that the candidate proteins must have the Protein existence attribute set to Evidence at protein level (PE=1). In predicate P3 we provided additional filter for the sequence version SV=4. Finally, predicate P4 sets a simple filter on the primary structure of proteins (amino acid sequence). Analyzing the execution times of queries with additional predicates in Fig. 19 (BUILT-IN) and comparing them to the execution time of the query containing SSE1 pattern in Fig. 17, we can notice that appropriately formulated filtering criteria significantly increase the performance of the search process and reduce the search time from several minutes even to several seconds (P3 and P4). It is also worth noting that for the analyzed pattern SSE1 we benefit from specifying additional filtering criteria as a parameter of the sequencePosition function. The WHERE clause is not so efficient in this case. 120 P1
100
P2
P3
P4
time [s]
80 P1 60
P2
40
20 P3
P4
0 BUILT-IN
predicate
WHERE
Fig. 19 Execution times of PSS-SQL queries containing only the SSE1 pattern and various filtering predicates P1-P4 passed as a parameter (BUILT-IN) or in the WHERE clause
6 Concluding Remarks PSS-SQL language provides ready to use and easy search mechanisms that allow searching protein similarities on the secondary structure level. The syntax of the PSS-SQL is transparent to users and flexible in possibilities of defining query patters. The pattern defined in a query does not have to be specified strictly: • Segments in the pattern can be specified as intervals and they can have undefined lengths (users can use the wildcard ‘*’ symbol). • The PSS-SQL allows to specify patterns with undefined types of the SSE (using the SSE type wildcard ‘?’ symbol) or patterns, where some SSE segments may occur optionally. Therefore, the search process has an approximate character, regarding various possible options for segments matching. • The possibility to define patterns that include optional segments, allows users to specify gaps in a particular place. For programmers and scientists involved in the data processing it is not surprising that integrating methods of protein similarity searching with a database
414
B. Małysiak-Mrozek, S. Kozielski, and D. Mrozek
management system makes it easy to manipulate biological data without the need for external data mining applications. The SQL extension presented in this paper is an example of such integration. There are many advantages of the proposed extension: • Entire logic of data processing is removed from the user application and moved towards the database server. The advanced analysis of biological data is then performed while retrieving data from a database with the use of PSS-SQL queries. Therefore, the number of data returned to the user and network traffic between the server and the user application are much reduced. • Users familiar with the SQL syntax will easily manage to formulate PSS-SQL queries. We have designed a simple and understandable SQL extension, and in consequence, a very clear language for protein structures. This gives an advantage of the PSS-SQL language over other known solutions. However, there are many implicit operations that hide behind this simplicity and transparency, such as the alignment using the modified Smith-Waterman method, which belongs to the class of dynamic programming algorithms. • As a result of PSS-SQL queries, users obtain pre-processed data. These data can then be used in further processing, e.g. users can treat results as strictly selected proteins, which meet specified criteria regarding the construction, and will be analyzed in more details. In our research, we use the presented extension in the similarity searching of protein tertiary structures. In the process, PSS-SQL queries allow us to roughly preselect proteins on the basis of their secondary structures.Future works will cover further development of the PSS-SQL language. Especially, we plan to focus on improving the efficiency of PSS-SQL queries through the use of intelligent heuristics.
Acknowledgment Scientific research supported by the Ministry of Science and Higher Education, Poland in years 2008-2010, Grant No. N N516 265835: Protein Structure Similarity Searching in Distributed Multi Agent System.
References [Allen 2008] Allen, J.P.: Biophysical chemistry. Wiley-Blackwell, London (2008) [Apweiler et al. 2004] Apweiler, R., Bairoch, A., Wu, C.H., et al.: Uniprot: the Universal Protein knowledgebase. Nucleic Acids Res. (Database issue) D115–119 (2004) [Berman et al. 2000] Berman, H.M., Westbrook, J., Feng, Z., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000) [Can and Wang 2003] Can, T., Wang, Y.F.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Proc. 2003 IEEE Bioinf. Conf., Stanford, CA, pp. 169–179 (2003)
Server-Side Query Language for Protein Structure Similarity Searching
415
[Frishman and Argos 1996] Frishman, D., Argos, P.: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9(2), 133–142 (1996) [Murzin et al. 1995] Murzin, A.G., Brenner, S.E., Hubbard, T., et al.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995) [Orengo et al. 1997] Orengo, C.A., Michie, A.D., Jones, S., et al.: CATH – A hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997) [Shapiro et al. 2004] Shapiro, J., Brutlag, D.: FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res. 32, 536–541 (2004) [Smith and Waterman 1981] Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981) [Yang 2008] Yang, J.: Comprehensive description of protein structures using protein folding shape code. Proteins 71(3), 1497–1518 (2008)
A New Kinds of Rules for Approximate Reasoning Modeling M. Pałasiński, B. Fryc, and Z. Machnicka University of Information Technology and Management, Rzeszów, Poland {mpalasinski,bfryc,zmachnicka}@wsiz.rzeszow.pl
Abstract. In this paper we prove some properties of new kind of rules – called generalized rules.
1 Introduction The notion of information and the decision systems are basic notions of the theory of rough sets [Pawlak 1991]. This theory is an effective methodology to extract rules from information and the decision tables. A lot of different algorithms for rule generations give different sets of rules. The problem is that we get to many rules. In this paper we introduce the new kind of rules – generalized rules and some of their properties. We also show how to use this new kind of rules for modeling of approximate reasoning.
2 Basic Definitions 2.1 Information and Decision Systems An information system is a pair S = (U , A) , where U is a nonempty finite set of objects called universe, A is a nonempty finite set of attributes such that a : U → Va for a ∈ A , where Va is called a value set of a . The set V = ∪ Va is a∈A
said to be the domain of A . A decision system is any information system of the form S = (U , A ∪ {d }) , where d ∉ A is a distinguished attribute called a decision. Elements of A are called conditional attributes (conditions). Any decision system S = (U , A ∪ {d }) can be represented by a data table with the number of rows equal to the cardinality of the universe U and the number of columns equal to the cardinality of the set A ∪ {d } . The value a (u ) appears on the position corresponding to the row u and column a .
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 417–428. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
418
M. Pałasiński, B. Fryc, and Z. Machnicka
Let S = (U , A' ) be an information system, where A' = A ∪ D , and V ' are the domain of A' . Pairs (a, v) , where a ∈ A' , v ∈V ' are called descriptors over A' and V ' (or over S , in short). Instead of (a, v) we also write a = v . The set of terms over A' and V ' is the least set of descriptors (over A' and V ' ) and closed with respect to the classical propositional connectives such that: NOT (negation), OR (disjunction), AND (conjunction), i.e., if τ , τ ' are terms over A' and V ' , then (NOT τ ), ( τ AND τ ' ), ( τ OR τ ' ) are terms over A' and V ' . The meaning τ S (or τ , in short) of a term τ in S is defined inductively as follows: if τ
is of the form a = v then
τ = {u ∈ U : a(u ) = v},
τ OR τ ' = τ ∪ τ ' ,
τ AND τ ' = τ ∩ τ ' , NOT τ = U − τ . 2.2 Example 1
Let us consider an example of a decision system S = (U , A ∪ D) representing some data about 6 patients as shown in Table 1. Table 1 Decision system S U/A∪{Flu}
Headache
Muscle-pain
Temperature
Flu
u1
Yes
Yes
Normal
No
u2
Yes
Yes
High
Yes
u3
Yes
Yes
Very-high
Yes
u4
No
Yes
Normal
No
u5
No
No
High
No
u6
No
Yes
Very-high
Yes
In the decision system S , the universe U = {u1 , u 2 , …, u 6 } where each object represents one patient. Each object is described by three conditional attributes: Headache, Muscle-pain and Temperature. The decision attribute is denoted by Flu. The sets of possible values of attributes from S are equal as follows: VHeadache = VMuscle-pain = VFlu = {Yes, No}, and VTemperature = {Normal, High, Very-high}. For simplicity of the presentation, let us consider an encoded decision system S1 from Table 2. In this system, attributes H, M, T, F correspond to attributes Headache, Muscle−pain, Temperature, Flu from the system S , respectively. The values 1, 2, 3 of attributes H, M, T, F correspond to values Yes, No, Normal, High, Very-high of attributes Headache, Muscle-pain, Temperature, Flu, respectively.
A New Kinds of Rules for Approximate Reasoning Modeling
419
Table 2 Coded decision system S U/A∪ {Flu}
H
M
T
F
u1
1
1
1
2
u2
1
1
2
1
u3
1
1
3
1
u4
2
1
1
2
u5
2
2
2
2
u6
2
1
3
1
3 Two Valued Information and Decision Systems Here we present a procedure of converting any information system S = (U , A) into uniquely determined information system S " = (U , A" ) satisfying the condition for each attribute a"∈ A". In the system S " set of Va is a two elements set. Starting from the information system S = (U , A) we define the set of attributes the following ways: A" = ∪ ({a} × Va ). Now, if v ∈ Va , we write a∈ A
A" in
av instead
(a, v). We put (av )(ui ) = 1 if and only if a (ui ) = v. Below we show a procedure for generating two valued information system S " , on the basis of system S . Procedure : Input: An information system S = (U , A), where set V = ∪ Va . a∈A
Output: Two value information system S " = (U , A"). Begin Create an empty set of the attributes A". Create an empty set of V ". Copy the set of object U to the new set U ". For each vk ∈ V create a new attribute a" = vk . Add an attribute a" to A". For each vk ∈ V For each a"∈ A". For each v(a )
if u (a ) = v then for u (a" ) = 1 else u (a" ) = 0 End A two value decision system is any two value information system of the form S " = (U , A"∪ D" ), where D"∉ A" is a set of distinguished attributes called decisions.
420
M. Pałasiński, B. Fryc, and Z. Machnicka
3.1 Example 2
Let us consider a decision system S = (U , A ∪ D ), presented in Table 2 in which: • • • •
set of objects U = {u1 , u 2 , u3 , u 4 , u5 , u5 }, set of conditional attributes A = {H, M, T}, set of decision attributes D = {F}, sets of values of attributes: VH = VM={1, 2}, VT= { 1, 2, 3}, VF= { 1, 2}.
The system from Table 2 gives us the two valued system S " = (U , A"∪ D" ), presented in Table 3 in which: • set of all attributes A"∪ D" = {H 1 , H 2 , M 1 , M 2 , T1 , T2 , T3 , F1 , F2 }, • sets of values of all attributes are equal to {0, 1}. Table 3 Two valued decision system U/A’
H1
H2
M1
M2
T1
T2
T3
F1
F2
u1
1
0
1
0
1
0
0
0
1
u2
1
0
1
0
0
1
0
1
0
u3
1
0
1
0
0
0
1
1
0
u4
0
1
1
0
1
0
0
0
1
u5
0
1
0
1
0
1
0
0
1
u6
0
1
1
0
0
0
1
1
0
4 Rules in Decision Systems 4.1 Rules in Standard Decision Systems
Rules express some of the relations between values of attributes in decision system. This subsection contains the definition of different kinds of rules considered in the paper as well as other related concepts. Let S = (U , A' ) be a decision system, where A' = A ∪ D. Let V ' be the domain of A'. Any expression r of the form IF φ THEN ψ, where φ and ψ are terms over A' and V ' , is called a rule in S . φ is referred to as predecessor of r and denoted by Pred (r). ψ is referred to as the successor of r and denoted by Succ (r). We extract two types of rules from a decision table using the rough set methods. First type of rules called decision rules, represents the relations between the values of conditional attributes and the decision. The second type of rules called conditional rules represent relations between the values of conditional attributes. We additionally assume that each of the two types of rules mentioned above can be deterministic or non-deterministic. Some numerical factor called certainty
A New Kinds of Rules for Approximate Reasoning Modeling
421
factor associated with a given rule determine a kind of rule: deterministic or nondeterministic. Let S = (U , A' ) be a decision system, where A' = A ∪ D. and IF φ THEN ψ, be a rule in S . The number: card ( φ ∩ ψ ) card ( φ )
(1)
is called the certainty factor (CF) of the given rule. It is easy to see that CF ∈ [0,1]. If CF = 1 then we will say that a given rule is deterministic. Otherwise (i.e. CF < 1 ), we will say that a given rule is non-deterministic. In order to generate the set of deterministic rules we can use the standard rough set methods proposed among others by [Skowron 1993]. 4.2 Example 3
Let us consider an encoded decision system S from Example 1. We can compute the following rules for this system: Deterministic decision rules: R1: IF T = 1THEN F = 2, CF=1.0; R2: IF T = 2 AND H = 1 THEN F = 1, CF=1.0; R3: IF T = 2 AND M = 1 THEN F = 1, CF=1.0; R4:. IF T = 3 THEN F = 1, CF=1.0; R5: IF M = 2 THEN F = 2, CF=1.0; R6: IF T = 2 AND H = 2 THEN F = 2, CF=1.0. 4.3 Rules in Two Valued Decision Systems
To describe the relation between rules for S and the associated with two valued system S " we need to describe the translation of the rule corresponding to S into the rule of S ". Let S = (U , A), where A = (a1 , … , a n ) and the set of values of the attribute ai is (0, …, k i −1 ), for i = 1, … , n. In the two valued system S " associated with S we have attributes (a10 , …, a1ki−1 , …, ank n−1 ) each one assuming two values 0 or 1. For any rule r for S we define the corresponding rule r" for S " , replacing each expression of the form ai = j by aij = 1 and in case the rule is an inhibitory one expression ai ≠ j by aij = 0. It is not difficult to see that: Lemma 1. For any rule r , if r is true and realizable in S then r" is true and realizable in S ".
422
M. Pałasiński, B. Fryc, and Z. Machnicka
Translation the other way goes as follows. For any rule r" for S " , we replace an expression aij = 1 by ai = j and each expression aij = 0 is replaced by ai ≠ j. The rule obtained this way we denote by r. Lemma 2. For any rule r" , if then r" is true and realizable in S " , then r is true and realizable in S . Theorem. For all two valued decision systems S1 = (U1 , A) and S 2 = (U 2 , A) and the sets Rul1 and Rul2 of valid and realizable rules for S1 and S 2 respectively, Rul1 ≠ ∅ and Rul 2 ≠ ∅, S1 = S 2 if and only if Rul1 = Rul 2 . Two valued decision system associated with the information system S allows one to consider additional types of rules (for S ) of the form ϕ → ψ , where ϕ is a conjunction of a finite number of formulas aij = bij and ϕ is a formula of the form aij = bij or ¬(aij = bij ). In this paper we consider so called generalized rules defined as follows: Definition 1. By the generalized rule we mean any implication of the form ϕ → ψ , where ϕ is a conjunction of formulas of two types: 1. aij = bij 2. ¬(aij = bij ). And ψ is a formula of the form aij = bij or ¬(aij = bij ).
Remark. If we assume that the set of values is finite then any subformula of ϕ (under notation as above) containing as a factor all formulas of the form ¬(aij = bij ) for fixed i, can be replaced by disjunction of formulas of the form ai = bik , where bik does not appear in any factor ¬(ai = bij ). Using remark we will use in the requel the equivalent definition of the generalized formula. Definition2. By the generalized rule we mean any implication of the form ϕ → ψ , where ϕ is a conjunction of formulas of two types: aij = bij disjunction of formulas of the form ai = bik moreover each attribute a j can appear only one in factor of type 1 or type 2 and
ψ is a formula of the form aij = bij or ¬(aij = bij ). The definition of true generalized rule in u, where u is an object of the system S and the definition of the true generalized rule in the system S are exactly the same as in case of “ordinary” rule, as both are implications.
A New Kinds of Rules for Approximate Reasoning Modeling
423
Definition3. We say that a generalized rule r , r : ϕ → ψ is a minimal generalized rule of the system S if and only if r is true in S and removing any factor of ϕ or any summand of any disjunction appearing in ϕ the resulting rule is not true in S . Now for any generalized rule: r : ϕ1 ∧ … ∧ ϕ k −1 ∧ ((ak = vk1 ) ∨ … ∨ (a k = vk p )) ∧ ϕ k +1 ∧ … ∧ ϕ n → ψ , for p > 1 we define for j = 1, …, p : rk1 : ϕ1 ∧ … ∧ ϕ k −1 ∧ (ak = vk1 ) ∧ ϕ k +1 ∧ … ∧ ϕ n → ψ , rk : ϕ1 ∧ … ∧ ϕ k − 1 ∧ ak = vk ∧ ϕ k + 1 ∧ … ∧ ϕ n → ψ , j j rk : ϕ1 ∧ … ∧ ϕ k − 1 ∧ ak = vk p p
∧ϕ k + 1 ∧ … ∧ ϕn → ψ
Let us note the following: Fact 1. If r is any generalized rule true in S , then for j = 1,… , p, rk j is true in S . Proof by contradiction. Let us suppose that for some t , 1 ≤ t ≤ p, rkt is not true in S . It means that there is an object in S at which rkt is not true, i.e. the predecessor of rkt is true in u and the successor of rkt is not true in u. But then the predecessor of r is true in u and the successor of r is not true in u. It means that r is not true in u , a contradiction. Fact 2. If each of generalized rules rk1 , … , rk p is true in S , then r is true in S . Proof by contradiction. If r is not true in S , then there is u, an object from S in which r is not true i.e. predecessor of r is true in u and the successor of r is not true in u. So all formulas: ϕ1, … , ϕ k − 1, (ak = vk ) ∨ … ∨ (ak = vk ), ϕ k + 1, …ϕ n are true in u and 1 p ψ is not true in u. The former implies that at least one of formulas a k = vk1 , … , ak = vk p is true in u. Thus we get that at least one of rules rk1 , … , rk p is not true in u , so it is not true in S . As a conclusion we get: Corollary. A generalized rule r is true in S if and only if for all j = 1, … , p, rk j is true in S . Let us state the obvious.
424
M. Pałasiński, B. Fryc, and Z. Machnicka
Fact 3. If for all j = 1, … , p, j = 1, … , p, is realizable and true in S , then the generalized rule r is realizable and true in S . Fact 4. If r is a generalized minimal rule in S , then for all j = 1, … , p, ri j is realizable and true in S . Proof. Here we prove only realizability, as the second part is obvious. Let us suppose to the contrary, that there is j , 1 ≤ j ≤ p such that ri j is not realizable in S . Take U := {u ∈ S : u |= ϕ1 ∧ … ∧ ϕ k −1 ∧ ϕ k +1 ∧ … ∧ ϕ n }. Each u ∈ U has k-th coordinate element different from vij . So, in S the rule r ' is true, where r ' is a rule obtained from r by removing the rummand
ak = vij from
ϕ k . This means that r is not minimal. The last fact concerns the rules of any two information system S " annunciated with the given system S . In such a system we do not consider generalized rules however as the notion of the generalized rule appeared while dealing with a system S and associated two information system S " , it is worth to present. Let us end with the fact partly describing rules in S ". Fact 5. If a rule: r : (ai1 = q1 ) ∧ … ∧ (ain = qn ) → (ai j = q), where q, q1 ,… , qn ∈ {0, 1} is a minimal and realizable rule in S " , then if in the predecessor of r we have a jk = 1, for some k , 1 ≤ k ≤ p, then there is no formula a jk = 0, for t ≠ k . Proof by contradiction. Let us suppose that we have a rule r , r : … (aik = 1) ∧ … ∧ (ait = 0) ∧ … → ψ , k ≠ t , which is minimal and realizable in S ". Removing a jt = 0 from the prede-
cessor of r gives us a rule r ' which is not true in S " (it follows from minimality of r ). It means that there is an u object of S " in which the predecessor of r ' is true and the successor of r ' is not true. Adding back to the predecessor of r ' , factor a jt = 0 we get the rule true in S " which means that the predecessor of r has logical value 0 at u (i.e. is not true in S ). It means that u ≠ (a jt = 0) i.e. u ≠ (a jt = 1). But then u ≠ (a jk = 1) what is not possible.
A New Kinds of Rules for Approximate Reasoning Modeling
425
4.4 Example 4
Let us consider two value decision system S " from Example 2. The set of rules Rul ' computed for this system as follows: R1: IF M 1 = 1 ∧ T1 = 1 THEN F2 = 1 R2: IF H 1 = 1 ∧ M 1 = 1 ∧ (T2 = 1 ∨ T3 = 1) THEN F2 = 1 R3: IF M 1 = 1 ∧ T3 = 1 THEN F1 = 1 R4: IF H 2 = 1 ∧ (T1 = 1 ∨ T2 = 1) THEN F2 = 1. Each rule from Rul" can be transformed into rule valid and realizable in S . For example: R1: IF ¬( M = 2) ∧ ¬(T = 2) ∧ ¬(T = 3) THEN F = 2, is obtained from rule 1, R2: IF ¬( H = 2) ∧ ¬( M = 2) ∧ ¬(T = 1) THEN F = 2, is obtained from rule 2, R3: IF ¬( M = 2) ∧ ¬(T = 1) ∧ ¬(T = 2) THEN F = 1, is obtained from rule 3, R4: IF ¬( H = 1) ∧ ¬(T = 3) THEN F = 2, is obtained from rule 4.
5 Approximate Petri Nets In this section, approximate Petri nets are recalled. The formal definition of APnets and their dynamic properties are presented in [Fryc et al. 2004]. 5.1 General Description
AP-nets are high-level nets. The structure in AP-nets is a directed graph with two kinds of nodes: places (drawn as ellipses) and transitions (drawn as rectangles), interconnected by arcs - in such a way that each arc connects two different kinds of nodes (i.e., a place and a transition). The places and their tokens represent states, while the transitions represent state changes. The data value which is attached to a given token is referred to as the token colour. The declaration of the net tells us about colour sets and variables. Each place has a colour set attached to it and this means that each token residing on that place must have a colour which is a member of the colour set. Each net inscription is attached to the place, transition or arc. Places have four different kinds of inscriptions: names, sets of colour, initialization expressions and current markings. Transitions have four kinds of inscriptions: names, guards, threshold and certainty value, while arcs only have one kind of inscription: arc expressions. The initialization expression of a place must evaluate to a fuzzy set over the corresponding colour set. The guard of a transition is a Boolean expression which must be fulfilled before the transition can occur. The arc expression (as the guard) may contain variables, constants, functions and operations that are defined in the declarations. When the variables of an arc expression are bounded, then the arc expression must evaluate to a fuzzy set over the colour that belongs to the colour set attached to the place of the arc. A distribution of tokens (on the places) is
426
M. Pałasiński, B. Fryc, and Z. Machnicka
called marking and denotes M. The initial marking M0 is the marking determined by evaluating the initialization expressions. A pair, where the first element is a transition and the second element is a binding of that transition, is called an occurrence element. If an occurrence element is enabled in a given marking then we can talk about the next marking which is reached by the occurrence of the occurrence element in the given marking. The formal definition of AP-nets is given bellow. Approximate Petri net is a touple: APN = (Σ, P, T , A, N in , N out , C , G, Ein , Eout , I , f )
(2)
satisfying the following requirements: • • • • •
Σ is a nonempty, finite set of types which are called colour sets, P is a finite set of places, T is a finite set of transitions, A is a finite set of arcs, N in : A → ( P × T ) is a input node function,
• N out : A → (T × P ) is a output node function, • C : P → Σ is a colour function, • G is a guard function, • Ein is a input arc expression function, • Eout is a output arc expression function, • I is an initialization function, • f : T → [0,1] is a certainty factor function. In the next example we show how to use a new kind of rules for creating an approximate Petri net. 5.2 Example 5
Let us consider the set of rules for the system S” computed in Example 4. For this set of rules we can create an AP-nets as a graphical model of approximate reasoning (see Fig. 1). In the AP-net from Figure 1, places p H , p M , pT represent the conditional attributes H , M , T from S , respectively. However, the place p F represents the decision F . The colour sets (types) are as follows: H = {H 1 , H 2 }, M = {M 1 , M 2 }, T = {T1 , T2 , T3 }, F = {F1 , F2 }. The transitions t1 , … , t 4 represent the rules R1 , … , R4 , respectively. For example, transition
A New Kinds of Rules for Approximate Reasoning Modeling
427
t 4 represents the decision rule: IF ¬( H = 1) ∧ ¬(T = 3) THEN F = 2. The input arc expressions are the following: e41 = {x H }, e42 = {xT }, where x H , x M are variables of types H and T , respectively. The output arc expression has the form e43 = CF ⋅ max( μ ( x H ), μ ( xT )) / y F , where y F is a variable of the type F . The guard
expression
for
t4
is
the
following:
g 3 = [¬( x M = M 1 ) ∧
¬( xT = T3 ) ∧ ( x F = F2 )]. Moreover, CF3 = 1 (because the rule is deterministic). Analogously we can describe other transitions and arcs.
Fig. 1 Approximate Petri net corresponding to decision system S”
6 Conclusions In the paper we consider the new type of rules. This kind of rules we use for approximate reasoning modeling. It can also be used for reasoning in these nets, for modeling of concurrent systems [Pancerz 2008], in the classification systems and so on. In the next paper we will compare these rules with another kind of rules, to determined characteristics of these rules.
References [Delimata et al. 2008] Delimata, P., Moshkov, M., Skowron, A., Suraj, Z.: Inhibitory rules in data analysis a rough set approach. Springer, Heidelberg (2008) [Fryc et al. 2004] Fryc, B., Pancerz, K., Suraj, Z.: Approximate petri nets for rule-based decision making. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 733–742. Springer, Heidelberg (2004) [Fryc et al. 2010] Fryc, B., Machnicka, Z., Pałasiński, M.: Remarks on two valued information systems. In: Pardela, T., Wilamowski, B. (eds.) Proc. the 3rd International Conference on Human System Interaction (HSI 2010), Rzeszow, Poland, pp. 775–778 (2010)
428
M. Pałasiński, B. Fryc, and Z. Machnicka
[Pancerz 2008] Pancerz, K., Suraj, Z.: Rough sets for discovering concurrent system models from data tables. In: Hassanien, A.E., Suraj, Z., Ślęzak, D., Lingras, P. (eds.) Rough Computing, Theories, Technologies and Applications, Information Science Reference, Herrshey, pp. 239–268 (2008) [Pawlak 1991] Pawlak, Z.: Rough sets - theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht (1991) [Skowron 1993] Skowron, A.: Boolean reasoning for decision rules generation. In: Komorowski, J., Raś, Z.W. (eds.) Methodologies for Intelligent Systems, pp. 295–305. Springer, Heidelberg (1993)
Technical Evaluation of Boolean Recommenders S. Chojnacki and M.A. Kłopotek Institute of Computer Science, Polish Academy of Sciences
[email protected] Abstract. The purpose of this paper is to describe a new methodology dedicated to the analysis of boolean recommenders. The aim of most recommender system is to suggest interesting items to a given user. The most common criteria utilized to evaluate a system are its statistical correctness and completeness. The two can be measured by accuracy and recall indices. In this paper we argue that technical performance is an important step in the process of recommender system’s evaluation. We focus on four real-life characteristics i.e. time required to build a model, memory consumption of the built model, expected latency of creating a recommendation for a random user and finally the time required to retrain the model with new ratings. We adapt a recently developed evaluation technique, which is based on an iterative generation of bipartite graphs. In this paper we concentrate on a case when preferences are boolean, which is opposite to value-based ratings.
1 Introduction Recommender systems are an important component of the Intelligent Web. The systems make information retrieval easier and push users from typing queries towards clicking at suggested links. We experience real-life recommender systems when browsing for books, movies, news or music. The engines are an essential part of such websites as Amazon, MovieLens or Last.fm. Recommender systems are used to deal with the tasks that are typical for statistical classification methods. They fit especially the scenarios in which the number of attributes, classes or missing values is large. Classic data-mining techniques like logistic regression or decision trees are well suited to predict which category of news is the most interesting for a particular customer. Recommender systems are used to output more fine-grained results and point at concrete stories. In recent years we have observed a surge of interest of research community in recommender systems. One of the events that was responsible for this phenomenon was the Netflix Prize challenge. The competition was organized by a large DVD retailer in US. The prize of 1 million dollars was awarded to the team that managed to improve RMSE (root mean standard error) of the retailer's Cinematch algorithm by more than 10%. The lesson we learned during the Netflix Prize is that the difference between the quality of simple methods and the sophisticated ones is not as significant as we could have expected. Moreover, in order to lower
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 429–441. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
430
S. Chojnacki and M.A. Kłopotek
RMSE an ensemble of complex and computationally intensive methods has to be done. Even though the organizers made much effort to deliver realistic and huge data, the setting did not envision the problems that we need to face in diverse reallife recommender systems applications, such as: • the Cold Start problem, i.e. an arrival of new users with short history • instant creation of new items (e.g. news, auction items or photos) • real-time feedback from users about our performance These drawbacks were overcome during the Online Task of the Discovery Challenge organized as a part of the ECML 2009 (European Conference on Machine Learning). The owners of the www.BibSonomy.org bookmarking portal opened its interfaces to recommender systems taking part in the evaluation. Whenever a user of BibSonomy was bookmarking a digital resource (a publication or a website) a query was sent to all the systems. The tag recommendation of a random one was displayed to the user. After the action a feedback with user's actions was sent to all systems. The systems could have been maintained during the challenge, because they were configured as web services. The results showed that all of the teams found it difficult to deliver majority of its recommendations within time constraint of 1 000 milliseconds. Our research was motivated by the above result and an observation that the development of recommender systems is limited by a fact that there are not enough possibilities to test the algorithms with various datasets. The data structure used by recommender systems is a sparse user - item matrix with ratings. It is a hard exercise to generate randomly such matrices. We have challenged the problem recently [Chojnacki and Kłopotek 2010a]. We proposed to look at the matrix with ratings as if it was a bipartite graph with nodes of both modalities representing users and items respectively. A rating from the matrix is mapped onto an edge in the bigraph. We proposed an algorithm in which we can control not only simple statistics like numbers of users, items or rankings, but also obtain skewed distributions and correlations among users or items. Moreover, our random bigraph generator's asymptotic properties were verified by virtue of formal and numerical tools and we can add user or items to the graph without loosing the properties of the original datasets. In this paper we apply the generator to produce several random bigraphs with various properties and evaluate how the properties impinge on the performance of analyzed recommender systems. We analyze four features of the systems that in our opinion are responsible for the success of an algorithm in a real-life setting: (1) time required to build a model from a scratch, (2) memory consumption of the trained model, (3) latency of creating a recommendation and (4) time of updating the model with new ratings. We focus our attention on a situation when users’ preferences are boolean. The situation is occurs when we only possess an information whether a user expressed an interest for an item or not. When we have an access to information about the strength of a preference we say that preferences are value based. The magnitude of a preference (or ranking) is often expressed by the number of stars given to an item. It is sometimes advisable to build a boolean model if the quality of rankings
Technical Evaluation of Boolean Recommenders
431
is low. We proposed to utilize random graphs to an analysis of value-based recommenders in [Chojnacki and Kłopotek 2010b]. In this paper we compare performance of recommender systems in both boolean and value-based settings. We considered four algorithms during the tests: UserBased, SlopeOne, KnnItem and SVD. We used high-performance implementations of the algorithms delivered in the Mahout system [Owen et al 2010]. The rest of the article is organized as follows. In Section 2 we describe in detail the differences between value-based and boolean recommenders. In Section 3 we outline the details of applied random bigraph generator. The fourth section contains the results of extensive experiments. The last fifth section is dedicated for the concluding remarks.
2 Value-Based vs Boolean Recommenders Recommender algorithms are generic and can be used with both value-based and boolean preferences. At the abstract level preferences of a particular user are represented by n-dimensional vector, where n is the number of items in a system. The fields of a vector represent values of preferences of the user for the items. Virtually any measure can be utilized to measure the distance between any two users or items. However, when we consider an optimal implementation of such abstract structure the difference between value-based and boolean preferences becomes clear. In case of value-based implementations HashMaps are utilized to store the vectors. In case of boolean implementations it is reasonable to used more memory efficient structure and HashSets are utilized. The selection of a distance measure for value-based scenario is not constrained. However, in boolean scenario only setbased measures are allowed. Pearson, Euclidean and Spearman are examples of measures that can be used only in value-based setting. LogLikelihood, Jaccard or Tanimoto measures can be used in both settings. In our experiments we compare technical performance of three variants of implementation: 1. vectors are represented by HashMaps and the distance is calculated with the Pearson similarity 2. vectors are represented by HashMaps and the distance is calculated by means of the LogLikelihood similarity 3. vectors are represented by HashSets and LogLikelihood similarity.
3 Bipartite Random Graph Generator In this section we describe an algorithm used to generate random bigraphs. The algorithm was introduced and described in detail in [Chojnacki and Kłpotek 2010a]. The generative procedure consists of three steps: (1) new node creation, (2) edge attachment type selection and (3) running bouncing mechanism. The
432
S. Chojnacki and M.A. Kłopotek
steps are run after an initialization of the bigraph. The procedure requires specifying eight parameters. Table 1 The parameters of the random graph generative procedure Parameter
Interpretation
m
the number of initial loose edges with a user and an item at the ends
T
the number of iterations
p
the probability that a new node is a user
1-p
the probability that a new node is an item
u
the number of edges created by each new user
v
the number of edges created by each new item
alpha
the probability that a new user's edge is being connected to an item with the preferential attachment mechanism
1-alpha
the probability that a new user's edge is being connected to an item with the random attachment mechanism
beta
the probability that a new item's edge is being connected to a user with the preferential attachment mechanism
1-beta
the probability that a new item's edge is being connected to a user with the random attachment mechanism
b
the fraction of preferentially attached edges that are created via a bouncing mechanism
In the preferential attachment mechanism the probability that a node is drawn is linearly proportional to its degree. Opposite to the preferential attachment is random attachment, in which a probability of selection is equal for all nodes. The model is based on an iterative repetition of three steps. Step 1 If a random number is greater than p create a new user with u loose edges, otherwise create a new item with v loose edges. Step 2 For each edge decide whether to join it to a node of the second modality randomly or with preferential attachment. The probability of selection preferential attachment is alpha for new user and beta for new item. Step 3 For each edge that is supposed to be created with preferential attachment decide if it should also be generated via a bouncing mechanism. Bouncing is performed in three micro steps: (1) a random node is drawn from the nodes that are already joined with the new node, (2) a random neighbor of the drawn node is chosen, (3) a random neighbor of the neighbor is selected for joining with the new node. The bouncing mechanism was injected into the model in order to parameterize the level of transitivity in a graph. The transitivity is a
Technical Evaluation of Boolean Recommenders
433
feature of real datasets and in terms of recommender systems represent the correlations between items ranked by different users. In unipartite graphs transitivity is measured by the local clustering coefficient, which is calculated for each node as a number of edges among direct neighbors of the node divided by all possible pairs of the neighbors. In bipartite graphs the coefficient is always zero. Hence it is substituted by bipartite local clustering coefficient (BLCC). Bipartite local clustering coefficient of node j takes values of one minus the proportion of node's second neighbors to the potential number of the second neighbors of the node. The steps of the generator are depicted in Fig. 1.
Fig. 1 For each edge of a new node, that is to be connected with an existing node with accordance to the preferential attachment mechanism, a decision is made whether to create it via a bouncing mechanism. In case of attaching new user node, u new edges are created. On average u·alpha edges' endings are to be drawn preferentially and b of them are to be obtained via bouncing from the nodes that are already selected
One can see that after t iterations the bigraph consists of U(t) = m+pt users, I(t) = m+(1-p)t items, and E(t) = m+t(pu+(1-p)v) edges. It can be shown that: • as alpha/beta grows the item/user degree distribution becomes more power-law like than exponential like • as bouncing parameter b grows an average BLCC grows • both alpha and beta impinge on the average number of the second neighbors in the bigraph • the influence of alpha and beta on the above quantity is opposite. The above observations can be used to show that two features of data structures derived from social networks domain have an impact on the technical performance of recommender systems. The features are heavy-tailed node degree distribution and positive clustering. It is worth to mention that formal tools used to analyze the algorithms are based only on numbers of users, items and ratings [Jahrer et al. 2010].
434
S. Chojnacki and M.A. Kłopotek
4 Experiments In order to evaluate the performance of analyzed algorithms we generated 83 artificial bipartite graphs. The statistics describing the graphs are contained in Table 2. In case of HashMap representation each graph's edge was augmented with a random integer from a set of possible rankings {0; 1; 2; 3; 4; 5}. After the last iteration (usually T = 10 000) hundred more edges were created by running 100 steps for each graph with unchanged parameters. This enabled us to preserve asymptotic properties of the graphs within a set of rankings used to batch update of the models. The experiments were run in-memory within separate threads on a 64bit Fedora operating system with the Quatro 2.66GHz Intel(R) Core(TM) i5 CPUs. 4.1 Evaluated Systems We evaluated four recommender algorithms implemented in the Mahout java library. Mahout contains highly efficient open-source implementations of machinelearning algorithms maintained by a vibrant community. It is powering several portals e.g. SpeedDate, Yahoo! Mail, AOL or Mippin. The algorithms are: GenericUserBasedRecommender [Herlocker et al. 1999], SlopeOneRecommender [Lamire and Maclachlan 2005], KnnItemBasedRecommender [Bell and Koren 2007] and SVDRecommender [Zhang et al. 2005]. The algorithms cover wide spectrum of approaches to the problems of Collaborative Filtering. 4.2 Building Models We measured time required to build a model as the number of milliseconds, that are required to load whole bigraph from a text file and train a model, after this period of time the model is ready to create recommendations. We measured memory consumption of built model in megabytes. Times and memory requirements needed to train four considered recommenders are depicted in Fig. 2. There exists strong relationship between time and memory. However, we do not observe major changes in behavior between three analyzed variants: (1) HashMap and Pearson simiality, (2) HashMap and LogLikelihood similarity and (3) HashSet and LogLikelihood similarity. The fact that UserBased and KnnItem models are trained immediately in the second variant rises our concern. This observation shows that random graphs can be used not only to compare various algorithms, but also to identify potential bugs in their implementations. The fact that memory consumption is usually lower in variant three than in the first two variants is consistent with our expectations. There is one surprising result, i.e. the memory consumption of SVD recommender is the highest in variant three.
Technical Evaluation of Boolean Recommenders
435
Fig. 2 Time of building models and memory consumption of built models. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity. Left column contains times of building the models. Right column contains memory requirements
4.3 Creating Recommendations Time required to create a list of top recommended items for a random user is the most important technical criterion in many settings. We measured this latency as an average time in milliseconds required to output five best recommendations for a sample of 500 users. We can see in Fig. 3. that the longer it took to train a model the faster recommendations can be expected. The shortest latency is obtained in the first implementation variant, the longest in the second variant. We do not observe in Fig. 3.
436
S. Chojnacki and M.A. Kłopotek
qualitative changes of behavior as we proceed from variant one to variant two and variant three.
Fig. 3 Expected latency of creating a recommendation differentiated by size of the dataset and proportion of the number of users to the number of items. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity
The only qualitative difference between the three variants that we managed to identify is drawn in Fig. 4. SlopeOne recommender was consistently slower than UserBased in the first variant, but it slows down significantly in the second and the third variant. The results in Fig. 5. suggest that the premises with low and high latency that are visible for various configurations of alpha and beta are preserved among the variants for each algorithm.
Technical Evaluation of Boolean Recommenders
437
Fig. 4 Latency dimensioned by the skewness of node degree distributions. Top row contains the first variant with HashMap data structure and Pearson similarity. Middle row contains the second variant with HashMap data structure and LogLikelihood similarity. Bottom row contains the third variant with HashSet data structure and LogLikelihood similarity
438
S. Chojnacki and M.A. Kłopotek
Fig. 5 Latency of recommender algorithms for various values of alpha and beta. Botton left corner of each figure represents the variant with alpha=beta=0. The upper right corner of each figure represents the variant with alpha=beta=1. The values of alpha and beta change gradually within horizontal and vertical axis. Left column contains the first variant with HashMap data structure and Pearson similarity. Middle column contains the second variant with HashMap data structure and LogLikelihood similarity. Right column contains the third variant with HashSet data structure and LogLikelihood similarity
Technical Evaluation of Boolean Recommenders
439
4.4 Additional Analyses We performed several additional analyses to see what happens as we switch from value-based to boolean data structures and similarity measures. We checked the influence of the density of a graph and the level of clustering. In both cases the behavior of the algorithms in the second and the third variants was qualitatively consistent with the first variant. We also confirmed this fact by evaluating time required to update the models with new users and items.
5 Discussion and Conclusions In the paper we identified two factors that may impinge on the technical performance of recommender systems when we switch from value-based to boolean preferences. The factors are data structure implementation and similarity measure. We proposed three settings to compare value-based and boolean recommenders. In the first two variants datasets are implemented with a HashMap, which is characteristic for value-based models. However, only in the first case the similarity measure utilizes the values of ratings. In the third variant both data structure implementation and the similarity measure are optimal for boolean preferences. Our observations can be summarized in five points: • recommender systems based on a HashSet data structure require less memory than HashMap based implementations • time required to create a recommendation is longer for purely boolean recommenders than for purely value-based, the longest time is needed for mixed imlementations (i.e. the second variant) • random datasets enable us to identify potential bugs in implemented algorithms • the only quantitative difference in the behavior of algorithms was observed for the UserBased model, which slows down faster than e.g. SlopeOne in the boolean similarity variants The second point is in our opinion the most surprising result. It shows that in case of recommender systems the time needed to process smaller amount of information may be longer than the time required to process enriched information. This is because, even though value-based models have access to more information than boolean, they can utilize fast vector similarity measures. In case of boolean recommenders only set based distance between vectors can be calculated.
Acknowledgment This work was partially supported by Polish state budget funds for scientific research within research project Analysis and visualization of structure and dynamics of social networks using nature inspired methods, grant No. N516 443038.
440
S. Chojnacki and M.A. Kłopotek
References [Bell and Koren 2007] Bell, R.M., Koren, Y.: Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In: Proc. of ICDM, pp. 43–52. IEEE Computer Society, Los Alamitos (2007) [Chojnacki and Kłopotek 2010a] Chojnacki, S., Kłopotek, M.A.: Random graph generator for bipartite networks modeling (2011), http://arxiv.org/abs/1010.5943 [Chojnacki and Kłopotek 2010b] Chojnacki, S., Kłopotek, M.A.: Random graphs for performance evaluation of recommender systems (2011), http://arxiv.org/abs/1010.5954 [Herlocker et al. 1999] Herlocker, J.L., Konstan, J.A., Borchers, A., et al.: An algorithmic framework for performing collaborativeltering. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Informarmation Retrieval, pp. 230–237. ACM Press, New York (1999) [Jahrer et al. 2010] Jahrer, M., Toscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: KDD, pp. 693–702. ACM Press, New York (2010) [Lemire and Maclachlan 2005] Lemire, D., Maclachlan, A.: Slope one predictors for online rating-based collaborative filtering. In: Proc. of SIAM Data Mining (2005) [Owen et al. 2010] Owen, S., Anil, R., Dunning, T., et al.: Mahout in Action, Manning (2010) [Zhang et al. 2005] Zhang, S., Wang, W., Ford, J., et al.: Using singular value decomposition approximation for collaborative filtering. In: Proc. of the 7th IEEE Conf. on ECommerce, pp. 257–264 (2005)
Technical Evaluation of Boolean Recommenders
441
Appendix Table 2 Synthetic bigraphs used in experiments Parameters of bigraph generator
Lp. 1
Properties of obtained graphs
m
T
p
u
v
alpha
beta
b
users
items
edges
100
10 000
0,9
7
7
0,5
0,5
0
9 086
1 114
70 100
2
100
10 000
0,8
7
7
0,5
0,5
0
8 159
2 041
70 100
3
100
10 000
0,7
7
7
0,5
0,5
0
7 102
3 098
70 100
4
100
10 000
0,6
7
7
0,5
0,5
0
6 185
4 015
70 100
5
100
10 000
0,5
7
7
0,5
0,5
0
5 122
5 078
70 100
6
100
10 000
0,4
7
7
0,5
0,5
0
4 098
6 102
70 100
7
100
10 000
0,3
7
7
0,5
0,5
0
3 120
7 080
70 100
8
100
10 000
0,2
7
7
0,5
0,5
0
2 083
8 117
70 100
9
100
10 000
0,1
7
7
0,5
0,5
0
1 107
9 093
70 100
10
100
1 000
0,9
7
7
0,1
0,1
0
1 008
192
7 100
11
100
2 000
0,9
7
7
0,1
0,1
0
1 888
312
14 100
12
100
3 000
0,9
7
7
0,1
0,1
0
2 788
412
21 100
13
100
4 000
0,9
7
7
0,1
0,1
0
3 699
501
28 100
14
100
5 000
0,9
7
7
0,1
0,1
0
4 568
632
35 100
15
100
6 000
0,9
7
7
0,1
0,1
0
5 511
689
42 100
16
100
7 000
0,9
7
7
0,1
0,1
0
6 419
781
49 100
17
100
8 000
0,9
7
7
0,1
0,1
0
7 306
894
56 100
18
100
9 000
0,9
7
7
0,1
0,1
0
8 178
1 022
63 100
19
100
10 000
0,9
7
7
0,1
0,1
0
9119
1 081
70 100
20
100
25 000
0,9
7
7
0,1
0,1
0
22 576
2 624
175 100
21
100
50 000
0,9
7
7
0,1
0,1
0
45 172
5 028
350 100
22
100
100 000
0,9
7
7
0,1
0,1
0
90 211
9 989
700 100
48
100
10 000
0,5
7
7
1
1
0
5 081
5 119
70 100
49
100
10 000
0,5
7
7
1
0,8
0
5 078
5 122
70 100
50
100
10 000
0,5
7
7
1
0,6
0
5 083
5 117
70 100
…
…
…
…
…
…
…
…
…
…
…
…
83
100
10 000
0,5
7
7
0
0
0
4 985
5 215
70 100
Interval Uncertainty in CPL Models for Computer Aided Prognosis L. Bobrowski Faculty of Computer Science, Białystok Technical University Institute of Biocybernetics and Biomedical Engineering, PAS, Warsaw, Poland
[email protected] Abstract. Multivariate regression models are often used for the purpose of prognosis. Parameters of such models are estimated on the basis of learning sets, where feature vectors (independent variables) are combined with values of response (target) variable. The values of response variable can be determined only with some uncertainty in some important applications. For example, in survival analysis, the values of response variable is often censored and can be represented as intervals. The interval regression approach has been proposed for designing prognostic tools in circumstances of such type of uncertainty. The possibility of using the convex and piecewise linear (CPL) functions in designing linear prognostic models on the basis of interval learning sets is examined in the paper.
1 Introduction Multivariate regression models are widely used in statistics, pattern recognition or data mining context [Johnson and, Wichern 1991; Duda et al. 2001]. The most important applications of regression models are linked to prognosis (prediction) goals. The value of dependent (target) variable should be predicted on the basis of independent variables values. The main role is played here by linear regression models, when the dependent variable is a linear combination of independent variables. Linear regression models can be designed by using different methods depending on the structure of learning data sets. In accordance with the classical last-square approach, the parameters of the linear regression models are estimated on the basis of learning sequence in the form of feature vectors combined with exact values of dependent (target) variable [Johnson and, Wichern 1991]. The exact value of target variable represents additional knowledge about a particular object represented by given feature vectors. The logistic regression is typically used when the target variable is categorical. If the target variable is a binary one, the regression model is based on a linear division of feature vectors into two groups [Duda et al. 2001].
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 443–461. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com
444
L. Bobrowski
The ranked regression models are designed on the basis of a set of feature vectors with additional knowledge in the form of ordering relation inside selected pairs of these vectors [Bobrowski 2009]. Linear ranked models can be designed through the minimization of the convex and piecewise linear (CPL) criterion function defined on differences of feature vectors. Special regression methods known as interval regression are developed for the case when values of target values are uncertain and can be represented in the form of intervals [Buckley. and James 1979], [Gomez et al. 2003]. Uncertainty means in this case some missing information about the exact values of target variable. Cox proportional hazards model developed in the context of survival analysis is commonly applied in the case of censored data [Klein and Moeschberger 1997]. The algorithms based on the Expectation Maximization (EM) principle have been developed and used for the purpose of estimation parameters of the interval regression models. Designing linear regression models on the basis of interval learning data by using the convex and piecewise linear (CPL) criterion functions has been proposed and analyzed in the paper [Bobrowski 2010]. This paper describes a different approach to designing interval regression models with using a different CPL criterion functions. This approach is referring to the concept of linear separability of data sets [Duda et al. 2001].
2 Linear Regression Models The pattern recognition terminology is used throughout this paper [2]. We are considering a set of m feature vectors xj[n] = [xj1,…,xjn]T belonging to a given n dimensional feature space F[n] (xj[n] ∈ F[n]). Feature vectors xj[n] represent a family of m objects (events, patients) Oj (j = 1,..., m). Components xji of the vector xj[n] could be treated as the numerical results of n standardized examinations of the given object Oj (xji ∈ {0,1} or xji ∈ R1). Each vector xj[n] can be also treated as a point of the n-dimensional feature space F[n]. Linear regression models have a form of linear (affine) transformations of ndimensional feature vectors x[n] (x[n]∈ F[n]) on the points y of the line (y∈R1): y(x) = w[n]Tx[n] + θ
(1)
where w[n] = [w1,…, wn] ∈ R is the parameters (weight) vector and θ is the threshold (θ ∈ R1). Properties of the model (1) depend on the choice of the parameters w[n] and θ. The weights wi and the threshold θ are usually computed on the basis of the data (learning) sets. In the classical regression analysis the learning sets have the below structure [Johnson and, Wichern 1991]: T
n
Cm′ = {xj[n]; yj} = {xj1,…., xjn,; yj}, where j = 1,….., m
(2)
Interval Uncertainty in CPL Models for Computer Aided Prognosis
445
Each of m objects Oj is characterized in the set Cm′ by values xji of n independent variables (features) xi, and by the observed value yj (yj ∈ R1) of the dependent (target) variable Y. In the case of classical regression, the parameters w[n] and θ] are chosen in such a manner that the sum of the squared differences (yj - yj^)2 between the observed target variable yj and the modeled variable yj^ = w[n]Txj[n] + θ (1) is minimal [Johnson and, Wichern 1991]. In the case of interval regression, additional knowledge about particular objects Oj is represented by the intervals [yj-, yj+] (yj- < yj+) instead of the exact values yj (2) [Buckley. and James 1979], [Gomez et al. 2003]: Cm = {xj[n], [yj-, yj+]}, where j = 1,….., m
(3)
where yj- is the lower bound (yj-∈ R1) and yj+ is the upper bound (yj+∈ R1) of unknown value of the target variable Y (yj- < yj+). Let us remark that the classical learning set Cm′ (2) can be transformed into the interval learning set Cm (3) by introducing the boundary values yj-= yj - ε and yj+ = yj + ε, where ε is a small positive parameter (ε > 0). Imprecise measurements of dependent variable y can be represented in such a manner. The transformation (1) constitutes the interval regression model if the below linear inequalities are fulfilled in the best way possible for elements of the set Cm (3): (∀j ∈ {1,…., m})
yj- < w[n]Txj[n] + θ < yj+
(4)
The formula (4) can be used among others for representation of the survival analysis problems as it is shown in the below example: Example 1: Traditionally, the survival analysis data sets Cs have the below structure [Klein and Moeschberger 1997]: Cs = {xj[n], tj , δj} ( j = 1,.......,m)
(5)
where tj is the observed survival time between the entry of the j-th Oj patient into the study and the end of the observation, δj is an indicator of failure of this patient (δj ∈{0,1}): δj= 1 - means the end of observation in the event of interest (failure), δj = 0 - means that the follow-up on the j-th patient ended before the event (the right censored observation). In this case (δj = 0) information about survival time tj is not complete. The real survival time Tj can be defined in the below manner on the basis of the set Cs (5): (∀ j = 1,.......,m) if δj = 1, then Tj = tj, and if δj = 0, then Tj > tj
(6)
446
L. Bobrowski
The right censoring can mean that an unknown survival time Tj is greater than some lower bound tj- (Tj > tj-). Similarly, the left censoring can mean that an unknown survival time Tj of the j-th patient Oj is less than some upper bound tj+ (Tj < tj+). We can use the below inequalities (4) for the purpose of designing the linear prognostic model T = w[n]Tx[n] + θ (1) from the censored data Tj: if Tj is right censored, then w[n]Txj[n] + θ > tj-
(7)
if Tj is left censored, then w[n] xj[n] + θ
0 w[n]Txj[n] - yj+ + θ < 0
(9)
Two types of the augmented feature vectors xj+[n+2] and xj-[n+2] and the augmented weight vector v[n+2] (v[n+2] ∈ Rn+2) can be linked to the above inequalities: (∀j ∈ {1,…., m}) (10) if (yj- > - ∞), then xj+[n+2] = [xj[n]T, 1, -yj-]T else xj+[n+2] = 0, and if (yj+ < + ∞), then xj-[n+2] = [xj[n]T, 1, -yj+]T else xj-[n+2] = 0
Interval Uncertainty in CPL Models for Computer Aided Prognosis
447
and v[n+2] = [v1,…,vn+2]T = [w[n]T, θ, β]T
(11)
where β is the interval weight (β ∈ R1). The inequalities (9) can by represented by using the symbols xj+[n+2], xj-[n+2] and v[n+2]: (∀j ∈ {1,…., m})
(12)
(∀xj+[n+2] ≠ 0) v[n+2]T xj+[n+2] > 0, and (∀xj [n+2] ≠ 0) v[n+2]T xj-[n+2] < 0
The above inequalities can be linked to the demand of the linear separability of the sets R+ and R-. The positive set R+ is composed of m+ augmented vectors xj+[n+2] (10) which are different from zero (xj+[n+2] ≠ 0) and the negative set R- is composed of m- augmented vectors xj-[n+2] (10) which are different from zero (xj[n+2] ≠ 0): R+ = {xj+[n+2]} and R- = {xj-[n+2]}
(13)
We will examine the possibility of the sets R+ and R- separation by a such hyperplane H(v[n+2]) in the (n+2) – dimensional feature space F[n+2] which passes through the point 0 (origin) of this space [Bobrowski 2005]: H(v[n+2]) = {x[n+2]∈F[n+2]: v[n+2]Tx[n+2] = 0}
(14)
Definition 1: The sets R+ and R- (8) are linearly separable in the feature space F[n+2] if and only if exists such augmented weight vector v′[n+2] (11), that the below inequalities hold for the all non-zero vectors xj+[n+2] and xj-[n+2] (10): (∃ v′[n+2] )
(15)
(∀xj+[n+2] ≠ 0) v′[n+2]T xj+[n+2] > 0. and (∀xj-[n+2] ≠ 0) v′[n+2]T xj-[n+2] < 0 If the inequalities (15) hold then all the non-zero elements xj+[n+2] of the set R+ (13) are situated on the positive side of the hyperplane H(v′[n+2]) (14) and all the non-zero elements xj-[n+2] of the set R- are situated on the negative side of this hyperplane (Fig. 1). Example 2: Let us take an example of seven values xj of dependent variable x (xj ∈ F[1] - one dimensional feature space) and the dependent variable y characterised by the intervals [yj-, yj+] (3).
448
L. Bobrowski
Table 1 An example of interval data set (3) with seven elements xj (m+ = 6 and m- = 5) j
xj
yj-
yj+
1
-1.0
-3.5
-1.0
2
-0.5
-3.0
+∞
3
1.0
2.0
2.5
4
1.5
-3.0
+∞
5
2.5
1.5
2.5
6
3.5
-∞
3.0
7
4.5
2.0
4.0
We can remark that in the above data set that the values y2 and y4 are right censored and the value y6 is left censored. The equation y = x - 1 fulfils all the inequalities (3) resulting from the Table 1 and can be treated as the interval regression model (1) of this data (Fig. 1). The parameters of the model (1) are equal in this case w[1] = 1 and θ = 1.
yy +
y3 y5+
2
-2 y + 1 -
y1- y2
y3 y5
y7+
-
y =x-1
y74
2 -
y6
+
x
y4-
Fig. 1 An illustration of the data set from the Table 1 and the interval regression model y = x – 1 (3)
Remark 1: If the augmented vector v′[n+2] (11) linearly separates (15) the sets R+ and R- (13), then the interval weight β′ (11) is greater than zero (β′ > 0). This Remark results directly from the definition (10) of the vectors xj+[n+2] and xj [n+2], the relation (9), and the inequality yj+ > yj-. Lemma 1: If the hyperplane H(v′[n+2]) (14) with the interval weight β′ (11) equal to one (v′[n+2] = [w′[n]T, θ′,1]) separates (15) the sets R+ and R- (13), then the linear model yj = w′[n]Txj[n] + θ′ (1) fulfils all the inequalities (4). This Lemma results directly from the definition of the augmented feature vectors xj+[n+2] and xj-[n+2] (10) and the augmented weight vector v[n+2] (11).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
449
Lemma 2: All the inequalities yj- < w′[n]Txj[n] + θ′ < yj+ (4) can be fulfilled by some parameters vector v′[n+2] = [w′[n]T, θ′,1] (11) if and only if the sets R+ and R- (13) are linearly separable (15). Proof: If there exists such weight vector v′[n+2] (11) that all the inequalities yj- < w′[n]Txj[n] + θ′< yj+ (4) are fulfilled, then the sets R+ and R- (13) are linearly separable (15) with the interval weight β′ (11) equal to one (β′ = 1). This property results directly from the definition (10) of the vectors xj-[n+2] and xj+[n+2]. If the sets R+ and R- (13) are linearly separable (15), then there exists such hyperplane H(v′[n+2]) (14) which separates these sets. The separation (10) of the sets R+ and R- (8) by the hyperplane H(v′[n+2]) (14) means that the interval weight β′ is greater than zero (β′ > 0) (Remark 1). In this case, the sets R+ and R- (13) are also separated by the hyperplane H(v′′[n+1]) (14) with the marginal weight β′′ equal to one (β′′= 1) as it results from the below inequalities (15): (∃ w′[n], θ′, and β′ > 0) (∀j ∈ {1,…., m}) w′[n]Txj[n] + θ′ - β′ yj- > 0 and w′[n]Txj[n] + θ′ - β′ yj+ < 0
(16)
By dividing the above inequalities by β′ (β′ > 0) we obtain the inequalities (4). The inequalities (16) can be represented equivalently in the augmented manner [Bobrowski 2005]: (∃ v′[n+2] = [w′[n]T, θ′, β′]T, where β′ > 0 (∀j ∈ {1,…., m}) v′[n+2]T xj+[n+2] ≥ 1 and v′[n+2]T xj-[n+2] ≤ -1
(17)
Such representation is used further in the definition of the CPL penalty functions.
4 Convex and Piecewise Linear (CPL) Penalty and Criterion Functions The inequalities (17) can be a guideline in the definition of the CPL penalty φj+(v[n+2]) and φj-(v[n+2]) [Bobrowski 2005]. The upper penalty functions φj+(v[n+2]) are defined by different from zero feature vectors xj+[n+2] (10): (∀xj+[n+2] ≠ 0) φj+(v[n+2]) =
(18) 1 - v[n+2]Txj+[n+2]
if
v[n+2]Txj+[n+2] < 1
0
if
v[n+2]T xj+[n+2] ≥ 1
450
L. Bobrowski
Similarly, the lower penalty functions φj-(v[n+2]) are defined by the augmented feature vectors xj-[n+2] (10): (∀xj-[n+2] ≠ 0)
(19) 1 + v[n+2]Txj-[n+2]
if
v[n+2]Txj-[n+2] > -1
0
if
v[n+2]Txj-[n+2]
φj-(v[n+2]) =
≤ -1
The perceptron criterion function Φ(v[n+2]) is defined as the sum of the penalty functions φj+(v[n+2]) (18) and φj-(v[n+2]) (19) [7]: Φ(v[n+2]) = Σ αj φj+(v[n+2]) + Σ αj φj-(v[n+2]) j
(20)
j
where nonnegative parameters αj (αj ≥ 0) determine an importance (price) of the particular feature vectors xj[n] (3). The function Φ(v[n+2]) (20) is convex and piecewise-linear (CPL) as the sum of such type penalty functions. Designing the interval regression models (1) can be based on finding of the minimal value Φ(v*[n+2]) and the optimal vector v*[n+2] of the criterion function Φ(v[n+2]) (20) []: (∀v[n+2]) Φ(v[n+2]) ≥ Φ(v*[n+2]) = Φ* ≥ 0
(21)
where v [n+2] = [w [n] , θ , β ] , and w [n] = [w1 ,…., wn ] (1). The basis exchange algorithms, which are similar to the linear programming, allow one to find the minimum of the CPL function Φ(v[n+2]) (20) and the optimal parameters vector v*[n+2] (21) efficiently, even in the case of large, multidimensional data sets [Bobrowski 1991]. The below theorems can be proved: *
*
T
*
* T
*
*
* T
Theorem 1: The minimal value Φ* = Φ(v*[n+2]) (21) of the non-negative criterion function Φ(v[n+2]) (20) is equal to zero (Φ* = 0) if and only if the sets R+ and R(13) are linearly separable (15). In this case, the hyperplane H(v*[n+2]) (14) defined by the optimal vector v*[n+2] (21) exactly separates the sets R+ and R-. The proof of the similar theorem has been given in the author′s earlier works [7]. Theorem 2: If there exists such weight vector w′[n] and the threshold θ′, that all the inequalities yj- < w′[n]Txj[n] + θ′ < yj+ (4) are fulfilled for all feature vectors xj[n] (3), then the minimal value Φ(v*[n+2]) (21) of the criterion function Φ(v[n+2]) (20) is equal to zero (Φ(v*[n+2]) = 0).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
451
The minimal value Φ′(v*[n+2]) (21) of the criterion function Φ(v[n+2]) (20) is greater then zero, if not all the inequalities (4) can be fulfilled. If the interval weight β* in the optimal weight vector v*[n+2] = [w*[n]T,θ*, β*]T (21) is greater than zero (β* > 0), then the below linear transformation of the feature vectors xj[n] on the line y (1) can be defined: (∀j ∈ {1,…., m})
yj^ = (w*[n] / β*)T xj[n] + θ*/ β*
(22)
We can infer from the Theorem 2, that the minimal value Φ(v*[n+2]) (21) of the criterion function Φ′(v[n+2]) (20) is equal to zero if the interval regression model (22) with β* > 0 fulfils all the constraints (4). Theorem 3: If the sets R+ and R- (13) are linearly separated (15) by the optimal vector v*[n+2] = [w*[n]T,θ*, β*]T (21) and the interval weight β* is greater than zero (β* > 0), then the model (22) fulfills all the inequalities yj- < (w*[n] / * T * * + β ) xj[n] + θ / β < yj (4). This theorem can be proved on the basis of the Theorem 1 and the equation (11).
5 Feature Selection for the CPL Interval Regression Designing the linear regression model yj^ = (w*[n] / β*)T xj[n] + θ*/ β* (22) can be based on the minimization of the CPL criterion function Φ(v[n+2]) (20) defined on the interval data set Cm (3). In practice, the interval data set Cm (3) often contain a small number m of multidimensional feature vectors xj[n] (m 0), i = 1,..., n, and the cost functions φi(v[n+2]) are defined by the unit vectors ei[n+2] = [0,…,1,…,0]T: (∀i ∈ {1,…, n}) φi(v[n+2]) = | wi | =
-e i[n+2]Tv[n+2] if ei[n+2]Tv[n+2]< 0
(25)
e i[n+2]Tv[n+2] if ei[n+2]Tv[n+2]≥ 0 The cost function φi(v[n+2]) is linked to the feature xi and it is aimed at reducing (23) of this feature. Let us remark that in accordance with the above equations, the cost functions φi(v[n+2]) are related only to n real features xi (xj[n] = [x1,…, xn]T (1)). The cost function φn+1(v[n+2]) can be defined in a similar manner: φn+1(v[n+2]) = | θ | =
-e n+1[n+2]Tv[n+2] if en+1[n+2]Tv[n+2]< 0
(25)
e n+1[n+2]Tv[n+2] if en+1[n+2]Tv[n+2]≥ 0 The cost function φn+1(v[n+2]) is aimed at diminishing the threshold θ value to zero. We can remark that in some applications the reducing of the threshold θ value is not required. Such effect can be achieved by using a very small value of the parameter γn+1 (γn+1 > 0). The criterion function Ψλ(v[n+2]) (24) contains an additional cost function γn+2 |vn+2 - 1| = γn+2 |β - 1|. This cost function can serve as reinforcement of the condition β′ = 1 (11) that the interval weight β′ should be equal to one (Lemma 1). The non-negative parameter γn+2 (γn+2 ≥ 0) allows to regulate the level of this reinforcement. In accordance with the RLS method of feature selection, the reduction (23) of unimportant features xi in the cost sensitive manner is based on the minimization of the modified CPL criterion function Ψλ(v[n+2]) (24) with different values of the cost level λ [4]. The criterion function Ψλ(v[n+2]) (24) is the convex and piecewise linear (CPL) as the sum of the CPL functions Φ(v[n+2]) (20) and the
Interval Uncertainty in CPL Models for Computer Aided Prognosis
453
CPL functions λ γi φi(v[n+2]) (25). The basis exchange algorithms allow to find the optimal vector vλ*[n+2] which constitutes the minimal value of the criterion function Ψλ(v[n+2]) (24): (∃vλ*[n+2]) (∀v[n+2]) Ψλ(v[n+2]) ≥ Ψλ(vλ*[n+2]) = Ψλ*
(26)
Remark 2: The minimal value Ψλ* (26) of the non-negative criterion function Ψλ(v[n+2]) (24) with the cost level λ equal to zero (λ = 0) is equal to zero (Ψλ* = 0) if and only if the sets R+ and R- (13) are linearly separable (15). The above Remark can be linked to the Theorem 1. The CPL cost function φi(v[n+2]) (25) allows to reinforce the conditions wi = 0 and tends to reduce the feature xi (23) as a result of the function Ψλ(v[n+2]) (24) minimization. An influence of the cost function φi(v[n+2]) (25) on the feature xi reduction increases with the value of the parameters γi and λ (24). An increase of the cost level λ leads to reducing greater number of features xi in result of the criterion function Ψλ(v[n+2]) (24) minimization. Successive increase of the value of parameter λ in the criterion function Ψλ(v[n+2]) (24) allow to generates the descended sequence of feature subspaces Fk[nk]: F[n] ⊃ F1[n1] ⊃ F2[n2] ⊃… ⊃ Fk′[nk′]
(27)
where nk > nk+1. Each step Fk[nk] → Fk+1[nk+1] in the above sequence can be realized in the deterministic manner by an adequate increase λk → λk+1 = λk + Δkλ of the cost level λ in the criterion function Ψλ(v[n+2]) (24). The minimization of the criterion function Ψλ(v[n+2]) (24) with the parameter λk+1 results in the feature subspace Fk+1[nk+1]. The quality of particular feature subspaces Fk[nk] in the sequence (27) should be evaluated during the feature selection process. In the RLS approach, the quality of the feature subspace Fk[nk] was evaluated on the basis of the optimal linear classifier designed in this subspace. For this purpose, the perceptron criterion function Φk(v[nk+2]) (20) was defined by using the feature vectors xj[nk] from the subspace Fk[nk] (xj[nk] ∈ Fk[nk]). Two types of the augmented feature vectors xj+[nk+2] and xj-[nk+2] and the augmented weight vector v[nk+2] was defined in accordance with the rules (10) and (11): (∀j ∈ {1,…., m}) (28) if (yj- > - ∞), then xj+[nk+2] = [xj[nk]T, 1, -yj-]T else xj+[nk+2] = 0, and if (yj+ < + ∞), then xj-[nk+2] = [xj[nk]T, 1, -yj+]T
454
L. Bobrowski
and v[nk+2] = [w[nk]T, θ, β]T, where v[nk+2]
∈ V[nk+2].
(29) vk*[nk+2]
The basis exchange algorithm allows to find the optimal vector which constitutes the minimum (21) of the function Φk(v[nk+2]) (20) in the weight subspace V[nk+2]. vk*[nk+2] = [wk*[nk]T, θk*, βk*]T
(30)
vk*[nk+2]
The optimal vector (30) allows to define both the interval regression model (22) as well as the following decision rule of the optimal linear classifier in the subspace Fk [nk+2]: if vk*[nk+2]Tx[nk+2] ≥ 0, then x[nk+2] is allocated to the category ω+ * if vk [nk+2]Tx[nk+2] ≥ 0, then x[nk+2] is allocated to the category ω-
(31)
In accordance with the above rule, the augmented feature vector x[nk+2] (x[nk+2] ∈ Fk[nk+2]) is allocated to the positive category ω+, if the scalar product vk*[nk+2]Tx[nk+2] is not negative. In the other case, the vector x[nk+2] is allocated to the negative category ω+. We are considering the linear separability (10) of the sets Rk+ and Rk- (13) containing the augmented feature vectors xj+[nk+2] and xj-[nk+2] (28). Remark 3: If the sets Rk+ and Rk- (13) are linearly separable (15) in the feature space Fk[nk+2], then the decision rule (31) based on the optimal vector vk*[nk+2] (30) allocates all the non-zero elements xj+[nk+2] of the set Rk+ in the positive category ω+, and all the non-zero elements xj-[nk+2] of the set Rk- in the negative category ω-. The above Remark can be justified by using the Theorem 2. In accordance with the RLS method of feature selection, the quality of the feature subspace Fk[nk] (27) is evaluated on the basis of evaluation of the optimal linear classifier (31) defined in this subspace by a such parameters vector vk*[nk+2] = [wk*[nk]T, θk*, βk*]T (30), which constitutes the minimum of the criterion function Ψλ(v[nk+2]) (24). The quality of the linear classifier (31) can be evaluated by using the error estimator (apparent error rate) ea(vk*[nk+2]) as the fraction of wrongly classified non-zero elements xj+[nk+2] and xj-[nk+2] (28) of the sets Rk+ and Rk- (8) [Duda et al. 2001]: ea(vk*[nk+2]) = ma(vk*[nk+2]) / (m+ + m-)
(32)
Interval Uncertainty in CPL Models for Computer Aided Prognosis
455
where m+ is the number of the non-zero elements xj+[nk+2] (28) in the set Rk+ (13), m- is the number of the non-zero elements xj-[nk+2] in the set Rk-, and ma(vk*[nk+2]) is the number of such elements from these sets which are wrongly allocated by the rule (31). Wrong allocation happens, when the augmented feature vector xj+[nk+2] (28) is allocated to the negative category ω-, or the vector xj-[nk+2] is allocated to the positive category ω+. Because the same data xj[nk] was used for classifier (31) designing and for classifier evaluation, the evaluation result (32) is too optimistic (biased) [2]. The error rate ea(vk*[nk+2]) (32) evaluated on the elements xj+[nk+2] and xj-[nk+2] (28) of the learning sets Rk+ and Rk- (13) is called the apparent error (AE). In accordance with the Remark 2, if the sets Rk+ and Rk- (13) are linearly separable (15) in the feature subspace Fk[nk+2], then the apparent error ea(vk*[nk+2]) (32) is equal to zero. But it is typically found in practical applications that the error rate of classifier (31) evaluated on vectors xj[nk+2] (28) that do not belong to the learning sets Rk+ and Rk- (8) is higher than zero. For the purpose of reducing the classifier bias, the cross validation procedures can be applied [2]. The term p-fold cross validation means that the data sets Rk+ and Rk- (13) have been divided into p parts Pi, where i = 1,…, p. The vectors xj+[nk+2] and xj-[nk+2] (28) contained in p – 1 parts Pi are used for the definition of the criterion function Φk(v[nk+2]) (20) and in the computation of optimal parameters vk*[nk+2] (30). The remaining vectors xj+[nk+2] and xj-[nk+2] (28) are used as a test set (one p-part Pi′) for the evaluation of the error rate ei′(vk*[nk+2]) (32). This evaluation is repeated p times, and during each time different p-part Pi′ is used as the test set. After this, the mean value ec(vk*[nk+2]) of the errors rates ei′(vk*[nk+2]) (32) on the elements of the test sets Pi′ is computed. The cross validation procedure allows to use different vectors for designing of classifier (31) and its evaluation, and, as a result, to reduce the bias of the error rate estimation (32). The error rate eCVE(vk*[nk+2]) (32) estimated during the cross validation procedure is called the cross-validation error (CVE). A special case of the p-fold cross validation method is the leave-one out procedure. In the case of the leave-one out procedure, the number p of the parts Pi is equal to the number of the non-zero elements xj+[nk+2] and xj-[nk+2] (28) of the sets Rk+ and Rk- (13). In accordance with the RLS method of feature selection, such feature subspace Fk*[nk] in the sequence (27) is selected as the optimal one which is linked to the smallest value of the cross-validation error rate eCVE(vk*[nk+2]) (32) of the linear classifier (31) [Bobrowski and Łukaszuk 2009].
6 Hyperplanes and Vertices in the Parameter Space V[n + 2] Each non–zero feature vector xj+[n+2] (10) defines the hyperplane hj+ in the parameter space V[n+2]:
456
(∀j ∈ {1,…, m})
L. Bobrowski
if xj+[n+2] ≠ 0, then hj+ = {v[n+2]: xj+[n+2]Tv[n+2] = 1} (33)
Similarly, feature vectors xj-[n+2] (10) define the hyperplanes hj-: (∀j ∈ {1,…, m})
if xj-[nk+2] ≠ 0, then hj- = {v[nk+2]: xj-[nk+2]Tv[nk+2] = -1}
(34)
The unit vectors ei[n+2] = [0,…,1,…,0]T define the hyperplanes hi0 in the (n +2) - dimensional parameter space V[n +2]: (∀i ∈ {1,…, n +2}) h i0 = {v[n +2]: ei[n +2]Tv[n +2] = 0}
(35)
The hyperplanes hj+ (33), hj- (34) or hi0 (35) intersect in some points vr[n + 2] (vr[n + 2] ∈ V[n + 2]), which are called as vertices. Each vertex vr[n + 2] in the (n + 2) - dimensional parameter space V[n + 2] is the geometrical place of intersection at least n + 2 hyperplanes hj+ (33), hj- (34) or hi0 (35). Each vertex vr[n+2] can be defined by the set of n + 2 linear equations: xj+[n + 2]Tvr[n + 2] = 1 (33) or xj-[n + 2]Tvr[n + 2] = -1 (34) or ei0[n + 2]Tv r[n + 2] = 0 which can be represented in the below matrix form: Br[n + 2]Tvr[n + 2] = δr[n + 2]
(36)
where Br[n + 2] is the nonsingular matrix (basis) with the columns constituted by n + 2 linearly independent feature vectors xj+[n + 2], xj-[n + 2] (28) or unit vectors ei[n + 2] and δr[n + 2] is the margin vector with the components δri equal 1, -1, or 0 adequately to the type of the vector which constitutes the i-th row of the matrix Br[n + 2] (xj+[n + 2], xj-[n + 2] or ei[n + 2]). The vertex vr[n + 2] can be computed in accordance with the below formula on the basis of the equation (36): vr[n + 2] = (Br[n + 2]T)-1 δr[n + 2]
(37)
It can be proved that the minimum (26) of the modified CPL criterion function Ψλ(v[n + 2]) (24) defined on the feature vectors xj+[n + 2] and xj-[n + 2] (10) can be located in one of the vertices vr[n + 2] (36): (∃v r *[n +2]) (∀v[n + 2]) Ψλ(v[n + 2]) ≥ Ψλ(vr*[n + 2])
(38)
The minimization (26) of the modified CPL criterion function Ψλ(v[n + 2]) (24) allows also to find the basis Br*[n + 2] (36) related to the optimal vertex vr*[n + 2] (38).
Interval Uncertainty in CPL Models for Computer Aided Prognosis
457
Remark 4: If the i-th (i = 1,…, n) unit vector ei[n + 2] = [0,…,1,…,0]T constitutes one of the rows of the basis Br*[n+2] (36) related to the optimal vertex vr*[n+2] = [wr*[n]T, θr*, βr*]T (38), where wr*[n] = [wr,1*,…, wr,n+1*]T (22), then the weight wr,i* (28) linked to the i-th feature xi is equal to zero (wr,i* = 0). In accordance with the implication (23), the i-th feature xi can be reduced (neglected) in this case. The Remark 4 can be summarized in the below manner by using the implication (23): (The i-th unit vector ei[n+2] (i = 1,…, n) is in the basis Br*[n+2]) (The i-th feature xi can be reduced)
(39)
Remark 5: A sufficiently large increase of the cost level λ (λ ≥ 0) in the CPL criterion function Ψλ(v[n+2]) (24) leads to an increase of the number n0 of unit vectors ei[n+2] in the basis Br*[n+2] (36) related to the optimal vertex vr*[n+2] (38). In result, n0 features xi can be reduced from the feature space F[n]. The dimensionality n of the feature space F[n] can be reduced arbitrarily by an adequate increase of the parameter λ in the criterion function Ψλ(v[n+2]) (24). For example, the value λ = 0 means that the optimal vertex vr*[n+2] (38) constitutes also the minimum of the perceptron criterion function Φ(v[n+2]) (20) defined in the full feature space F[n]. On the other hand, sufficiently large value of the parameter λ results in the optimal vertex vr*[n+2] (38) equal to zero (vr*[n+2] = 0). Such solution is not constructive, because it means that all the features xi have been reduced (23) and the separating hyperplane H(vr*[n+2]) (14) cannot be defined. The basis exchange algorithms allow to find efficiently the parameters (vertex) vr*[n+2] constituting the minimum of the CPL criterion function, even in the case of large sets R+ and R- (13) of high dimensional vectors xj+[n+2] and xj-[n+2] (10) [Bobrowski 1991].
7 Examples Example 3: Let us consider the case of two-dimensional feature space F[2] with only one feature vector x1[2] = [x11, x12]T = [3, 2]T∈ F[2] and with only one constraint (3): the upper bound y1+ = -2. In this case, the linear model (1) y(x[2]) = w[2]Tx[2] + θ should fulfil only one inequality (4): 3w1 + 2 w2 + θ < -2. The augmented vector xj-[nik+2] = [xj[nk]T, 1, -yj+]T (28) is equal in this case to x1-[4] = [3, 2, 1, 2]T. The augmented inequality (17) v[n+2]Txj-[n+2] ≤ -1 takes the form v[4]Tx1-[4] ≤ -1, where v[4] = [v1, v2, v3, v4]T = [w1, w2, θ, β]T (11). The feature vector x1-[4] = [3, 2, 1, 2]T (10) defines the below hyperplane h1- (34) in the parameter space V[4]:
458
L. Bobrowski
h1- = {v[4]: x1-[4]Tv[4] = -1} = {v[4]: 3v1+ 2v2 + v3+ 2 v4 = -1}
(40)
The unit vectors ei[4] define the zero basis B0[0] = [e1[4],e2[4],e3[4],e4[4]] (36) and four hyperplanes hi0 with the margin equal to zero in the parameter space V[4]: (∀i ∈ {1,2, 3,4}) h i0 = {v[4]: ei[4]Tv[4] = 0} T
(41) 1
The unit vector e4[4] = [0, 0, 0, 1] defines also the hyperplane h4 with the margin equal to one: h41 = {v[4]: e4[4]Tv[4] = 1} = {v[4]: v4 = 1} =
(42)
The hyperplane h41 has been used for representation of the condition β = 1 in the vector v[4] = [v1, v2, v3, v4]T = [w1, w2, θ, β]T (Lemma 1). The four hyperplanes hi0 intersect the zero vertex v0[4] = [0, 0, 0, 0]T. (37). The zero vertex v0[4] should be excluded from further considerations because this vector does not fulfill the assumed inequality (4): 3w1 + 2 w2 + θ < - 2. Let us take into considerations such vertices vr[4] (37) in the parameter space V[4] which are the points of intersection of the hyperplane hi-(40) with the hyperplane h31 (42) and with two hyperplanes hi0 (41). Each such vertex vr[4] can be defined by the linear equation: x1-[4]Tvr[4] = -1 (34), by the equation e3[4]Tv[4] = 1 (42) and by two of four equations ei[4]Tvr [4] = 0, where i < 4 (41). The vertex vr[4] can be represented in the matrix form (36): Br[4] Tvr[4] = δr[4]
(43)
where v[4] = [v1, v2, v3, v4] = [w1, w2, θ, β] (11). The nonsingular matrix (basis) Br[4] is constituted by the vector x1-[4] = [3, 2, 1, 2]T and by three unit vectors ei(k)[4]. The margin vector δr[4] has components equal to -1, 0 or 1 adequately to the hyperplanes h101- (40), hi0 (41), or h41 (42). The solution of the equation (43) is given by: T
T
vr[4] = (Br[4]T)-1 δr[4]
(44)
The below rules can be obtained from the equation (44): • If the basis Br[4] (43) is equal to B1[4] = [x1-[4], e2[4], e3[4], e4[4]], then δ1[4] = [-1, 0, 0, 1]T and the vertex vr[4] is equal to v1[4] = [-1, 0, 0, 1]T. • If the basis Br[4] (43) is equal to B2[4] = [e1[4], x1-[4], e3[4], e4[4]], then δ2[4] = [0, -1, 0,1]T and the vertex vr[4] is equal to v2[4] = [0, -3/2, 0, 1]T. • If the basis Br[4] (43) is equal to B4[4] = [e1[4], e2[4], e3[4], x1-[4]], then δ4[4] = [0, 0, -1, 1]T and the vertex vr[4] is equal to v4[4] = [0, 0, -3, 1]T.
Interval Uncertainty in CPL Models for Computer Aided Prognosis
459
We can remark that in this case, the values Φ(vr[4]) of the perceptron criterion function Φ(v[4]) (20) are equal to zero for each of these points vr[4]: Φ(v1[4]) = Φ(v2[4]) = Φ(v4[4]) = 0
(45)
The values Ψ1(vr[4]) of the modified criterion function Ψ1(v[4]) (24) with λ = 1 and γ1= γ2 = γ3 = 1 are equal to: Ψ1(v1[4]) = 1.0, Ψ1(v2[4]) = 1.5, Ψ1(v4[4]) = 3.0
(46)
The modified criterion function Ψ1(v[4]) (24) has the lowest value equal to zero in the vertex v1[4] (Ψ1(v1[4]) = 0). In accordance with the relation (26), the optimal vertex v1*[4] is equal in this case to v1[4] = [-1, 0, 0, 1]T (v1*[4] = v1[4]) and Ψ1* = 1.0. Example 4: Let us consider, similarly as in Example 3, two dimensional feature space F[2]. We will take into consideration three feature vectors x1[2] = [x11, x12]T = [3, 2]T, x2[2] = [x21, x22]T = [2, -1]T, and x3[2] = [x31, x32]T = [1, -1]T. The upper bounds y1+ = -2 and y2+ = 2 has been related to the vectors x1[2] and x2[2]. The lower bound y3- = 1 has been related to the vector x3[2]. These constraints are described by the below inequalities: 3w1 + 2 w2 + θ < -2 2w1 - w2 + θ < 2 w1 - w2 + θ > 1
(47)
The augmented vector (10) can be linked to each of this inequality: x1 [4] = [3, 2, 1, 2]T x2-[4] = [2, -1, 1, -2]T x3+[4] = [1, -1, 1, -1]T -
(48)
In this case, the margin vector δr[4] (36) has the following components δri: δr[4] = [δr1, δr2, δr3, δr4]T = [-1, -1, 1, 1]T
(49)
The augmented vectors (48) can be used in the matrix (basis) Br[4] (36): Br[4] = [x1-[4], x2-[4], x3+[4], e4[4]]
(50)
T -1
The inverse matrix (Br[4] ) (37) is equal to: (Br[4]T)-1 = [r1[4], r2[4], r3[4], r4[4]]
(51)
where r1[4] = [0, 1/3, 1/3, 0]T r2[4] = [1, -2/3, -5/3, 0]T r3[4] = [-1, 1/3, 7/3, 0]T r4[4] = [1, -5/3, -5/3, 1]T
vr[4] = [-1, -1, 2, 1]T
(52)
We can compute the vertex vr[4] (37) by taking into account (49) and (52): vr[4] = [-1, -1, 2, 1]T
(53)
460
L. Bobrowski
The scalar products values vr[4]Txj[4] for the augmented feature vectors (48) are equal to: vr[4]T x1-[4] = [-1, -1, 2, 1]T[3, 2, 1, 2] = - 1 vr[4]T x2-[4] = [-1, -1, 2, 1]T[2, -1, 1, -2]T = - 1 vr [4]T x3+[4] = [-1, -1, 2, 1]T[1, -1, 1, -1]T = 1
(54)
The values of the penalty functions φj+ (v[4]) (18) and φj-(v[4]) (19) are equal to zero in the point vr[4]. In result, the value Φ(vr[4]) of the perceptron criterion function Φ(v[4]) (20) in the point vr[4] is also equal to zero (Φ(vr[4]) = 0). The vector of parameters vr[4] = [w1, w2, θ, β]T = [-1, -1, 2, 1]T (53) defines the regression model (1): y(x) = w1 x1 + w2 x2 + θ = - x1 - x2 + θ
(55)
This model fulfils all the constraints (47): y(x1) < y1 (-5 < -2), y(x2) < y2+ (1 < 2), and y(x3) > y3+ (2 > 1) document. +
8 Concluding Remarks The problem of designing the prognostic linear models (1) on the basis of data sets Cm (3) with an uncertainty of target variable in the form of intervals has been analyzed in the paper. In accordance with the proposed approach, this problem has been transformed into the problem of the linear separability (15) of the sets R+ and R- (13). The problem of the linear separability (15) means here the search for such hyperplane H(v*[n+2]) (14) which separates the sets R+ and R- (13) in the best possible way. The parameters v*[n+2] of the optimal hyperplane H(v*[n+2]) (14) can be found through the minimization (21) of the convex and piecewise linear (CPL) criterion function Φ(v[n+2]) (20) or the modified criterion function Ψλ(w[n+1]) (36). The basis exchange algorithms, similarly to linear programming, allow to find efficiently the minimum of each of this function [12]. The modified CPL criterion function Ψλ(w[n+1]) (36), which takes into account the features xi costs γI, allows to combine the designing interval regression models with the feature selection process. As a result, the most influential subsets of features (risk patterns) xi can be identified in accordance with the relaxed linear separability (RLS) method [11]. The described approach to designing interval prognostic models allows to take into account also the censored data sets. Important and widely used examples of censored data sets can be found in survival analysis applications [7]. The interval censored data represented by intervals [yj-, yj+] (3) can be treated as a kind of generalization of survival analysis data - even the case, when the data set Cs (5) contains only censored survival times tj, can be analyzed in this manner. Such possibility opens the way for applying interval regression modeling to many important problems where the dependent quantity cannot be measured exactly. This approach allows for designing prognostic models on the basis of imprecise measurements of the dependent variable. Such circumstances are commonly met in practice.
Interval Uncertainty in CPL Models for Computer Aided Prognosis
461
Acknowledgment This work was supported by the by the NCBiR project N R13 0014 04, and partially financed by the project S/WI/2/2011 from the Białystok University of Technology, and by the project 16/St/2011 from the Institute of Biocybernetics and Biomedical Engineering PAS.
References [Bobrowski 1991] Bobrowski, L.: Design of piecewise linear classifiers from formal neurons by some basis exchange technique. Pattern Recognition 24(9), 863–870 (1991) [Bobrowski 2005] Bobrowski, L.: Eksploracja danych oparta na wypukłych i odcinkowoliniowych funkcjach kryterialnych (Data mining based on convex and piecewise linear (CPL) criterion functions), Technical University Białystok (2005) (in Polish) [Bobrowski 2009] Bobrowski, L.: Ranked linear models and sequential patterns recognition. Pattern Analysis & Applications 12(1), 1–7 (2009) [Bobrowski and Łukaszuk 2009] Bobrowski, L., Łukaszuk, T.: Feature selection based on relaxed linear separabilty. Biocybernetics and Biomedcal Engineering 29(2), 43–59 (2009) [Bobrowski 2010] Bobrowski, L.: Linear prognostic models based on interval regression with CPL functions. Symulacja w Badaniach i Rozwoju 1, 109–117 (2010) (in Polish) [Buckley. and James 1979] Buckley, J., James, I.: Linear regression with censored data. Biometrika 66, 429–436 (1979) [Duda et al. 2001] Duda, O.R., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001) [Gomez et al. 2003] Gomez, G., Espinal, A., Lagakos, S.: Inference for a linear regression model with an interval-censored covariate. Statistics in Medicine 22, 409–425 (2003) [Johnson and, Wichern 1991] Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall Inc., Englewood Cliffs (1991) [Klein and Moeschberger 1997] Klein, J.P., Moeschberger, M.L.: Survival Analysis, Techniques for Censored and Truncated Data. Springer, NY (1997)
Neural Network Training with Second Order Algorithms H. Yu and B.M. Wilamowski Department of Electrical and Computer Engineering, Auburn University, Auburn, AL, USA
[email protected],
[email protected] Abstract. Second order algorithms are very efficient for neural network training because of their fast convergence. In traditional Implementations of second order algorithms [Hagan and Menhaj 1994], Jacobian matrix is calculated and stored, which may cause memory limitation problems when training large-sized patterns. In this paper, the proposed computation is introduced to solve the memory limitation problem in second order algorithms. The proposed method calculates gradient vector and Hessian matrix directly, without Jacobian matrix storage and multiplication. Memory cost for training is significantly reduced by replacing matrix operations with vector operations. At the same time, training speed is also improved due to the memory reduction. The proposed implementation of second order algorithms can be applied to train basically an unlimited number of patterns.
1 Introduction As an efficient way of modeling the linear/nonlinear relationships between stimulus and responses, artificial neural networks are broadly used in industries, such as nonlinear control, data classification and system diagnosis. The error back propagation (EBP) algorithm [Rumelhart et al. 1986] dispersed the dark clouds on the field of artificial neural networks and could be regarded as one of the most significant breakthroughs in neural network training. Still, EBP algorithm is widely used today; however, it is also known as an inefficient algorithm because of its slow convergence. Many improvements have been made to overcome the disadvantages of EBP algorithm and some of them, such as momentum and RPROP algorithm, work relatively well. But as long as the first order algorithms are used, improvements are not dramatic. Second order algorithms, such as Newton algorithm and Levenberg Marquardt (LM) algorithm, use Hessian matrix to perform better estimations on both step sizes and directions, so that they can converge much faster than first order algorithms. By combining the training speed of Newton algorithm and the stability of EBP algorithm, LM algorithm is regarded as one of the most efficient algorithms for training small and medium sized patterns.
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 463–476. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
464
H. Yu and B.M. Wilamowski
Table 1 shows the training statistic results of two-spiral problem using both EBP algorithm and LM algorithm. In both cases, fully connected cascade (FCC) networks were used for training and the desired sum square error was 0.01. For EBP algorithm, the learning constant was 0.005 (largest possible avoiding oscillation), momentum was 0.5 and iteration limit was 1,000,000; for LM algorithm, the maximum number of iteration was 1,000. One may notice that EBP algorithm not only requires much more time than LM algorithm, but also is not able to solve the problem unless excessive number of neurons is used. EBP algorithm requires at least 12 neurons and the LM algorithm can solve it in only 8 neurons. Table 1 Training results of two-spiral problem Neurons
Success Rate
Average Iteration
Average Time (s)
EBP
LM
EBP
LM
EBP
LM
8
0%
13%
/
287.7
/
0.88
9
0%
24%
/
261.4
/
0.98
10
0%
40%
/
243.9
/
1.57
11
0%
69%
/
231.8
/
1.62
12
63%
80%
410,254
175.1
633.91
1.70
13
85%
89%
335,531
159.7
620.30
2.09
14
92%
92%
266,237
137.3
605.32
2.40
15
96%
96%
216,064
127.7
601.08
2.89
16
98%
99%
194,041
112.0
585.74
3.82
Even having such a powerful training ability, LM algorithm is not welcomed by engineers because of its complex computation and several limitations: 1. Network architecture limitation The traditional implementation of LM algorithm by Hagan and Menhaj in their paper was developed only for multilayer perceptron (MLP) neural networks. Therefore, much more powerful neural networks [Hohil et al. 1999; Wilamowski 2009], such as fully connected cascade (FCC) or bridged multilayer perceptron (BMLP) architectures cannot be trained. 2. Network size limitation The LM algorithm requires the inversion of Hessian matrix (size: nw×nw) in every iteration, where nw is the number of weights. Because of the necessity of matrix inversion in every iteration, the speed advantage of LM algorithm over the EBP algorithm is less evident as the network size increases. 3. Memory limitation LM algorithm cannot be used for the problems with many training patterns because the Jacobian matrix becomes prohibitively too large. Fortunately, the network architecture limitation was solved by recently developed neuron-by-neuron (NBN) computation in papers [Wilamowski et al. 2008;
Neural Network Training with Second Order Algorithms
465
Wilamowski et al. 2010]. The NBN algorithm can be applied to train arbitrarily connected neural networks. The network size limitation still remains unsolved, so that the LM algorithm can be used only for small and medium size neural networks. In this paper, the memory limitation problem of the traditional LM algorithm is addressed and the proposed method of computation is going to solve this problem by removing Jacobian matrix storage and multiplication. In this case, second order algorithms can be applied to train very large-sized patterns [Wilamowski and Yu 2010]. The paper is organized as follows: Section 2 introduces the computational fundamentals of LM algorithm and addresses the memory limitation problem. Section 3 describes the improved computation for both quasi Hessian matrix and gradient vector in details. Section 4 implements the proposed computation on a simple parity-3 problem. Section 5 gives some experimental results on memory and training speed comparison between traditional Hagan and Menhaj LM algorithm and the improved LM algorithm.
2 Computational Fundamentals Before the derivation, let us introduce some indices which will be used in the paper: • p is the index of patterns, from 1 to np, where np is the number of training patterns; • m is the index of outputs, from 1 to no, where no is the number of outputs; • i and j are the indices of weights, from 1 to nw, where nw is the number of weights. • k is the index of iterations and n is the index of neurons. Other indices will be explained in related places. The sum square error (SSE) E is defined to evaluate the training process. For all patterns and outputs, it is calculated as: E=
1 np no 2 e pm 2 p =1 m=1
(1)
where: epm is the error at output m when training pattern p, defined as e pm = o pm − d pm
(2)
where: dpm and opm are desired output and actual output, respectively, at output m for training pattern p. The update rule of LM algorithm is: Δw k = ( H k + μ I) −1 g k
(3)
where: μ is the combination coefficient, I is the identity matrix, g is the gradient vector and H is the Hessian matrix.
466
H. Yu and B.M. Wilamowski
The gradient vector g and Hessian matrix H are de fined as: ∂E ∂w 1 ∂E g = ∂w 2 " ∂E ∂wnw ∂2E 2 ∂w1 ∂2E H = ∂w ∂w 2 1 " ∂2E ∂w nw ∂w1
(4)
∂2E ∂w1∂w 2
"
∂2E ∂w 22 " ∂2E ∂wnw ∂w 2
" " "
∂2E ∂w1∂w nw ∂2E ∂w 2 ∂w nw " ∂2E 2 ∂w nw
(5)
As one may notice, in order to perform the update rule (3), second order derivatives of E in (5) has to be calculated, which makes the computation very complex. In the Hagan and Menhaj implementation of LM algorithm, Jacobian matrix J was introduced to avoid the calculation of second order derivatives. The Jacobian matrix has the format: ∂e11 ∂w 1 ∂e12 ∂w1 " ∂e 1no ∂w1 J = " ∂e np1 ∂w1 ∂e np 2 ∂w1 " ∂e npno ∂w1
∂e11 ∂w 2 ∂e12 ∂w 2 " ∂e1no ∂w 2 " ∂e np1 ∂w 2 ∂e np 2 ∂w 2 " ∂e npno ∂w 2
" " " " " " " " "
∂e11 ∂wnw ∂e12 ∂wnw " ∂e1no ∂wnw " ∂e np1
∂wnw ∂e np 2 ∂wnw " ∂e npno ∂wnw
(6)
By combining (1) and (4), the elements of gradient vector can be calculated as: np no ∂e pm ∂E e pm = ∂wi p =1 m =1 ∂wi
(7)
So the relationship between gradient vector and Jacobian matrix can be presented by
g = JTe
(8)
Neural Network Training with Second Order Algorithms
467
By combining (1) and (5), the elements of Hessian matrix can be calculated as np no ∂e pm ∂e pm np no ∂e pm ∂e pm ∂ 2 e pm ∂2E e pm ≈ + = p =1 m =1 ∂wi ∂wi ∂wi ∂w j ∂wi ∂w j p =1 m =1 ∂wi ∂wi
(9)
The relationship between Hessian matrix and Jacobian matrix can be described by H ≈ JT J = Q
(10)
where: matrix Q is the approximated Hessian matrix, called quasi Hessian matrix. By integrating equations (3) and (8), (10), the implementation of LM update rule becomes Δw k = ( J kT J k + μ I ) −1 J kT e k
(11)
where: e is the error vector. Equation (11) is used as the traditional implementation of LM algorithm. Jacobian matrix J has to be calculated and stored at first; then matrix multiplications (8) and (10) are performed for further weight updating. According to the definition of Jacobian matrix J in (6), there are np×no×nw elements needed to be stored. It may work smoothly for problems with small and medium sized training patterns; however, for large-sized patterns, the memory limitation problem could be triggered. For example, the MNIST pattern recognition problem [Cao et al. 2006] consists of 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest possible neural network (one neuron per each output), the memory cost for entire Jacobian matrix storage is nearly 35 gigabytes which would be quite an expensive memory cost for real programming.
3 Improved Computation The key issue leading to the memory limitation in traditional computation is that the entire Jacobian matrix has to be stored for further matrix multiplication. One may think that if both gradient vector and Hessian matrix could be obtained directly, without Jacobian matrix multiplication, there is no need to store all the elements of Jacobian matrix so that the problem can be solved. 3.1 Matrix Algebra for Jacobian Matrix Elimination There are two ways of matrix multiplication. If the row of the first matrix is multiplied by the column of the second matrix, then a scalar is obtained, as shown in Fig. 1a. If the column of the first matrix is multiplied by the row of the second matrix, then a partial matrix q is obtained, as shown in Fig. 1b. The number of scalars is nw×nw, while the number of partial matrices q, which later have to be summed, is np×no.
468
H. Yu and B.M. Wilamowski
JT
×
J
=
Q
=
q
(a)
JT
×
J
(b) Fig. 1 Two ways of matrix multiplication: (a) row-column multiplication results in a scalar; (b) column-row multiplication results in a partial matrix q
When JT is multiplied by J using the routine shown in Fig. 1b, partial matrices q (size: nw×nw) need to be calculated np×no times, then all of the np×no matrices q must be summed together. The routine of Fig. 1b seems complicated; therefore, almost all matrix multiplication processes use the routine of Fig. 1a, where only one element of the resulted matrix is calculated and stored each time. Even the routine of Fig. 1b seems to be more complicated than the routine in Fig. 1a; after detailed analysis (Table 2), one may conclude that the computation cost for both methods of matrix multiplication are basically the same. Table 2 Computation analysis between the two methods of matrix multiplication Multiplication Methods
Addition
Multiplication
Row-column (Fig. 1a)
(np × no) × nw × nw
(np × no) × nw × nw
Column-row (Fig. 1b)
nw × nw × (np × no)
nw × nw × (np × no)
In the specific case of neural network training, only one row of Jacobian matrix J (column of JT) is known for each training pattern, and there is no relationship among training patterns. So if the routine in Fig. 1b is used, then the process of creation of quasi Hessian matrix can be started sooner without necessity of computing and storing the entire Jacobian matrix for all patterns and all outputs. Table 3 roughly estimates the memory cost in two multiplication methods separately.
Neural Network Training with Second Order Algorithms
469
Table 3 Memory cost analysis between two methods of matrix multiplication Multiplication Methods
Elements for storage
Row-column (Fig. 1a)
(np × no) × nw + nw × nw + nw
Column-row (Fig. 1b)
nw × nw + nw
Difference
(np × no) × nw
Notice that the column-row multiplication (Fig. 1b) can save a lot of memory. 3.2 Improved Gradient Vector Computation Let us introduce gradient sub vector ηpm (size: nw×1):
η pm
∂e pm ∂e pm e pm ∂w1 ∂w1 ∂e pm ∂e pm = ∂w e pm = ∂w × e pm 2 2 " " ∂e pm ∂e pm ∂w e pm ∂w N N
(12)
By combining (7), (8) and (12), gradient vector g can be calculated as the sum of gradient sub vectors ηpm np no
g = η pm
(13)
p =1 m =1
By introducing vector jpm (size: 1×nw) ∂e pm j pm = ∂w1
∂e pm ∂w 2
"
∂e pm ∂wnw
(14)
sub vectors ηpm in (12) can be also written in the vector form η pm = j Tpm e pm
(15)
One may notice that for the computation of sub vector ηpm, only nw elements of vector jpm need to be calculated and stored. All the sub vectors can be calculated for each pattern p and output m separately, and summed together, so as to obtain the gradient vector g. Considering the independence among all training patterns and outputs, there is no need to store all the sub vector ηpm. Each sub vector can be summed to a temporary vector after its computation. Therefore, during the direct computation of gradient vector g using (13), only memory for jpm (nw elements) and epm (1 element) is required, instead of the whole Jacobian matrix (np×no×nw elements) and error vector (np×no elements).
470
H. Yu and B.M. Wilamowski
3.3 Improved Quasi Hessian Matrix Computation Quasi Hessian sub matrix qpm (size: nw×nw) is introduced as
q pm
∂e 2 pm ∂w1 ∂e pm ∂e pm = ∂w ∂w 2 1 " ∂e pm ∂e pm ∂w nw ∂w1
∂e pm ∂e pm ∂w1 ∂w 2
"
2
∂e pm ∂w 2 " ∂e pm ∂e pm ∂w nw ∂w2
" " "
∂e pm ∂e pm ∂w1 ∂w nw ∂e pm ∂e pm ∂w2 ∂w nw " 2 ∂e pm ∂w nw
(16)
By combining (9), (10) and (16), quasi Hessian matrix Q can be calculated as the sum of quasi Hessian sub matrix qpm np no
Q = q pm
(17)
p =1 m =1
Using the same vector jpm defined in (14), quasi Hessian sub matrix can be calculated as
q pm = j Tpm j pm
(18)
Similarly, quasi Hessian sub matrix qpm can be calculated for each pattern and output separately, and summed to a temporary matrix. Since the same vector jpm is calculated during the gradient vector computation above, no extra memory is required. With the improved computation, both gradient vector g and quasi Hessian matrix Q can be computed directly, without Jacobian matrix storage and multiplication. During this process, only a temporary vector jpm with N elements needs to be stored; in other words, the memory cost for Jacobian matrix storage is reduced by np×no times. In the MINST problem mentioned in section 2, the memory cost for the storage of Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes. From (16), one may also notice that all the sub matrix qpm are symmetrical. With this property, only upper or lower triangular elements of those sub matrices need to be calculated. Therefore, during the improved quasi Hessian matrix Q computation, multiplication operations in (18) and sum operations in (17) can be both reduced by half approximately. 3.4 Simplified ∂epm/∂wi Computation For the improved computation of gradient vector g and quasi Hessian matrix Q above, the key point is to calculate vector jpm (defined in (14)) for each training pattern and each output. This vector is equivalent of one row of Jacobian matrix J.
Neural Network Training with Second Order Algorithms
471
By combining (2) and (14), the element of vector jpm can be computed by ∂e pm ∂wi
=
∂ (o pm − d pm ) ∂wi
=
∂o pm ∂net pn ∂net pn
(19)
∂wi
where: netpn is the sum of weighted inputs at neuron n, calculated by net pn = x pi wi
(20)
where: xpi and wi are the inputs and related weights respectively at neuron n. Inserting (19) and (20) into (14), the vector jpm can be calculated by ∂o pm [ x p11 j pm = ∂net p1
" x p1i
"] "
∂o pm ∂net pn
[ x pn1
" x pni
"] "
(21)
where: xpni is the i-th input of neuron n, when training pattern p. Using the neuron by neuron (NBN) computation, in (21), xpni can be calculated in the forward computation, while ∂opm/∂netpn is obtained in the backward computation. Again, since only one vector jpm needs to be stored for each pattern and output in the improved computation above, the memory cost for all those temporary parameters can be reduced by np×no times. All matrix operations are simplified to vector operations.
4 Implementation For a better illustration of the improved computation, let us use the parity-3 problem as an example.Parity-3 problem has 8 patterns, each of which is made up of 3 inputs and 1 output, as shown in Fig. 2.
Fig. 2 Parity-3 problem: 8 patterns, 2 inputs and 1 output
w
3
The structure, 2 neurons in FCC network (Fig. 3), is used to train parity-3 patterns.
Fig. 3 Two neurons in fully connected cascade network
472
H. Yu and B.M. Wilamowski
In Fig. 3, all weights are initialed by w={w1,w2,w3,w4,w5,w6,w7,w8,w9}. Also, all elements in both gradient vector and quasi Hessian matrix are set to “0”. Applying the first training pattern (-1, -1, -1, -1), the forward computation is organized from inputs to output, as 1. 2. 3. 4. 5.
net11=1×w1+(-1) ×w2+(-1) ×w3+(-1) ×w4 o11= f(net11), where f() is the activation function for neurons net12=1×w5+(-1) ×w6+(-1) ×w7+(-1) ×w8+o11×w9 o12=f(net12) e11=-1-o12
Then, the backward computation, from output to inputs, does the calculation of ∂e11/∂net11 and ∂e11/∂net12 in the following steps: 6. Using the results from steps 4) and 5), it could be obtained ∂e11 ∂ (−1 − o12 ) ∂f (net12 ) = =− ∂net12 ∂net12 ∂net12
(22)
7. Using the results from steps 1), 2) and 3), and the chain-rule in differential, one can obtain that: ∂e11 ∂(−1 − o12 ) ∂f (net12 ) ∂net12 ∂o11 ∂f (net12 ) ∂f (net11 ) = =− =− × w9 × ∂net11 ∂net11 ∂net12 ∂o11 ∂net11 ∂net12 ∂net11
(23)
Using equation (21), the elements in j11 can be calculated as ∂o j 11 = 11 [1 − 1 − 1 − 1] ∂net11
∂o11 [1 − 1 − 1 − 1 o11 ] ∂net12
(24)
By combining equations (15) and (24), the first sub vector η11 can be obtained as η11 = [s1 − s1 − s1 −s1
s2
−s 2
−s2
−s2
s 2 o11 ] × e11
(25)
where: s1= ∂e11/∂net11 and s2=∂e11/∂net12. By combining equations (18) and (24), the first quasi Hessian sub matrix q11 can be calculated as s12 q11 =
− s12
− s12
− s12
s1 s 2
− s1 s 2
− s1 s 2
− s1 s 2
s12
s12 s12
s12 s12 s12
− s1 s 2 − s1 s 2 − s1 s 2
s1 s 2 s1 s 2 s1 s 2
s1 s 2 s1 s 2 s1 s 2
s1 s 2 s1 s 2 s1 s 2
s 22
− s 22 s 22
− s 22 s 22 s 22
− s 22 s 22 s 22 s 22
s1 s 2 o11 − s1 s 2 o11 − s1 s 2 o11 − s1 s 2 o11 s 22 o11 − s 22 o11 − s 22 o11 2 − s 2 o11 2 s 22 o11
(26)
One may notice that in (26), only upper triangular elements of sub matrix q11 are calculated, since all quasi Hessian sub matrices are symmetrical (as analyzed in section 3.3). This further simplifies the computation.
Neural Network Training with Second Order Algorithms
473
So far, the first sub gradient vector η11 and quasi Hessian sub matrix q11 are calculated as equations (25) and (26), respectively. Then the last step for training the pattern (-1, -1, -1, -1) is to add the vector η11 and matrix q11 to gradient vector g and quasi Hessian matrix Q separately. After the sum operation, all memory costs in the computation, such as j11, η11 and q11, can be released.
% Initialization Q=0; g=0 % Improved computation for p=1:np % Number of patterns % Forward computation … for m=1:no % Number of outputs % Backward computation … calculate vector jpm; % Eq. (21) calculate sub vector ηpm; % Eq. (15) calculate sub matrix qpm; % Eq. (18) % Eq. (13) g=g+ηpm; Q=Q+qpm; % Eq. (17) end; end;
Fig. 4 Pseudo code of the improved computation
The computation above is only for training the first pattern of the parity-3 problem. For the other 7 patterns, the computation process is almost the same, except applying different input and output values. During the whole computation process, there is no Jacobian matrix storage and multiplication; only derivatives and outputs of activation functions are required to be computed. All the temporary parameters are stored in vectors which have no relationship with the number of patterns and outputs. Generally, for the problem with np training patterns and no outputs, the improved computation can be organized as the pseudo code shown in Fig. 4.
5 Experimental Results The experiments are designed to test the memory and training time efficiencies of the improved computation, comparing with traditional computation. They are divided into two parts, memory comparison and time comparison. 5.1 Memory Comparison Three problems, each of which has a huge number of patterns, are selected to test the memory cost of both the traditional computation and the improved computation. LM algorithm is used for training and the test results are shown in the tables below. The actual memory costs are measured by Windows Task Manager.
474
H. Yu and B.M. Wilamowski Table 4 Memory comparison for parity-14 and parity-16 problems Problems Patterns Structures* Jacobian matrix sizes Weight vector sizes
Average iteration Success Rate Algorithms Traditional LM Improved LM
Parity-14 16,384 15 neurons 20.6Mb 1.3Kb 99.2 13% 87.6Mb 11.8Mb
Parity-16 65,536 17 neurons 106.3Mb 1.7Kb 166.4 9% Actual memory cost 396.47Mb 15.90Mb
*All neurons are in fully connected neural networks.
For the test results in Tables 4 and 5, it is clear that memory cost for training is significantly reduced in the improved computation. Notice that, in the MINST pattern recognition problem, higher memory efficiency can be obtained by the improved computation if the memory costs for training patterns storage are removed. Table 5 Memory comparison for MINST pattern recognition problem Problems Patterns Structures Jacobian matrix sizes Weight vector sizes Algorithms Traditional LM Improved LM
MINST Problem 60,000 784=1 single layer network* 179.7Mb 3.07Kb Actual memory cost 572.8Mb 202.8Mb
*In order to perform efficient matrix inversion during training, only one digit is classified each time.
5.2 Time Comparison Parity-9, parity-11 and parity-13 problems are trained to test the training time for both traditional and the improved computation, using LM algorithm. For all cases, fully connected cascade networks are used for testing. For each case, the initial weights and training parameters are exactly the same. Table 6 Time comparison for parity-9, parity-11 and parity-13 problems Problems Patterns Neurons Weights Average Iterations Success Rate Algorithms Traditional LM Improved LM
Parity-9 512 8 108 35.1 38%
Parity-11 2,048 10 165 58.1 17%
Parity-13 8,192 15 315 88.2 21%
Averaged training time (ms) 2,226 73,563 2,868,344 1,078 19,990 331,531
Neural Network Training with Second Order Algorithms
475
From Table 6, one may notice that the improved computation can not only handle much larger problems, but it also computes much faster than the traditional one, especially for large-sized patterns training. The larger the pattern size is, the more time efficient the improved computation will be. As analyzed above, both the simplified quasi Hessian matrix computation and reduced memory contributes to the significantly improved training speed presented in Table 6. From the comparisons above, one may notice that the improved computation is much more efficient than traditional computation for training with Levenberg Marquardt algorithm, not only on memory requirements, but also training time.
6 Conclusion In this paper, the improved computation is introduced to increase the training efficiency of Levenberg Marquardt algorithm. Instead of storage the entire Jacobian matrix for further computation, the proposed method uses only one row of Jacobian matrix each time to calculate both gradient vector and quasi Hessian matrix gradually. In this case, the corresponding memory requirement is decreased by np×no times approximately, where np is the number of training patterns and no is the number of outputs. The memory limitation problem in Levenberg Marquardt training is eliminated. Based on the proposed method, the computation process of quasi Hessian matrix is further simplified using its symmetrical property. Therefore, the training speed of the improved Levenberg Marquardt algorithm becomes much faster than the traditional one, by reducing both memory cost and multiplication operations in quasi Hessian matrix computation. With the experimental results presented in section 5, one can conclude that the improved computation is much more efficient than traditional computation, not only for memory requirement, but also training time. The method was implemented in neural network trainer (NBN 2.10) [Yu and Wilamowski 2009; Yu et al. 2009], and the software can be downloaded from http://www.eng.auburn.edu/users/wilambm/nnt/
References [Cao et al. 2006] Cao, L.J., Keerthi, S.S., Ong, C.J., Zhang, J.Q., Periyathamby, U., Fu, X.J., Lee, H.P.: Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans. on Neural Networks 17(4), 1039–1049 (2006) [Hagan and Menhaj 1994] Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. on Neural Networks 5(6), 989–993 (1994) [Hohil et al. 1999] Hohil, M.E., Liu, D., Smith, S.H.: Solving the N-bit parity problem using neural networks. Neural Networks 12, 1321–1323 (1999) [Rumelhart et al. 1986] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986) [Wilamowski 2009] Wilamowski, B.M.: Neural network architectures and learning algorithms: How not to be frustrated with neural networks. IEEE Industrial Electronics Magazine 3(4), 56–63 (2009)
476
H. Yu and B.M. Wilamowski
[Wilamowski et al. 2008] Wilamowski, B.M., Cotton, N.J., Kaynak, O., Dundar, G.: Computing gradient vector and jacobian matrix in arbitrarily connected neural networks. IEEE Trans. on Industrial Electronics 55(10), 3784–3790 (2008) [Wilamowski et al. 2010] Yu, H., Wilamowski, B.M.: Neural network learning without backpropagation. IEEE Trans. on Neural Networks 21(11) (2010) [Wilamowski and Yu 2010] Yu, H., Wilamowski, B.M.: Improved Computation for Levenberg Marquardt Training. IEEE Trans. on Neural Networks 21(6), 930–937 (2010) [Yu and Wilamowski 2009] Yu, H., Wilamowski, B.M.: Efficient and reliable training of neural networks. In: Proc. 2nd IEEE Human System Interaction Conf. HSI 2009, Catania, Italy, pp. 109–115 (2009) [Yu et al. 2009] Yu, H., Wilamowski, B.M.: C++ implementation of neural networks trainer. In: Proc. of 13th Int. Conf. on Intelligent Engineering Systems, INES 2009, Barbados (2009)
Complex Neural Models of Dynamic Complex Systems: Study of the Global Quality Criterion and Results G. Drałus Department of Electrical Engineering Fundamentals, Rzeszow University of Technology, Rzeszow, Poland
[email protected] Abstract. In this paper dynamic global models of input-output complex systems are discussed. Dynamic complex system which consists of two nonlinear discrete time sub-systems is considered. Multilayer neural networks in a dynamic structure are used as a global model. The global model is composed of two sub-models according to the complex system. A quality criterion of the global model contains coefficients which define the participation of sub-models in the global model. The main contribution of this work is the influence study on the global model quality of these coefficients. That influence is examined for different back propagation learning algorithms for complex neural networks.
1 Introduction Complex system is a wide term which can concern a different nature (technical, economical, biological). Simply, we can take into consideration interactions between units of the system, large dimensionality, a large number of interacting entities or unusual behavior. In this paper complex system means a dynamic input-output complex system which has a technical nature. In such a complex system can be distinguished elementary processes or elementary objects having inputs and outputs. It can be also pointed out connections between these processes or objects. Connection means that outputs of some objects are inputs of another object. Many examples of such complex systems can be found in chemical industry, for example chemical process of sulphuric acid production [Osowski 2007] or ammonium nitrite process production [Drałus and Świątek 2009]. Many mathematical methods allow us to model simple plants. However, the modeling of dynamic complex systems is a very important and difficult problem so far has not been well enough solved. In complex systems, we do not have to
Z.S. Hippe et al. (Eds.): Human – Computer Systems Interaction, AISC 99, Part II, pp. 477–495. springerlink.com © Springer-Verlag Berlin Heidelberg 2012
478
G. Drałus
deal with particular simple plants, but with static or dynamic simple plants which are interconnected into a complex system. Additionally, there exist numerous interactions between units of complex systems. One of the basic problems is taking into consideration the quality of the system model as a whole and providing a suitable approximation quality of particular subsystems. Neural networks are investigated in an application to the identification and modeling complex process exhibiting nonlinearities and typical disturbances. Neural networks offer a flexible structure that can map arbitrary nonlinear functions, making them ideally suited for the modeling and control of complex, nonlinear systems [Hunt et al., 1992]. They are particularly appropriate for multivariable applications, where they can readily characterize the interactions between different inputs and outputs. A further benefit is that the neural architecture is inherently parallel, and can be applied to real time implementations. Nowadays neural networks have many applications in science particularly in modeling and control of systems. Neural networks can be used for modeling and identification of simple static and dynamic plants owing to their ability to approximate nonlinear functions [Hornik 1989]. However, modeling simple objects that are part of complex system are inadequate to modern science. Accurate and detailed models of complex systems are strong expected today. One of the tools that is suitable for modeling of complex systems are just the neural networks [Narendra and Parthasarathy 1990; Dahleh and Venkatesh 1997; Drapała and Świątek 2006; Drałus and Świątek 2009]. In this paper, modeling of dynamic complex systems by multilayer neural network is discussed. To model an input-output complex system was built an adequate complex neural model. The complex neural model is non-typical multilayer feedforward neural networks which are suitable to the complex system. The complex model is the global model. In the global model can be indicated parts of the model which are adequate to the simple plants in the complex system. These parts of the model are called dynamic sub-models. The influence of quality of submodels on the global model quality is discussed.
2 Models of Dynamic Complex Systems 2.1 Description of Dynamic Complex Systems Naturally, there are many structures of complex system. A very important case of complex system is the cascade complex system (series connection of each unit of a complex system). Each unit of complex system is a dynamic simple plant. There is known a method of modeling of dynamic simple plant called the series-parallel model of identification [Narendra and Parthasarathy 1990]. The idea of the seriesparallel model of identification of simple dynamic plants is developed in this paper for identification and modeling of complex dynamic systems.
Complex Neural Models of Dynamic Complex Systems
479
In the series-parallel model, the present and past input and output signals are transmitted via tapped delay lines (TDL). In this case there is no feedback from the network, but from the plant, when, the multilayer neural networks are in the series-parallel stage (learning mode). Because at this stage the model is a multilayer feedforward neural network so the networks can be learned according to the static back propagation learning algorithms. After the learning stage, if errors of modeling are sufficiently small, the series-parallel model is replaced by the parallel model with feedback loop from the output of the model (work mode). By this the dynamic models are obtained. Let’s consider a complex system which consists of R simple plants (see Fig. 1). A global model of the complex system also consists of R sub-models, respectively, denoted as M1..MR. The global model is composed in such a way which is suitable to the complex system. In that model the previous sub-model output is the input to the next sub-model.
Fig. 1 Dynamic complex system and their dynamic global model using TDL (series-parallel model)
In such a structure the output previous sub-model is the input to next submodel. The output of t-th sub-model is calculated by:
yˆ (r ) (k + r ) = f r ( yˆ ( r −1) (k + r − 1), w (r ) ) = f r ( f r −1 (" f1 (u (k ), w (1) ), ", w ( r −1) ), w ( r ) ) = fˆr (u (k ), w (1) , …, w ( r ) )
(1)
where: u(k ) is the external input for the global model and the complex system,
w (r ) are the parameters (weights) of r-th sub-model. The error for k-th step of input signal for r-th sub-model is the difference between the output
(r ) yˆ (k + r ) of the r-th sub-model (Mr) and the output
y ( r ) (k + r ) of simple plant (Pr) of the complex system, respectively, is defined as the following:
480
G. Drałus
e ( r ) (k + r ) = yˆ ( r ) (k + r ) − y ( r ) (k + r )
(2)
The performance index for each dynamic sub-model is defined as the sum of square errors for all K discrete time samples:
Qd ( r ) ( w ( r ) ) =
1 K J r (r ) 1 K (r ) T 2 (e(k + r )) e(k + r ) = ( yˆ j (k + r ) − y j (k + r )) 2 k =1 j =1 2 k =1
(3)
where K is the number of discrete time samples, J r is the number of outputs of the r-th plant, w (r ) is the set of parameters of r-th sub-model, yˆ ( r ) - output of r-th (r ) sub-model, y - the output of r-th simple plant.
Global quality assessment criterion for the global dynamic model is weighted sum of all dynamic sub-models performance indices (3) as the following: R
Qd (W ) = β r Qd( r ) ( w( r ) ) = r =1
Jr 1 K R (r ) (r ) 2 β r ( yˆ j (k + r ) − y j (k + r )) 2 k =1r =1 j =1
(4)
where W = [ w (1) , w ( 2) , … , w ( R ) ] is the set of parameters (weights) of the global model divided into subsets of sub-models, β r ∈ [0,1] are weight coefficients and
rR=1 β r = 1 .
The coefficients β determine the impact of particular sub-models on the global model quality. The influence of these coefficients on the global models quality will be investigated. In the particular case, considered in this paper, the complex system consists of two simple plants (R=2), thus the influence of two weight coefficients β1 and β2 on the quality of the global model will be investigated. 2.2 Motivation
The global quality criterion Qd (formula (4)) will be used to develop learning algorithms for complex neural models. Learning algorithms allow us to obtain the desired parameters of the complex neural model e.g. the weight of neural submodels. The global quality criterion Qd contains β coefficients, which determine the influence of sub-models quality (3) on the global quality criterion. The question left is how to select β coefficients properly. In case when complex system consists of two simple plants there are only two coefficients β1 and β2 in the global criterion (4). So should we select β1=0.5 and β2=0.5 e.g. which would mean an equal influence of both sub-models on the global model. Maybe it will be better to select following values β1=0.25 and β2=0.75 e.g. which means that the second sub-model has a greater impact on the global model than the fist sub-model.
Complex Neural Models of Dynamic Complex Systems
481
The basic question is what influence on the global model quality these coefficients have e.g. how sub-models quality influence on the global model quality? Also what influence on the quality of sub-models they have? The study is carried out in a such way that for determined neural architecture of the global model and under the same initial conditions, in constant amount of learning steps in each particular case we conduct a parameter model selection of coefficient for changing β1 and β2 participation in the global criterion (4), ranging from 0.001 to 0.999 (as per formula β1+β2 =1). Many results were received during simulations. Results of simulations show sufficiently the influence of the β coefficients on the global model quality as well as sub-models quality. Knowing the results, e.g. how β coefficients influence the quality of the model can be consciously selected β1 and β2 values, to obtain an optimal model. Under that conscious selection we may better control the learning process. Thus, faster and more certainly we may achieve the required quality of the global model. 2.3 The Complex Gradient Backpropagation Learning Algorithm for Dynamic Complex Models
To develop the learning algorithm for multilayer neural networks, of which dynamic global model is constructed, it was necessary to modify and adapt the common gradient backpropagation algorithm [Gupta and al. 2003]. This modification must consider the fact that the complex model consists of static neural networks with tapped delay lines and feedback loop from the complex system under the learning stage and feedback loop from the complex model when the model works. The preparation of learning data using past samples of input and output signals must be done using TDL. To minimize the global quality criterion (4), the complex backpropagation learning algorithm for multilayer neural network was developed [Drałus 2004]. The change of the global model parameters e.g. the weights of neural networks were achieved by gradient computation by the following calculation:
Δw ji = −η
∂Qd ( W ) ∂w ji
(5)
The changes of weights in the layers after gradient calculations of the formula (5) are computed as follow:
482
G. Drałus
• for the output layer: K
,k + R Δw (jiR ),M = −η f ′( z M ) β R ( yˆ (jR ) ( k + R ) − y (jR ) ( k + R )) u iM −1, ( k + R − 1) j
(6)
k =1
• for the hidden layers: K
I m +1
k =1
l =1
Δw (jir ),m = −η f ′( z mj,k + r ) δ l( r ),m+1,k + r wlj( r ),m+1 u im−1 (k + r − 1)
(7)
• for the „binding” hidden layers, (e.g. output’s layers of sub-models in global complex model): K Im+1 Δw(jir ),m = −η f ′( z mj,k ) δ l(r +1),m+1,k wlj(r +1),m+1 + k =1 l =1 β r ( yˆ (jr ) (k + r ) − y (jr ) (k + r ) ⋅ uim−1 (k + r − 1)
(8)
The global model parameters W are adopted for the constant learning rate η and particular layers according to: w (jir ),m (k + 1) = w (jir ),m (k ) + ηΔw (jir ),m (k )
(9)
The obtained formulas (6-9) (which minimize the quality criterion (4)) are called the complex gradient backpropagation learning algorithm. This algorithm is very slow. However, it is the base for other learning algorithms which may be used for adjusting the neural global model parameters in the future. 2.4 The Complex Delta-Bar-Delta Learning Algorithm
Learning algorithms should be convergent and fast. Fast learning algorithms should be also developed for complex neural networks. There are many fast learning algorithms for simple neural networks that can be used to develop algorithms for complex network. One of these algorithms is the Delta-Bar-Delta learning algorithm [Jacobs 1988]. On the basis of Delta-Bar-Delta (DBD) algorithm and global quality criterion Qd the new complex Delta-Bar-Delta learning (complex DBD) algorithm for complex neural models was developed [Drałus 2010]. In this algorithm the learning rate is adaptive so speed of learning is fast.
Complex Neural Models of Dynamic Complex Systems
483
In the complex DBD algorithm the w parameters in the m-th layer of the neural model for k+1 learning step are given by the following formula: w mji (k + 1) = (1 − μ ) ⋅η mji (k )Δw (jir ),m (k ) + μ ⋅ w mji (k )
(10)
where µ is momentum term over the interval 0-1 and the change of the weights
Δw(jir ),m are calculated according to formulas (6-8). The adaptive learning rate η for the m-th layer in k+1 step of the learning is calculated as the following:
η mji (k + 1) = η mji (k ) + Δη mji (k )
(11)
The change of the learning rate Δη is given by:
a if S mji (k − 1) ⋅ D (jim) (k ) > 0 Δη mji (k ) = − b ⋅η (jim) (k − 1) if S mji (k − 1) ⋅ D (jim) (k ) < 0 if S mji (k − 1) ⋅ D (jim) (k ) = 0 0
(12)
The component S m ji ( k ) in formula (12) is calculated by the following formula: S mji (k ) = (1 − γ ) D mji (k ) + γS mji (k − 1) where D m ji =
∂Qd (W (k )) ∂w ji
(13)
, the γ coefficient in formula (13) is value over the in-
terval 0-1 (in the simulations γ=0.75), coefficients a and b in formula (12) are equal to a=0.002, b=0.2, respectively. In this algorithm the learning rate is adaptive and much faster than the learning rate in the complex gradient one. This algorithm allows us to find the proper W parameters of the global complex neural model much faster than the complex gradient one.
3 Simulation Study Let’s consider a dynamic nonlinear complex system which consists of two dynamic nonlinear simple plants connected in series. Both simple plants (denotes as P1 and P2, see Fig.2) of the complex system are described by the second-order nonlinear difference equations.
484
G. Drałus
Fig. 2 Dynamic discrete time complex system
The output of the first simple plant P1 (see Fig.2) is described by the following difference equation [Narendra and Parthasarathy 1990]: y (1) (k + 1) = f1 ( y (1) (k ), y (1) (k − 1), y (1) (k − 2), u(k ), u(k − 1))
(14)
The output of the second simple plant P2 is described by the following formula: y (2) (k + 2) = f 2 ( y (2) (k + 1), y (2) (k ), y (2) (k − 1), y (1) (k + 1), y (1) (k ))
(15)
The nonlinear functions f1 of the first simple plant P1 is given by the following formula: f1 ( x1 , x 2 , x3 , x 4 , x5 ) =
x1 x 2 x3 x5 ( x3 − 1) + x 4 1 + x 22 + x32
(16)
The second nonlinear functions f2 is given by the following formula: f 2 (v1 , v 2 , v3 , v 4 , v5 ) =
v 2 v3 v5 (v1 − 1) + v 4 1 + 2v12
(17)
As a global model of the considered complex system a 6-layer feedforward neural network was used (see Fig. 3). The global model which is the complex neural network has the following structure: one external input, 5 inputs neurons; 20 neurons (of the hyperbolic tangent transfer function) in the first hidden layer; 10 neurons in the second hidden layer; 1 linear neuron in the third layer called “binding” layer; 20 and 10 neurons in the fourth and the fifth layer, respectively; 1 linear neuron in the sixth (output) layer of the complex model (shortly, 1(5)-20-101(5)-20-10-1). The left part of the global model is the first sub-model (see Fig. 3) and the right part of the global model is the second sub-model. In the complex neural model there exist non typical hidden layers called “binding” hidden layers. The “binding” layer is the layer which connects sub-models in the complex model. The output of this layer is suitable to the output of the corresponding simple plant. In this model the “binding” hidden layer is the third layer. The architecture of the model in learning mode allows us to use the complex learning algorithms (developed above) for the multilayer neural networks.
Complex Neural Models of Dynamic Complex Systems
485
Fig. 3 The neural network as the global model of the complex system (series-parallel model in learning mode)
Fig. 4 The neural network as the global model of the complex system (parallel model in work mode)
After the learning stage (after adjusting the model parameters) the global model is switched to a work mode (the parallel model, see Fig. 4). This model utilizes one delay elements in the external input line and three delay elements in the output line. In such kind of architecture, the model inputs depend on the neural model delayed values. This architecture of the model allows us to approximate the true dynamic of the complex system.
486
G. Drałus
Learning data, containing 500 data points, random uniformly was distributed in the interval [-1, 1]. Initial weights were randomly generated, according to Nguyen–Widrow rule [Gupta et al. 2003]. For simulation study three leaning algorithms: the complex gradient, the complex DBD and DCRprop [Drałus and Świątek 2009] were used. The DCRprop is the heuristic algorithm and is the fastest algorithm with the previously mentioned. Three complex networks of the same architecture were learnt during 1000 epoch for the complex DBD and DCRprop algorithms and 2000 epochs for the complex gradient algorithm. All algorithms start from the same initial parameter of the model e.g. from the same initial weights of the complex network. For the system and the model inputs testing signal is given by the following formula:
u ( k ) = sin( 2πk / 250) for k ≤ 250 u ( k ) = 0.8 sin( 2πk / 250) + 0.2 sin( 2πk / 25) for 250 < k ≤ 500
(18)
The weights in the neural network were adjusted in one learning step after presentation of all 500 discrete time samples. The momentum µ in formula (10) is equal to 0.02 and η=0.002 in formula (9). All simulations were made on self-neural simulation tool. The results of the simulations for three learning algorithms are presented in numerical (tables) and graphic form (figures) for both learning and working mode of neural models. The performance indices Qd(1) and Qd(2) values and the global quality criterion Qd values in dependence on the coefficient β1 (where β1+β2=1) values are shown. The performance index Qd directly depends on the β1 and β2 coefficients. However, the β coefficients indirectly flow across the model parameters (1) (2) W on the indices Qd and Qd . 3.1 Simulation Results
The global criterion Qd, performance indices Qd(1) and Qd(2) after 2000 epochs of learning for the complex gradient learning algorithm are shown in Table 1 for learning (the series-parallel model, feedback from the simple plants) and for testing data (the parallel model, feedback from the sub-models). The values of the quality criteria for the complex DBD learning algorithm after 1000 learning epochs of are shown in Table 2. The global criterion Qd, performance indices Qd(1) and Qd(2) after 1000 epochs for DCRprop algorithm are shown in Table 3. All quality indicators were calculated for learning data when the model was in a learning mode. However, for test data indicators were calculated when the model was in a work mode.
Complex Neural Models of Dynamic Complex Systems
487
Table 1 Values of performance indices for learning and testing data after 2000 epochs for the complex gradient learning algorithm
β1 0.01 0.1 0.3 0.5 0.7 0.9 0.99
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
16.96 3.428 2.835 2.306 1.592 9.080 1871
1.796 1.802 2.156 2.438 2.125 7.580 7194
1.921 1.965 2.360 2.372 1.752 8.937 1854
35.77 3.786 2.086 2.675 4.012 14.99 1770
Qd(2) Qd data for testing 2.077 2.479 1.633 1.234 0.896 3.591 107
2.414 2.610 1.769 1.954 3.077 13.85 1760
Table 2 Values of performance indices for learning and testing data after1000 epochs for the complex DBD learning algorithm
β1 0.001 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
126.0 67.69 0.2480 0.1590 0.1180 0.0631 0.0406 0.0349 0.0347 0.0290 0.0250 0.0175 0.0170
0.0504 0.2260 0.0517 0.0739 0.0714 0.0315 0.0228 0.0262 0.0536 0.0637 0.1020 0.3260 0.6300
0.1770 0.9010 0.0714 0.0909 0.0853 0.0442 0.0317 0.0314 0.0404 0.0400 0.0326 0.0206 0.0177
812.0 565.0 4.365 2.595 2.750 2.800 3.520 3.807 3.001 3.456 1.485 4.282 4.904
Qd(2) Qd data for testing 5.820 1.886 0.298 0.695 0.366 0.388 0.365 0.362 0.673 0.920 0.920 1.013 2.689
6.630 7.530 0.704 1.075 1.080 1.353 1.942 2.429 2.302 2.949 1.429 4.249 4.902
Table 3 Values of performance indices for learning and testing data after1000 epochs for the CDRprop learning algorithm
β1
Qd(1)
Qd(2) data for learning
Qd
Qd(1)
0.001 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
299 142 0.386 0.214 0.196 0.174 0.150 0.132 0.114 0.142 0.109 0.091 0.142
0.155 0.302 0.0627 0.0613 0.0738 0.0847 0.0910 0.0866 0.0842 0.0903 0.0835 0.0735 0.0925
0.451 1.722 0.0951 0.0919 0.111 0.120 0.120 0.114 0.105 0.132 0.106 0.091 0.142
433 247 0.803 1.06 1.021 0.316 0.488 0.520 0.919 0.788 0.398 0.395 0.577
Qd(2) Qd data for testing 9.22 51.71 0.0850 0.0999 0.113 0.0266 0.0591 0.0512 0.0968 0.0859 0.0340 0.0211 0.1067
9.56 53.6 0.156 0.293 0.385 0.142 0.247 0.332 0.672 0.648 0.362 0.392 0.576
488
G. Drałus
A relative percentage error RPE as an additional performance index of modeling is introduced for each output of r-th sub-model: RPE ( r ) =
(r ) (r ) kK=1 yˆ ( k ) − y ( k ) (r ) kK=1 y ( k )
(19)
⋅ 100%
Values of RPE errors after 2000 epochs of learning are for the model learnt by the complex gradient algorithm are presented in Table 4. Values of RPE errors after 1000 epochs for the model learnt by the complex DBD and the DCRprop learning algorithm are presented in Table 5 and Table 6, respectively. Table 4 RPE errors in the global complex model for learning and testing data after 2000 epochs for the complex gradient learning algorithm
β1
0.01 0.1 0.3 0.5 0.7 0.9 0.99
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
RPE(2) [%]
data for learning
data for testing
38.0 16.8 16.2 14.5 11.6 29.5 2000
49.0 14.4 12.3 13.3 14.0 34.2 1200
16.2 16.1 18.9 18.1 17.5 33.3 234
17.0 19.0 15.2 12.5 12.3 23.3 130
Table 5 RPE errors in the global complex model for learning and testing data after 1000 epochs for the complex DBD learning algorithm
β1
0.001 0.010 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
RPE(2) [%]
data for learning
data for testing
257 188 10.5 7.80 6.75 5.20 4.15 3.82 3.80 3.60 3.31 2.73 2.69
260 221 14.2 10.3 11.5 12.5 13.6 16.8 12.3 13.8 9.60 14.7 15.3
5.70 11.5 5.66 6.51 6.00 4.27 3.80 3.89 5.75 6.08 7.62 14.1 33.2
21.0 18.2 7.05 10.6 8.05 8.25 7.57 13.4 10.1 12.0 11.2 12.4 18.0
Complex Neural Models of Dynamic Complex Systems
489
Table 6 RPE errors in the global complex model for learning and testing data after 1000 epochs for DCRprop learning algorithm
β1
RPE(1) [%]
RPE(2) [%]
RPE(1) [%]
data for learning
0.001 0.010 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 0.999
171 121 5.50 4.20 3.92 3.80 3.53 3.35 2.97 3.40 2.98 2.80 3.36
data for testing
177 140 4.70 4.64 5.32 3.47 3.80 3.77 4.60 4.07 3.04 3.19 4.28
4.50 6.45 2.85 2.80 3.02 3.25 3.35 3.36 3.15 3.40 3.24 3.11 3.43
a)
RPE(2) [%] 38 95 2.55 2.47 2.93 2.04 2.80 2.66 3.40 3.33 2.19 1.67 5.16
b)
Fig. 5 Values of performance indices Qd(1), Qd(2) and Qd for the complex gradient learning algorithm a) for learning data; b) for testing data
490
G. Drałus
a)
b)
Fig. 6 Values of performance indices Qd(1), Qd(2) and Qd for the complex DBD learning algorithm a) for learning data; b) for testing data
a)
b)
Fig. 7 Values of performance indices Qd(1), Qd(2) and Qd for DCRprop learning algorithm a) for learning data; b) for testing data
a) (1)
b) (2)
Fig. 8 Values of RPE and RPE learning data; b) for testing data
errors for the complex gradient learning algorithm a) for
Complex Neural Models of Dynamic Complex Systems
a) (1)
b)
Fig. 9 Values of RPE and RPE learning data; b) for testing data
(2)
a) (1)
Fig. 10 Values of RPE data; b) for testing data
491
errors for the complex DBD learning algorithm a) for
b)
and RPE
(2)
errors for DCRprop learning algorithm a) for learning
Fig. 11a shows signals on the outputs of the first sub-model and first simple plant for testing data. The output signals of the first sub-model and simple plant are the input signals to the second sub-model and the second simple plant, respectively. Fig. 11b also shows signals on the outputs of the second sub-model and second simple plant also for testing data. The output signals of the second submodel are also the output signals of the global model of the complex system.
492
G. Drałus
a)
b)
Fig. 11 The first simple plant and first sub-model outputs and second simple plant and second sub-model outputs for testing data in the complex model learnt by DCRprop learning algorithm
3.2 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for the Complex Gradient Learning Algorithm
The lowest values of Qd(1) index are achieved in the wide middle of β1 coefficient. At the end of the range of β1 the Qd(1) index rapidly increases for learning and testing data (see Fig. 5 and Table 1). The quality index Qd(2) is the lowest at the beginning and it increases while β1 grows for learning data. However, for testing data Qd(2) has minimum while β1=0.7. The Qd index achieved the lowest values at the beginning of β1 value and at the end of the range e.g. β1=0.999. The shapes of RPE error are similar to the shapes of performance indices for testing and learning data (see Fig.8). Values of RPE errors and performance indices are much higher for this complex gradient learning algorithm than the other algorithms. In this algorithm the learning rate η is constant therefore, this algorithm converges very slowly. 3.3 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for the DBD Learning Algorithm
The increase of β1 coefficient values (the decrease of β2) causes the monotonic decrease of the quality index Qd(1) for learning data. The smallest value of Qd(1) index is achieved for the maximum value of β1 coefficient (β1 is equal 0.999, see Fig. 6a and Table 2). The quality index Qd(2) has its minimum in the middle of the range of β1 (e.g. when β1=β2=0.5). While β1 increases from 0.6 to 0.999 then the quality index Qd(2) also increases to the maximum value.
Complex Neural Models of Dynamic Complex Systems
493
The global quality index Qd has minima while β1 is in the middle of the range (β1=0.5-0.6). The quality index Qd achieved the largest value while β1 coefficient is at the end of range e.g. β1=0.999. For very small values of β1, all quality indices proceed to very high values so it is not possible to adjust parameter of the first sub-model model and the global model is not reached either. Thus the quality sub-models have an influence on the global model quality. The plots of all performance indices have different shapes for testing and learning data (see Fig. 6). The quality indices Qd(2) and Qd have its minimum while β1 is equal to 0.1 (β2=0.9). The quality index Qd(1) has its minimum while β1 is equal to 0.9 (β2=0.1,see Table 2). Fig. 9 shows values of RPE errors for learning and testing data. In the middle range of β coefficients the RPE errors for both sub-models are on average about 5 percent for learning data and about 10 percent for testing data. The RPE errors are calculated for learning data when the model is in the seriesparallel mode, but the errors for testing data are calculated when the model is in work mode. Therefore the RPE errors are about twice higher for testing data than for learning data. 3.4 Analysis of the Impact of β Coefficients on the Complex Neural Model Quality for DCRprop Learning Algorithm
For learning data the quality index Qd(1) is the largest for the lowest value of β1 equal 0.001 and it is the lowest for the largest value of β1=0.999. The Qd(2) and Qd indices have minima while β1=0.2 and while β1=0.99 (see Table 3 and Fig. 7). However, for testing data Qd(1) and Qd indices achieved the global minima in the (1) middle of the range e.g. β1=0.4 but the Qd performance index achieved the local minimum while β1=0.4 and the global minimum while β1=0.9 (e.g. β2=0.1). The plots of RPE errors show the quality degree of the sub-models. The average levels of RPE errors for learning and testing data are similar (about 3 percent see Fig. 10). At the beginning of range of β coefficients for both sub-models RPE errors are larger than in the middle and near the end of the range of β coefficient. However, RPE errors increased at the end e.g. β1=0.999.
4 Summary Three types of learning algorithms used in simulations gave different results. The worst results were obtained for the complex gradient algorithm. The average level of RPE errors is about 15 percent. The obtained best result of RPE errors are equal about 11.6 percent for learning data and about 12.4 percent for testing data (see Fig. 8). For the complex DBD algorithm with adaptive learning rate the obtained results were much better. The average percentage RPE error was at around 6 percent for learning data and at around 10 percent for testing data. For testing data, the lowest RPE(1) error value of the first sub-model is 9.6 percent and the lowest value of RPE(2) is 7.05 percent for the second sub-model (see Fig. 9).
494
G. Drałus
However, the best results were obtained for DCRprop learning algorithm. The average RPE errors rate was approximately 3 percent for both learning and testing data. The lowest value of the error RPE(1) is equal to 2.80 percent and the error of the RPE(2) is also equal to 2.80 percent but for different values of the β1 coefficient (for learning data, Fig. 10a). Similarly, for testing data, the lowest value of RPE(1) error is equal to 3.04 percent but RPE(2) error is equal to 1.67 percent (Fig. 10b). For DCRprop algorithm, the best results of quality index Qd(1) were obtained for high values of β1 for learning data, and for β1 value less than half (β1=0.4) for testing data. Similarly, the quality of the second sub-model is low for large values of β2 (small of β1) for learning data and for testing data are low for small values of β2. The minimum values of index Qd(2) for learning and testing data are similar and the shape of the graph of index Qd(2) is also similar. However, the shape and minimum values for the quality index Qd(1) come together well for learning and testing data. The overall global quality index Qd is similar to Qd(2) index for small β1 but for large β1 is similar to Qd(1) in accordance with formula (4). Also for the complex DBD algorithm, the course of Qd(1) index for learning data differs significantly from the course for testing data. The course of Qd(2) index is similar for both learning and testing data. Levels of quality indices for testing data are significantly higher than for learning data. The model trained by the complex DBD learning algorithm is sensitive to type of data and the change of the configuration of the model. The model trained by complex gradient algorithm (the model is not fully trained) quality indicators and their shape for learning data are quite similar as for testing data. Unfortunately, this algorithm is not very efficient. Thus, the most effective learning rate and quality of the model is DCRprop learning algorithm. As is apparent from the obtained results a neural model quality depends not only on the β weight coefficients but also on the applied learning algorithms. Of course the quality of the model also depends on the length of the training steps and the representation of the training set, but these issues are not dealt with this article.
5 Conclusions The global model of the dynamic input-output complex system was presented. The dynamic complex system which consists of two nonlinear discrete time subsystems was considered. Multilayer neural networks which have non-typical structure were introduced as models of complex systems. The influence of β coefficients of the global performance index on the global model quality was investigated. The results of the investigation were shown for three learning algorithms and both learning and testing data. The β coefficient directly influence on the global quality criterion according to formula (4). The obtained results shows, that there exists the impact of
Complex Neural Models of Dynamic Complex Systems
495
β coefficients on the quality of sub-models of simple plants. After these simulations, there are known the courses of quality criteria as the functions β weight factors. Quality of the global model depends on the performance indices of sub-models and β weight coefficients according to formula (4). Formula (4) is a weighted sum of the quality of the models and the weighting factors. The obtained results confirm this formula. Knowledge about the courses of quality criteria allows us to choose the values of the β coefficients to obtain the optimal global model as well as a better sub-model quality. On the other hand we can focus on the quality of one or two of sub-models according to the global model quality. This algorithms and architecture of the global neural models allow us to adjust the parameters of the global model effectively. The presented approach of modeling is useful for computer control systems of complex systems.
References [Dahleh and Venkatesh 1997] Dahleh, M.A., Venkatesh, S.: System Identification of complex systems; Problem formulation and results. In: Proc. of 36th Conf. on Decision & Contol, San Diego, CA, pp. 2441–2446 (1997) [Drałus 2004] Drałus, G.: Modeling of dynamic nonlinear complex systems using neural networks. In: Proc. of the 15th International Conference on Systems Science, Wroclaw, Poland, vol. III, pp. 87–96 (2004) [Drałus and Świątek 2009] Drałus, G., Świątek, J.: Static and dynamic complex models: comparison and application to chemical systems. Kybernetes: The Int. J. of Systems & Cybernetics 38(7/8) (2009) [Drałus 2010] Drałus, G.: Study on quality of complex models of dynamic complex systems. In: 3rd Conference on Human System Interactions, Digital Object Identifier, pp. 169–174 (2010), doi:10.1109/HSI.2010.5514570 [Drapała and Światek 2006] Drapała, J., Światek, J.: Modeling of dynamic complex systems by neural networks. In: Proc. of 18th Int. Conf. on Systems Engineering, Coventry University, UK, pp. 109–112 (2006) [Gupta et al. 2003] Gupta, M.M., Jin, L., Homma, N.: Static and dynamic neural networks – from fundamentals to advanced theory. John Wiley & Sons, Inc., Chichester (2003) [Hornik 1989] Hornik, K.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) [Hunt and all, 1992] Hunt, K.J., Sbarbaro, D., Zbikowski, R., Gawthrop, P.J.: Neural networks for control systems – A survey. Automatica 28(8), 1083–1112 (1992) [Narendra and Parthasarathy 1990] Narendra, K.S., Parthasarathy, K.: Identification and control of dynamic systems using neural network. IEEE Trans. on Neural Networks 1(1), 4–27 (1990) [Jacobs 1988] Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Networks 1, 295–307 (1988) [Osowski 2007] Osowski, S.: Modeling and simulation of dynamic systems and processes. Warsaw University of Technology Publishing House (2007)
Author Index
Adamczyk, K.
205
Machnicka, Z. 417 Małysiak-Mrozek, B. Materka, A. 3 Milik, A. 325 Mroczek, T. 147 Mrozek, D. 395 Mu˜noz, L.M. 15 Musetti, A. 359
Balestra, A. 159 Barkana, D. Erol 75 Bieda, R. 131 Bobrowski, L. 443 Byczuk, M. 3 Casals, A. 15 Chojnacki, S. 429 Cudek, P. 125
Nauth, P. 41 Nowak, L. 111 Noyes, E. 277
David, R.C. 223 Deligiannidis, L. 277 Di Iorio, A. 359 Dragos¸, C.A. 223, 261 Drałus, G. 477 Dwulit, M.P. 345 Fryc, B.
Hanada, H. 57 Hara, T. 57 Hippe, Z.S. 125, 147 Jaszuk, M. Jurczak, P.
Ogorzałek, M. Orio, S. 159 Ota, Y. 31
417
Gomuła, J. 191 Grzymała-Busse, J.W.
175 147
Kaszuba, K. 295 Kitani, M. 57 Kłopotek, M.A. 429 Kostek, B. 295 Kozielski, S. 395
395
125, 147
111
Paja, W. 191 Pałasi´nski, M. 417 Pancerz, K. 191 Pazzaglia, R. 159 Peroni, S. 359 Petriu, E.M. 223, 261 Ponsa, P. 15 Poryzała, P. 3 Precup, R.E. 223, 261 Preitl, S. 223, 261 Przystalski, K. 111 Pułka, A. 325 Pyzik, L. 237 R˘adac, M.B. 223 Rakytyanska, H.B. 375
498 Ravarelli, A. 159 Rotshtein, A.P. 375 Sakaino, S. 91 Sato, T. 91 Sawada, H. 57 Sp˘ataru, S.V. 223 Sur´owka, G. 111 ´ Swito´ nski, A. 131 Szkoła, J. 191 Szostek, G. 175 Szyma´nski, Z. 345
Author Index Vitali, F.
359
Walczak, A. 175, 205 Wilamowski, B.M. 313, 463 Wojciechowski, K. 131 Xie, T.T.
313
Yakoh, T. 91 Yu, H. 313, 463 Zanetti, M.A.
159
Subject Index
Adamczyk, K.
205
Machnicka, Z. 417 Małysiak-Mrozek, B. Materka, A. 3 Milik, A. 325 Mroczek, T. 147 Mrozek, D. 395 Mu˜noz, L.M. 15 Musetti, A. 359
Balestra, A. 159 Barkana, D. Erol 75 Bieda, R. 131 Bobrowski, L. 443 Byczuk, M. 3 Casals, A. 15 Chojnacki, S. 429 Cudek, P. 125
Nauth, P. 41 Nowak, L. 111 Noyes, E. 277
David, R.C. 223 Deligiannidis, L. 277 Di Iorio, A. 359 Dragos¸, C.A. 223, 261 Drałus, G. 477 Dwulit, M.P. 345 Fryc, B.
Hanada, H. 57 Hara, T. 57 Hippe, Z.S. 125, 147 Jaszuk, M. Jurczak, P.
Ogorzałek, M. Orio, S. 159 Ota, Y. 31
417
Gomuła, J. 191 Grzymała-Busse, J.W.
175 147
Kaszuba, K. 295 Kitani, M. 57 Kłopotek, M.A. 429 Kostek, B. 295 Kozielski, S. 395
395
125, 147
111
Paja, W. 191 Pałasi´nski, M. 417 Pancerz, K. 191 Pazzaglia, R. 159 Peroni, S. 359 Petriu, E.M. 223, 261 Ponsa, P. 15 Poryzała, P. 3 Precup, R.E. 223, 261 Preitl, S. 223, 261 Przystalski, K. 111 Pułka, A. 325 Pyzik, L. 237 R˘adac, M.B. 223 Rakytyanska, H.B. 375
500 Ravarelli, A. 159 Rotshtein, A.P. 375 Sakaino, S. 91 Sato, T. 91 Sawada, H. 57 Sp˘ataru, S.V. 223 Sur´owka, G. 111 ´ Swito´ nski, A. 131 Szkoła, J. 191 Szostek, G. 175 Szyma´nski, Z. 345
Subject Index Vitali, F.
359
Walczak, A. 175, 205 Wilamowski, B.M. 313, 463 Wojciechowski, K. 131 Xie, T.T.
313
Yakoh, T. 91 Yu, H. 313, 463 Zanetti, M.A.
159
Subject Index
A ABCD rule 125 Analyze medical texts 175 Antilock braking system 223 Approximate reasoning modeling 417 Artificial neural networks 111, 477 Asymmetry 739 Autistic spectrum disorders (ASD) 159 Autonomous humanoid robots 41 B Back propagation learning algorithms 477 BCI performance 3 Biofeedback method 295 Bipartite graphs 429 BlackBoard platform 237 Boolean recommenders 429 Brain stroke 147 computer interface 3 C Cancer detection 131 Colonoscopy diagnostician 131 Complex neural networks 477 Computer assisted robotic systems 75 Convex and piecewise linear (CPL) 443 Copernicus system 191 CPL models 443
Digital images 125, 205 DWT (discrete wavelet transform) 295 Dwulit’s hull 345 Dynamic complex systems 477 E Edge-directed interpolation 205 Education in control engineering 261 EEG (electroencephalography) 295 Effective learning 295 E-learning platforms 237 Endoscopy diagnosis 131 Entrepreneurship education 277 EOG (electrooculogram) 295 Ergonomic haptic interaction 15 Eye tracking 159 F FDL model 325 Flash cards 295 Force-sensorless bilateral control 91 FPGA Xilinx Virtex5 device 325 Fuzzy control systems 223 default logic (FDL) 325 If-Then rules 377 model 223 relational calculus 375 relational equations 378 relational identification 375 relational matrix 377 systems 313 G
D Data mining system 148 Dermatoscopic images 111
Gastroscopy diagnostician 131 Generalized rules 417 Genetic algorithm 379 Glasgow outcome scale (GOS) 147
500 Goal understanding 41 Gradient vector 463
Subject Index
Haptic interface 15 Healthcare support 31 Hemispherical synchronization 295 Hessian matrix 463 Human computer interaction 277 machine interaction 159 robot systems 15 Hyperlexia profile 159
Melanoma diagnosis 125 Mental diseases 191 Mimicking human vocalization 57 MIMO object identification 376 Mind map 295 Minnesota multiphasic personality inventory (MMPI) 191 ML2EM laboratory equipment 261 Modeling of approximate reasoning 417 Modified Rankin scale (mRS) 147 Moodle platform 237 Multilayer neural networks 477 Multispectral imaging 131 objects detection 131
I
N
ILIAS platform 237 Image interpolation 205 Intelligent control architecture 75 robot 41 Interpolation method 205 Interval regression approach 443 uncertainty 443 Inverse problem 376
JSEG image segmentation 111
Nearest neighbor (NN) classification 345 Neural models 477 networks 111, 477 training 375 systems 313 Neuro-fuzzy systems 313 NGTS 148 Non-linear control surfaces 313 notes 295 Not-natively-semantic wikis 359 Null solution 380 Nursing and healthcare support 31
K
O
kNN algorithm 345 Knowledge extraction 375
Language impairments 159 Linear quadratic regulator (LQR) 223
Ontology design 175 led creation 359 Optimization problem 379 Orthopedic robotic system 75 Orthoroby 75 oWiki 359
M
P
Magnetic levitation system 261 Malignant melanoma 111 Manufacturing support 31 Medical diagnostic knowledge 175 Melanocytic skin lesions 125
Partner robots 31 Photodynamic diagnosis 131 Pigment skin lesions 111 Production deficits 159 Protein structure similarity searching 395
H
J
L
Subject Index R Real-life characteristics 429 Real-time experiments 261 Remote robots 91 teaching 237 Rule-based analysis 191 S Second order algorithms 463 Self generating will 41 Semantic analysis 111 data 359 model 175 Server-side query language 395 Short-distance personal mobility 31 Singing robot 57 Skin cancer images 111 Solution set 380 Solving fuzzy relational equations 378 Spectral pixel signatures 131 Spectrum estimation 131 Speech recognition 41 SQL language 395 Steady-state visual evoked potentials (SSVEPS) 3 Stimulus parameters 3
501 Stolz strategy 125 Support vector machines 111 T Takagi-Sugeno (T-S) fuzzy models 223 Talking robot 57 Telerobotic applications 15 Text comprehension 159 Thrust wires 91 Two electromagnets (MLS2EM) 261 Two-degree-of-freedom (two-DOF) 91 V VEPs spectral 3 Vocal cords 57 tract 57 Z ZSZN platform 237 2 2D anisotropic wavelet edge extractors 205 2-D visualizations 277 3 3-D visualizations 277