SOFTWARE ENGINEERING AND DEVELOPMENT No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
SOFTWARE ENGINEERING AND DEVELOPMENT
ENRIQUE A. BELINI EDITOR
Nova Science Publishers, Inc. New York
Copyright © 2009 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Software engineering and development / Enrique A. Belini. p. cm. Includes index. ISBN 978-1-61668-289-7 (E-Book) 1. Software engineering. 2. Computer software--Development. I. Belini, Enrique A. QA76.758.S64557 2009 005.1--dc22 2009014731
Published by Nova Science Publishers, Inc.
New York
CONTENTS Preface
vii
Expert Commentaries A
Succinct Representation of Bit Vectors Supporting Efficient rank and select Queries Jesper Jansson and Kunihiko Sadakane
B
Heterogeneity as a Corner Stone of Software Development in Robotics Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo and Javier González
Short Communications
1 3
13
23
A
Embedding Domain-Specific Languages in General-Purpose Programming Languages Zoltán Ádám Mann
25
B
Studying Knowledge Flows in Software Processes Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García , Jesús Favela and Mario Piattini
37
C
Software Product Line Engineering: The Future Research Directions Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
69
D
Software Development for Inverse Determination of Constitutive Model Parameters A. Andrade-Campos, P. Pilvin, J. Simões and F. Teixeira-Dias
93
E
Design of Molecular Visualization Educational Software for Chemistry Learning L.D. Antonoglou, N.D. Charistos and M.P. Sigalas
125
vi
Contents F
Software Components for Large Scale Super and Grid Computing Applications Muthu Ramachandran
151
G
Principles and Practical Aspects of Educational Software Evaluation Quynh Lê and Thao Lê
175
Research and Review Studies
185
Chapter 1
Testing Event-driven Software – the Next QA Challenge? Atif M. Memon
187
Chapter 2
Debugging Concurrent Programs Using Metaheuristics Francisco Chicano and Enrique Alba
193
Index
223
PREFACE Software engineering is one of the most knowledge intensive jobs. Thus, having a good knowledge management (KM) strategy in these organizations is very important. This book examines software processes from a knowledge perspective flow, in order to identify the particular knowledge needs of such processes to then be in a better position for proposing systems or strategies to address those needs. Its possible benefits are illustrated through the results of a study in a software maintenance process within a small software organization. Furthermore, software product line architecture is regarded as one of the crucial piece of entity in software product lines. The authors of this book discuss the state of the art of software product line engineering from the perspectives of business, architecture, process and organization. In recent years, domain-specific languages have been proposed for modelling applications on a high level of abstraction. Although the usage of domain-specific languages offers clear advantages, their design is a highly complex task. This book presents a pragmatic way for designing and using domain-specific languages. Other chapters in this book examine the development of numerical methodologies for inverse determination of material constitutive model parameters, discuss some of the reasons for the irrelevancy of software engineering to the robotic community, review the evolution of robotic software over time, and propose the use of Ant Colony Optimization, a kind of metaheuristic algorithm, to find general property violations in concurrent systems using a explicit state model checker. In the design of succinct data structures, the main objective is to represent an object compactly while still allowing a number of fundamental operations to be performed efficiently. In Expert Commentary A, the authors consider succinct data structures for storing a bit vector B of length n. More precisely, in this setting, one needs to represent B using n+ o(n) bits so that rank and select queries can be answered in O(1) time, where for any i ∈ {1, 2, . . . , n}, rank0(B, i) is the number of 0s in the first i positions of B, select0(B, i) is the position in B of the ith 0 (assuming Bcontains at least i0s), and rank1(B, i) and select1(B, i) are defined analogously. These operations are useful because bit vectors supporting rank and select queries are employed as a building block for many other more complex succinct data structures. The authors first describe two succinct indexing data structures for supporting rank and select queries on B in which B is stored explicitly together with some auxiliary information. The authors then present some matching lower bounds. Finally, the authors discuss generalizations and related open problems for supporting rank and select queries efficiently on strings over non-binary alphabets.
viii
Enrique A. Belini
In the last years the complexity of robotic applications has raised important problems, particularly in large and/or long-term robotic projects. Software engineering (SE) seems the obvious key for breaking that barrier, providing good maintenance and reusing, coping with exponential growth of programming effort, and integrating diverse components with guarantees. Suprisingly, SE has never been very relevant within the robotic community. In Expert Commentary B the authors briefly describe some causes for that, review the evolution of robotic software over time, and provide some insights from our most recent contributions. They have found that many problems arising from the conflicts raised by robotic complexity can be well addressed from a SE perspective as long as the focus is, at all levels, on the heterogeneity of components and methodologies. Therefore the authors propose heterogeneity as one of the corner stones of robotic software at present. In recent years, domain-specific languages have been proposed for modelling applications on a high level of abstraction. Although the usage of domain-specific languages offers clear advantages, their design is a highly complex task. Moreover, developing a compiler or interpreter for these languages that can fulfil the requirements of industrial application is hard. Existing tools for the generation of compilers or interpreters for domainspecific languages are still in an early stage and not yet appropriate for the usage in an industrial setting. Short Communication A presents a pragmatic way for designing and using domainspecific languages. In this approach, the domain-specific language is defined on the basis of a general-purpose programming language. Thus, general programming mechanisms such as arithmetics, string manipulations, basic data structures etc. are automatically available in the domain-specific language. Additionally, the designer of the domain-specific language can define further domain-specific constructs, both data types and operations. These are defined without breaching the syntax of the underlying general-purpose language. Finally, a library has to be created which provides the implementation of the necessary domain-specific data types and operations. This way, there is no need to create a compiler for the new language, because a program written in the domain-specific language can be compiled directly with a compiler for the underlying general-purpose programming language. Therefore, this approach leverages the advantages of domain-specific languages while minimizing the effort necessary for the design and implementation of such a language. The practical applicability of this methodology is demonstrated on a case study, in which test cases for testing electronic control units are developed. The test cases are written in a new domain-specific language, which in turn is defined on the basis of Java. The pros and cons of the presented approach are examined in detail on the basis of this case study. In particular, it is shown how the presented methodology automatically leads to a clean software architecture. Many authors have observed the importance of knowledge for software processes. This fact has caused that every time more researchers and practitioners initiate efforts to apply knowledge management in software processes. Unfortunately, much of such efforts are just oriented to aid big software companies, and in using existing knowledge management systems or strategies that have not been developed following the specific and particular knowledge needs of the process in which they are included. This fact has caused that often such efforts do not really help to the people who should benefit by using them. In this chapter the authors state that one way to address this problem is to first study software processes from a knowledge flow perspective, in order to identify the particular knowledge needs of such processes to then be in a better position for proposing systems or strategies to address those
Preface
ix
needs. Short Communication B presents an approach which has been used to accomplish the last objective. Its possible benefits are illustrated through the results of a study in a software maintenance process within a small software organization. The recent trend of switching from single software product development to lines of software products in the software industry has made the software product line concept viable and widely accepted methodology in the future. Some of the potential benefits of this approach include cost reduction, improvement in quality and a decrease in product development time. Many organizations that deal in wide areas of operation, from consumer electronics, telecommunications, and avionics to information technology, are using software product lines practice because it deals with effective utilization of software assets and provides numerous benefits. Software product line engineering is an inter-disciplinary concept. It spans over the dimensions of business, architecture, process and organization. The business dimension of software product lines deals with managing a strong coordination between product line engineering and the business aspects of product line. Software product line architecture is regarded as one of the crucial piece of entity in software product lines. All the resulting products share this common architecture. The organizational theories, behavior and management play critical role in the process of institutionalization of software product line engineering in an organization. The objective of Short Communication C is to discuss the state of the art of software product line engineering from the perspectives of business, architecture, organizational management and software engineering process. This work also highlights and discusses the future research directions in this area thus providing an opportunity to researchers and practitioners to better understand the future trends and requirements. Computer simulation software using finite element analysis (FEA) has, nowadays, reached reasonable maturity. FEA software is used in such diverse fields as structural engineering, sheet metal forming, mould industry, biomechanics, fluid dynamics, etc. This type of engineering software uses an increasingly large number of sophisticated geometrical and material models. The quality of the results relies on the input data, which are not always readily available. The aim of inverse problem software, which will be considered here, is to determine one or more of the input data relating to FEA numerical simulations. The development of numerical methodologies for inverse determination of material constitutive model parameters will be addressed in Short Communication D. Inverse problems for parameter identification involve estimating the parameters for material constitutive models, leading to more accurate results with respect to physical experiments, i.e. minimizing the difference between experimental results and simulations subject to a limited number of physical constraints. These problems can involve both hyperelastic and hypoelastic material constitutive models. The complexity of the process with which material parameters are evaluated increases with the complexity of the material model itself. In order to determine the best suited material parameter set, in the less computationally expensive way, different approaches and different optimization methods can be used. The most widespread optimization methods are the gradient-based methods, the genetic, evolutionary and natureinspired algorithms, the immune algorithms and the methods based in neural networks and artificial intelligence. By far, the better performing methods are gradient-based but their performance is known to be highly dependent on the starting set of parameters and their results are often inconsistent. Nature-inspired techniques provide a better way to determine an optimized set of parameters (the overall minimum). Therefore, the difficulties associated to
x
Enrique A. Belini
choosing a starting set of parameters for this process is minor. However, these proved to be computationally more expensive than gradient-based methods. Optimization methods present advantages and disadvantages and their performance is highly dependent on the constitutive model itself. There is no unique algorithm robust enough to deal with every possible situation, but the use of sequential multiple methods can lead to the global optimum. The aim of this strategy is to take advantage of the strength of each selected algorithm. This strategy, using gradient-based methods and evolutionary algorithms, is demonstrated for an elasticplastic model with non-linear hardening, for seven distinct hyperelastic models (Humphrey, Martins, Mooney-Rivlin, Neo-Hookean, Ogden, Veronda-Westmann and Yeoh) and for one thermoelastic-viscoplastic hypoelastic model. The performance of the described strategy is also evaluated through an analytical approach. An important goal for chemical education is students’ acquisition of key concepts and principles regarding molecular structure. Chemists have developed a rich symbolic language that helps them create and manipulate mental and external representations that describe spatial relations of aperceptual particles in order to investigate and communicate chemical concepts. High school and college students pose significant difficulties in understanding these concepts, mastering the symbolic language and making connections and transformations between symbolic, microscopic and macroscopic representations of chemical phenomena. Over the past decade the development of molecular visualization tools has changed the nature of chemistry research and created promising prospects for their integration in chemistry education which could help students overcome these difficulties. In Short Communication E the authors examine the case of molecular visualization in chemistry education and describe a number of educational packages that utilize new molecular visualization tools they developed to support learning of chemistry concepts in secondary and tertiary education. Software development for large and complex systems remains a costly affair. The complexity for supercomputing applications that require high speed and high precision systems grows exponentially. Short Communication F provides an approach to design and development of supercomputing applications based on software components which have potential to minimize the cost and time for complex and high dependability systems. Software components are aimed to provide a self-contained entity that can be adapted to the required environment quickly and easily. However this definition need to extended for large scale supercomputing paradigm. This will be a quite considerable paradigm shift for Component Based Software Engineering (CBSE) paradigm that exists today. The main criteria for supercomputing and grid applications include flexibility, reusability, scalability, highly concurrent, parallel & multi-threaded, security, distributed and data-intensive systems. This chapter defines a new paradigm for CBSE for supercomputing applications. Therefore design for large scale software components is the major emphasis of this chapter. As explained in Short Communication G, software is not just a product for all-purpose use. Generally, software is produced for a specific purpose in a domain. Some software products appeal to a wide range of users such as word processing, drawing, and editing. However, software is developed to cater for the demand of targeted users. For example, Statistical Packages for Social Science (SPSS) is a statistic analysis tool for analyzing quantitative data in research. In education, the main aim of software is to enhance teaching and learning. It is important to evaluate educational software to determine its effectiveness. There are a number of issues concerning evaluation of educational software such as users’ and evaluators’ perspectives on teaching and learning, translation theory into practice.
Preface
xi
A particular class of software that is fast becoming ubiquitous is event-driven software (EDS). All EDS share a common event-driven model – they take sequences of events (e.g., messages, mouse-clicks) as input, change their state, and (sometimes) output an event sequence. Examples include web applications, graphical user interfaces (GUIs), network protocols, device drivers, and embedded software. Quality assurance tasks such as testing have become important for EDS since they are being used in critical applications. Numerous researchers have shown that existing testing techniques do not apply directly to EDS because of the new challenges that EDS offer. Chapter 1 lists some of these challenges and emphasizes on the need to develop new techniques (or enhance existing ones) to test EDS. Model Checking is a well-known and fully automatic technique for checking software properties, usually given as temporal logic formulae on the program variables. Some examples of properties are the absence of deadlocks, the absence of starvation, the fulfilment of an invariant, etc. The use of this technique is a must when developing software that controls critical systems, such as an airplane or a spacecraft. Most model checkers found in the literature use exact deterministic algorithms to check the properties. The memory required for the verification with these algorithms usually grows in an exponential way with the size of the system to verify. This fact is known as the state explosion problem and limits the size of the system that a model checker can verify. When the search for errors with a low amount of computational resources (memory and time) is a priority (for example, in the first stages of the implementation of a program), non-exhaustive algorithms using heuristic information can be used. Non-exhaustive algorithms can find errors in programs using less computational resources than exhaustive algorithms, but they cannot be used for verifying a property: when no error is found using a non-exhaustive algorithm the authors still cannot ensure that no error exists. In Chapter 2 the authors propose the use of Ant Colony Optimization, a kind of metaheuristic algorithm, to find general property violations in concurrent systems using an explicit state model checker. Metaheuristic algorithms are a well-known set of techniques used for finding near optimal solutions in NP-hard optimization problems in which exact techniques are unviable. Our proposal, called ACOhg-mc, takes also into account the structure of the property to check in order to improve the efficacy and efficiency of the search. In addition to the description of the algorithm, the authors have performed a set of experiments using the experimental model checker HSF-SPIN and a subset of models from the BEEM benchmark for explicit model checking. The results show that ACOhg-mc finds optimal or near optimal error trails in faulty concurrent systems with a reduced amount of resources, outperforming in most cases the results of algorithms that are widely used in model checking, like Nested Depth First Search or Improved Nested Depth First Search. This fact makes our proposal suitable for checking properties in large faulty concurrent programs, in which traditional techniques fail to find counterexamples because of the model size. In addition, the authors show that ACOhg-mc can also be combined with techniques for reducing the state explosion such as partial order reduction and the authors analyze the performance of this combination.
EXPERT COMMENTARIES
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 3-12
ISBN 978-1-60692-146-3 c 2009 Nova Science Publishers, Inc.
Expert Commentary A
S UCCINCT R EPRESENTATION OF B IT V ECTORS S UPPORTING E FFICIENT rank AND select Q UERIES Jesper Jansson1,∗,† and Kunihiko Sadakane2,‡ 1 Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan 2 Department of Computer Science and Communication Engineering, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
Abstract In the design of succinct data structures, the main objective is to represent an object compactly while still allowing a number of fundamental operations to be performed efficiently. In this commentary, we consider succinct data structures for storing a bit vector B of length n. More precisely, in this setting, one needs to represent B usingn+ o(n) bits so that rank and select queries can be answeredin O(1) time, where for any i ∈ {1, 2, . . . , n}, rank 0 (B, i) is the number of 0s in thefirst i positions of B, select 0 (B, i) is the position in B of the ith 0 (assuming B contains at least i 0s), and rank 1 (B, i) and select 1 (B, i) are defined analogously.These operations are useful because bit vectors supporting rank andselect queries are employed as a building block for many other morecomplex succinct data structures. We first describe two succinct indexing data structures for supportingrank and select queries on B in which B is stored explicitlytogether with some auxiliary information.We then present some matching lower bounds.Finally, we discuss generalizations and related open problems for supportingrank and select queries efficiently on strings over non-binaryalphabets.
1.
Introduction
Let B ∈ {0, 1}n be a bit vector of length n. For any i ∈ {1, 2, . . . , n}, let B[i] denote the value of B atposition i, and for any i, j ∈ {1, 2, . . . , n} with i ≤ j, letB[i..j] be the bit ∗
E-mail address:
[email protected] Funded by the Special Coordination Funds for Promoting Science and Technology, Japan. ‡ E-mail address:
[email protected] †
4
Jesper Jansson and Kunihiko Sadakane
vector consisting of B[i], B[i + 1], . . . , B[j].(If i > j then B[i..j] is defined to be ∅.)Next, define the following operations: • rank 0 (B, i) – Return the number of 0s in B[1..i]. • rank 1 (B, i) – Return the number of 1s in B[1..i]. • select 0 (B, i) – Return the position in B of the ith 0. • select 1 (B, i) – Return the position in B of the ith 1. In this commentary, we consider the problem of constructing a data structure for storing any given B such that rank 0 (B, i), rank 1 (B, i),select 0 (B, i), and select 1 (B, i) queries can be carried out efficiently.We focus on indexing data structures for B, where B is storedverbatim in n bits and one is allowed to use o(n) extra bitsof storage (called the index) to efficiently support rank andselect queries on B. We assume the word-RAM model of computation with wordlength w = ⌈log n⌉ 1 bits in order to handle pointers to the data structure in constant time.In the word-RAM model, the CPU can perform logical operations such as ANDand OR, and arithmetic operations such as addition, subtraction,multiplication, and division between two integers in the interval[0, 2w − 1] (w-bit integers) in constant time.The CPU can also read/write a wbit integer from/to a specific memory cellin constant time; in other words, if B is a stored bit vector of length n,then for any given i ∈ {0, 1, . . . , n − w}, B[(i + 1)..(i + w)] can beobtained in O(1) time. The commentary is organized as follows: In Section 2., we outline how to construct in O(n) timean index for B of size O(n log log n/ log n) = o(n)bits which allows each subsequent rank or select query to be answeredin O(1) time.The presentation in Section 2. is based on [20] for rank and [28] for select.Next, in Section 3., we state some lower bounds from [11]and [19] which match the upper bounds given in Section 2.Then, in Section 4., we discuss generalizations to non-indexingdata structures as well as generalizations to non-binary vectors,and finally, in Section 5., we provide examples of otherdata structures that depend on efficient rank and select datastructures for bit vectors and non-binary vectors,and mention some directions for further research.
2.
Upper Bounds for Indexing Data Structures
Jacobson [15] presented a space-efficient indexing data structurefor B which allows rank and select queries on B to be answered inO(1) and O(log n) time, respectively, while requiring onlyO(n log log n/ log n) bits for the index.A series of improvements to Jacobson’s data structure were made byClark [4],Munro [20],Munro et al. [23],and Raman et al. [28], reducing the time needed to answereach select query to O(1) while usingan index of size O(n log log n/ log n) bits. Below, we describe two simplified indexing data structures for Bbased on [20] for rank To make the presentation more readable, we omit“⌈”, “⌉”, “⌊”, and “⌋” symbolswhere obvious. Also, we allow the last block in any partition into blocks to be smallerthan the specified block size. 1
Throughoutthis commentary, “log” denotes the base-2 logarithm and“logσ ” denotes the base-σ logarithm.
Succinct Representation of Bit Vectors Supporting Efficient rank and select Queries 5
2.1.
An Indexing Data Structure for rank Queries (based on [20])
Conceptually divide the bit vector B into blocks of lengthℓ = log2 n each, and call each such block a large block.Next, divide every large block into small blocks of length s = 12 log n each.Create auxiliary data structures for storing the values of rank 1 forthe boundaries of these blocks as follows:Use an integer array Rℓ [0..n/ℓ] in which every entry Rℓ [x]stores the number of 1’s in B[1..xℓ], andan integer array Rs [0..n/s] in which every y entry Rs [y] stores thenumber of 1’s inB[(⌊ ℓ/s ⌋·ℓ + 1)..ys],i.e., the number of 1’s in the yth small block plus the total number of1’s in all small blocks which belong to the same large block as smallblock y and which occur before small block y.The space needed to store Rℓ isO( nℓ · log n) = O( logn n ) bitsbecause each of its entries occupies O(log n) bits, and the
log n spaceneeded to store Rs isO( ns · log(ℓ + 1)) = O( n log log n )bits because all of its entries are between 0 and ℓ. To answer the query rank 1 (B, i) for any given i ∈ {1, 2, . . . , n},compute x = ⌊ ℓi ⌋, y = ⌊ si ⌋, and z = i − ys,and use the relationrank 1 (B, i) = rank 1 (B, ys) + Pz Pz q=1 B[ys + q] = Rℓ [x] + Rs [y] + q=1 B[ys + q],where the first two terms are directly available from Rℓ and Rs .To compute the third term in constant time, the following table lookuptechnique can be applied:In advance, construct a table Tr [0..(2s − 1), 1..s] in which eachentry Tr [i, j] stores the number of 1’s in the first j bits of thebinary representation P of i.Then, whenever one needs to compute zq=1 B[ys + q],first read the memory cell storing B[(ys + 1)..(ys + s)](because s < w, this can be done in constant time),interpret this s-bit vector as an integer p,where p ∈ {0, 1, . . . , 2s − 1},and find the value Tr [p, z] in the table.Hence, rank 1 (B, ys + z) can be computed in constant time.The size of the √ table Tr is2s · s · log(s + 1)=O( n · log n · log log n) = o(n) bits,and all of the auxiliary data structures Rℓ , Rs , Tr may beconstructed in O(n) time. To compute rank 0 (B, i), no additional data structures are necessarybecause rank 0 (B, i) = i − rank 1 (B, i).Therefore we have:
Theorem 1. Given a bit vector of length n, after O(n) time preprocessing andusing an index of size O(n log log n/ log n) bits,each rank 1 and rank 0 query can be answered in O(1) time.
2.2.
An Indexing Data Structure for select Queries (based on [28])
Define ℓ = log2 n and construct an array storing the position ofthe (iℓ)th occurrence of a 1 in B for alli = 1, 2, . . . , nℓ .Regions in B between two consecutive positions stored in the array arecalled upper blocks.If the length of an upper block is at least log4 n, it is sparse.For every sparse upper block, store the positions of all its 1’s explicitlyin sorted order.Since the number of such blocks is at most logn4 n , the spacerequired for storing the positions of all 1’s in all sparse upper blocks isat most logn4 n · log2 n · log n =
n log n
bits. For every non-sparse 1 2
upper block U , further divide it intolower blocks √ of length s = log n each and construct acomplete tree for U with branching factor log n whose leaves arein one-to-one correspondence with the lower blocks in U .The height of the tree is at most 7, i.e., at most a con4 stant, because thenumber of leaves is at most logs n = 2 log3 n.For each non-leaf node v of √ the tree, let Cv be an array of log n integers such that Cv [i] equals the number of 1’s inthe subtree rooted at the ith child of v.(All Cv -arrays can be computed in O(n) time prepro2n cessing.)The entire bit vector B contains at most ns = log n lower blocks, so the total number
6
Jesper Jansson and Kunihiko Sadakane
of nodes in all trees representing allthe upper blocks is O( logn n ) and furthermore, the total numberof entries in all Cv -arrays is at most this much.Since the number of 1’s in any tree is at most log2 n, every entryin a Cv -array can be stored in O(log log n) bits.Therefore, the total space needed to store all trees(including all the Cv -arrays) isO( logn n ·log log n) bits. To answer the select 1 (B, i) query in constant time, first divide iby ℓ to find the upper block U that contains the ith 1, andcheck whether U is sparse or not.If U is sparse, the answer to the select 1 query is stored explicitlyand can be retrieved directly.If U is not sparse, start at the root of the tree that represents U anddo a search to reach the leaf that corresponds to the lower block with thejth 1, where j equals i modulo ℓ.At each step, it is easy to determine which subtree contains the jth 1 inO(1) time by a table lookup using the Cv array for√ the currentnode v, and then adjust j and continue to the next step.(For the lookup, use an ( log n + 1)-dimensional table T such thatentry T [c1 , c2 , . . . , c√log n , j] = x if and only ifthe first subtree contains exactly c1 1’s, the second subtree containsexactly c2 1’s, etc. and the jth 1 belongs to the xth √ subtree.The space needed to store T is o(n) bits because the index of T is encoded in ( log n + 1) · 2 log log n ≤ 0.5 log n bitsfor large enough n, so T has O(20.5 log n ) = O(n0.5 ) entrieswhich each need log log n bits.)Finally, after reaching a leaf and identifying the corresponding lower block,find the relative position of the jth 1 inside that lower block byconsulting a global table of √ size21/2 log n · 21 log n · log log n=O( n log n log log n)bits which stores the relative position of the qth 1 inside a lower blockfor every possible binary string of length 12 log n and everypossible query q in {1, 2, . . . , 12 log n}. To answer select 0 queries, construct data structures analogous to thosefor select 1 described above.We obtain the following. Theorem 2. Given a bit vector of length n, after O(n) time preprocessing andusing an index of size O(n log log n/ log n) bits,each select 1 and select 0 query can be answered in O(1) time.
3.
Lower Bounds for Indexing Data Structures
By applying two different techniques,one consisting of a reduction from a vector addition problem and theother one a direct information-theoretical argument involving reconstructingB from any given indexing data structure for B together with anappropriately defined binary string,Miltersen [19] proved the following theorem.(Recall that B is assumed to be stored explicitly in addition to the bitsused by the indexing data structure.) Theorem 3. [19] It holds that: 1. Any indexing data structure for rank queries on B using word size w,index size r bits, and query time t must satisfy2(2r + log(w + 1))tw ≥ n log(w + 1). 2. Any indexing data structure for select queries on B using word size w,index size r bits, and query time t must satisfy 3(r + 2)(tw + 1) ≥ n. In particular, for the case t = O(1) and w = O(log n),Theorem 3 immediately implies the lower boundsr = Ω(n log log n/ log n) for rank indexing data structuresand r = Ω(n/ log n) for select indexing data structures. Using a counting argument based on binary choices trees, these lower boundswere strengthened by Golynski [11] as follows:
Succinct Representation of Bit Vectors Supporting Efficient rank and select Queries 7 Theorem 4. [11] If there exists an algorithm for either rank or select queries on Bwhich reads O(log n) different positions of B, has unlimited access to an index of size r bits, and log n is allowed to use unlimited computationpower, then r = Ω( n log log n ). Hence, the upper bounds given in Theorems 1and 2 are asymptotically optimal.Note that Theorem 4 is very general;it does not impose any restrictions on the running time or require the readpositions of B to be consecutive for the lower bound to hold. Theorem 5. [11] Suppose that B has exactly m positions set to 1 for some integer m.If there exists an algorithm for either rank or select queries on Bwhich reads at most t different positions of B, has unlimited access toan index of size r bits, and is allowed to use unlimited computation power,then r = Ω( mt · log t).
4.
Generalizations
The indexing data structures in Sections 2. and 3. assume that the bit vector B is always stored explicitly.However, the space used by this type of encoding is far from optimal if the number of 1’s in B is much smaller than n, or close to n.This is because the number of bit n n n−m n vectors of length n having m 1’s is m ≈ 2nH0 , whereH0 = m n log m + n log n−m is n the 0th order entropy of the bit vector,which may be much less than 2 , the number of distinct bit vectors oflength n.In fact, there exist data structures for rank /selectusing only nH0 + O(n log log n/ log n) bitsto store B such that any consecutive O(log n) bits of B canstill be retrieved in constant time2 : Theorem 6. [28] For a bit vector B of length n with m 1’s, after O(n) timepreprocessing n n−m n and using nH0 + O(n log log n/ log n) bits,where H0 = m n log m + n log n−m ,each rank 1 , rank 0 , select 1 , and select 0 query can beanswered in O(1) time.Moreover, any consecutive O(log n) bits of B can be retrieved inO(1) time. The rank /select data structures can be extended to non-binary vectors.A string S of length n over an alphabet A is a vector S[1..n]such that S[i] ∈ A for 1 ≤ i ≤ n.Let σ be the alphabet size, i.e., σ = |A|.We assume that A is an integer alphabet of the form{0, 1, . . . , σ − 1} and that σ ≤ n.(Without loss of generality, we further assume that σ is a powerof 2.)Below, we consider succinct data structures for S supporting the followingoperations for any i ∈ {1, 2, . . . , n} and c ∈ {0, 1, . . . , σ − 1}: • access(S, i) – Return S[i]. • rank c (S, i) – Return the number of occurrences of c in S[1..i]. • select c (S, i) – Return the position of the ith c in S. S may be encoded in n log σ bits by the obvious representationusing log σ bits for each position S[i], but there exist otherencodings which improve the rank and select query time complexitiesat the cost of increasing the space complexity and the time needed 2
Observethat these data structures do not store B directly, so to retrieveO(log n) consecutive bits of B in O(1) time is no longertrivial.
8
Jesper Jansson and Kunihiko Sadakane
toretrieve S[i] for any given i ∈ {1, 2, . . . , n}.Hence, there is a trade-off between the size of a data structure and theaccess/rank /select query times.Table 1 lists the performance of two straightforward datastructures D1 and D2 (explained below) and threeimproved data structures proposed in [1, 12, 13]. The first straightforward data strucTable 1. The trade-off between the size (in bits) and the time neededto answer each access, rank c , and select c query for various datastructures.|S| denotes the number of bits to encode S, H0 is the 0th orderentropy of S, and α = log log σ log log log σ. Reference D1 in Section 4. D2 in Section 4. [13] [12] [1]
Size of data structure n(H0 + log e) + σ · o(n) |S| + (σ + 1) · o(n) nH0 + log σ · o(n) n log σ + n · o(log σ) |S| + n · o(log σ)
access time O(σ) O(1) O(log σ) O(log log σ) O(1)
rank c time O(1) O(log σ) O(log σ) O(log log σ) O(α log log σ)
select c time O(1) O(log σ) O(log σ) O(1) O(α)
ture D1 stores σ bit vectorsV0 , V1 , . . . , Vσ−1 of length n such that Vc [i] = 1 if andonly if S[i] = c, along with rank1 and select1 indexing data structuresfor these bit vectors.Then rank c (S, i) = rank 1 (Vc , i) and select c (S, i) = select 1 (Vc , i),and therefore they can be obtained in constant time.On the other hand, access requires O(σ) time because it mustexamine all of V0 [i], V1 [i], . . . , Vσ−1 [i].Each bit vector Vc can be encoded inlog mnc ≈ mc (log e + log mnc ) bitsby Theorem 6, where mc denotes the number of c’s in P S.In total, the space is c {mc (log e + log mnc ) + O(n log log n/ log n)} = n(H0 + log e) + σ · O(n log log n/ log n) bits. The second straightforward data structure D2 stores S explicitlyin n log σ bits.In addition, it stores a rank1 and select1 indexing data structure foreach of the bit vectors V0 , V1 , . . . , Vσ−1 of D1 .The bit vectors V0 , V1 , . . . , Vσ−1 are not stored explicitly,so to answer rankc and selectc queries,D2 must have a method to compute any consecutive log n bits of Vc that are required by the indexing data structure for Vc .This can be done in O(log σ) time by repeating the following steps2 log σ times,each time obtaining 1 1 2 logσ n bits of Vc :In O(1) time, read 2 log n consecutive bits from Sand put them in a bit 1 vector r.To find the 2 logσ n bits of Vc that correspond to r,let s be the bit vector of length 1 1 2 log n consisting of 2 logσ n copies of the length-(log σ)pattern 000 . . . 01, let t be s multiplied by c, and and let u bethe bitwise exclusive-or between r and t.Note that for any nonnegative integer i,the length-(log σ) pattern of u starting at positioni·log σ equals 000 . . . 00 if and only if the correspondingposition in S contains the symbol c.Finally, look up entry u √ in a table having 21/2 log n = nentries to obtain a bit vector of size 12 logσ ncontaining a 1 in position i if and only ifu[(i · log σ)..((i + 1) · log σ) − 1] = 000 . . . 00.Thus, rank c and select c take O(log σ) time.The access query takes constant time because S is explicitly stored.The total space is that of storing S plusσ · O(n log log n/ log n) bits for the rank1 andselect1 indexing data structures, plus the size of the lookup table which √ is n · 12 logσ n = o(n) bits. In Table 1, |S| denotes the number of bits to encode S. It is n log σ if S is not compressed; however, it can be reduced byapplying a compression algorithm which supports instant Theorem 7. There exists a succinct data structure for storing a string S[1..n] over an
Succinct Representation of Bit Vectors Supporting Efficient rank and select Queries 9 alphabet A = {0, 1, . . . , σ − 1} in n(log logσ n + k log σ) nHk + O logσ n
bits for any k ≥ 0, where Hk is the kth order empiricalentropy of S, such that any substring of the formS [i . . . i + O(logσ n)]with i ∈ {1, 2, . . . , n} can be decoded in O(1) time on theword-RAM. By using this theorem, we can compress S into nHk + o(n log σ)bits.Furthermore, we can regard the compressed data as an uncompressed string.Therefore the query time in Table 1 does not change.
5.
Concluding Remarks
Succinct data structures that support efficient rank and select querieson bit vectors and non-binary vectors are important because they form thebases of several other more complex data structures.Some examples include succinct data structures for representingtrees [2, 6, 10, 16, 17, 22, 23],graphs [3, 15],permutations and functions [21, 24],text indexes [7, 14, 29, 30],prefix or range sums [26],and polynomials and others [9].In these data structures, a typical use of rank and select querieson bit vectors is to encode pointers to blocks of data.For example, suppose that to compress some data we partition it into blocks,compress each block independently into a variable numbers of bits, andconcatenate the result into a bit vector C.Then we can use another bit vector B[1..|C|] such that B[i] = 1 if and onlyif the ith bit of C is the starting position of a block, and applyselect 1 queries on B to find the correct starting and ending positionsin C when decompressing the data corresponding to a particular block.Some directions for further research include dynamization to supportoperations that allow B to be modified online [27],proving lower bounds on the size of succinct datastructures [5, 11, 19](note that the lower bounds shown in Section 3. hold only ifthe bit vector is stored explicitly using n bits, and thus do nothold for bit vectors stored in a compressed form),and practical implementation [25].Although the sizes of the known indexing data structures for bit vectors areasymptotically optimal, the o(n) additional space needed by an index isoften too large for real data and cannot be ignored.Therefore, for practical applications, it is crucial to develop otherimplementations of succinct data structures.Another open problem involves access/rank /selectoperations on non-binary vectors.No single data structure listed in Table 1 supports constanttime access, rank and select queries.What are the best possible lower and upper bounds on the number of bitsrequired to achieve this?Finally, a related topic is compressed suffix arrays [14],which are data structures for efficient substring searches.The suffix array [18] uses n log n bits for a string oflength n with alphabet size σ, while the compressedsuffix array uses only O(n log σ) bits, which is linear inthe string size. On the other hand, the compressed suffix array does notsupport constant time retrieval of an element of the suffix array.An important open problem is to establish whether there exists a data structureusing linear space and supporting constant time retrieval.
10
Jesper Jansson and Kunihiko Sadakane
References [1] J. Barbay, M. He, J. I. Munro, and S. S. Rao. Succinct indexes for strings, binary relations and multi-labeled trees. In Proceedings of the 18 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pages 680–689, 2007. [2] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing Trees of Higher Degree. Algorithmica, 43(4):275–292, 2005. [3] D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In Proceedings of the 14 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 679–688, 2003. [4] D. Clark. Compact Pat Trees. PhD thesis, The University of Waterloo, Canada, 1996. [5] E. D. Demaine and A. L´opez-Ortiz. A Linear Lower Bound on Index Size for Text Retrieval. Journal of Algorithms, 48(1):2–15, 2003. [6] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Structuring labeled trees for optimal succinctness, and beyond. In Proceedings of the 46 th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), pages 184–196, 2005. [7] P. Ferragina and G. Manzini. Indexing compressed texts. Journal of the ACM, 52(4):552–581, 2005. [8] P. Ferragina and R. Venturini. A simple storage scheme for strings achieving entropy bounds. Theoretical Computer Science, 372(1):115–121, 2007. [9] A. G´al and P. B. Miltersen. The cell probe complexity of succinct data structures. In Proceedings of the 30 th International Colloquium on Automata, Languages and Programming (ICALP 2003), volume 2719 of Lecture Notes in Computer Science, pages 332–344. Springer-Verlag, 2003. [10] R. F. Geary, N. Rahman, R. Raman, and V. Raman.A simple optimal representation for balanced parentheses.In Proceedings of the 15 th Annual Symposium on Combinatorial Pattern Matching (CPM 2004), volume 3109 of Lecture Notes in Computer Science, pages 159–172. Springer-Verlag, 2004. [11] A. Golynski.Optimal lower bounds for rank and select indexes. Theoretical Computer Science, 387(3):348–359, 2007. [12] A. Golynski, J. I. Munro, and S. S. Rao. Rank/select operations on large alphabets: a tool for text indexing. In Proceedings of the 17 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pages 368–373, 2006. [13] R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 841–850, 2003.
Succinct Representation of Bit Vectors Supporting Efficient rank and select Queries 11 [14] R. Grossi and J. S. Vitter. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM Journal on Computing, 35(2):378– 407, 2005. [15] G. Jacobson.Space-efficient static trees and graphs.In Proceedings of the 30 th Annual Symposium on Foundations of Computer Science (FOCS 1989), pages 549–554, 1989. [16] J. Jansson, K. Sadakane, and W.-K. Sung. Ultra-succinct Representation of Ordered Trees. In Proceedings of the 18 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pages 575–584, 2007. [17] H.-I. Lu and C.-C. Yeh. Balanced Parentheses Strike Back. To appear in ACM Transactions on Algorithms, 2008. [18] U. Manber and G. Myers. Suffix arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935–948, October 1993. [19] P. B. Miltersen.Lower bounds on the size of selection and rank indexes.In Proceedings of the 16 th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2005), pages 11–12, 2005. [20] J. I. Munro. Tables. In Proceedings of the 16 th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 1996), volume 1180 of Lecture Notes in Computer Science, pages 37–42. Springer-Verlag, 1996. [21] J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Succinct representations of permutations.In Proceedings of the 30 th International Colloquium on Automata, Languages and Programming (ICALP 2003), volume 2719 of Lecture Notes in Computer Science, pages 345–356. Springer-Verlag, 2003. [22] J. I. Munro and V. Raman. Succinct representation of balanced parentheses and static trees. SIAM Journal on Computing, 31(3):762–776, 2001. [23] J. I. Munro, V. Raman, and S. S. Rao. Space efficient suffix trees. Journal of Algorithms, 39(2):205–222, 2001. [24] J. I. Munro and S. S. Rao. Succinct Representations of Functions. In Proceedings of the 31 st International Colloquium on Automata, Languages and Programming (ICALP 2004), volume 3142 of Lecture Notes in Computer Science, pages 1006– 1015. Springer-Verlag, 2004. [25] D. Okanohara and K. Sadakane. Practical Entropy-Compressed Rank/ Select Dictionary. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX 2007), 2007. [26] C. K. Poon and W. K. Yiu. Opportunistic Data Structures for Range Queries. In Proceedings of Computing and Combinatorics, 11 th Annual International Conference (COCOON 2005), volume 3595 of Lecture Notes in Computer Science, pages 560– 569. Springer-Verlag, 2005.
12
Jesper Jansson and Kunihiko Sadakane
[27] R. Raman, V. Raman, and S. S. Rao. Succinct dynamic data structures. In Proceedings of Algorithms and Data Structures, 7 th International Workshop (WADS 2001), volume 2125 of Lecture Notes in Computer Science, pages 426–437. Springer-Verlag, 2001. [28] R. Raman, V. Raman, and S. S. Rao.Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms, 3(4):Article 43, 2007. [29] K. Sadakane. New Text Indexing Functionalities of the Compressed Suffix Arrays. Journal of Algorithms, 48(2):294–313, 2003. [30] K. Sadakane. Compressed Suffix Trees with Full Functionality. Theory of Computing Systems, 41(4):589–607, 2007.
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 13-22
ISBN: 978-1-60692-146-3 © 2009 Nova Science Publishers, Inc.
Expert Commentary B
HETEROGENEITY AS A CORNER STONE OF SOFTWARE DEVELOPMENT IN ROBOTICS Juan-Antonio Fernández-Madrigala, Ana Cruz-Martínb, Cipriano Galindoc and Javier Gonzálezd System Engineering and Automation Department, University of Málaga (Spain)
Abstract In the last years the complexity of robotic applications has raised important problems, particularly in large and/or long-term robotic projects. Software engineering (SE) seems the obvious key for breaking that barrier, providing good maintenance and reusing, coping with exponential growth of programming effort, and integrating diverse components with guarantees. Suprisingly, SE has never been very relevant within the robotic community. In this text we briefly describe some causes for that, review the evolution of robotic software over time, and provide some insights from our most recent contributions. We have found that many problems arising from the conflicts raised by robotic complexity can be well addressed from a SE perspective as long as the focus is, at all levels, on the heterogeneity of components and methodologies. Therefore we propose heterogeneity as one of the corner stones of robotic software at present.
1. Introduction Robots are mechatronic systems [2], therefore they integrate electromechanical devices and software. Although software has usually contributed to robotics with its plasticity when compared to implementations on bared hardware, some of its limits, mostly computational
a
E-mail address:
[email protected] E-mail address:
[email protected] c E-mail address:
[email protected] d E-mail address:
[email protected] b
14
Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo et al.
complexity [19] and its intrinsic nature as a manipulator of pre-existing symbols, have also limited the finding of practical solutions to some robotic problems+. In the last decades, a new limit of robotic software has been more and more evident when robotic projects have become large (for example, the development of complete robot architectures [18] or multirrobot systems [3]) and/or long-term. This kind of projects necessarily have to cope with the integration of very diverse components, ontologies, and methods in a guaranteed, maintainable and extensible way. It is clear at present that, under those requirements, sophisticated software methodologies are unavoidable for breaking the barrier of complexity without sacrificing robotic dependability. In particular, software engineering (SE) seems the obvious key for helping in these issues, namely by providing good maintainance and reusing and integrating with guarantees components that are diverse. Suprisingly, SE has never been considered very relevant within the robotics community, as demonstrated by the lack of specialized journals in the area and the reduced number of workshops on SE organized by robotics people. A few reasons could be: the small scale of most robotic applications until more or less one decade ago, the strongly reductionist methodology used in typical robotic problems, and, in cases, the improper belief that SE has nothing to do with "pure" robotics. Only a few research groups within robotics have proposed a number of tools more or less inspired in SE methodologies during the last twenty years (see section 2). These tools have set the basis for a deeper knowledge of the limits and characteristics of robotic software, but they have not formed yet a consistent solution that covers all aspects of such software. Recently, we have proposed a characteristic that serves to differentiate robotic software: heterogeneity ([10], [12], [13]). We have found that many problems arising from the conflicts between robotic complexity and dependability can be well addressed from a SE perspective as long as heterogeneity is included, as a core feature, at all levels of development, instead of forcing the use of a common solution and set of methodologies. Therefore, we have constructed a minimalistic framework for covering the main stages of heterogeneous software development: design, implementation, validation, debugging, and maintenance. This framework, called BABEL, has demonstrated that allowing a high level of heterogeneity at all the levels of development facilitates the achievement of modern robotic goals. In the following we explore the evolution of robotic software (section 2) and show, through BABEL, the role of heterogeneity as one of the corner stones of robotic software (sections 3 and 4). Since particular and detailed examples and results have been reported elsewhere during the last years, we focus here on the essential aspects of our approach.
2. The Main Stages in the Evolution of Robotic Software As the robotics realm itself, robotic software has continually evolved, running in parallel to the hardware and software technologies available at the moment. For summarizing the main trends, we identify here three different stages in time, each of them characterized by a particular software issue that received certain effort from the robotics research community in order to solve the robotic problems at the time. Nevertheless, these stages should be + For instance, efficient solutions to the Simultaneous Localization and Mapping problem suffer from software complexity [17]. Also, the autonomous acquisition of symbols from sub-symbolic information [8] is still an open issue.
Heterogeneity as a Corner Stone of Software Development in Robotics
15
understood as a convenient discretization of a continuous process; thus it is not rare that the works mentioned here can be considered to belong to more than one. The first stage we can set in the evolution of robotic software, that we could call raw robotic programming, covered from the lately sixties until the late eighties of the XX century. In that period robotics programming was limited to solutions based on direct hardware implementation of algorithms [4] or ad-hoc programming of concrete hardware platforms [23], [32]. Most of the development of programming languages for robots was focused during that period on industrial manipulators [25], although those languages were of a very low level (close to assembler). In a second stage, that we could call middleware robotic programming and extended around the early nineties of the XX century, the goal for the robotic software developers shifted to provide pre-defined software platforms to control the physical devices of the robot (actuators and sensors); in spite of this software being tightly coupled to specific hardware, it alleviated the, until the moment, heavy task of programming complex robots. In this period, some robotic software was in fact real-time operative systems, like ALBATROSS [36], Harmony [20], or Chimera II [33]. But this stage do not stopped there: these platforms led to the ability of a more complex processing, and, accordingly, the notion of robotic control architecture (a set of software elements or modules that worked together in order to achieve a robotic task) also received attention by the robotics community+. So, the first years of that 90's decade also offered interesting architectures like TCA [29] or NASREM [1]. Since then, architecture solutions were continuously released to the robotics arena: e.g., new robotics fields (for example, multirobots) demanded their own architectural approaches ([24]). Finally, we can distinguish a last stage of robotics software that embraced from the mid nineties to present and can be called robotics software engineering. The key point at this stage is that some SE aspects are considered when programming robots, mainly due to the still increasing complexity of robotic applications. Now, the goal is not to produce a closed or static architecture, but a framework that allows the developer to produce the architectural solution he/she may need in his/her particular situation∗. Examples of free, commercial, and/or academical frameworks are ORCCADD [30], Cimetrix's CODE [7], RTI's ControlShell [28], GeNoM [16], NEXUS [9] (a previous instance of our current BABEL development system), OSACA [31], OROCOS [35], Player/Stage [26], CARMEN [37], MARIE [38], RobotFlow [25], CLARAty [13], or Microsoft Robotics Studio [22]. Different SE concepts -like object-oriented programming, software lifecycle, software validation, reusability, CASE tools, or automatic code generation- are being progressively included into these frameworks. However, not all of these focus on SE in the same manner or intensity. In particular, it is very common that they are not able to deal with heterogeneity in a desirable way, which is our aim with BABEL.
+ Notice that a robot software architecture can be conceptually seen nowadays as a robotic middleware. ∗ For more details on the current state of the art of robotic software, you can consult [5].
16
Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo et al.
3. Towards a Heterogeneity-Based Robotic Software Development System Currently, large and/or long-term robotic projects involve many different researchers with very different programming needs and areas of research, using a variety of hardware and software that must be integrated efficiently (i.e., with a low development cost) in order to construct applications that satisfy not only classic robotic requirements (fault-tolerance, realtime specifications, intensive access to hardware, etc.) but also software engineering aspects (reusability, maintainability, etc.). This indicates three main sources of heterogeneity: hardware, software, and methodological. They appear with different strength at each stage of the robotic application lifecycle: analysis, design, verification, implementation, validation, debugging, maintainance... Our aim with the identification and inclusion of heterogeneity as one of the pervasive features of robotic applications is to set the basis for a comprehensive software development framework, in the sense that it covers all the stages of the robotic software lifecycle. Up to now, we have reported, in the context of our BABEL development system, tools and methodologies for stages that are present in the most common models for software development (Waterfall, Iterative, etc. [27]), which are described in the following.
3.1. Robotic Software Design Software design consists of finding a conceptual solution to a given problem in terms of software components and methodologies. From the heterogeneity perspective, the design of a robotic application should be the foundation for integrating diverse elements while guaranteeing certain requirements (produced by a previous stage of analysis, not covered here). The problem in robotic systems is that there is no wide standardization for components, and thus, forcing the use of one standard or of some unique framework is difficult to achieve. Our philosophy is the opposite to that: we consider heterogeneity in components as a core feature of the design framework, that is to be preserved. Thus, we maintain the framework to a minimum, stablishing the smallest structural and behavioral ontologies of design that allow us to express the most important requirements in a robotic application without sacrificing diversity. BABEL provides a heterogeneity-based specification for the design of robotic applications, currently called Aracne. Aracne is based on three design ontologies: structural, behavioral, and the structural-behavioral link. Each of these contains a minimalistic set of specifications where most of the heterogeneity present in a complex robotic application can fit. Recently, Aracne is being extended as a specification language, called H [12]. The main features of Aracne/H are: ¾ Clear identification of both the software components that are portable and those that are tied to some platform. The former have a low “heterogeneity level”, while the latter comprise most of the heterogeneity present in the application. ¾ The structural ontology is based on active objects or components [34], called modules, that provide certain services to other modules and maintain an internal
Heterogeneity as a Corner Stone of Software Development in Robotics
¾
¾
¾
¾
¾
17
status. This ontology is different from the one of the object oriented paradigm, since ours includes execution and intercommunication models explicitly (concurrency, synchronization, client-server/subscriber-publisher behavior, etc.). The inclusion of an execution model into the structural ontology links it with the behavioral ontology, where the logic of services is specified. Aracne/H permits us to design the logic (code) of the modules in different programming languages or visual methodologies, in a way that isolates and highlights heterogeneity. The structural-behavioral link makes up a complete design for the application. We currently include in the ontology for this link basic fault-tolerance mechanisms in the form of software replication [21]. The three ontologies of Aracne/H include the explicit specification of the most important robotic requirements, namely: hardware dependencies, real-time, and faulttolerance. Aracne/H allows us to design, with the same specification, both distributed and nonnetworked applications, hard, soft, and mixed real-time systems, modules with different programming paradigms, etc. Finally, due to its intrinsic minimalistic nature, the specification is open to cope with most off-the-shelf components and with their evolution in time, important concerns in the component-based software engineering field (CBSE) [34].
Currently we have a visual designing tool, called the Module Designer, that implements the Aracne specification and allows us to make up the design of the application easily, integrating different programming languages and platforms. We have developed a number of robotic applications over the last decade using Aracne (see for example [14], [15], [13], [12]), obtaining important benefits, as summarized in section 4.
3.2. Robotic Software Implementation One of the most relevant differences of BABEL with respect to other approaches is that it considers into the design all the elements of the application, which includes behavior, i.e., code. That allows us to develop tools for generating implementations almost directly from the design. The Module Designer of the BABEL development system is a CASE tool that not only facilitates the heterogeneous design, but is also able to transform automatically that design into an implementation, which can be supported by heterogeneous execution platforms. Its main features are: ¾ It provides a user-friendly integrated development environment for visually designing modules according to the Aracne/H specification, specifying their public interfaces, services, codifications and dependencies. ¾ The tool automatically generates the software for converting that design into a complete executable program and for the integration of this program into a (possibly distributed) robotic application composed of other modules.
18
Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo et al. ¾ It includes the possibility of generating implementations for a given set of particular platforms, and is extensible to platforms not covered yet. Examples of the platforms and languages supported is reported elsewhere [13].
The Module Designer also includes logging and debugging facilities that can be placed at critical paths in the logic (code) of modules. This links the implementation to the verification and validation stages.
3.3. Robotic Software Verification and Validation The goal of software verification is to guarantee that a given design/implementation satisfies all its requirements. In a design made with the Aracne specification it is easy to check a number of possible pitfalls during design and also conflicts between dependencies during implementation, independently on the highly heterogeneous nature of the components that integrate the application. This includes: ¾ Checking the possibility of satisfying the real-time requirements of the different components (currently simplified to WCETs -Worse-Case Execution Times-). ¾ Checking if all the platforms needed for the satisfaction of the requirements are present in the deployment (otherwise, the Module Designer adapts the implementation for reduced requirement satisfaction). ¾ Checking for some limited kinds of dependency cycles that could end in deadlocks. ¾ Also, the information present in the design is enough to carry out scheduling analysis when the application needs hard real-time. On the other hand, the goal of software validation is to check if a given design/implementation satisfies the intended application in practice. A debugger tool is included in BABEL that retrieves execution information for carrying out off-line analysis of the real-time performance for the cases where that information reflects robotic goals (for example, in navigation modules that need to react to the environment at predefined times). Sometimes, this tool also serves for verification, discovering errors or faults in programming.
3.4. Robotic Software Maintenance Large and/or long-term robotic projects cannot be carried out efficiently without software bootstrapping, that is, without reusing. However, it is quite evident the still present trend in robotics of repeating the same programming effort time after time for different platforms or control architectures. As long as this way of working is used, the development of complex robotic application will be severely handicapped. The Aracne design specification, and in particular its most recent extension H, is aimed to reusing by including some characteristics of the object-oriented paradigm. For instance, in H inheritance has been appropriately adapted to the heterogeneity of the design: the structural
Heterogeneity as a Corner Stone of Software Development in Robotics
19
design of a module can be separately inherited from the behavioral design, allowing us to maintain a repository of logical interfaces and a set of behavioral logics (code) that fit into them. In summary, inheritance allows us to specialize previous developments to new necessities, while the separation between the structural and the codification design isolates the changes that have to be made due to evolution of hardware or software. For coping with the important amount of information that this approach generates (which must be appropriately stored in any large robotic project), BABEL also includes a tool for maintenance. This tool is a web site under development [11] that holds all the designs produced up to now, classified through a simple versioning system, and accessable by the members of our group and their collaborators, since the data held there belongs to particular research projects. Nevertheless, some of the modules and results, and all the documentation and the most tested configuration of BABEL, are freely available from the site.
4. Results, Conclusions, and Future Work Along this text we have set one of the most important characteristics of robotic software, that differentiates it from other kinds of software applications: heterogeneity. This heterogeneity is to be understood as a pervasive characteristic: every stage and every level of detail in the development of a robotic application should deal appropriately with diverse components and methodologies. Based on this idea we have described our approach to the treatment of heterogeneity from a software engineering perspective. Our solution is a minimalistic software development system, called BABEL, that is aimed to cover the most important stages of the robotic software lifecycle. We have used BABEL for our robotic projects during the last decade. The benefits have been evident: ¾ It has made possible to break the complexity barrier in current robotic applications, mainly by enforcing the reuse of software and eliminating the completely reprogramming of algorithms for each new platform. ¾ It has allowed us to guarantee the most relevant requirements of robotics applications, mostly regarding dependability. ¾ The time effort dedicated to robotic programming has been, in general, transformed from exponential into linear [13]. ¾ BABEL has allowed us to develop very different research projects with very different requirements, hardware components (we have a number of different robots in our laboratories), and software (operating systems, libraries, etc.), without slowing down the development due to component evolvement and diversity. ¾ Our system has enabled the integration of people with very different skills into interdisciplinary groups of research (from under-graduate students to professors). Part of the effort that was not spent in re-programming existing applications for new robots is now devoted to maintaining the BABEL system. Thus, we are working in including new hardware and software components as we acquire new equipment, and also in adjusting in a continuous basis the ontologies of the Aracne/H specification to cover the most diversity we can while maintaining them as minimalistic as possible.
20
Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo et al.
However, this kind of effort has important drawbacks for us as robotic researchers. In an interesting article, Bruyninckx [6] mentions some of them: ¾ The small interest that a good software practice has for the robotics researcher, since it cannot be translated into citation index or other tangible results. ¾ Fundings are currently more centered on fundamental research, and not in middleware projects that, though do not offer new or original results, are the basis for a well-supported robotics research in the long term. ¾ Robotics experts are often not really interested in those advances that software engineering could offer to the developers of robotic software. From an optimistic point of view, it is clear that the interest on software engineering in robotics is increasing in the last years. Although there is much work to do to consolidate this trend, just the unavoidable necessity of breaking the complexity barrier in software development in order to build the robotic applications of the present and future should be enough for changing the robotic software status within the community.
References [1] Albus J.S., Quintero R., Lumia R. “An overview of Nasrem - The NASA/NBS standard reference model for telerobot control system architecture”. NISTIR 5412, National Institute of Standards and Technology, Gaithersburg, Md.. 1994. [2] Appukutam K.K., “Introduction to Mechatronics”, Oxford University Press, ISBN 9780195687811, 2007. [3] Balch T., Parker L.E. (eds), “Robot Teams: from Diversity to Polymorphism”, AK Peters, ISBN 1-56881-155-1, 2002. [4] Brooks R.A. “A Robust Layered Control System for a Mobile Robot”. IEEE Journal of Robotics and Automation, Vol. RA-2, no. 1. 1986. [5] Brugali D. “Software Engineering for Experimental Robotics”. Springer – STAR. 2007. [6] Bruyninckx, H. “Robotics Software: The Future Should Be Open”. IEEE Robotics and Automation Magazine, March 2008, pp. 9-11. 2008. [7] Cimetrix CODE. http://www.cimetrix.com/code.cfm. 2008. [8] Coradeshi S., Saffiotti A., “An Introduction to the Anchoring Problem”, Robotics and Autonomous Systems, vol. 43, no. 2-3, pp. 85-96, 2003. [9] Fernández J.A., González J. “The NEXUS Open System for Integrating Robotic Software”. Robotics and Computer-Integrated Manufacturing, Vol. 15(6). 1999. [10] Fernández-Madrigal J.A. Galindo C., González J., “Integrating Heterogeneous Robotic Software”, 13th IEEE Mediterranean Electrotechnical Conference (MELECON), Benalmádena-Málaga (Spain), May 16-19, 2006 [11] Fernández-Madrigal J.A., Cruz-Martín E., “The BABEL Development Site”, http://babel.isa.uma.es/babel2, 2008. [12] Fernández-Madrigal J.A., Galindo C., Cruz A., González J., “A Software Framework for Coping with Heterogeneity in the Shop-Floor”, Assembly Automation no. 4, vol. 27, pp. 333-342, ISSN 0144-5154, 2007.
Heterogeneity as a Corner Stone of Software Development in Robotics
21
[13] Fernández-Madrigal J.A., Galindo C., González J., Cruz E., and Cruz A., ”A Software Engineering Approach for the Development of Heterogeneous Robotic Applications”, Robotics and Computer-Integrated Manufacturing, vol. 24, no. 1, pp. 150-166, ISSN 0736-5845, 2008. [14] Fernández-Madrigal J.A., González J., “A Visual Tool for Robot Programming”, 15th IFAC World Congress on Automatic Control, Barcelona, Spain, July 2002. [15] Fernández-Madrigal J.A., González J., “NEXUS: A Flexible, Efficient and Robust Framework for Integrating the Software Components of a Robotic System”, IEEE International Conference on Robotics and Automation (ICRA'98), Leuven, Belgium, May 1998 [16] Fleury S., Herrb M., Chatila R. “GenoM: A Tool for the Specification and the Implementation of Operating Modules in a Distributed Robot Architecture”. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'97). 1997. [17] Fresse U., Larsson U., Duckett T., “A Multilevel Relaxation Algorithm for Simultaneous Localization and Mapping”, IEEE Transactions on Robotics, vol. 21, no. 2, pp. 196-207, 2005. [18] Galindo C., González J., Fernández-Madrigal J.A., “Control Architecture for HumanRobot Integration. Application to a Robotic Wheelchair”, IEEE Transactions on Systems, Man, and Cybernetics part B, vol. 36, no. 5, pp. 1053-1067, 2006. [19] Garey M.R., Johnson D.S., “Computers and Intractability: A Guide to the Theory of NPCompleteness”, Freeman (ed.), ISBN 978-0716710455, 1979. [20] Gentleman W.M., MacKay S.A., Stewart D.A., Wein M. “An Introduction to the Harmony Realtime Operating System”. Newsletter of the IEEE Computer Society Technical Committee on Operating Systems. 1988. [21] Guerraoui R., Schiper A., “Software-Based Replication for Fault Tolerance”, IEEE Computer. Vol. 30, no. 4., 1994 [22] Microsoft Robotics Studio. http://msdn.microsoft.com/en-us/robotics/default.aspx. 2008. [23] Mitchell T.M. “Becoming Increasingly Reactive”. Proceedings of the AAAI Conference. 1990. [24] Parker L.E. “ALLIANCE: An Architecture for Fault Tolerant Multi-Robot Cooperation”. IEEE Transactions on Robotics and Automation, 14 (2). 1998. [25] Paul R.P., “Robot Manipulators: Mathematics, Programming and Control”, MIT Press, ISBN 0-262-16082-X,1981. [26] Player Project. http://playerstage.sourceforge.net/. 2008. [27] Pressman R.S., “Software Engineering. A Practitioner's Approach”, 6th edition, McGraw-Hill, ISBN 978-0073019338, 2004. [28] Schneider S.A., Ullman M.A., Chen V.W. “ControlShell: a real-time software framework”. IEEE International Conference on Systems Engineering. 1991. [29] Simmons R., Lin L.-J., Fedor C. “Autonomous Task Control for Mobile Robots (TCA)”. Fifth IEEE International Symposium on Intelligent Control. 1990. [30] Simon D., Espiau B., Castillo E., Kapellos K. “Computer-Aided Design of a Generic Robot Controller Handling Reactivity and Real-Time Control Issues”. Rapports de Recherce nº 1801, Programme 4: Robotique, Image et Vision, INRIA. 1992. [31] Sperling W., Lutz P. “Enabling Open Control Systems – An Introduction to the OSACA System Platform”. Robotics and Manufacturing, Vol. 6, ASME Press New York. 1996.
22
Juan-Antonio Fernández-Madrigal, Ana Cruz-Martín, Cipriano Galindo et al.
[32] SRI: Shakey the Robot. http://www.sri.com/about/timeline/shakey.html. 2008. [33] Stewart D.B., Schmitz D.E., Khosla P.K. “The Chimera II Real-Time Operating System for Advanced Sensor-Based Control Applications”. IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 6, Nov/Dec. 1992. [34] Szyperski C., Gruntz D., Murer S., “Component Software: Beyond Object-Oriented Programming”. Boston, Ma., 2nd edition, Addison-Wesley, ISBN 0201745720, 2002. [35] The Orocos Project. http://www.orocos.org/. 2008. [36] Von Puttkamer E., Zimmer U.R. “ALBATROSS: An Operating-System under RealtimeConstraints”. Real-Time magazine, Diepenbemmd 5 – 1650 Beersel – Belgium, Vol. 5, no. 3, 91/3. 1991. [37] http://carmen.sourceforge.net, 2007 [38] http://marie.sourceforge.net, 2007 [39] http://robotflow.sourceforge.net, 2005 [40] http://claraty.jpl.nasa.gov, 2008
SHORT COMMUNICATIONS
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 25-35
ISBN: 978-1-60692-146-3 © 2009 Nova Science Publishers, Inc.
Short Communication A
EMBEDDING DOMAIN-SPECIFIC LANGUAGES IN GENERAL-PURPOSE PROGRAMMING LANGUAGES Zoltán Ádám Mann AAM Consulting Ltd.; Budapest University of Technology and Economics
Abstract In recent years, domain-specific languages have been proposed for modelling applications on a high level of abstraction. Although the usage of domain-specific languages offers clear advantages, their design is a highly complex task. Moreover, developing a compiler or interpreter for these languages that can fulfil the requirements of industrial application is hard. Existing tools for the generation of compilers or interpreters for domain-specific languages are still in an early stage and not yet appropriate for the usage in an industrial setting. This paper presents a pragmatic way for designing and using domain-specific languages. In this approach, the domain-specific language is defined on the basis of a general-purpose programming language. Thus, general programming mechanisms such as arithmetics, string manipulations, basic data structures etc. are automatically available in the domain-specific language. Additionally, the designer of the domain-specific language can define further domain-specific constructs, both data types and operations. These are defined without breaching the syntax of the underlying general-purpose language. Finally, a library has to be created which provides the implementation of the necessary domain-specific data types and operations. This way, there is no need to create a compiler for the new language, because a program written in the domain-specific language can be compiled directly with a compiler for the underlying general-purpose programming language. Therefore, this approach leverages the advantages of domain-specific languages while minimizing the effort necessary for the design and implementation of such a language. The practical applicability of this methodology is demonstrated on a case study, in which test cases for testing electronic control units are developed. The test cases are written in a new domain-specific language, which in turn is defined on the basis of Java. The pros and cons of the presented approach are examined in detail on the basis of this case study. In particular, it is shown how the presented methodology automatically leads to a clean software architecture.
26
Zoltán Ádám Mann
1. Introduction In the last decades, the requirements toward software have become tougher and tougher. The complexity of the problems that are solved by software is growing, while at the same time the expectations concerning numerous other, non-functional, aspects (for instance, maintainability, usability, fault-tolerance, parallelism, throughput etc.) have also increased significantly. Moreover, in today’s highly competitive software market, it is crucial to minimize time-to-market for software, to be able to quickly add fixes or new features to products. Since the human brain has not evolved significantly in this time, the only way to create more complex software more quickly is to raise the level of abstraction for software development. Just imagine how it would be to develop software that should fulfil today’s requirements, if you had to keep in mind which piece of data is in which register of the processor! In order to cope with increasing complexity, the profession moved from machine code to assembler, from assembler to high-level programming languages, then to object orientation, to component orientation etc. Today, we think in terms of high-level programming abstractions, such as components, threads, GUI elements etc., and not in terms of what the hardware can offer (registers, memory addresses, interrupts). Despite all this development, the requirements are still ahead of what we can deliver safely with our current software development practices. So, what will be the next quantum leap in increasing the level of abstraction? Many researchers agree that the destination of this journey will be some kind of model orientation [6]. Software development will mean creating an abstract, logical model of what the software is supposed to do, without technical details on how it will fulfil those aims. As formulated by Brooks in his seminal paper “No silver bullet,” the essence of software development is the construction of abstract, conceptual structures; the difficulties arising from the representation of these structures within the framework of a programming language are just accidental and are decreasing with scientific progress [1]. There are some debates in the research community on what the future model oriented software development process will look like∗: •
•
One possibility is to define a universal modelling language that can be used for the development of any software application. Most notably, the Object Management Group (OMG) follows this path with the Unified Modelling Language (UML) [12]. In contrast, others argue that modelling at a really high level of abstraction is only possible with domain-specific concepts, which can be best accomplished by a domain-specific language (DSL). In recent years, this latter approach has gained tremendously in popularity [2] and is also the topic of this paper. More on DSLs can be found in Section 0. Another question is how to bridge the gap between the abstract model and the real features of the available platform. Two main approaches can be distinguished, similar to compiled vs. interpreted programming languages. The first approach
∗ Also, there are minor differences in the terminology, e.g. model-based vs. model-driven vs. model-oriented.
Embedding Domain-Specific Languages…
•
27
consists of generating (possibly in more than one step) program code from the model, after which the code can be executed using traditional mechanisms. For instance, the OMG’s Model-Driven Architecture (MDA) paradigm falls into this category [11]. The other approach consists of executing the model itself with a suitable model interpreter. As an example, the Executable UML (xUML) approach belongs to this category [9]. When hearing the word ‘model,’ one tends to think of a graphical representation, like an UML model. However, graphical modelling has its limitations. Not only is a graphical representation less appropriate for machine processing, but also for the human reader, it is quite hard to understand hundreds (or more) of pages of graphical models. Usually, a textual model is more concise and can therefore scale better in model size when readability is concerned. Thus, textual modelling languages became more popular in recent years [5].
In the rest of the paper, textual domain-specific languages are considered. The issue of generating code from the model vs. interpreting the model itself will be discussed in more detail.
1.1. Paper Organization The rest of the paper is organized as follows. In Section 0, the concept of DSLs is described in more detail, with special emphasis on the challenges associated with the development of a DSL. Section 0 contains a case study, introducing the domain of testing electronic control units. In this domain, there is a need for a DSL for the specification of test cases. Section 0 describes the proposed pragmatic way of defining a DSL based on a generalpurpose language in principle, followed by the second part of the case study in Section 0, in which the practical applicability of the proposed approach is presented for specifying test cases for electronic control units. Section 0 contains a discussion of the lessons learned in the application of the proposed methodology, while Section 0 concludes the paper.
2. DSLs 2.1. General Properties of DSLs A DSL is a language for the description of programs, or of models of programs∗, on a specific field of application (called a domain). Since the language is tailored to one domain, complex constructs and abstractions of the domain can be supported directly by the language. A number of benefits are expected from this clear focus on one domain, such as: • •
Concise representation of complex issues; Gain in productivity;
∗ From a theoretical point of view, the distinction between a program and a model of the program is artificial, since a model can be defined as an abstract representation of a system, and thus the program itself can also be regarded as a model.
28
Zoltán Ádám Mann • • •
Improved maintainability; Better communication between IT and functional departments; Efficient development of variants and of software product lines.
The idea of domain-specific languages is not new. There are several languages that are already widely used and can be regarded as a DSL, for instance: • • • •
SQL (Structured Query Language) for the definition of database queries and manipulations; XSLT (eXtensible Stylesheet Language Transformation) for the definition of transformations between XML (eXtensible Markup Language) files; sed scripts for string manipulations; make scripts for directing the software build process.
As can be seen from this list, these widely used DSLs are usually tailored to a technical domain. For functional∗ domains, the idea of DSLs can also be leveraged; however, by the nature of functional domains, these languages are usually known and used only by a limited set of experts. Examples include: • • • •
CPL (Call Processing Language) for the definition of Internet telephony services; BPMN (Business Process Modeling Notation) for the definition of business processes; OWL (Web Ontology Language) for the definition of ontologies for the Semantic Web; VoiceXML for the definition of interactive voice dialogues between a human and a computer.
2.2. Creating DSLs Developing a DSL and the supporting tool chain is a time-consuming process requiring much care and deep expertise [10]. The process can be divided into five phases: decision, analysis, design, implementation, and deployment [5]. Out of these, especially challenging are the analysis phase and the implementation phase. In the analysis phase, the constructs of the domain that should be integrated into the language have to be identified and formalized. Although there are several methodologies for this kind of domain engineering, this phase is definitely time-consuming and requires special expertise. In the implementation phase, the necessary tool chain must be developed for the language: editor, compiler / interpreter, debugger, profiler etc. Parts of this can be automated ∗ In this context, the distinction between technical and functional is as follows. Functional issues are those intrinsic properties of the system which result directly from the end user’s functional requirements. In contrast, technical issues are related to the implementation of the system with a specific technology. Accordingly, a technical domain is relevant for the IT expert, whereas a functional domain may also be relevant to the end user.
Embedding Domain-Specific Languages…
29
(e.g., parser generation), and there are also language workbenches [3] for facilitating the whole process (e.g., Eclipse Modeling Framework, Microsoft Visual Studio DSL Tools). However, not all the steps can be fully automated, so that creating efficient tools for a nontrivial DSL remains a difficult process with a lot of manual work. In particular, developing a compiler or interpreter for the language that can fulfil the requirements of industrial application is hard.
3. Case Study – Part 1 As a case study for a domain-specific language, the domain of testing electronic control units (ECUs) in vehicles is considered. An ECU is an embedded computer system with a specific control function within a bigger mechatronic system. For instance, a high-end car contains nowadays up to 80 ECUs (e.g., ABS, tuner, night vision camera control, airbag control etc.). The ECUs within a car are interconnected so that they can exchange messages. For the interconnection of ECUs several bus technologies are in use, from which two are the most common ones: Controller Area Network (CAN) and Media Oriented System Transport (MOST). CAN supports the transmission of 8-byte messages with a data rate of up to 500 kilobit/sec and a non-negligible error rate. MOST is a more expensive technology, supporting the safe transmission of messages of up to 4 kilobyte in length and a data rate of up to 23 megabit/sec. Moreover, the two technologies differ significantly in their addressing scheme. Car manufacturers spend huge amounts of resources with testing whether every ECU obeys its specification in every possible combination and under all imaginable circumstances. Testing ECUs has several flavours and there are several methodologies. In this paper, we will focus on the testing of generic system functions (e.g. power management, security, personalization) that have to be implemented in every ECU according to the same logical specification, but with different technical details, e.g. depending on the bus technology used by the ECU (CAN/MOST) [4]. It should also be noted that, in order to find errors in the earliest possible stage, these generic system functions are usually first implemented and tested in the form of a software simulation on a PC. The testing of these functions basically consists of sending different sequences of messages to them and comparing the replies from the ECU with the expected behaviour (whether there was a reply at all; whether timing requirements were met; whether the data in the reply were as expected etc.). Now the challenge is the following. The test cases for testing (a) the PC simulation of the function; (b) the implementation of the function in a CAN ECU; (c) the implementation of the function in a MOST ECU are almost the same at the logical level. However, at the level of the communication technology, the three cases are quite different. The aim is to define the test cases only once, at a sufficiently high level of abstraction, and use them in all three cases (see Figure 1). Thus, the goal is to define a DSL with the following main concepts: • • •
Sending of messages with defined content to the System Under Test (SUT); Waiting for messages from the SUT with given timing constraints; Comparing the contents of a received message with a predefined pattern.
30
Zoltán Ádám Mann Simulation
simulated ECUs
execution for
Model of testcase Testcase Testcase execution for CAN CAN ECU execution for MOST
MOST ECU
Simulated boardnet
Figure 1. The same logical test cases should be executed for different ECU implementations.
The DSL should be free of any references to the specific communication technology; however, it should be possible to run the test cases without modification on any of the supported technology platforms. The resulting DSL is described in Section 0.
4. A Pragmatic Approach to DSL Development In light of the challenges associated with the development of a DSL (see Section 0), we suggest that DSLs should be developed from scratch only if (a) there are some specific requirements concerning the tool chain that are otherwise hard to fulfil (e.g., hard requirements concerning performance may require a very specific optimizing compiler) and (b) the foreseen wide-spread usage of the DSL justifies the efforts. Otherwise, we propose using a pragmatic approach in order to leverage the benefits of DSLs even in projects with very limited budget, as follows. The DSL should be defined on the basis of an existing general-purpose programming language (GPL). Thus, general mechanisms such as arithmetics, string manipulations, basic data structures etc. are automatically available in the DSL. Additionally, the designer of the DSL will of course define further, domain-specific, constructs. These can be categorized as data types and operations. Both can be defined without breaching the syntax of the underlying GPL, as data types and operations in the GPL. Finally, a library has to be created which provides implementation in the GPL for the defined domain-specific data types and operations. In other words: the DSL is nothing but a GPL enriched with domain-specific data types and operations, which are defined in the GPL themselves. A program written in the DSL is thus at the same time also a program in the GPL. The representation of the DSL within the GPL is possible, because there are so many degrees of freedom in the design of the DSL. Usually, the requirements concerning the future
Embedding Domain-Specific Languages…
31
DSL are very high-level: what kinds of domain constructs should be available in the language and what kinds of operations should be possible on these constructs (see for instance the requirements formulated in Section 0 in connection with the DSL for ECU test case specification). There are usually no strict constraints on the syntax of the language, so any logical, readable, and coherent syntax can be used. Thus, the syntax of a GPL is usually applicable. This approach has several major advantages. First of all, since a program in the DSL is at the same time also a program in the GPL, the whole tool chain of the GPL can be used for programs written in the DSL. This way, the efforts involved in the creation of the DSL are drastically reduced. Moreover, it is safe to assume that the tool chain of a GPL is significantly more mature – concerning comprehensiveness, correctness, documentation, etc. – than the tools that would be created for the sake of the DSL. Furthermore, many useful features of the DSL can simply be inherited from the GPL “free of charge,” such as macros, inheritance, etc. – features that you might not bother to include in the language if developed from scratch. Of course, this approach also has some limitations. If, for some reason, there are very specific requirements concerning the syntax of the DSL that cannot be found in any available GPL, then this approach cannot be applied. Also, this approach does not yield a clear separation between code and model, which can be a problem if some team members are supposed to work on the model only. In any case, since the presented approach allows for the quick and easy construction of DSLs, it can be used as a rapid prototyping methodology. Suppose for instance that for a given domain DSL1 is created using the above methodology. The language can be tried in practice and fine-tuned based on the experience in an early stage of the design. Afterwards, a second language DSL2 can be created which is semantically equivalent to DSL1, but its syntax is closer to the logical syntax of the domain instead of the syntax of the GPL. Then, only a DSL2-to-DSL1 compiler must be created, in order to have a full-fledged DSL (namely, DSL2) with moderate efforts.
5. Case Study – Part 2 We have applied the presented approach to the ECU test case specification domain presented in Section 0. That is, we developed a DSL for the black-box testing of ECUs, based on Java as the underlying GPL.
5.1. Design of the DSL The building blocks of test cases consist of the sending of messages to the ECU and the waiting for the reply from the ECU. The most common scenario is that the test case sends a message to the ECU and expects a reply in a given time frame. Such a building block can have a variety of parameters, such as: • • •
The message that should be sent to the ECU; Criteria for the acceptance of the reply (e.g. bit 12 must be 0 in the reply); Minimum and maximum reply time.
32
Zoltán Ádám Mann
There are other, similar building blocks. For instance, it is possible to specify that, after having sent a given message to the ECU, no reply must arrive within a given time frame. From such building blocks, complex test cases can be compiled, as shown in Figure 2. //Inquiry + 1st reply messageToSend=new MessageInquiry(TEST_ID); idOfExpectedReply=TEST_ID; minimalWaitTime=0; maximalWaitTime=100; sendMessageAndWaitForReply(); //2nd reply minimalWaitTime=175; maximalWaitTime=225; waitForReply(); //3rd reply waitForReply(); //wait one more period; no further reply should arrive waitNoReply();
Figure 2. Example test case.
This test case tests that an inquiry sent to the ECU results in exactly 3 replies, from which the first one must arrive within 100 milliseconds after sending the inquiry and the next two with a gap of approximately 200 milliseconds between them. The grammar of the DSL is specified in EBNF notation in Figure 3. := * := <param-spec> := <param-spec-opt> <param-spec> := + <min-time-spec> <max-time-spec> <param-spec-opt> := ? * <min-time-spec>? <max-time-spec>? := "messageToSend=new MessageInquiry(" ");" := | … := "idOfExpectedReply=" ";" | … <min-time-spec> := "minimalWaitTime=" ";" <max-time-spec> := "maximalWaitTime=" ";" := "sendMessageAndWaitForReply();" | "sendMessageNoReply();" | "waitForReply();" | "waitNoReply();"
Figure 3. Grammar of the DSL for ECU testing.
Embedding Domain-Specific Languages…
33
5.2. Implementation of the DSL The domain-specific constructs of the language – the attributes minimalWaitTime, messageToSend etc. as well as the operations waitForReply() etc. – are specified in an abstract Java class called ECUTest. All test cases are Java classes that inherit from this abstract class, so that these attributes and operations can be used in all the test cases (see Figure 4). Of course, the exact behaviour of these operations depends on the used technology (calling Java routines vs. using CAN messages vs. using MOST messages). Hence, the operations in the class ECUTest do nothing but delegate the work to an adapter. The class ClientTest can be parameterized with different adapters according to the used technology. All details concerning the technology are encapsulated in the relevant adapter. The test cases themselves are free of any technology-related details. The domain-specific language constructs are defined in the parent class.
Simulation ECUTest
AdapterCan CAN ECU
Test1
Testn
AdapterMost
AdapterJava
MOST ECU
Simulated boardnet
Figure 4. Embedding the DSL into Java via inheritance.
This way, the challenge described in Section 0 is met: the test cases are specified at a logical level, only once, but can be used without any modifications with the different technologies.
6. Discussion Based on the presented case study, two issues are discussed: • •
The consequences of using an underlying GPL on the DSL; The resulting software architecture.
34
Zoltán Ádám Mann
6.1. Consequences on the Language As can be seen from Figure 2 and Figure 3, the syntax of the language is sufficiently simple, and contains only constructs of the given domain. Thus, expressing test cases in the DSL is really simple and easy to understand. In particular, it is much simpler than its implementation in Java, in particular because the latter involves at least two threads: one for receiving the asynchronous incoming messages, the other for checking the elapsed time and interrupting the first one after the specified amount of time. The chosen syntax elegantly hides this complexity from the user, who can thus focus on the logic of the test case instead of the difficulties of multithreaded Java programming. It should also be noted that, although embedding the DSL into Java imposes some constraints on the syntax of the DSL (e.g., every command must be followed by a semicolon), these restrictions are not disturbing at all. What is more, the embedding in Java provides a lot of powerful features free of charge. For instance, comments can be added to the test cases according to Java syntax, although this was not explicitly defined in the language grammar. More importantly, when judging the acceptability of the data contained in an incoming message, the full power of Java can be used to perform sophisticated computations (e.g. take a substring of the data field, interpret it as a hexadecimal number, compute a formula based on this number etc.). Defining these features from scratch, without relying on the existing features of Java, would be a quite tedious and time-consuming task.
6.2. The Resulting Architecture When looking only at the result in Figure 4, one could argue that this is a pure Java system, without any use of a DSL. In a way, this is indeed the case: through the embedding in Java, at the end all artefacts are in Java, and the DSL is not visible at all. However, when assessing Figure 4 thoroughly, one can also state that the result is a really clean architecture in which technology-related code and functional (i.e., test-related) code is successfully separated, in the sense of separation of concerns. It should also be noted that this feature is guaranteed automatically by the usage of the DSL, since the DSL only contains constructs of the domain, and no technology-related issues. Therefore we can conclude that even if the DSL is not visible in the final product, its use is justified also by the final product because the consequent use of the DSL leads automatically to the presented clean software architecture. Moreover, as mentioned in Section 0, the option is still available to extract the test cases from the Java program into separate non-Java files, and transform them in an automated way to Java or interpret them on-the-fly, if a looser coupling is needed.
7. Conclusion This paper has presented a pragmatic approach for the development of DSLs, in which the DSL is not created from scratch, but rather on top of an existing GPL. The GPL is extended with domain-specific constructs that are defined and implemented as data structures
Embedding Domain-Specific Languages…
35
and operations in the GPL. This way, the tools for the GPL can be used directly also in connection with the DSL, which drastically reduces the efforts of implementing the DSL. The presented case study showed how this approach can be applied in practice. A DSL has been devised for the black-box testing of ECUs on the basis of Java. With the presented approach, it is possible to specify the test cases at a high level of abstraction, without making any reference to the underlying technology (whether CAN, MOST, or direct Java function calls). The resulting DSL is simple and easy to use; moreover, its use leads automatically to a clean software architecture. To sum up: the presented approach helps to leverage the power of DSLs even in small projects in which creating a full-fledged DSL from scratch would not be feasible.
References [1]
Brooks, F. P., Jr.: No Silver Bullet – Essence and Accidents of Software Engineering, Computer, 1987 [2] Cook, S.; Jones, G.; Kent, S.; Wills, A. C.: Domain-Specific Development with Visual Studio DSL Tools, Addison-Wesley, 2007 [3] Fowler, M.: Language workbenches – the killer app for domain specific languages?, http://www.martinfowler.com/articles/languageWorkbench.html [4] Heider, A.; Mann, Z. Á.; Staudacher, B.: Verteiltes System, Automobil Elektronik, 03/2006 [5] Karlsch, M.: A model-driven framework for domain specific languages, Master’s thesis, Hasso-Plattner-Institute of Software Systems Engineering, 2007 [6] Kempa, M.; Mann, Z. Á.: Aktuelles Schlagwort: Model Driven Architecture, Informatik Spektrum, August 2005 [7] Ludwig, F.; Salger, F.: Werkzeuge zur domänenspezifischen Modellierung, OBJEKTspektrum, 03/2006 [8] Luoma, J.; Kelly, S.; Tolvanen, J.: Defining Domain-Specific Modeling Languages – Collected Experiences, Proceedings of the 4th OOPSLA Workshop on Domain-Specific Modeling, 2004 [9] Mellor, S; Balcer, M: Executable UML – A foundation for model-driven architecture, Addison-Wesley, 2002 [10] Mernik, M; Heering, J.; Sloane, A. M.: When and how to develop domain-specific languages, ACM Computing Surveys, volume 34, issue 4, pages 316-344, 2005 [11] Object Management Group: Model Driven Architecture, http://www.omg.org/mda/ [12] Object Management Group: Unified Modeling Language, http://www.uml.org/
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 37-68
ISBN: 978-1-60692-146-3 © 2009 Nova Science Publishers, Inc.
Short Communication B
STUDYING KNOWLEDGE FLOWS IN SOFTWARE PROCESSES Oscar M. Rodríguez-Elias1,a , Aurora Vizcaíno2,b, Ana I. Martínez-García3,c, Jesús Favela3,d and Mario Piattini2,e 1
University of Sonora, Matemathics Department, Hermosillo, Son., Mexico Universidad de Castilla-La Mancha, Escuela Superior de Informática, Paseo de la Universidad No. 4, Ciudad Real, Spain, 3 CICESE Research Center, Computer Science Department, Ensenada, B.C., Mexico 2
Abstract Many authors have observed the importance of knowledge for software processes. This fact has caused that every time more researchers and practitioners initiate efforts to apply knowledge management in software processes. Unfortunately, much of such efforts are just oriented to aid big software companies, and in using existing knowledge management systems or strategies that have not been developed following the specific and particular knowledge needs of the process in which they are included. This fact has caused that often such efforts do not really help to the people who should benefit by using them. In this chapter we state that one way to address this problem is to first study software processes from a knowledge flow perspective, in order to identify the particular knowledge needs of such processes to then be in a better position for proposing systems or strategies to address those needs. This chapter presents an approach which has been used to accomplish the last objective. Its possible benefits are illustrated through the results of a study in a software maintenance process within a small software organization.
a
E-mail address:
[email protected] E-mail address:
[email protected] c E-mail address:
[email protected] d E-mail address:
[email protected] e E-mail address:
[email protected] b
38
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
Introduction Software engineering is one of the most knowledge intensive jobs (Robillard, 1999). Thus, having a good knowledge management (KM) strategy in these organizations is very important (Aurum et al., 2003). Nevertheless, there are some factors which negatively affect the use of traditional KM systems (KMS) in software companies in many cases (Desouza, 2003). Some reasons for the latter are related to the fact that much KMS are not designed taking into account the real knowledge needs of the knowledge workers of organizations (Stewart, 2002). Developing KMS considering the daily work done in organizational processes is important to get them accepted by their potential users. This is of particular relevance in small and medium size organizations, since it is often difficult for those organizations to have enough resources for performing big changes into their processes to include KMS or strategies (Sparrow, 2001). In fact, small organizations should be capable of identifying the role that KM plays in their activities and current systems before developing or acquiring new tools (Wong, 2005). They should search for ways to integrate these current systems as part of the KM strategies designed, in such a way that these should be aligned to the every day work carried out by their employees. In this chapter, we present a methodological approach which has been useful to achieve the last objective in a software process. The methodology is oriented to aid in the study of organizational processes to obtain information useful to propose the design of KMS and strategies which consider the real knowledge needs of the knowledge workers; also considering the tools being used by them to achieve their job’s activities. The methodology is composed of a set of steps that pursuit three general goals: 1) to model organizational processes from the point of view of the knowledge that flows through them, 2) to analyze the knowledge, sources, their flow and the manner in which all these interact with the activities of the processes, and 3) to identify the tools or systems that support the process and that can play an important role as knowledge flow enablers. The information obtained by accomplishing these goals, should be finally useful to aid in the proposal of KM systems or strategies focused on improving the knowledge flow within the studied processes. The methodology was taken into practice in the domain of software processes, to evaluate their possible benefits and limitations. The main results of this case study are also presented in this chapter. The remain of this chapter is organized as follows: the next section, titled background, presents a general view of the use of knowledge management in software engineering, its importance, the main approaches reported in literature, and some of the main open issues we have observed. We are particularly interested on small software organizations, for which we also include in this section a subsection focused on addressing this issue, to finally discuss about the integration of KM into the every day work processes to introduce the need of an approach such as the one described in this chapter. In the third section we present the methodological approach we have followed to study software processes with the goal of designing KMS or strategies. Later, section four presents examples of the application of this approach and the results obtained from its use in a real setting. After this, a discussion of the results and observations from this case study is presented, to finally conclude and present some directions for further work.
Studying Knowledge Flows in Software Processes
39
Background Three main areas have motivated the present work, the improvement of knowledge flow in software engineering processes, the state of such an approach in small software development companies, and finally, the use of process-oriented KM to improve work processes in organizations. Next we present a discussion in such fields, in order to position the context of the present work.
Software Engineering and Knowledge Flow One of the main assets of a software organization is its organizational knowledge (Aurum et al., 2003). A KM strategy (Haggie & Kingston, 2003) can bring many benefits to software organizations which include time and cost reduction, quality increase, and improvement of the work environment or the software developer’s performance, see for instance (Dingsøyr & Conradi, 2002; Rus & Lindvall, 2002; Tiwana, 2004). Improving knowledge flow is a perhaps the main goal in KM initiatives (Borghoff & Pareschi, 1997). Consequently, in order to provide KM support in an organization, one important step is to understand how knowledge flows through it (Nissen, 2002). The flow of knowledge in software companies may be affected by many factors, such as inexistence or not updated documentation, often personal turnover, inexperienced employees, etc. These types of problems make KM an important concern for processes such as software maintenance (Anquetil et al., 2007), since they can be also related to a low level maturity of such processes, causing, for instance, dependability on source code and in the personal memory of employees (Singer, 1998). On the other hand, some characteristics of maintenance organizations with high level maturity can be related to the use of KM activities, such as updated documentation or management of expert skills (Chapin, 2003). Therefore, carrying out a process analysis focused on knowledge flow has the potential of bringing various advantages to software companies, as it can help to: •
•
•
Identify knowledge-related problems. A formal representation of a process can be used as a communication channel between the actors of the process, and the process analyzers. This can enable brainstorming to identify problems such as knowledge bottlenecks, and a search for possible solutions to solve them (Hansen & Kautz, 2004). Increase the information of the knowledge and knowledge sources involved in the process. The explicit representation of elements in process models facilitates the analysis of those elements (Conradi & Jaccheri, 1999). People analyzing models with explicit representation of knowledge and its sources can be induced to think about those elements and, as a consequence, provide more information about them. Identify tools that can be integrated within the KM initiative. An important part of successful KM initiatives is to integrate the organizational technical infrastructure (Jennex & Olfman, 2005), and to connect the KM support to the real work of the organization (Davenport, 2007). Many tools often used in software development may be powerful knowledge sources or KM systems (Lindvall & Rus, 2003). However, those systems might not be used to their full potential as KM systems. Analyzing
40
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
•
•
•
software processes and having a special focus upon the way in which the tools used in the process can provide support to the knowledge flow may facilitate their integration as a part of the KM initiative. Identify requirements in order to acquire or develop new tools through which to improve the knowledge flow. Once the problems affecting the knowledge flow are identified, it is easier to define requirements through which to modify the tools currently used, or to acquire or develop new tools to solve those problems. Analyze the effects of including KM strategies in the process. Process models which consider the knowledge flowing through the process can be used to analyze the effects caused by the inclusions of changes in the knowledge flow (Nissen & Levitt, 2004); this can, for instance, be carried out by modifying the process models of the current process to reflect the changes, and then comparing both models in order to analyze how these changes may affect the process. Improve the assignment of human resources. Assigning people appropriate roles in the development process is crucial to the attainment of a productive development team (Acuña et al., 2006). One of the main steps for this is to identify the profile required for each role, which includes the identification of the knowledge and skills that are required for the people that will play a specific role. Software process models that consider the knowledge and skills required in the activities performed by the different roles can be used to obtain information with which to define those profiles.
KM in Small Software Organizations Traditional approaches for managing knowledge in software organizations require the use of staff that are in charge of packaging knowledge or experiences, maintaining an experience base, and supporting software projects in identifying and using appropriate experiences (V. R. Basili et al., 1994). Since such staff must be separated from the developers’ staff, this may entail the need for resources that might not be easily available to small companies, although such approaches have been successfully applied in large software companies, see for instance (Dingsøyr & Conradi, 2002; Schneider et al., 2002). The latter point has motivated some researchers to propose lightweight approaches through which to help small organizations to adopt experience reuse. For instance, some approaches focus upon the use of postmortem analysis to capture experiences, either at the end of development projects (Dingsøyr, 2005; Dingsøyr et al., 2001), or during maintenance projects (Anquetil et al., 2007; de Sousa et al., 2004). It is clear that small software companies are different to the large ones in many aspects (Richardson & von Wangenheim, 2007), they have fewer resources which are often not sufficient for them to be able to engage in a novel KM approach that requires new tools, processes, training, etc., or that require having staff in charge of creating, maintaining, capturing, etc. a base of experiences or knowledge (Laporte et al., 2008). Mature KM initiatives for software organizations require a well established software process with a measurement program that enables the company to obtain data which is useful for measuring how they are working and how they can improve the quality of both the product and the process (Victor R. Basili & Caldiera, 1995). Unfortunately, many small software companies have not only not adopted standard processes and measurement programs (Laporte et al.,
Studying Knowledge Flows in Software Processes
41
2008), but have also not defined their own processes well (García et al., 2006), and these can sometimes seem chaotic (Harris et al., 2007). All this makes harder for such organizations to adopt traditional KM practices. Therefore, it is important to study means for helping small software companies to include KM activities according to their particular processes and needs. The importance of this last can be even grater if we consider that most of the software organizations world wide are considered to be small or very small companies (Laporte et al., 2008; Richardson & von Wangenheim, 2007). On the other hand, as has been observed by Land et al. (Land et al., 2001), KM efforts in software organizations commonly focus upon managing explicit knowledge outside the development process, while important knowledge involved in the development processes is not explicitly managed. Some studies have shown that even in software companies which do not have explicit KM practices, software engineers implicitly apply some kind of KM in their daily work, see for instance (Aurum et al., 2008; Meehan & Richardson, 2002; Ward & Aurum, 2004). Therefore, before engaging in a costly KM effort, small companies should first become aware of the implications of KM in their current systems (Sparrow, 2001) by, for instance, studying its processes to identify and understand the role that knowledge flow plays in them. Once accomplished this, we should be in a better shape to propose strategies to integrate KM into the software processes.
Process-Oriented Knowledge Management In recent years, KM experts have noted that KM approaches should be better integrated to the real work processes of organizations, (Scholl et al., 2004). From this observation, developing means to facilitate the analysis of work processes from a KM point of view has become an important concern (Maier & Remus, 2002). There exists some works reported in the literature which have applied process analysis methods to study software process from knowledge flow or knowledge management points of view. Hansen and Kautz (Hansen & Kautz) have used Rich Pictures (Monk & Howard, 1998) to study knowledge flows in a software process. Rich Pictures is a flexible Process Modeling Language (PML) which can be easily adapted to interleave different perspectives in a process model. In this study, the authors analyzed the process in order to define a knowledge map, which was then used to find problematic and missing knowledge flows. We have also performed a similar study (Rodríguez et al., 2004a) in which knowledge flows of a software maintenance process were analyzed by modeling them using Rich Picture. From this last study, some requirements for a KM tool emerged (Rodríguez et al., 2004b). Woitsch and Karagiannis (Woitsch & Karagiannis, 2003) have proposed a tool and a modeling approach which focus upon facilitating the analysis, documentation and implementation of enterprise KM systems. They use various types of models to represent different elements of the process, such as activities, people, etc. They illustrate the approach with a case study in a software organization and show how those models can be used to design an organizational memory. Nissen and Levitt (Nissen & Levitt, 2004) have used a tool to design virtual teams with which to analyze knowledge flows in a software process. Zhuge (Zhuge, 2002) has proposed an approach for analyzing knowledge flows in software organizations by decomposing the processes in tasks and knowledge nodes, and defining the workflow of tasks and the sequence of the knowledge transfers between knowledge nodes,
42
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
based on the inputs and outputs of tasks and knowledge nodes (for instance knowledge consumed or generated in the node). Finally, Strohmaier and Tochtermann (Strohmaier & Tochtermann, 2005) propose a framework with which to define a knowledge process and a tool to facilitate the development of a knowledge infrastructure, and this is exemplified by the development of a knowledge portal for a software company. However, this approach focuses upon the development of KM systems and not upon the analysis of software processes. From the review of the last works, we can state that studying knowledge flows can be a good start to identify and understand the main knowledge needs in a process, and as a consequence, to have information useful to start defining or designing KM systems or strategies focused on addressing those needs. In our work presented in this chapter, we have been using a Knowledge Flow Identification methodology called KoFI (Rodríguez-Elias et al., 2007a; Rodríguez-Elias et al., 2007b) to study knowledge flows in software processes. The advantage we see in this methodology is that it does not require the use of special tools for applying it, such as occurred with some of the methods we have found in literature. For instance, it does not require the use of a specific process modeling approach. Instead, the methodology proposes some recommendations for selecting a PML. On the other hand, much of the methods which we have found, are oriented to the development of specific types of KM systems. In our case, we are more interested on identifying and understanding the knowledge flows in the processes before deciding on a specific type of KM system. Moreover, we are interested on identifying the manner of integrating the current tools used in the software processes into the KM strategies, and perhaps, base such strategies in those tools. Therefore, the KoFI methodology was designed following these objectives (Rodríguez-Elias et al., 2007b). Next we describe the KoFI methodology to latter present the manner in which it was applied to study a software maintenance process.
A General View of the Kofi Methodology The KoFI methodology was designed to aid in the analysis of software processes from a knowledge flow perspective (Rodríguez-Elias et al., 2007a). It was defined to assist in three main areas: 1) to identify, structure, and classify the knowledge that exists in the process studied, 2) to identify the technological infrastructure which supports the process and affects the knowledge flow, and 3) to identify forms with which to improve the knowledge flow in the process. KoFI is orientated towards helping to analyze specific work processes. Therefore, it is necessary to define the specific process and model it. The process models are later analyzed following a four stage process, to finally identify and describe the tools which, positively or negatively, affect the flow of knowledge. The Figure 1 presents a schematic view of the KoFI methodology. The process followed to apply the methodology is iterative, since each stage may provide information useful for the preceding stages. The process models are also capable of evolving while they are being analyzed in the different stages of KoFI.
Studying Knowledge Flows in Software Processes
43
The KoFI methodology To specify the process to be analyzed
To identify knowledge sources
Process Analysis Phase
To identify knowledge topics
Knowledge focused process modeling
To identify knowledge flows Process Modeling Phase
KM Tools Analysis Phase
Analysis of the tools involved in the flow of knowledge
To identify knowledge flow problems
Figure 1. General view of the KoFI methodology.
As despicted in Figure 1, KoFI has three main phases: knowledge-focused process modeling, analysis of the process (which include identification of knowledge sources, topics, and flows, and knowledge flow problems), and analysis of the tools affecting the knowledge flow. To describe and illustrate the different phases of KoFI, we will use a sample case. Next we describe this sample case, to latter describe the different phases of KoFI.
A Sample Case In this sample case, we will analyze the process followed to create a software development plan in a small software company. The project plan is written by the head of the development department (HD), together with the project management (PM). The project plan must contain information for contacting the client who requests the system, the system’s description, the main expected functionality, references to the main related documents (principally the requirements specification), a detailed description of the main activities required to fulfill the project, the estimated time and cost, required personnel and resources, and a detailed description of the sequence of activities, and the personnel assigned to them.
Process Modeling Phase In this phase, the process is modeled in order to use those models to support the analysis. It is important that the models facilitate the identification of issues related to the knowledge involved in the process. Traditional PMLs can be used to identify issues related to implicit knowledge flows, such as the information sources which are required, generated, or modified by an activity (Abdullah et al., 2002). However, it is important that a PML used to analyze knowledge flow provides explicit representation of the knowledge consumed or generated in activities, the knowledge required by the roles participating in those activities, the sources of that knowledge, or knowledge dependencies. Because there are not much PML with this
44
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
characteristics (Bera et al., 2005), one way is to adapt existing PMLs to integrate the representation of knowledge. It is recommended to model the process at different levels of abstraction. First, a general view of the process can be defined with a general and flexible process modeling technique. To perform a detailed analysis, a more formally constrained language should be used. It may also be helpful to use a PML which has been designed for the type of process that is being analyzed, since this language should provide primitives to represent specific elements involved in that type of processes and the explicit representation of those elements will facilitate their analysis. In our case, we have used an adaptation of the Software Process Engineering Metamodel (SPEM) (Rodriguez et al, in press). The latter has been chosen due to the fact that our main domain area is software processes. SPEM is a UML based metamodel which has been specifically designed for software process modeling (OMG, 2002). There are various advantages of using SPEM. First, UML is the most extended modeling language for software development; this may facilitate the assimilation of SPEM as standard software PML. Secondly, it is possible to use any UML modeling tool supporting UML profiles to produce SPEM models (Bézivin & Breton, 2004), and there are many software modeling tools which enable the use of UML as a modeling language: commercial, open source, and free tools. Finally, SPEM enables the modeling of a software process from different views, at different levels of abstraction and with a formally constrained language. SPEM considers three main elements in the software processes: roles, activities and work products. A work product is everything generated or consumed in the activities of the process. The adaptation to SPEM used in our work consists on defining special types of work products to represent knowledge and knowledge sources involved in the processes. In this manner, it is possible to specify the knowledge generated or used in an activity, the knowledge contained in a specific source, the role responsible of specific knowledge or sources, between other things. Since the focus of this paper is not on the modeling language, further details of this aspect are not presented; a detailed description of the SPEM adaptation can be found in (Rodríguez-Elias et al., in press). In this paper we shall limit ourselves to simply present some examples while presenting the model of the sample process.
Modeling the Sample Case Four main elements are the ones used to represent knowledge related issues in the models: 1) knowledge topics (first icon in Figure 2), used to represent specific knowledge topics, subjects, or areas; knowledge sources (second icon in Figure 2), used to represent the main documents or other sources of knowledge involved in the process; knowledge packages (third icon in Figure 2), used to group related knowledge topics or sources; and knowledge transfers (last icon in Figure 2), a multiple relationship used to represent transfers of knowledge between roles and/or sources. These elements are used in the SPEM diagrams to represent knowledge issues, as is next described.
Studying Knowledge Flows in Software Processes
KTopic
KSource
GroupedKnowledge
45
KnowledgeTransfer
Figure 2. Notation icons used in the adaptation of SPEM.
The first step in the modeling phase is to identify the main activities and knowledge packages involved in the process. This is done by creating use cases of the process. A regular use case in SPEM shows the relationships between the roles of the process, and the workflows in which they participate. In our case, we use the use cases to represent the workflows and the main knowledge packages involved in them. Figure 3 shows a use case of the sample process. As can be observed, there have been identified three main workflows: 1) to write the document (project plan), 2) to estimate the cost and time of the project, and 3) to define and assign the activities. Related to these workflows four main knowledge packages have been identified: 1) the head of the department’s knowledge, 2) the information contained in the project plan, 3) the project manager’s knowledge, and 4) the software engineers’ knowledge. The packages are related to the workflows with dependencies. Therefore, the direction of the arrows means that a workflow or activity depends on the pointed knowledge package, or that the knowledge packaged depends on an activity or workflow, perhaps because the knowledge is generated from such activity. The knowledge packages of Figure 3 highlight the fact that most of the knowledge used or generated in the process modeled is tacit, that means, it is in the head of the people who perform the process.
Project plan development
Head of the Department’s Knowledge
Project Manager’s knowledge
To write the document
Project plan information
Time and cost estimation
To define and assign activities Figure 3. Use case of the sample process.
Software Engineers’ knowledge
46
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
Once identified the main knowledge packages and workflows, we then modeled the specific activities of each workflow, to identify the roles participating in it, and the knowledge sources and packages being required or generated in such activities. To this end, an adaptation of the activity diagrams is used. Figure 4 presents an example in which the “writing the document” workflow is modeled. This model illustrates the main packages of knowledge or information being required or generated in the activities. In this manner, it is possible to explicitly show: • • •
knowledge requirements. The knowledge required by a role to accomplish its activities, or the knowledge required to accomplish a specific activity. knowledge dependencies. Activities that require knowledge generated in the previous ones. knowledge loses or knowledge difficult to find. Important knowledge that could be not being captured or is difficult to find because, perhaps it is distributed in different sources, or the sources are unknown by the employees. To highlight this fact, it could be used the knowledge source icon to represent the means where the knowledge used or generated resides, and knowledge packages to represent that knowledge that is not being captured, or that could be difficult to find. Project Manager
Head of the Department
Knowledge about the text edition tool
To create the document Knowledge about formatting the document’s content
Información del plan
To fill the document To analyze the plan
Head of the department’s experience
To develop business Summarized summary project plan
Figure 4. Activity diagram of one of the workflows of the sample process.
To describe the specific knowledge subjects that are contained in a knowledge package, we use an adaptation of a class diagram which has been called knowledge package diagrams. This diagrams help to decompose the knowledge packages in knowledge areas or specific topics or subjects. In this manner, we can start classifying the knowledge involved in the
Studying Knowledge Flows in Software Processes
47
process, as is illustrated in Figure 5, in which we show the decomposition of the information contained in the project plan. Project plan Information Project description Expected functionality Estimated time and cost
Information about the client Assigned personnel
Name of the person responsible Address
Information about related documents
Activities to be done
Information about the client
Resources assignment
Information about the activities
Time and cost restrictions
Information about the related documents Document’s name Document’s description Document’s location
Summarized project plan Summarized project description Main functionality Estimated time and cost
Assigned personnel
Information about the activities Activities’ description
Resources assignment
Assigned engineers Estimated time Dependencies with other activities
Figure 5. Example of a knowledge package diagram, used to decompose specific knowledge areas or subjects grouped in a package.
Finally, it is important to identify the specific sources where the different knowledge subjects identified can be obtained. To accomplish this, we use an adaptation of class diagrams, in which the sources and the knowledge they contain is modeled, together with the roles responsible of the sources. A relationship called knowsAbout is used to specify the knowledge that a specific source contain. Figure 6 presents an example of this type of diagram. In the example, we specify the sources related to the information of the project plan. As can be seen, three main sources are identified: the project plan itself, the requirement specification document, and an summary of the project plan. The diagram also shows dependencies between sources, for instance, the content of the project plan depends on the information contained in the requirement specification. Additionally, these diagrams can be also used to start classifying the different types of sources. For example, the requirement specification and the project plan have been classified into a group called system documentation.
48
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
Project Plan knows about
responsible
Project Manager
Project plan information
System documentation Expected functionality knows about
knows about
knows about
Summarized project plan responsible
Information about the client Summarized information about the project plan
Head of the department
Requirement specification responsible
Requirements engineer
Figure 6. Example of an adapted class diagram, used to illustrate the relationships between knowledge sources, topics, and roles.
The Process Analysis Phase Once obtained an initial model of the process, we can use them to analyze the process. However, those models could be changed during this phase as more details of the process are being obtained. As was shown in Figure 1, the analysis of the process is carried out in four stages, described in the following subsections.
Identifying Knowledge Sources The first stage of the process analysis phase of KoFI is to identify the main sources of knowledge involved in the process. This stage also include the organization and classification of the sources found. The above is a first step towards the definition of a taxonomy of sources and topics of knowledge, which is one of the first steps in the development of KMSs (Rao, 2005). Additionally, an ontology can be developed from this taxonomy to help define the relationships between the sources and the other elements of the process. This type of ontologies can later be used to structure the knowledge base of the process (O'Leary, 1998). The main issues that should be considered during this stage are: • •
•
Documents used or generated by during the process. Products of the process. In software organizations, the produced software, source code for instance, is one of the main sources of knowledge in many cases (Seaman, 2002). People involved on the process, such as staff members, external consultants, etc.
Studying Knowledge Flows in Software Processes • •
49
Information systems used by the actors of the process, such as data bases, organizational memories, knowledge bases, organizational portals, etc. Other tools which could be being used to obtain or generate information or knowledge. For instance, software development environments can provide tools to help obtain knowledge about the structure of specific pieces of source code, the history of a software system, simulate operation scenarios, etc.
From the example we have been following from previous sections, we can observe the existent of different sources. First, there two types of documents: the project plan and the associated documents, such as the resume of the plan, and the requirement specification. Secondly, it can be identified that people involved are the head of the department and the project manager. However, since some of the information required by these people is obtained from the software engineers, they are also an important source. In the scenario there is not description about the use of tools or information systems. However, let us suppose that for selecting the personnel to be in charge of the project, the company uses a human resource management system, which has information about the work assigned to each employee and their experience profile, in such a way that the this information can be used to make an initial assignation of personnel to the project. This system should be classified as an information system. On the other hand, let us now suppose that we identify that the project manager uses a tool to help him/her to estimate the time or cost that the project will consume, for instance making use of previous project data; then, this tool can be classified in the support tools category.
Figure 7. A metamodel for knowledge sources description and classification.
To classify and describe each knowledge source, KoFI proposes a metamodel which consists in a classification schema composed by categories (KSourceCategory) and types (KSourceKind), and other elements useful to provide more information about the sources,
50
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
showed in Figure 7. If we follow this metamodel, then for each source identified, we should describe its location (the place in which it can be found or through which it can be consulted), its physical support if apply (i.e. electronic source or paper), and the specific format of the source (for instance, if the source is a electronic file, it could be free text, pdf, etc.). The LocationKind property is used to define predefined types of locations, for example a data base, an email, a physical address, etc. The KConcept element refers to a knowledge concept, which can be both: a knowledge source or a knowledge topic. By using the classification schema of the metamodel, then Figure 8 could represent an example of the classification of the sources described in the previous paragraphs.
Figure 8. An example of a classification schema of the knowledge sources identified in the sample process.
Finally, it is important to comment that during the identification of the knowledge sources, we should also start identifying the main knowledge topics that could be obtained from them.
Identifying Knowledge Topics In this stage, the topics of knowledge that can be obtained from the sources found in the previous stage are defined together with the knowledge that the people involved in the process may have or require. As the same as in the previous stage, a taxonomy and an ontology can be used to classify the types of knowledge and define their relationships with other elements of the process. The ontology must define means to relate the knowledge sources with the knowledge areas or topics that can be found in them. Each knowledge topic should be described by a name and a brief description, additionally it is important to identify alternative names, and dependencies with other topics.
Studying Knowledge Flows in Software Processes
51
The main issues that should be taken into account are: • • • • •
Knowledge about the organization, such as its structure, norms, culture, etc. Knowledge about the organizational processes, such as their activities, people involved, etc. Activities dependant knowledge, such as proceedings or tools used to accomplish them. Knowledge about the domain of the work done in the company. In our case software development. Other types of knowledge, such as foreign language speaking, or other knowledge or skills which are not part of the daily work but could be important.
In the sample process we could identify that the knowledge required to create the project plan include: the information that the document should contain, the format that the document should follow, the sources from which the information captured in the document is obtained, the methods to follow for estimating the time and cost of the project, information about the activities to be performed and about the personnel assigned to the project.
Figure 9. A metamodel for knowledge topics classification.
To classify the knowledge topics, we also follow the metamodel proposed by KoFI. Figure 9 shows the part of the metamodel focused on the classification schema of the knowledge topics. As can be observed, the classification is based on a hierarchy of knowledge categories (KCategory), knowledge areas (KArea), and knowledge subjects (KSubject). A knowledge category is a structural element of high level of abstraction used to group related knowledge areas. Knowledge areas are subdivisions of the knowledge categories which are logically related with the category by, for instance, inheritance, aggregation, or composition. Knowledge subjects are subdivisions of the knowledge areas that represent basic concepts which have an explicit and well defined description. Subjects define knowledge elements that can be defined as a unit. Of course, defining what will constitute a knowledge area and what a subject can be a subjective decision. Thus, this decision should be taken depending on each
52
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
particular case. The three elements of the hierarchy are considered as knowledge topic since, in this manner, it is possible to refer to a knowledge category, area, or subject to specify the knowledge required or generated in the process. In some cases it could be needed to specify that for solving a problem it is important to know about subjects grouped in a category or in an area, and in other cases to a specific subject. For instance, we could decide to search for a person who knows about programming languages, or about a specific programming language (i.e. java), or about specific issues of a programming language (i.e. accessing databases with java). By using the classification schema of the metamodel, then Figure 10 could represent an example of the classification of the knowledge topics. In this example, we highlight the fact that there exist predefined methods for estimating the cost and time of the project. Additionally, the example illustrate that this topic is part of both: and area grouping the knowledge related to the creation of the project plan, and other grouping estimating methods, which is also part of a bigger area called methods and tools. Finally, all the knowledge areas mentioned are part of a category grouping the knowledge related to the development process activities.
Figure 10. An example of a classification schema of the knowledge topics identified in the sample process.
Identifying Knowledge Flows In this stage, the process model is used to identify the way in which the knowledge and sources are involved in the activities performed in the process. The main activities in the processes must be identified, along with the decisions that the people performing those activities have to make. Figure 11 presents the part of the metamodel proposed in KoFI, and that represents the relationships between the knowledge concepts and work definitions. A work definition is an element of SPEM; it refers to an operation or a set of operations that must be performed to accomplish a process (OMG, 2002). Therefore, each work definition must have a goal; this means there must be a purpose for which to perform the operation.
Studying Knowledge Flows in Software Processes
53
Then, a work definition can be a simple task or a decision to make, or a group of them, such as a complete activity, workflow or sub-process. These work definitions are related to the knowledge concepts either because these last are required to accomplish them, or because they are generated from them. Finally, the metamodel also establish that each knowledge source can have a certain level of knowledge about other knowledge concepts, either about knowledge topics or other sources.
Figure 11. Metamodel of the relationships between work definitions and knowledge concepts. Project plan information Required
General data Activity
Obtained
Project’s name Project’s description
To create the document
General data
Information about the client Details
To fill the document
Details
Expected functionality Estimated time and cost
Details To analize the plan
Information about related documents Activities to be done Resources assignment
General data
To develop business summary
Assigned personnel Information about the activities
Figure 12. Knowledge transfer diagram of the project plan information of the sample process.
Following the metamodel, the process models are analyzed to identify the relationships between knowledge topics, sources, and work definitions. Then, it should be easier to identify the way the knowledge flows through the process while the people involved perform their
54
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
activities; for example, the sources which could be being used as knowledge transfer mechanisms between activities or people. It is important to identify either flows of knowledge between activities or between sources; for example the transfer of knowledge from a person to a document. The main diagrams that could help to perform this analysis are the activity diagrams for identifying the knowledge or sources required or generated in the activities of the process, and the class diagrams to identify the knowledge contained in each source. For instance, in Figure 4 it can be observed that the project plan document is the main source used to transfer knowledge between activities and between actors of the process. If more details are required to analyze the manner in which the knowledge contained in a source is being used or generated in the different activities in which it participates, the source’s knowledge transfer diagram proposed by Rodriguez-Elias et al. (Rodríguez-Elias et al., in press) can be used. This type of diagram shows the sequence in which the knowledge contained in a source is obtained by or extracted from such source, and the activities in which all this takes place. Figure 12 presents an example that shows the information that is contained in the project plan.
Identifying Knowledge Flow Problems The last stage of the process analysis phase is to discover the problems that may be affecting knowledge flow. For example, whether the information generated from an activity is not captured, or whether there are sources that might be able to assist in the performance of certain activities but that are not consulted by the people in charge of them. To do this, KoFI proposes using problem scenarios, which are stories describing a problem which is taking place in the process being analyzed (Rodríguez-Elias et al., 2007a). This story must show how the problems detected are affecting the knowledge flow. Then, one or more alternative scenarios must be defined to illustrate possible solutions, and the manner in which those alternative solutions may improve the flow of knowledge. These possible solutions are then used as the basis for proposing the KM strategy or system. The problem scenario approach is a combination of the problem frames (Jackson, 2005) and the scenario (Carroll & Rosson, 1992) techniques. These both techniques have proven to be useful to obtain requirements for designing software systems (Chin et al., 1997; Cox et al., 2005). The knowledge flow problems identified should be classified, then it should be selected a sample case to illustrate the specific type of problem in each category. The problem scenario is described by a name and the type of problem it refers to. The problem is explained through a story that shows the way it appears in practice. The possible solutions to the problem are explained with alternative scenarios that show the manner in which we expect the problem could be solved or reduced by using the proposed solution. In the sample process we found a system in which all the projects are captured. This system allows define the activities of the project and the people in charge of them. When the activities are defined, the system sends a message with the information of the activities to the engineer who should accomplish each of them. Then, the engineers can have a registry of the activities done, in progress, or those that are pending. Although the engineers can modify the information of the activities assigned to them (i.e. status, time required to accomplish them, etc.), we find that this information is not reflected in the main project plan, since the information is modified only locally in the workplace of each engineer. Therefore, we find it
Studying Knowledge Flows in Software Processes
55
is difficult to the project manager to know exactly which the status of the entire project is. To know this, the project manager must consult each engineer. After analyzing the problem, we might propose to modify the system in a manner in which it could be possible to reflect the changes made by the engineers in each activity, in the main project plan. This problem could be described by a problem scenario as the shown in the Table I. Table I. An example of a problem scenario for describing a knowledge flow problem Problem: Inconsistent information in the activities performed Type: Lost information Scenario description: John, the person responsible of one of the projects of the informatics department, must define the activities of the project and estimate the time needed for each one, in order to specify the total time needed for accomplishing the project. John knows that the time of each activity is partially dependant on the engineer in charge of it. Thus, John decides to assign each activity to the engineer who will perform it before estimating the time. This assignation is performed by John taking care of its personal knowledge about the expertise that each engineer has, assigning the most suitable engineer for each activity, trying to reduce the time that each activity could need. After this, John consults each engineer, and together they define the time each activity could spend. When John calls one of the engineers, he informs John about other activities he has to perform in other project; thus John must reassign the activities, and perform all the process again. Alternative scenario When John starts the definition of the activities to be done, he uses the project manager system to identify similar activities and estimate the average time they consumed. If this time seems to be independent of the engineer in charge, John might suppose that in this time it will not be different. If the last is not the case, then John could analyzes if the variation depends on the engineer in charge, and consult them to define which could be the cause of this variation. In this manner, John could start making an estimation which could be independent of the engineers in charge of each activity. Additionally, once John decides to assign the activities to the engineers, he uses the system to identify which engineers have other pending activities, considering this additional situation when he makes the assignation of the engineer for each activity of the project.
The problem scenarios can be used to identify requirements for designing applications focused on solving or reducing the knowledge flow problems identified. However, there might be cases in which the proposed solution can be the modification of the existent technical infrastructure, such as the case presented here. For these cases, it is required to first analyze the manner in which the existent systems are participating in the flow of knowledge, to then be able to define the way to modify them to improve the flow. The final phase of KoFI, following described, is focused on aiding in this objective.
The Tool Analysis Phase The final phase of the methodology focuses on analyzing the manner in which the tools used to support the activities of the process affect the flow of knowledge. To this end, we have used the Rodriguez-Elias et al. framework (Rodríguez-Elias et al., 2008). The objectives followed in this framework include: • •
To identify the role that the tools used in the process play as knowledge flow enablers. To identify requirements to improve the use of the tools as knowledge flow enablers.
56
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al. • • •
To identify the possible effects in the knowledge flow of the possible modifications to the tools. To identify possible uses of the tools in other process or activities of the organization. To compare different tools that could support similar activities, from a knowledge flow perspective.
To accomplish the last objectives, the framework proposes four main steps to analyze information systems as knowledge flow enablers: 1. The first step is to define the application domain of the system. This includes: a.
identifying the use of knowledge, that means whether the knowledge managed by the tool is useful for the business strategy, for the organization’s managers, for the organization’s daily live, for the work processes domain, or for technical activities; b. identifying the scope of the knowledge, which can be single persons, an intra or extra organizational group, the entire organization, multiple organizations, or an entire sector of the industry; and c. identifying the domain of the knowledge managed, which can be knowledge about the business strategic, the organization, the management, the products or services provided by the organization, or about technical activities. 2. The second step consists on identifying the structure of the knowledge, that means, whether the knowledge managed is tacit or explicit, and if it is explicit, how well structured or formalized it is. 3. The third step focuses on defining the KM activities being supported by the tool, such as creation, application, capture, storing, retrieval, transfer, etc. 4. The fourth step consists on the definition of the main technical aspects considered important for the tools. Particularly if the tool performs some activities in an automatic manner or if it manages distributed or local knowledge. As an example of the use of the framework, we will use the projects management system described in the previous section. From the description of such system, we observe that it is being used to facilitate the transfer of knowledge from the project manager to the software engineers, since the systems allows to the engineers to know which activities are assigned to them. Based on this, we might conclude that the projects manager system application domain is the work processes, particularly the project management process, and the development process. The scope is intra-organizational group since it is used by the people of the development department (project managers and software engineers). And the domain of the knowledge managed could be project planning and scheduling. The type of knowledge managed it is explicit, however, how formalized it is will depend on the form the information is being stored in the system. For instance, if it is managed mainly as free text, it will be unstructured. On the other hand, if it is being managed as registries in a well structured and defined database, it will be structured.
Studying Knowledge Flows in Software Processes
57
The knowledge management activities supported will depend on the functionality provided by the system. First, we might conclude that it aid in the capture and storage of knowledge phases, and in the transfer of knowledge from the project manager to the software engineers. However, if the tool provides a well defined structure for capturing the knowledge, for instance it specify the type of information and the manner in which this should be captured, then it will be aiding to the knowledge formalization phase. Additionally, if the system provide with means for organizing the stored information, for instance to facilitate the identification of specific projects, perhaps past projects with specific characteristics, then the system will be aiding in the knowledge retrieval stage. Finally, in certain cases technical aspects are the ones responsible of facilitating or blocking the acceptance of KM systems. These technical aspects could be related to for instance security, work overload, or distribution of the knowledge. For instance, one problematic technical aspect of the projects management system could be the fact that modifications made by the engineers to the status of the activities are not reflected in the main project plan. In this case, the knowledge of such status is distributed across the personal workspaces of the engineers, and it is not being managed by the tool. As can be observed, by analyzing the functionality of the tools being used in the process, we can start identifying the manner in which they are supporting specific KM activities. Additionally, this analysis can be used to start defining proposals by which to improve the KM support of the tools, or to decide whether the tool is facilitating or blocking the flow of knowledge. All this information can be later used to make decisions about, for instance, if the tool should be modified, leaved as it is, or perhaps changed by another or eliminated from the process. To support these statements, in the following section, we describe some of the results of the application of the described methodology to study a software maintenance process.
Applying the KoFI Methodology The goal of this section is to describe the results we have obtained by applying the KoFI methodology in real settings. Particularly, we focused on the results of a case study carried out in a software development group to study its software maintenance process. This case study is presented as follows: first, we describe the context of the study, to latter present the manner in which the analysis of the process was carried out, together with some of the results of this analysis. Finally, we shall describe the main observations and lessons learnt from the study.
The Context of the Study The group of software developers who took part in the case study was the Informatics Department (ID) of a research centre, which is in charge of maintaining the information systems used in this centre. At the time of the study the group was composed of fourteen people: the head of the department (HD), a secretary, six software engineers (SEs), and six assistant programmers (APs). One of the SEs was a novice, with only 2 months’ experience in the job; the other SEs had between 5 and 11 years experience; all of the APs had spent less than 1 year in their jobs. The group was in charge of maintaining the applications in 5 domain
58
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
areas: finances, human and material resources, academic productivity, and student services. All of the applications were of about 10 to 13 years old, if we take into account the development of their first version. Most of them used the Oracle database management systems and programming tools. Normally, these applications require only one or two people for their maintenance, and can be considered as medium size applications (between 50 and 100 different modules per application, taking into account reports, views, forms, menus, etc., -the ID does not have an exact measurement of their current size-). Most of the maintainers did not participate as original developers of the systems. APs were assigned to the modification activities; there are often changes in the particular people involved in these tasks. The ID does not follow standard software processes. Nevertheless, its maintenance process is similar to others we have studied in literature (i.e. (Chapin et al., 2001; Polo et al., 2003a)). The work of the ID is organized into projects triggered by maintenance requests, which are of two types: modification requests (MRs) made by the users of the systems to improve or to add new features, or to adapt the system to changes in the environment (the operating system, the process, etc.); and problem reports (PRs), which describe problems the users have experienced with the systems. Each SE is responsible for the requests∗ of the systems assigned to him/her, even though some of the changes may eventually be made by another SE or by an AP. All the requests are stored in a logbook. The logbook is used to track the status of each request by the HD and the users who made them. The SEs use the logbook to track the requests assigned to them, and to consult information about the request. The data of the process was obtained from interviews with the members of the ID, direct non-participatory observation, and the analysis of documents and information systems of the ID. We first developed an initial model of the process whose modification was based on observations and comments made by the members of the ID during the period of our final interviews. We used the previous models as a mechanism by which to analyze the process in the final interviews, until the SEs agreed with the models.
The Analysis Phase The results of the analysis phase have been classified in three main categories: the identification and classification of knowledge sources and topics, the identification of aspects related to knowledge flows (knowledge flows problems), and tools being used as knowledge flow enablers.
Identification and Classification of Knowledge Sources and Topics One of the main contributions of the study was that it helped us to initiate the identification and classification of the main knowledge topics required by the ID and the sources in which such knowledge can be obtained. These are the first two steps in the methodology followed. First, we identified the main general topics of knowledge required by the maintainers to perform modifications to the systems they are in charge of. These general topics were classified into knowledge packages of related topics, which also have sub-
Studying Knowledge Flows in Software Processes
59
packages of more specialized topics. The main knowledge needs of a maintainer were: 1) the knowledge related to the system being maintained, which includes knowledge of the structure of the system, the correct functioning of the system, and so on; 2) the application domain of the system, which includes topics related to the specific concepts used in that application domain, and the process being supported by the system; and 3) technical knowledge, which includes topics related to the programming language and the development environment being used. One important observation we made was that the knowledge requirements of the maintainers were directly related to their experience in the maintenance process carried out by the organization, the applications being maintained, and the technical infrastructure used during the process. We observed that novel engineers were more concerned on acquiring knowledge about the technology they must use to accomplish their activities, such as the programming language or the development tools. Maintainers with a medium experience were more interested on knowing about the application they must maintain, such as its structure, dependencies between the modules of the system, etc.; while the most experienced maintainers were more concerned on knowing about the application domain of the system being maintained, such as the details of the processes supported by the system, the way the users are using the system, and the main problems they have while using system. On the other hand, respecting to the knowledge sources, we confirmed what other authors have reported about the documentation problems in software maintenance (Lethbridge et al., 2003; Lientz, 1983; Singer et al., 1997). First, the ID has not much formal documentation of the systems they are maintaining. In fact, some of the systems have not documentation at all. Additionally, most of the documentation that the ID has is not updated, and for making the problem bigger, although the few documents the ID has are stored in a central repository to make them available to all the engineers, most of those documents are unknown by them. Moreover, even though some engineers know that there are some documents that could be useful for them; they do not use them since it is difficult to find them because the engineers do not know what the documents names are, or in which specific directory they are. Thus, they believe they will spend much time trying to find a document, so they just do not use the available documentation. All this problems cause the engineers head to be the main knowledge source in the ID. This means that most of the knowledge of the ID remains tacit. Nevertheless, we made two important observations about the use of knowledge sources in the group studied. First, we found that some of the engineers are actually making their own KM effort, and trying to capture some of the knowledge they often require in a formally manner, although this cause them a work overload. The engineers are aware of that this extra job will help them to reduce the time and effort they spend in common activities. For instance, some of the engineers have created documents that describe the internal structure of the systems they are in charge of, in a way in which these documents help them to know which are the source files of a specific module of the system, or which are the database tables or other modules related to it. However, these documents are just known by the engineers who created them since they remain in the engineers’ workspace. The second observation we made, was that much of the sources consulted by the engineers are outside the ID. Particularly, the engineers often consult to the users to obtain knowledge about the application domain of the system. However, we also found that part of ∗ We use requests to refer to both, MRs and PRs.
60
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
the information provided by the users is also stored in documents available in an organizational portal of the research centre.
Identification of Knowledge Flow Problems and Knowledge Flow Support The main knowledge flowing in the maintenance process was the associated with a maintenance request, and the main mechanisms being used in the ID as knowledge flow channel is a logbook in which the maintenance requests are stored. However, just a little part of such knowledge is stored in the requests themselves, and not all the maintainers write the same information. A request may frequently only contain initial data, such as the client who made it, the person responsible for solving it, and a brief description of the problem. The study helped us to identify other sources that could also have information related to the request. First, there were documents containing descriptions of the requirements of changes. These documents were very diverse, some times they were memos, other times they were emails, and in other cases they were just notes in the maintainers’ personal notebook. Additionally, we found there were some emails containing information related to the files changed, and the modifications done to the databases. The last was observed once we defined all the specific topics of knowledge included in a package containing the information related to a request, and then, identifying the different sources which could be useful to obtain each of those topics (see Rodríguez-Elias et al., in press). From this exercise, we identified all the sources related to the information of a maintenance request. Then, we were able to recommend the ID to modify its logbook in order to allow the inclusion of links to documents associated with each request. The goal is to enable the maintainers to access all the associated documents and information related to a request from the same request, thus improving the use of the logbook as a knowledge flow facilitator.
Practical Contribution of the Case Study As described in the previous section, the case study has give us useful information for identifying some of the aspects to be considered while defining KM strategies or systems for a group with similar characteristics as the one we have studied. However, additionally to these results, the case study has also made some important contributions to Informatics Department and its personnel. Next we provide a list of what we consider were the main contributions. •
•
The members of the ID are now aware of some of the problems they face in their maintenance process and are consequently developing tools to address some of these problems. They have, for instance, developed a web portal in which all the documents and information of the systems being maintained will be easily accessible. The logbook is being used more often as an important source of knowledge. When we conducted the study, not all the engineers used the logbook on a regular basis, but it is now a key part of their process. This might help the ID to maintain better control over the changes they make.
Studying Knowledge Flows in Software Processes •
•
61
The study has improved the sharing of knowledge among members of the ID. Through the analysis of the models, the members of the ID have become aware of sources of knowledge that they did not previously know existed, and these can be used to obtain important knowledge for their activities. The ID personnel are now interested in formally describing their processes as a step towards the adoption of a standard software process model. They are interested in taking on a Mexican standard for small-to-medium-size software development organizations which was approved by the Mexican government as a measure towards promoting the Mexican software industry (Oktaba, 2005). This process model considers the integration of a knowledge base as an important part of the software process. The ID did not previously have a formal description of their processes. They discovered that the analysis of the process models we made might be a useful aid in the formal description of their processes. The representation of knowledge and its sources in the models is of particular interest in aiding the definition of the knowledge base.
Related Work The identification of the knowledge required by people performing software engineering activities is an important research area. For instance, Lethbridge in (Lethbridge, 2000) and Kitchanham et al. in (B. Kitchenham et al., 2005) present studies aimed at identifying the knowledge that software professionals require in practice. It can be observed that much of such knowledge is obtained during practice, and thereafter, depends on the context of the organization in which that knowledge is applied. These works provide an appropriate reference through which to identify the main knowledge areas related to software engineering. However, they are too wide to be practical when classifying the knowledge involved in specific small software organizations. In spite of the importance of software maintenance (Polo et al., 2003b), there are few studies dealing with software maintenance knowledge needs. Most of these works can be classified in the field of program comprehension (Ko et al., 2006; Mayrhauser & Vans, 1995). This may be because source code is the main source of information for software maintainers (Singer, 1998). However, they frequently also consult other sources (Seaman, 2002), and it is important to identify how these are related to the maintainers’ knowledge needs. Two types of studies address this issue: •
Koskinen et al. (Koskinen et al., 2004) have proposed a program comprehension tool by which to facilitate software maintainers’ access to information sources in order to fulfill certain information needs defined from a review of literature. On the other hand, our research group (Rodríguez et al., 2004a; Vizcaíno et al., 2003) has proposed the use of software agents (Nwana, 1996) to facilitate access to knowledge sources in software maintenance. This last approach is not focused upon facilitating code comprehension, but upon facilitating software maintainers’ access to sources which will be useful for obtaining information from other domains, such as the application domain, users’ behavior, and previous maintenance requests, amongst others.
62
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al. •
Oliveira et al. (Oliveira et al., 2003) have performed a study through which to identify the knowledge involved in software maintenance, from which an ontology was developed (Dias et al., 2003). Oliveira et al.’s ontology is similar to the ontology of Ruiz et al. (Ruiz et al., 2004), since both ontologies identifies the main elements that are involved in the maintenance process, and the relationships between them. Additional, both ontologies are based on the Kitchenham et al.’s ontology of software maintenance (B. A. Kitchenham et al., 1999). Nevertheless, Anquetil et al. in (Anquetil et al., 2007) illustrate how such types of ontologies can be used as frameworks for obtaining knowledge about software maintenance projects. They used their ontology to discover what to request during postmortem analysis in maintenance projects.
The works of Koskinen et al. (Koskinen et al., 2004) and Rodriguez et al. (Rodríguez et al., 2004a) show that relating knowledge sources to specific knowledge needs can make a contribution towards aiding software maintainers to access the knowledge sources that might be useful for specific activities. On the other hand, Oliveira and her colleagues (Anquetil et al., 2007; Oliveira et al., 2003) illustrate that having a model of the tasks to be carried out, and the elements involved in those tasks, such as knowledge and sources, can facilitate the identification of such knowledge needs, and the knowledge sources that might be related to them. However, the works described do not provide methods for studying specific software processes in order to identify and understand knowledge needs, and knowledge flow problems. Other related works are those which have been conducted to identify KM practices in software processes. Meehan and Richardson (Meehan & Richardson, 2002) investigated the software process of three small Irish software companies in order to identify their KM practices. From the study, the authors extracted some recommendations to accomplish some KM activities, such as knowledge creation, storing, etc. Aurum et al. (Aurum et al., 2008; Ward & Aurum, 2004) have also investigated KM practices in software companies, but they have studied big Australian companies. Between the main findings of these studies was the identification of some enablers of the KM process in the organizations studied, principally leadership, technology and culture. Although the above works provides important results about KM practices in software companies, they do not provide a method by which to obtain those findings for specific software companies, as we have done in this chapter. Finally, Lutters and Seaman (Lutters & Seaman, 2007) have proposed the use of “war stories” as a methodological approach by which to perform empirical software engineering research. They use that method to study the documentation usage practices in a set of software maintenance environments. A war story is a type of storytelling technique useful to obtain data in interviewing settings. The characteristic of a war story is that it is a detailed description of a problematic situation in a specific setting. Since war stories could be very extended, detailed, and open (the interviewed is free of telling what s/he wants), transcribing and analyzing them becomes a hard and time consuming job. War stories, as well as other story telling techniques, are useful for exploratory studies, in which we are interested on developing theories from the observed data. In our case, we are interested on more practical and focused information, particularly, the knowledge requirements and knowledge flow problems of specific software processes. Nevertheless, through the use of the KoFI methodology in our study we observed some of the same situations reported by Lutters and
Studying Knowledge Flows in Software Processes
63
Seaman, particularly, that maintainers consult people when they have not formal sources available, that the most useful sources are those written by maintainers, that unofficial sources become very valuable in some cases, and that location of knowledge sources is a relevant issue, since when they are difficult to find they are not used.
Conclusion In this paper, we have described the use of the KoFI methodology as a means to study knowledge flows in software processes. The application of the methodology in real settings was also described, along with the results obtained from these applications. As is described in this chapter, the methodology assisted us in the identication of certain knowledge problems in a software process, and in the proposal of alternatives with which to solve them. Furthermore, we consider that one of the main contributions of the study was that of making the members of the studied group aware of the importance of having a KM strategy. They became conscious of some of the KM activities that they were actually carrying out, and of the tools which can facilitate knowledge management in their process. Therefore, we can argue that the usage of KoFI can facilitate the implementation of KM strategies in software processes, particularly if current practices and tools are taken into account when developing those strategies. The main difference between the present work and others that we have found in literature is its focus on attempting to integrate the current working tools as part of the KM strategy, and perhaps as the basis of it. Doing this could reduce the cost of using KM strategies or systems, mainly in small or medium enterprises which are often unable to meet great expenses which, moreover, only render a profit in the long term. As future work we intend to continue evaluating the benefits and limitations of the KoFI methodology in order to measure how it can help to improve software processes´ knowledge flow. Additionally, we are also interested on analyzing the benefits and results of the methodology in settings different to software engineering since we think that the general character of this methodology allows its usage in different domains.
Acknowledgements This work is partially supported by CONACYT, Mexico; the ESFINGE project (TIN2006-15175-C05-05) Ministerio de Educación y Ciencia (Dirección General de Investigación)/ Fondos Europeos de Desarrollo Regional (FEDER), and the MELISA Project (PAC08-0142-3315), Junta de Comunidades de Castilla-La Mancha, Consejería de Educación y Ciencia, both from Spain.
References [1]
Abdullah, M. S., Benest, I., Evans, A., & Kimble, C. (2002, September 2002). Knowledge modelling techniques for developing knowledge management systems. Paper presented at the European Conference on Knowledge Management, Dublin, Ireland.
64 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al. Acuña, S. T., Juristo, N., & Moreno, A. M. (2006). Emphasizing human capabilities in software development. IEEE Software, 23(2), 94-101. Anquetil, N., de Oliveira, K. M., de Sousa, K. D., & Batista Dias, M. G. (2007). Software maintenance seen as a knowledge management issue. Information and Software Technology, 49(5), 515-529. Aurum, A., Daneshgar, F., & Ward, J. (2008). Investigating knowledge management practices in software development organizations - an australian experience. Information and Software Technology, 50(6), 511-533. Aurum, A., Jeffery, R., Wohlin, C., & Handzic, M. (Eds.). (2003). Managing software engineering knowledge. Berlin, Germany: Springer. Basili, V. R., & Caldiera, G. (1995). Improve software quality by reusing knowledge & experience. Sloan Management Review, 37(1), 55-64. Basili, V. R., Caldiera, G., & Rombach, H. D. (1994). The experience factory. In J. J. Marciniak (Ed.), Encyclopedia of software engineering (pp. 469-476): John Willey & Sons. Bera, P., Nevo, D., & Wand, Y. (2005). Unravelling knowledge requirements through business process analysis. Communications of the Association for Information Systems, 16, 814-830. Bézivin, J., & Breton, E. (2004). Applying the basic principles of model engineering to the field of process engineering. UPGRADE, V(5), 27-33. Borghoff, U. M., & Pareschi, R. (1997). Information technology for knowledge management. Journal of Universal Computer Science, 3(8), 835-842. Carroll, J. M., & Rosson, M. B. (1992). Getting around the task-artifact cycle: How to make claims and design by scenario. ACM Transactions on Information Systems, 10(2), 181-212. Chapin, N. (2003). Software maintenance and organizational health and fitness. In M. Polo, M. Piattini & F. Ruiz (Eds.), Advances in software maintenance management: Technologies and solutions (pp. 1-31). Hershey, PA. USA: Idea Group Inc. Chapin, N., Hale, J. E., Khan, K. M., Ramil, J. F., & Tan, W.-G. (2001). Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 13(1), 3-30. Chin, G. J., Rosson, M. B., & Carroll, J. M. (1997). Participatory analysis: Shared development of requirements from scenarios. Paper presented at the Conference on Human Factors in Computing Systems (CHI97), Atlanta, GA, USA. Conradi, R., & Jaccheri, L. (1999). Process modelling languages. In J. C. Derniame, B. A. Kaba & D. Wastell (Eds.), Software process (Vol. LNCS 1500, pp. 27-52). Berlin: Springer. Cox, K., Phalp, K. T., Bleistein, S. J., & Verner, J. M. (2005). Deriving requirements from process models via the problem frames approach. Information and Software Technology, 47(5), 319-337. Davenport, T. H. (2007). Information technologies for knowledge management. In K. Ichijo & I. Nonaka (Eds.), Knowledge creation and management: New challenges for managers (pp. 97-117). New York, NY.: Oxford University Press. de Sousa, K. D., Anquetil, N., & Oliveira, K. M. d. (2004, 20-21 June). Learning software maintenance organizations. Paper presented at the 6th International Workshop on Learning Software Organization (LSO 2004), Banff, Canada.
Studying Knowledge Flows in Software Processes
65
[19] Desouza, K. C. (2003). Barriers to effective use of knowledge management systems in software engineering. Communications of the ACM, 46(1), 99-101. [20] Dias, M. G. B., Anquetil, N., & Oliveira, K. M. d. (2003). Organizing the knowledge used in software maintenance. Journal of Universal Computer Science, 9(7), 641-658. [21] Dingsøyr, T. (2005). Postmortem reviews: Purpose and approaches in software engineering. Information and Software Technology, 47(5), 293-303. [22] Dingsøyr, T., & Conradi, R. (2002). A survey of case studies of the use of knowledge management in software engineering. International Journal of Software Engineering and Knowledge Engineering, 12(4), 391-414. [23] Dingsøyr, T., Moe, N. B., & Nytrø, Ø. (2001). Augmenting experience reports with lightweight postmortem reviews. Lecture Notes in Computer Science, 2188, 167-181. [24] García, S., Graettinger, C., & Kost, K. (2006). Proceedings of the first international workshop for process improvement in small settings, 2005 (Special Report No. CMU/SEI-2006-SR-001). Pittsburgh, PA: Carnegie Mellon, Software Engineering Institute. [25] Haggie, K., & Kingston, J. (2003). Choosing your knowledge management strategy. Electronic Journal of Knowledge Management Practice, 4(June), Article 5. [26] Hansen, B. H., & Kautz, K. (2004, November 10-12). Knowledge mapping: A technique for identifying knowledge flows in software organizations. Paper presented at the European Conference on Software Process Improvement (EuroSPI 2004), Trondheim, Norway. [27] Harris, M., Aebischer, K., & Klaus, T. (2007). The whitewater process: Software product development in small it businesses. Communications of the ACM, 50(5), 89-93. [28] Jackson, M. (2005). Problem frames and software engineering. Information and Software Technology, 47(14), 903-912. [29] Jennex, M. E., & Olfman, L. (2005). Assessing knowledge management success. International Journal of Knowledge Management, 1(2), 33-49. [30] Kitchenham, B., Budgen, D., Brereton, P., & Woodall, P. (2005). An investigation of software engineering curricula. The Journal of Systems and Software, 74(3), 325-335. [31] Kitchenham, B. A., Travassos, G. H., Mayrhauser, A., Niessink, F., Schneidewind, N. F., Singer, J., et al. (1999). Towards an ontology of software maintenance. Journal of Software Maintenance: Research and Practice, 11, 365-389. [32] Ko, A. J., Myers, B. A., Coblenz, M. J., & Aung, H. H. (2006). An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Transactions on Software Engineering, 32(12), 971-987. [33] Koskinen, J., Salminen, A., & Paakki, J. (2004). Hypertext support for the information needs of software maintainers. Journal of Software Maintenance and Evolution: Research and Practice, 16(3), 187-215. [34] Land, L. P. W., Aurum, A., & Handzic, M. (2001). Capturing implicit software engineering knowledge. Paper presented at the 13th Australian Software Engineering Conference (ASWEC'01), Sydney, NSW, Australia. [35] Laporte, C. Y., Alexandre, S., & Renault, A. (2008). Developing international standards for very small enterprises. IEEE Computer, 41(3), 98-101. [36] Lethbridge, T. C. (2000). What knowledge is important to a software professional? IEEE Computer, 33(5), 44-50.
66
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
[37] Lethbridge, T. C., Singer, J., & Forward, A. (2003). How software engineers use documentation: The state of the practice. IEEE Software, 20(6), 35- 39. [38] Lientz, B. P. (1983). Issues in software maintenance. Computing Surveys, 15(3), 271278. [39] Lindvall, M., & Rus, I. (2003). Knowledge management for software organizations. In A. Aurum, R. Jeffery, C. Wohlin & M. Handzic (Eds.), Managing software engineering knowledge (pp. 73-94). Berlin: Springer. [40] Lutters, W. G., & Seaman, C. B. (2007). Revealing actual documentation usage in software maintenance through war stories. Information and Software Technology, 49(6), 576-587. [41] Maier, R., & Remus, U. (2002). Defining process-oriented knowledge management strategies. Knowledge and Process Management, 9(2), 103-118. [42] Mayrhauser, A. v., & Vans, A. M. (1995). Program comprehension during software maintenance and evolution. IEEE Computer, 28(8), 44-55. [43] Meehan, B., & Richardson, R. (2002). Identification of software process knowledge management. Software Process Improvement and Practice, 7, 47-55. [44] Monk, A., & Howard, S. (1998). The rich picture: A tool for reasoning about work context. Interactions, 5(2), 21-30. [45] Nissen, M. E. (2002). An extended model of knowledge-flow dynamics. Communications of the Association for Information Systems, 8, 251-266. [46] Nissen, M. E., & Levitt, R. E. (2004). Agent-based modeling of knowledge flows: Illustration from the domain of information systems design. Paper presented at the Hawaii International Conference on System Science (HICSS 2004), Big Island, Hi, USA. [47] Nwana, H. S. (1996). Software agents: An overview. Knowledge Engineering Review, 11(3), 205-244. [48] O'Leary, D. E. (1998). Using ai in knowledge management: Knowledge bases and ontologies. IEEE Intelligent Systems, 13(3), 34-39. [49] Oktaba, H. (2005, October 19-20). Moprosoft: A software process model for small enterprises. Paper presented at the First Int. Research Workshop for Process Improvement in Small Settings, Pittsburgh, Pennsylvania. [50] Oliveira, K. M., Anquetil, N., M.G, D., Ramal, M., & Meneses, R. (2003). Knowledge for software maintenance. Paper presented at the Fifteenth International Conference on Software Engineering and Knowledge Engineering (SEKE'03), San Francisco, CA. [51] OMG. (2002). Software process engineering metamodel specification (spem). Retrieved October 29, 2004, from http://www.omg.org/technology/documents/formal/ spem.htm [52] Polo, M., Piattini, M., & Ruiz, F. (2003a). A methodology for software maintenance. In M. Polo, M. Piattini & F. Ruiz (Eds.), Advances in software maintenance management: Technologies and solutions (pp. 228-254). Hershey: Idea Group Inc. [53] Polo, M., Piattini, M., & Ruiz, F. (Eds.). (2003b). Advances in software maintenance management: Technologies and solutions. Hershey, PA, USA: Idea Group Inc. [54] Rao, M. (Ed.). (2005). Knowledge management tools and techniques: Practitioners and experts evaluate km solutions. Amsterdam: Elsevier. [55] Richardson, I., & von Wangenheim, C. G. (2007). Why are small software organizations different? IEEE Software, 24(1), 18-22.
Studying Knowledge Flows in Software Processes
67
[56] Robillard, P. N. (1999). The role of knowledge in software development. Communications of the ACM, 42(1), 87-92. [57] Rodríguez-Elias, O. M., Martínez-García, A. I., Vizcaíno, A., Favela, J., & Piattini, M. (2007a). Identifying knowledge flows in communities of practice. In M. E. Jennex (Ed.), Knowledge management: Concepts, methodologies, tools, and applications (Vol. 2, pp. 841-849). Hershey, PA, USA: Idea Group Press. [58] Rodríguez-Elias, O. M., Martínez-García, A. I., Vizcaíno, A., Favela, J., & Piattini, M. (2008). A framework to analyze information systems as knowledge flow facilitators. Information and Software Technology, 50(6), 481-498. [59] Rodríguez-Elias, O. M., Martínez-García, A. I., Vizcaíno, A., Favela, J., & Piattini, M. (in press). Modeling and analysis of knowledge flows in software processes through the extension of the software process engineering metamodel. International Journal of Software Engineering and Knowledge Engineering. [60] Rodríguez-Elias, O. M., Martínez-García, A. I., Vizcaíno, A., Favela, J., & Soto, J. P. (2007b, June 16). Knowledge flow analysis to identify knowledge needs for the design of knowledge management systems and strategies: A methodological approach. Paper presented at the 9th Intl. Conf. on Enterprise Information Systems (ICEIS 2007), Funchal, Madeira – Portugal. [61] Rodríguez, O. M., Martínez, A. I., Favela, J., Vizcaíno, A., & Piattini, M. (2004a). Understanding and supporting knowledge flows in a community of software developers. Lecture Notes in Computer Science, 3198, 52-66. [62] Rodríguez, O. M., Martínez, A. I., Vizcaíno, A., Favela, J., & Piattini, M. (2004b, 20-24 Sep.). Identifying knowledge management needs in software maintenance groups: A qualitative approach. Paper presented at the Fifth Mexican International Conference on Computer Science (ENC'2004), Colima, México. [63] Ruiz, F., Vizcaíno, A., Piattini, M., & García, F. (2004). An ontology for the management of software maintenance projects. International Journal of Software Engineering and Knowledge Engineering, 14(3), 323-349. [64] Rus, I., & Lindvall, M. (2002). Knowledge management in software engineering. IEEE Software, 19(3), 26-38. [65] Schneider, K., von Hunnius, J.-P., & Basili, V. R. (2002). Experience in implementing a learning sofware organization. IEEE Software, 19(3), 46-49. [66] Scholl, W., König, C., Meyer, B., & Heisig, P. (2004). The future of knowledge management: An international delphi study. Journal of Knowledge Management, 8(2), 19-35. [67] Seaman, C. (2002, Octubre 2002). The information gathering strategies of software maintainers. Paper presented at the International Conference on Software Maintenance (ICSM'2002), Montreal, Canada. [68] Singer, J. (1998, 16-19 November). Practices of software maintenance. Paper presented at the International Conference on Software Maintenance, Bethesda, Maryland, USA. [69] Singer, J., Lethbridge, T. C., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. Paper presented at the CASCON'97, Toronto, Ontario, Canada. [70] Sparrow, J. (2001). Knowledge management in small firms. Knowledge and Process Management, 8(1), 3-16.
68
Oscar M. Rodríguez-Elias, Aurora Vizcaíno, Ana I. Martínez-García et al.
[71] Stewart, T. A. (2002). The case against knowledge management. Business 2.0, 3(February), 80. [72] Strohmaier, M., & Tochtermann, K. (2005). B-kide: A framework and a tool for business process-oriented knowledge infrastructure development. Journal of Knowledge and Process Management, 12(3), 171-189. [73] Tiwana, A. (2004). An empirical study of the effect of knowledge integration on software development performance. Information and Software Technology, 46(13), 899-906. [74] Vizcaíno, A., Favela, J., Piattini, M., & García, F. (2003, 2003). Supporting software maintenance in web repositories through a multi-agent system. Paper presented at the International Atlantic Web Intelligence Conference (AWIC’2003). [75] Ward, J., & Aurum, A. (2004). Knowledge management in software engineering describing the process. Paper presented at the 15th Australian Software Engineering Conference (ASWEC 2004), Melbourne, Australia. [76] Woitsch, R., & Karagiannis, D. (2003). Process-oriented knowledge management systems based on km-services: The promote approach. International Journal of Intelligent Systems in Accounting, Finance & Management, 11, 253-267. [77] Wong, K. Y. (2005). Critical success factors for implementing knowledge management in small and medium enterprises. Industrial Management & Data Systems, 105(3), 261279. [78] Zhuge, H. (2002). Knowledge flow management for distributed team software development. Knowledge-Based Systems, 15(8), 465-471.
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 69-92
ISBN: 978-1-60692-146-3 © 2009 Nova Science Publishers, Inc.
Short Communication C
SOFTWARE PRODUCT LINE ENGINEERING: THE FUTURE RESEARCH DIRECTIONS
1
Faheem Ahmed1,a, Luiz Fernando Capretz2,b and Muhammad Ali Babar3,c
College of Information Technology, United Arab Emirates University, Al Ain, United Arab Emirates 2 Department of Electrical & Computer Engineering, Faculty of Engineering, University of Western Ontario, London, Ont., Canada 3 Lero, University of Limerick, Ireland
Abstract The recent trend of switching from single software product development to lines of software products in the software industry has made the software product line concept viable and widely accepted methodology in the future. Some of the potential benefits of this approach include cost reduction, improvement in quality and a decrease in product development time. Many organizations that deal in wide areas of operation, from consumer electronics, telecommunications, and avionics to information technology, are using software product lines practice because it deals with effective utilization of software assets and provides numerous benefits. Software product line engineering is an inter-disciplinary concept. It spans over the dimensions of business, architecture, process and organization. The business dimension of software product lines deals with managing a strong coordination between product line engineering and the business aspects of product line. Software product line architecture is regarded as one of the crucial piece of entity in software product lines. All the resulting products share this common architecture. The organizational theories, behavior and management play critical role in the process of institutionalization of software product line engineering in an organization. The objective of this chapter is to discuss the state of the art of software product line engineering from the perspectives of business, architecture, organizational management and software engineering process. This work also highlights and
a
E-mail address:
[email protected] E-mail address:
[email protected] c E-mail address:
[email protected] b
70
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar discusses the future research directions in this area thus providing an opportunity to researchers and practitioners to better understand the future trends and requirements.
1.1. Introduction In today’s digitized economy, organizations endeavor, to the best of their abilities, to capture a major portion of the market share to be profitable. Software organizations are also continuously innovating and improving business operations such as technology, administration, and product development process. Their major concern is the effective use of software assets, thus reducing considerably the development time and cost of software products to capture market segments. For many organizations that deal in wide areas of operation, from consumer electronics, telecommunications, and avionics to information technology, the future of software development is in software product lines. Software product lines are promising, with the potential to substantially increase the productivity of the software development process and emerging as an attractive phenomenon within organizations that deal with the software development. Software product lines involve assembling products from existing core assets, and then growing those core assets continuously as the production proceeds. The software industry has shown a growing interest in the concept of software product line. One of the major concerns of software development organizations is the effective utilization of software assets, thus reducing considerably the development time and cost of software products. Many organizations that deal in wide areas of operation, from consumer electronics, telecommunications, and avionics to information technology, are using software product lines practice because it deals with effective utilization of software assets. Clements et al. [1] report that software product line engineering is a growing software engineering subdiscipline, and many organizations including Philips, Hewlett-Packard, Nokia, Raytheon, and Cummins are using it to achieve extraordinary gains in productivity, time to market, and product quality. Clements 2] defines the term software product line as a set of softwareintensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission, and are developed from a common set of core assets in a prescribed way. Some other terminologies for “software product line” that have been widely used in Europe are “product families,” “product population,” and “system families”. The concept of a software product line is a comprehensive model for an organization building applications based on common architectures and other core assets [3]. Clements [2] defines the term “software product line” as a set of software-intensive systems sharing a common managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way. The Software Engineering Institute (SEI) proposes the Product Line Technical Probe (PLTP) [4], which is aimed at discovering an organization’s ability to adopt and succeed with the software product line approach. The framework is divided into three essential activities: product development, core asset development, and management. van der Linden [5] pointed out that in 1995 the Architectural Reasoning for Embedded Systems (ARES) project began in Europe to provide architectural support for developing product families. In the overview of another European project, Engineering Software Architecture, Processes and Platforms for SystemFamilies (ESAPS) [6], a system family is defined as a group of systems sharing a common,
Software Product Line Engineering: The Future Research Directions
71
managed set of features that satisfy core needs of a scoped domain. The main objectives of system families are to reduce development efforts and to handle the impact of growing system complexity. Ommering [7] introduced another term called “product population”, which is a collection of related systems based on similar technology but with many differences among them. Northrop [8] stresses that fielding a product line involves core asset development and product development using core assets under the aegis of technical as well as organizational management. The essential activities of the software product line process are core asset development, product development, and management. All the three activities are linked and are highly iterative in nature. The link among the activities establishes a communication path among them to provide feedback. There is no fixed order of execution of these activities. They can be performed in any order and they give feedback to other activities for their execution. The feedback received at each activity is used to accommodate changes and modify the process. The software product line process can be described in terms of four simple concepts. The four simple concepts that interact with each other are: software assets, a decision model for products, a production mechanism process, and output products that result from software product line activity. Krueger [9] states that software asset inputs are a collection of software assets that can be configured and assembled in different ways to create all of the products in a product line. The decision model for products elaborates the requirements of the products within a product line. Production mechanism and process defines the procedures for assembling and configuring products from existing software assets. Software products are the collection of all products that can be produced from the product line. Associated processes are performed with those basic four concepts for the establishment of a software product line. Core assets in a software product line may include architecture, reusable software components, domain models, requirement statements, documentation, schedules, budgets, test plans, test cases, process descriptions, modeling diagrams, and other relevant items used for product development. There is no specific definition for core asset inclusion, except that it is an entity used for development purposes. The goal of core asset development is to establish the production capability of developing products [2]. The major inputs to the core asset development activity are: product constraints, styles, patterns, frameworks, production constraints, production strategy, and the inventory of pre-existing assets. The outputs of core assets development are software product line scope, core assets and the production plan. Software product line scope describes the characteristics of the products developed. The production plan gives an in-depth picture how products will be developed from core assets. Core assets are those entities that may be used in the product development. The collection of core assets is termed as core asset repository and the initial state of the core asset repository depends upon the type of approach being used to adopt software product line approach within an organization. In product development activity, products are physically developed from the core assets, based on the production plan, in order to satisfy the requirements of the software product line. The essential inputs of product development activity are requirements, product line scope, core assets and the production plan. Requirements describe the purpose of the product line along with functionalities and characteristics of the products developed. Product line scope describes qualification criteria for a product to be included or excluded from software product line based on functional and non-functional characteristics. The production plan describes a
72
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
strategy to use the core assets to assemble products. A product line can produce any number of products depending upon the scope and requirements of the software product line. The product development activity iteratively communicates with core asset activity and adds new core assets as products are produced and software product line progresses. Management plays a vital role in successfully institutionalising the software product line within an organization, because it provides and coordinates the required infrastructure. Management activity involves essential processes carried out at technical and organizational levels to support the software product line process. It ensures that necessary resources must be available and well coordinated. The objective of “Technical Management” is to oversee the core asset and product development activities by ensuring that the groups who build core assets and the groups who build products are engaged in the required activities, and are following the processes defined for the product line [4]. Technical management plays a critical role in decision-making about the scope of software product line based on requirements. It handles the associated processes of software development. Northrop [8] summarized the responsibilities of organizational management, which are: structuring an organization, resource management and scheduling, cost control and communication. Organizational management deals in providing a funding model for the software product line in order to handle cost constraints associated with the project. It ensures a viable and accurate communication and operational path between essential activities of software product line development because the overall process is highly iterative in nature. The fundamental goal of the organizational management is to establish an adoption plan, which completely describes a strategy to achieve the goals of software product line within an organization. The major responsibility of the management is to ensure proper training of the people to become familiar with the software product line concepts and principles. Management deals with external interfaces for smooth and successful product line and performs market analysis for internal and external factors to determine the success factor of software product line. Management performs organizational and technical risk analysis and continues tracking critical risk throughout the software product line development. van der Linden [5] reports that the term “product family” or “system family” is used in Europe whereas in the United States the term “software product line” is commonly used. As Europeans were working on product family engineering, researchers in the United States founded the SEI’s product line initiative, the major reason for this being that until 1996 the United States and European communities in this field worked independently. The objective of the software product line is to address the specific needs of a given business. Krueger [9] considers that the objective of a software product line is to reduce the overall engineering effort required to produce a collection of similar systems by capitalizing on the commonality among the systems and by formally managing the variation among the systems. A software product line gives an excellent opportunity to establish a production facility for software products based on a common architecture. To capture various market segments, it provides a means for the reuse of assets, thus reducing development time and the cost of software products. The software product line increases the quality and reliability of successive products, thereby gaining the confidence of customers. According to van der Linden [5], whenever an organization wants to establish product family development it must keep a number of things under consideration. In Europe, the acronym BAPO [5], is very popular for defining the process components associated with software product lines. BAPO is considered critical in its consideration of how products resulting from software product lines make a
Software Product Line Engineering: The Future Research Directions
73
profit. Software engineering, business, management and organizational sciences provide foundations for the concept of software product line engineering, and thus, it has become an inter-disciplinary concept.
1.2. Businesss of Software Product line Engineering Today, all businesses are experiencing greater competition, and customers’ expectations continuously increase as technology advances at an unprecedented rate of growth. The rapid and continual changes common to the present business environment not only affect business itself but also have a profound impact on production. Software is perhaps the most crucial piece of a business entity in this modern marketplace, where important decisions need to be made immediately. Organizations that fail to respond appropriately do not survive longer. The keys to success are in continuously monitoring customers and the competitors and in making improvement plans based on observations and measurements. Business is perhaps the most crucial factor in a software product line, mainly due to the necessities of long-term strategic planning, initial investment, longer payback period and retaining the market presence. Business is perhaps the most crucial dimension in the software product family process, mainly due to the necessities of long-term strategic planning, initial investment, longer payback period, and retention of the market presence. The “Business” in BAPO is considered critical because it deals with the way the products resulting from software product lines make profits. Bayer et al [10] at Fraunhofer Institute of Experimental Software Engineering (IESE) develop a methodology called PuLSE (Product Line Software Engineering) for the purpose of enabling the conception and deployment of software product lines within a large variety of enterprise contexts. PuLSE-Eco is a part of PuLSE methodology, deals with defining the scope of software product lines in terms of business factors. Pulse-Eco identifies various activities, which directly address the business needs of software product lines such as: system information, stakeholder information, business objectives and benefit analysis. van der Linden et al. [11] identify some main factors in evaluating the business dimension of software product line such as: identity, vision, objectives and strategic planning. They classified the business maturity of software product line into five levels in the ascending order: reactive, awareness, extrapolate, proactive and strategic. Clements and Northrop [4] highlight customer interface management, market analysis, funding, and business case engineering as important activities from the perspectives of managing the business of software product line. Kang et al. [12] present a marketing plan for software product lines that includes market analysis and marketing strategy. The market analysis covers need analysis, user profiling, business opportunity, time to market and product pricing. The marketing strategy discusses product delivery methods. Toft et al. [13] propose “Owen molecule model” consisting of three dimensions of social, technology and business. The business dimension deals with setting up business goals and analyzing commercial environment. Fritsch and Hahn [14] introduce Product Line Potential Analysis (PLPA), which aims at examining the product line potential of a business unit through discussions with managers of the business unit because in their opinion they know the market requirements, product information and business goals of the organization. Schmid and Verlage [15] discuss the successful case study of setting up software product line at Market Maker and highlights market and competitors analysis, vision of potential market segment,
74
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
and products as significantly important activities. Ebert and Smouts [16] weight marketing as one of the major external success factors of product line approach and further concluded that forecasting, ways to influence market, strong coordination between marketing and engineering activities, are required for gaining benefits from product line approach. Strategic plans are the focus of an organization’s endeavors to accomplish the desired level of achievement in a particular area. Strategic planning starts with elaborating strategic objectives. Niemelä [17] highlighted eight different strategies for adopting software product lines in an organization: minimizing risk, extending market share, maximizing end-user satisfaction, balancing cost and potential, balancing cost, customer satisfaction and potential, and maximizing potential. Niemelä [17] further concluded that a company has to evaluate the current status of their business, architecture, process, and organizational issues before making a decision about choosing one strategy out of those in order to achieve desired benefits. The software product line process needs resources, which must be delegated in strategic plans. Strategic planning must clearly outline what is to be developed from the software product line in order to gain competitive advantages and capture market segments to achieve strategic targets. Strategic plans are required to maintain organizational wide efforts to identify and exploit attractive long-range business opportunities by having the software product line in practice. The benefits of being the first in the market have long been recognized in the business sector; pioneers often gain a sustainable competitive advantage over followers, because, initially, they are the only solution-providers in a particular market segment. Thus, they usually capture a bigger portion of the market because they were first. It becomes very difficult for successors to gain a share of the market segment, especially in the case of software, where migration to other software is relatively uncommon. The timing for technology-based products entering the market is even more critical for the profitability and competitive position of an organization. The right product at the right time has a high potential of success. Order of market entry is perceived as a crucial business decision, with a long-lasting and profound impact on the performance of an organization in capturing and retaining the market. Appropriate timing to launch a software product into the market is even more essential for software development organizations. Timing is essential in launching a new product from the software product line in order to capture major shares of the market. The order of entry to the market depicts the delivery schedule for the software product family and provides guidelines to developers about development schedules. Organizations consider brand name a crucial catalyst of business success. A brand is regarded as both a promise of quality to customers and a point of comparison with other products or services. Bennett [18] defined brand as a name, term, sign, symbol, design, or any combination of these concepts that is used to identify the goods and services of a seller. Brand name products generally have high potential for increasing an organization’s business. Branded product serve, as an interface between customers and the organization, and loyalty to a brand is a kind of word-of-mouth advertisement from customers. Brand name strategy has also been successfully adopted in software development. Many successful brands in software, such as Windows®, AutoCAD®, and MATLAB®, successfully retain a significant number of customers, thus capturing a major portion of the market segment. But currently there is gab between software product line engineering and brand name strategy; many different products not originating from one software product line can be plugged under one marketed product line. Windows® is a working example of this scenario. Despite this fact there are successful cases that are using brand name strategy in software product lines concept. The product line
Software Product Line Engineering: The Future Research Directions
75
of Symbian operating system for mobile phones is an example of this scenario. Long range of products under this brand name is currently successfully installed in the handsets of Nokia, Sony Ericsson, Samsung and Panasonic etc. Jaasksi [19] presented the case study of developing software product line of mobile browsers under the brand name of “Nokia Mobile Browser” at Nokia is also an example of current use of brand name strategy in software product lines. The concept of market orientation provides an advantage over competitors by identifying what customers’ want, and then offering products that are different and superior to those offered by competitors. Market orientation deals with the acquisition, sharing, interpretation, and use of information about customers and competitors, both of which have a significant impact on the performance of the business. Birk et al. [20] defines market orientation in context of software product lines as whether the organization targets a specific market segment without a specific customer in mind or addresses individual customer projects. The software product line deals with developing a considerable number of products to capture various market segments, thus providing justification for a product line. Market orientation provides imperative information about the concerns and requirements of customers, which needs to be accommodated in the successive products from a product line. Pulse-Eco [21] illustrates various activities associated with market orientation for successful adoption of software product lines concept in an organization. It considers collecting and analyzing stakeholders’ information is helpful in defining the product line scope. Business success is highly dependent on the extent to which customers are satisfied with an organization’s product and services, as well as how they win the loyalty of customers by improving their relationships management. Relationships management plays a significant role in successful software product line development. Excellent working relationships with customers allow the developers to improve the performance and functionalities of successive products from the product line by better understanding the customers’ requirements and learning about market trends from the end users. The software product line can play a significant role in the business vision because it tends to produce long-term benefits to the organization. A clear statement about business vision will guide practitioners of the software product line to establish a production facility in order to meet the future goals of the organization. By including the software product line in the business vision, an organization can stream line its business operations in order to capitalize on its market audience for profitable venture. Wijnstra [22] concluded that a complete business roadmap is needed to describe what is expected from the software product lines in the years to come and how it will fit in the plan for the release of new products. The key to a successful business in today’s competitive environment is innovation. Organizations are continuously adopting innovations in major areas of business operations, such as technology, administration, and production processes. . Organizations with designs on capturing a major share of the market, in order to increase business, spend heavily on research and development. Business objectives influence research and development efforts because the order of a product’s entry into the market can make a significant difference in achieving strategic goals. Thus, research and development in technology, administration, processes, and product produce enduring results. The software product line is a relatively new concept, and a lot of research and development in process definition and development methodology is in progress. The research is occurring at various levels of industry and academia to improve the process and product development activity of the software product line for the successful industrialization of this valuable concept. Organizations are trying to institutionalize this concept in innovative ways to make the most
76
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
effective use of it. Böckle [23] highlighted some measures of innovation management in software product line organizations, which include a planned innovation process, clear roles and responsibilities definition for innovation management structure. Böckle [23] further stressed that the evolution of the product portfolio, platform, variability model, and reference architecture shall be planned with further innovations in mind. The business of software product line engineering has a profound impact on the long term planning and vision of the organization in the market place. The significance of the business factor in product line engineering requires a better understanding of various non-software development factors which originates from business theory. This makes software product line engineering multi-disciplinary paradigm which needs contributions from many experts in different areas of knowledge and expertise. Although business has always been highlighted as one of the critical success factors in product line engineering but has given least attention by product line engineering community to streamline the concept and integrate with software development efforts. Some of the leading areas of core research in software product line engineering and business factors are as follows:
Development of a business case and methodology to evaluate the significance in terms of cost and benefits for an organization. An organizational wide economic model for developing and managing software product line engineering, emphasizing return on investment. A methodology to develop production plan for the resulting products and allocation of resources. The issues of translating the business requirements into product line requirements which involves from non-technical (business group) to technical (architecture group). The role and impact of strategic planning of the organization in developing and managing software product line. How organization can achieve its strategic goals using the product line approach. Decision planning and implementation in allocating and committing resources to achieve the long-range business goals. Marketing plans to identify and exploit attractive long-range business opportunities. How market orientation provides imperative information about the concerns and requirements of customers, which need to be accommodated in successive products from a product line. Customer orientation enables an organization to develop customer-centered products. How this information assists in the domain- and application-engineering activities of the software product line development to capture market segments. Evaluation of appropriate timing to launch a software product into the market from product line in order to maximize the profit. The role of business vision in managing and developing software product line. Knowledge management of customers, marketing and competitors. Methodology to evaluate the business performance of the organization dealing with software product line engineering. Methodology to evaluate the practice of various key business factors in organization.
Software Product Line Engineering: The Future Research Directions
77
1.3. Institutionalization of Software Product Line Engineering The “Organization” in BAPO is considered critical because it deals with the way the organization responds, adopts and institutionalizes this concept. Institutionalization is the process by which a significantly new structure or practice is incorporated into a system of existing structures and practices [24]. Clements and Northrop [4] elaborate the institutionalization of software product line in an organization from the perspectives of product development and core assets development. Institutionalizing a software product line from the aspects of product development process anticipate the product development as a routine and predictable activity in an organization to achieve the product line goals of the organization. Clements and Northrop [4] emphasis that institutionalizing a software product line from the perspectives of managing and developing a core assets repository for software product line involves improving the processes that are associated with building, maintaining, and evolving the core assets and making those processes a part of standard organizational practice. In short institutionalization of software product lines refers to the wide acceptance of the concept in the roots of the organization. It involves integrating or improving the processes within organization that are associated with a product line infrastructure, and introducing those processes as a part of organizational character. The whole institutionalization process involves an organizational level culture and strong commitments in acquiring knowledge, skills and motivations to effectively initiate, launch and manage software product lines. Institutionalization of software product lines require that the concept has been entrenched at all levels of the organization, and it is supported with a necessary infrastructure of organizational wide guidelines, required training, and required resources. Successfully institutionalization of software product line in an organization has a profound impact on the product development behavior of the organization. It changes the mindset of the organization from single system development to a family of software products. The organizational theory focuses on the design and structures of the organization dealing in software product line. The organizational behavior aims at understanding the behavior, attitude and performance of the people. Software product line requires enriching this concept within the roots of the overall organizational behavior. Organizational management plays a vital role in successfully institutionalizing software product line within an organization because it provides and coordinates the infrastructure required. Initiating and launching a software product line within an organization to gain benefits out of this approach is not sufficient. The alignment of organizational theory, organizational management, and organizational behavior are required in the process of institutionalization of software product line in an organization. Thus, organizational factors play a key role in institutionalizing software product lines within an organization. Software product line is an inter-disciplinary concept, which has its roots in software engineering, business, management and organizational sciences. The organization in the business of software product line has to deal with multiple organizational factors in addition to their efforts in software development in order to institutionalize software product line, which in turn has the potential to achieve maximum benefits out of this approach. The organizational dimension is perhaps the least addressed area in software product line research due to relatively a new concept in software engineering paradigms. Much of the efforts have been spent on process, architecture and business aspects of the product line. Some scenarios of organizational structure for software product line are presented. The
78
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
researchers generally highlight that domain-engineering unit and several applicationengineering units are required from organizational structure standpoint. Bosch [25] presents four organizational models for software product lines: development department, business units, domain engineering units, and hierarchical domain engineering units. Bosch [25] also points out a number of factors that influence the organizational model such as geographical distribution, project management maturity, organizational culture and the type of systems. Macala et al. [26] report that software product line demands careful strategic planning, a mature development process, and the ability to overcome organizational resistance. Dikel et al. [27] share their experiences about initiating and maintaining software product lines at Nortel and discuss organizational, management and staffing issues grouped into a set of six organizational principles which they believe are critical in the long-term success of a software product line. Jacobsen et al. [28] focus on roles and responsibilities of personals within organizations dealing with software product lines. Mannion [29] elaborates that the management issues, organizational structure, culture and learning in context of successfully adopting the concept of software product line engineering needs close attention. Koh and Kim [30] concludes that all members of an organization experience and share their own success stories under existing processes and organizational structure in order to successfully adopt software product line approach. Clements and Northrop [4] discuss organizational issues of software product line and identified four functional groups, i.e. the architecture group, the component-engineering group, the product line support group and the product development group. The organizational dimension of software product lines deal with the way the organization is able to deal with complex relationships and many responsibilities [11]. Toft et al. [13] propose “Owen molecule model” consisting of three dimensions of organizational, technology and business. The organizational dimension of Owen molecule model deals with teams hierarchy, individual roles, operational models, individual interaction and communication etc. Introducing software product line practice to an organization significantly impacts the entire organization by fundamentally changing development practices, organizational structures, and task assignments [13]. Bayer et al. [32] at Fraunhofer Institute of Experimental Software Engineering (IESE) develop a methodology called PuLSE (Product Line Software Engineering) for the purpose of enabling the conception and deployment of software product lines within a large variety of enterprise contexts. PuLSE-BC is a technical component of PuLSE methodology and it deals with the ways to baseline organization and customized the PuLSE methodology to the specific needs of the organization. One of the support components of PuLSE is organization issue, which provide guidelines to set up and maintain the right organization structure for developing and managing product lines. According to Birk et al. [31] introducing product line development to an organization can fundamentally change the development practices, organizational structures, and task assignments. These changes can in turn impact team collaboration and work satisfaction. Verlage and Kiesgen [33] report the case study of successful adoption of software product line and conclude that organizational structure and change management are significantly important areas of concern. Organizational culture refers to the working environment in an organization. Some of the key process activities of software product line engineering, including domain engineering, software product line requirements engineering, commonality and variability management and business case engineering etc., require a lot of team effort, group discussion and innovation. The studies in organizational culture highlight two types of cultures: closed and
Software Product Line Engineering: The Future Research Directions
79
open. In closed organizational cultures, the decisions are made at the higher levels and are directly dictated to the lower levels without considering the views and observations of most employees. In contrast, open organizational cultures make decisions on the basis of discussions and employee involvement. Software product line engineering requires a culture of openness, where employees have the chance to participate in discussions and have the power to express their views. For example, variability management is one of the critical process elements that require an active involvement from various parts of the organization, such as the business unit and the development unit, to specify areas for expansion in the product line architecture and introducing product specific functionalities. An organizational culture that supports teamwork, sharing of experiences, innovation and learning has relatively greater potential institutionalizing software product line engineering. Particularly, an organization with a culture that supports the reusability of software assets is more likely to succeed in moving from single product development to a systematic line of products. Organizational commitment concerns the willingness of individuals and groups to achieve the long term strategic objectives of an organization. The payback period of software product line engineering is relatively longer than the single product development approach. Consequently, this transitional period requires a strong commitment from individuals, groups and management to adopt the software product line engineering concept and to exercise patience with its development process. The organizational policies such as business vision and strategic planning must highlight the concept of software product line engineering as a priority in order to reflect the organizational commitments. Furthermore, these policies must be well communicated to the employees so that they understand the significance of this approach in achieving the organizational goals. As well, the success of any long-term strategy in an organization necessitates the commitment of its employees. The management has to create a positive working environment in order to increase the level of employee commitment. Such an environment can be achieved through well-defined job placement, promotion strategy, appreciation and reward system, job security and competitive compensation. An organization is the planned coordination of activities of a number of people for the achievement of some common, explicit purpose or goal, through a division of labor and function, and through a hierarchy of authority and responsibility [34]. Organizational theories provide guidelines for developing organizational structures in order to accomplish the goals of a company. Wilson and Rosenfeld [35] define organizational structure as the established pattern of relationships between the parts of an organization, outlining communication as well as control and authority. According to Gordon [36] organizational structure refers to the delineation of jobs and reporting relationships in an organization and coordinates the work behavior of employees in accomplishing the organization’s goals. The structure of an organization is generally not a static phenomenon, since organizations tend to change their structures under the circumstances of changing goals or technologies. The rapid and continual changes common to the present technological environment necessitate that organizations adopt changes through a well defined change management plan. Beckhard and Harris [37] consider organizational change as a movement from the present state of the organization to some future or target state. Furthermore, Todd [38] defines change management as a structured and systematic approach, which provides a conceptual framework that encompasses strategy, politics, people and process. Cao et al. [39] observe that organizational change shows the diversity of an organization, and it also illustrates the
80
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
integration of technical and human activities that have interrelated functions in the organization. The successful implementation of any process methodology ultimately depends on how people perceive that change. A certain degree of resistance is quite normal when a new technology is introduced to an organization. However, this resistance will disappear if people understand that the change is positive and is in their best interest as well as that of the organization. Effective change management therefore depends partly on how the strategy is communicated to the people responsible for the implementation. When people interact with each other, the potential for conflict is present. This potential exists in different areas, as it could be either personal or task related. Walls and Callister [40] maintain that conflict is a process in which one party perceives that its interests are being opposed or negatively affected by another party. Conflict management consists of diagnostic processes, interpersonal styles, negotiating strategies, and other interventions that are designed to avoid unnecessary conflict and to reduce or resolve excessive conflict [41]. Hellriegel et al. [42] introduce four basic forms of conflicts in an organization: goal, cognitive, affective, and procedural. Moreover, Jehn [43] distinguishes between two kinds of intra-group conflict: task conflict and relationship conflict. Task conflict is a perception of disagreement among group members or individuals regarding the content of their decisions. It involves differences in viewpoints, ideas and opinions, whereas relationship conflict is a perception of interpersonal incompatibility and includes annoyance and animosity among individuals [44]. In the software product line engineering, organizational learning can be classified into two domains: external and internal. External learning involves necessary knowledge about customers, competitors, external environments and market segments. This knowledge is necessary in order to effectively utilize the product line by exploiting product characteristics. The domain engineering, the product line requirements and the business case engineering, etc. require that the organization has established procedures and a means to acquire external learning. Overall, this type of learning helps an organization to capture a major market share. Internal learning, on the other hand, requires acquiring, transferring and sharing a software product line methodology, ideas for process improvement and an understanding of the cross functional requirements of product lines in individuals, groups and the organization. Learning is a continuous process, especially for organizations that attempt to institutionalize software product lines. In particular, learning from experience and mistakes further facilitates improvement in the software product line engineering process. One of the major concerns of software development organizations is the effective utilization of software assets, which has the potential to considerably reduce the time and cost of developing software products. Software is perhaps the most crucial piece of business entity in this modern marketplace, where important decisions need to be made immediately. The studies in organizational behavior help in understanding how people, as individuals and groups, deal with managing and developing product line engineering in an organization. The relatively longer payback period of software product line engineering requires a consistency in organizational behavior in order to achieve the strategic objectives of the organization. Establishing a software product line requires setting up the internal structure of the organization and other supporting mechanisms, such as coordination and communication. The concept of a software product line entails a structure of overlapping processes rather than fixed and static ones. The theoretical foundations of this concept divide the overall engineering process into two broad areas, application and domain engineering, and involve a
Software Product Line Engineering: The Future Research Directions
81
stronger coordination and communication between them. The identification and mapping of the roles to the engineering processes requires interpretation and action from management. Verlage and Kiesgen [45] present a case study documenting the successful implementation of software product lines in their organization. As a result, they report that the roles and mapping of the roles to the processes are not fixed; rather, they are interchangeable, or more precisely, dynamic. The organizations that have well defined structures incorporating clearly identified roles of individuals, in addition to strong coordination and communication, are more likely to institutionalize a software product line in comparison to the organizations with structures not supporting coordination and communication. The process of evolving from single product development to a line of products is a significant change in an organization. During this procedure, almost every software development activity needs to be changed. For example, in the case of requirements engineering, an organization has to deal with product specific requirements engineering as well as product line specific requirements engineering. The product specific requirements engineering involves identifying the variability among products, whereas product line requirements engineering entails detecting the commonality among products. Furthermore, there is need to introduce trade off analysis for commonality and variability management. Introducing a new practice such as a product line is relatively difficult in the existing setup of an organization, especially if it is not being introduced with a proper change management plan. Even the best strategy is bound to fail if there is a consistent resistance to innovation and new technology from within the organization. Organizations that communicate the importance of this change via clear guidelines and the establishment of a road map for their employees are more successful in institutionalizing software product lines. Although organization has always been highlighted as one of the critical dimension in product line engineering but has given least attention by product line engineering community to streamline the concept and integrate with software development efforts. Some of the leading areas of core research in software product line engineering and organization dimension are as follows:
Conflict management planning to resolve and handle conflict in an organization dealing with product line approach. The organizational structure needs to be explored in order to provide a suitable structure which defines specific roles and responsibilities of object. Many traditional organizational structures have been studied for the application in product line environment but they need to be enhanced to accommodate the product line concept. Organizational learning procedures and guidelines to adopt product line approach and switching from traditional product development for single system to line of products. A study to develop strategies to incorporate and monitor the organizational communication process. Change management plans and implementation procedures. Effective utilization of core assets and their management. Developing plans to start and maintain an infrastructure for product line development. Knowledge management in organization for effective use and dissemination of knowledge across the boundaries of the organization.
82
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
Human resource management across organization to provide necessary resources for product line infrastructure. Inter group trust management to enhance the productivity of the product line process. A methodology to assess the organizational dimension of software product line process and to define improvement plans.
1.4. Software Product Line Architecture Software architecture has been a key area of concern in software industry due to its profound impact on the productivity and quality of software products. This is even more crucial in case of software product line, because it deals with the development of a line of products sharing common architecture and having controlled variability. Software architecture has a history of evolution and over a decade the software industry is observing and reporting refinements and advancements. Now the trends in software architecture for single product development have turned into software product line architecture for line of resulting products. Software architecture is the structure of the components of a program or system, their interrelationships, and the principles and guidelines governing their design and evolution [46]. Software architecture has a long history of evolution and in this modern age this transformation leads towards software product line architecture, where the concern is not a single product development rather the focus is on multiple product development by sharing the same architecture. Pronk [47] defines software product line architecture as an ultimate reuse in which the same software in reused for an entire class of products with only minimal variations to support the diversity of individual product family members. According to Jazayeri et al. [48] software product line architecture defines the concepts, structure, and texture necessary to achieve variation in features of variant products while achieving maximum sharing parts in the implementation. Mika and Tommi [49] further elaborate that software product line architecture can be produced in three different ways: from the scratch, from existing product group, or from a single existing product. Software product-line architecture is a powerful way to control the risks and take advantage of the opportunities of complex customer requirements, business constraints, and technology, but its success depends on more than technical excellence [50]. The software product line architecture captures the central design of all products and allows for the expression of variability and commonalities of the product instances, the products are instantiated by configuring the architecture and customizing components in an asset library [51]. The “Architecture” in BAPO is considered critical because it deals with the technical means to build an architecture that is aimed to share by a number of products from the same family. Van der Linden et al. [11] identify some main factors in evaluating the architecture dimension of software product line such as: software product family architecture, product quality, reuse levels and software variability management and classify the architecture maturity of software product line into five levels in the ascending order: independent product development, standardized infrastructure, software platform, variant products and self-configurable products. Birk et al. [31] conclude that explicit documentation of the software product line architecture, platform features, and generic interfaces is important for the product teams to understand the reusable assets.
Software Product Line Engineering: The Future Research Directions
83
The methodologies developed for software product line development either in general or specific to particular application domain consider domain engineering as an integral activity of the overall product line process and has profound impact on building the architecture for the product line. Bayer et al. [52] at Fraunhofer Institute of Experimental Software Engineering (IESE) develop a methodology called PuLSE (Product Line Software Engineering) for the purpose of enabling the conception and deployment of software product lines within a large variety of enterprise contexts. PuLSE-DSSA is a part of PuLSE methodology, which deals with developing the reference architecture for software product line. Knauber et al. [53] further elaborate that the basic idea of PuLSE-DSSA is to incrementally develop reference architecture guided by generic scenarios that are applied in decreasing order of architectural significance. Researchers at Philips® have developed Component-Oriented Platform Architecting (CoPAM) [54] method for the software product lines of electronics products. CoPAM assumes a strong correlation among facts, stakeholder expectations, any existing architecture and the institutions about possible architects in developing software product line architecture. Weiss and Lai [55] discuss the development of Family-Oriented Abstraction Specification and Translation (FAST) method for software product line process and successful use at Lucent Technologies®. FAST method covers a full software product line engineering process with specific activities and targeted artifacts. It divides the overall process of software product line into three major steps of domain qualification, domain engineering and application engineering. Researchers at IESE developed another methodology called KobrA [56], which defines software product line engineering process with activities and artifacts. The process of software product line engineering is divided into framework engineering and application engineering with their sub steps. These steps cover the implementation, releasing, inspection and testing aspects of product line engineering process. Kang et al. [57] propose a Feature Oriented Reuse Method (FORM), which is an extension to the Feature-Oriented Domain Analysis (FODA) method to cover the aspects of software product lines. FORM provides a methodology to use feature models in developing domain architectures and components reusability. Researchers at the VTT technical research centre of Finland have developed Quality-driven Architecture Design and Quality Analysis (QADA) method for developing and evaluating software architectures with emphasis on product line architecture. Matinlassi [58] has reported the comparison of software product line architecture design methods including CoPAM, FAST, FORM, KobrA and QADA, and concluded that these methods do not seem to compete with each other, because each of them has a special goal or ideology. The concepts of commonality and variability management inherently belong to domain engineering are gaining popularity over time due to extensive involvement in software product line concept. According to Coplien et al. [59] commonality and variability analysis gives software engineers a systematic way of thinking about and identifying the product family they are creating. Commonality management deals with the way features and characteristics that are common across the products belong to same product line whereas variability management is other way round. Variability management handles the way the variable features and characteristics are managed in different products of the a product line. Software product line requires systematic approaches to handling commonality and variability and the core of successful software product line management largely relies on effective commonality and variability management. Kang et al. [57] discuss the use of feature models
84
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
to manage commonality and variability in software product line. Lam [60] presents variability templates and variability hierarchy based variability management process. Thompson and Heimdah [61] propose a set based approach to structure commonalities and variability in software product lines. Kim and Park [62] describe the goal and scenario driven approach for managing commonality and variability on software product line. Ommering [63] observes that the commonalities are embodied in an overall architecture of software product line, while the differences result in specifying variation points and by filling those variation points, individual products can be derived. Other researchers [64] [55] [4] have stressed that the software architecture for a product family must address the variability and commonality of the entire set of products. Requirements modeling have always been a key architecture concern in software devolvement, because it provides a better understanding of the requirements of the architecture and allows visualizing the interconnection of various sub-units. Since the popularity of object oriented design, Unified Modeling Language (UML) has become an industry standard, many researchers have attempted to introduce UML in visual modeling of software product line architecture by presenting enhancement in the current state. Birk et al. [31] stress that the organization dealing with software product line architecture should describe the architecture using well-established notations such as UML and the architecture description should cover all relevant architectural views and use clearly defined semantics. Gomma and Shin [65] describe a multiple-view meta-modeling approach for software product lines using the UML notation, which defines the different aspects of a software product line such as: the use case model, static model, collaboration model, state chart model, and feature model. Zuo et al. [66] present the use of problem frames for product line engineering modeling and requirements analysis and demonstrate some additional notation to support the requirements management and variability issues in product line problem frames. Dobrica and Niemelä [67] discuss how UML standard concepts can be extended to address the challenges of variability management in software product line architecture and introduce some extensions in UML standard specification for the explicit representation of variations and their locations in software product line architectures, this work is based on previously mentioned QADA methodology. Eriksson et al. [68] describe a product line use case modeling approach named PLUSS (Product Line Use case modeling for Systems and Software engineering) and conclude that PLUSS performs better than modeling according to the styles and guidelines specified by the Rational Unified Process (RUP) in the current industrial context. Software architecture evaluation techniques are generally divided into two groups: qualitative evaluation and quantitative evaluation. Qualitative techniques include scenarios, questionnaires, checklists etc. Quantitative techniques cover simulations, prototypes, experiments, mathematical models, etc. Etxeberria and Sagardui [69] highlight the issues that can arise when evaluating product line architecture versus evaluating single system architecture, including classifications of relevant attributes in product line architecture evaluation, new evaluation techniques. Graaf et al [70] present a scenario based software product line evaluation technique, which provides guidelines to adapt scenario-based assessment to software product line context. Using the qualitative technique of software architecture evaluation Hoek et al [71] put forward service utilization metrics to assess the quality attribute of software product line architecture. Zhang et al [72] study the impact of variants on quality attributes using a Bayesian Belief Network (BBN) and design a
Software Product Line Engineering: The Future Research Directions
85
methodology applicable to software product line architecture evaluation. Lange and Kang [73] propose a product-line architecture prototyping approach using network technique to assess issues related to software product line architecture evaluation. Gannod and Lutz [74] define an approach to evaluating the quality and functional requirements of software product line architecture. Niemelä et al. [75] discuss the basic issues of product family architecture development and present evaluation model of software product family in industrial setting. Domain engineering has a pivotal role in the process of software product line. The inception phase of software product line starts with conducting a comprehensive domain engineering in defining and narrowing down the scope of product line, which identifies the characteristics of the product line and the products that comprise the product line. The product line engineering envisages the domain engineering into set of three activities: domain analysis, domain design and domain implementation. Domain analysis concentrates on understanding the domain and providing a foundation to domain design, which is an early sketch of the architecture of product line. Domain analysis not only defines the boundaries of the software product line scope but also helps in performing the commonality and variability analysis for the product line. Domain implementation further helps in developing the core architecture of software product line by specifying components and their inter-connections. The activities of domain engineering invariably helps in carrying out commonality and variability analysis. The domain engineering helps in defining the common and variable parts of the software product line requirements, thus explicitly identifying the commonality and variability of the envision products. The software product line requires a strong coordination among domain engineering and application engineering. The domain engineering helps in establishing an infrastructure for software product line and the application engineering uses the infrastructure and develops products using core assets. Requirements modeling provide us with the facility to model the requirements graphically so that requirements can easily be understood by various stakeholders Requirements modeling helps in understanding the requirements of the products and in further elaborating the functionalities and tradeoffs. Software product line needs to elaborate the requirements at two levels: product line level and individual product level. The product line level requirements envisage the commonality among products whereas individual product level requirements represent the variability. Modeling requirements in the context of software product line architecture helps in identifying and specifying the extension points called variation points. It decomposes and specifies the architecture into set of features with their dependency. Requirements models translate the requirements of the targeted market segment and specify the implementation views of the business case. Much of the work on requirements modeling for software product line has concentrated on establishing an extension in the current available modeling techniques like UML and feature diagrams. Product requirements in software product line are composed of a constant and a variable part. The constant part comes from product line requirements and deals with features common to all the products belonging to a family. The variable part represents those functionalities that can be changed to differentiate one product from another. This causes the significance of commonality and variability management in software product line. Commonality among products of a software product line is an essential and integral characteristic of product line approach that paves a way to maximize reusability. The products share the common architecture and they are developed out of common core assets. The commonality management takes much if its’ input from domain engineering and those inputs are further
86
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
elaborated and clearly specified using requirements modeling approaches. The extent of commonality among products is a design decision based on business case engineering and targeted market segment. In order to maximize the reusability of software assets, it is generally recommended to have as much commonality as possible. Variability among products of a software product line is necessary because it makes them a separate business entity. The products from a software product line may vary from each other’s in quality, reliability, functionality, performance and so on, but as they share a common architecture so the variation should not be that much high so that they become out from the scope of a single product line. Those variations must be handled systematically to accommodate changes in various versions of the product. The objective of variability management is to identify, specify and document variability among products in the applications of product line. Software product line architecture represents variability by specifying the variation points, which can be exploited at application engineering level by accommodating the design decisions based on the business case. The variability in products usually results from internal and external factors. The internal factors have their roots in refining the architecture whereas external factors accommodate the market and customers expectations. The introduction of variable features in a product from a software product line is a strategic decision based on market segment. The introduction of variable features in the successive products out of product line also provides a justification for setting up a product line in the organization as well because it helps in attracting new customer and retaining the current one. Fitting the components into the product without tailoring it is the easiest task, but some time we need to make certain changes in the component to meet the requirements for a particular product. Every component present in the core assets must clearly define the variability mechanism to be used in order to tailor them for reuse. The significance of commonality and variability management in software product line architecture and the overall performance of the software product line require tool support, which needs the attention of researchers. Software artifacts management play significant role in the process of development, maintenance and reuse of software. Software product line architecture is one of the critical artifacts of software product line approach, and all the resulting products share this common architecture. The architectural artifacts provide in-depth knowledge about various views, levels of abstractions, variation points, components identification, component behavior and their inter-connection. It has been a general trend in software industry to represent and document architecture using notations and languages such as Architecture Description Language (ADL). Software product lines currently lack an architecture description language to represent the software product line architecture in large. These documentations such as domain analysis, domain design, domain testing, requirements modeling provides inputs to software product line architecture. The configuration management issues of software product line artifacts are imperative in software product lines as it deals with a number of resulted products with different versions and releases as well as several number of core assets with different versions. The concept of configuration management currently used in software industry deals with a single project, or more precisely with a single product, and on the opposite software product line deals with a set of products. Therefore a multi dimensional approach of configuration management should be adopted to cope up with the issue. Configuration management of software product line is a research area where not much work has been done and requires an immediate attention of researchers.
Software Product Line Engineering: The Future Research Directions
87
Quality is a major issue for family of products. Like a single product, software quality is fundamental to a family of products’ success. Core and product architectures of family of products are expected to help achieve the required quality attributes. However, one of the key challenges in designing software architectures for core and individual products with respect to the desired level of different quality attributes is that the quality attributes have been found very hard to define, describe and understand. This aspect has very strong subjective interpretation. That is why it is vital to systematically elicit and precisely define quality aspects of a family of products in order to help design appropriate architectures. There are a number of classifications of quality attributes. McCall listed a number of classifications of quality attributes developed by software engineering researchers including himself [76]. A later classification of software quality is provided in [77]. However, none of them has been proven sufficient to define, specify, and model different levels of quality attributes required in different products of a family. There is a vital need for developing appropriate approaches to eliciting, specifying, and modeling quality attributes to be supported by software architectures of a family of products. Designing and evaluating software architectures of a family of systems involves complex and knowledge intensive tasks. The complexity lies in the fact that tradeoffs need to be made to satisfy current and future requirements of a potentially large set of stakeholders, who may have competing vested interests in architectural decisions. The knowledge required to make suitable architectural choices is broad, complex, and evolving, and can be beyond the capabilities of any single architect [78]. Due to the recognition of the importance and far reaching influence of the architectural decisions, several approaches have been developed to support architecting processes. Examples are the Generic Model for architecture design [79], Attribute-driven design [80], Architecture Tradeoff Analysis Method (ATAM) [81], 4+1 views [82], Rationale Unified Process (RUP) [83] and architecture-based development [84]) While these approaches help to manage complexity by using systematic approaches to reason about various design decisions, they provide very little guidance or support to capture and maintain the details on which design decisions are based, along with explanations of the use of certain types of design constructs (such as patterns, styles, or tactics). Such information represents architecture knowledge, which can be valuable throughout the software development lifecycle [85]. We assert that the lack of a systematic approach to capturing and sharing architectural knowledge may preclude organizations from growing their architecture capability and reusing architectural assets. Moreover, the knowledge concerning the domain analysis, architectural patterns used, design alternatives evaluated and design decisions made is implicitly embedded in the architecture and/or becomes tacit knowledge of the architect [86]. Hence, one of the key challenges in successfully development and evolving software architectures is the provision of suitable infrastructure for capturing, maintaining, and sharing architectural knowledge and rationale underpinning key architectural design decisions. Apart from the challenge of devising optimal architectural solutions, specifying the architecture and interfaces of component-based family of systems is a difficult task, which poses several kinds of challenges. For example, industries heavily dependent upon on the component-based software engineering, like automotive, Usually OEMs
88
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
(Original Equipments Manufacturers) have to provide an overall architecture of the automotive systems in its cars and distribute these to potential suppliers of systems and components who do the implementation. The AUTOSAR standard is a move to establish an open standard for automotive embedded electronic architecture. AUTOSAR tries to achieve modularity, scalability transferability and reusability of functions. However, even if the architecture and components are specified using AUTOSAR, there is still no checking of conformance or conformance validation. We assert that there is a need for specific methods and tools to validate that those implementations actually conform to the specifications and that the combination of the various implementations conforms to the OEMs’ specifications. Architecture and interface specification is another big challenge in software product line engineering in general and software product line engineering for automotive systems in particular. There is general lack of suitable and reliable methods to accurately and sufficiently provide interface specifications. This is also one of the key research challenges in the context of increasing trend of global software development.
References [1]
P.C. Clements, Jones, L.G., Northrop, L.M. and McGregor, J.D.: Project Management in a Software Product Line Organization. IEEE Software 22 (5) 54-62 (2005) [2] P.C. Clements: On the Importance of Product Line Scope, Proceedings of the 4th International Workshop on Software Product Family Engineering 69-77 (2001) [3] T. Wappler, Remember the basics: key success factors for launching and institutionalizing a software product line, in: Proceedings of the 1st Software Product Line Conference, 2000, pp. 73-84. [4] P.C. Clements, L.M. Northrop, Software product lines practices and pattern, Addison Wesley, 2002. [5] F. van der Linden, Software product families in Europe: The Esaps & Café projects, IEEE Software 19 (4) (2002) 41-49. [6] ESAPS Project (1996) available from: http://www.esi.es/en/Projects/esaps/ overview.html [7] R.V. Ommering, Beyond product families: building a product population, in: Proceedings of the Conference on Software Architectures for Product Families, 2000, pp.187-198. [8] L.M Northrop, SEI’s software product line tenets, IEEE Software 19 (4) (2002) 32-40. [9] C.W. Krueger (2004) Basic software product line concepts, Available from:http://www.softwareproductlines.com/introduction/concepts.html [10] [J. Bayer, O. Flege, P. Knauber, R. Laqua, D. Muthig, K. Schmid, T. Widen, J.M. DeBaud, PuLSE: a methodology to develop software product lines, in: Proceedings of the 5th ACM SIGSOFT Symposium on Software Reusability, 1999, pp. 122-131. [11] F. van der Linden J. Bosch, E., Kamsties, K. Känsälä, H. Obbink, Software product family evaluation, in: Proceedings of the 3rd International Conference on Software Product Lines, 2004, pp. 110-129.
Software Product Line Engineering: The Future Research Directions
89
[12] K.C. Kang, P. Donohoe, E. Koh, J. Lee, K. Lee, Using a marketing and product plan as a key driver for product line asset development, in: Proceedings of the 2nd International Conference on Software Product Lines, 2002, pp.366-382. [13] P. Toft, D. Coleman, J. Ohta, A cooperative model for cross-divisional product development for a software product line, in: Proceedings of the 1st International Conference on Software Product Lines, 2000, pp. 111-132. [14] C. Fritsch, R. Hahn, Product line potential analysis, in: Proceedings of the 3rd International Conference on Software Product Lines, 2004, pp. 228-237. [15] K. Schmid, M. Verlage, The economic impact of product line adoption and evolution, IEEE Software 9(4) (2002) 50-57. [16] C. Ebert, M. Smouts, Tricks and traps of initiating a product line concept in existing products, in: Proceedings of the 25th International Conference on Software Engineering, 2003, pp. 520-525. [17] E. Niemelä, Strategies of product family architecture development, in: Proceedings of the 9th International Conference on Software Product Lines, 2005, pp. 186-197. [18] P.D. Bennett, Dictionary of marketing terms, American Marketing Association, 1988. [19] A. Jaaksi, Developing mobile browsers in a product line, IEEE Software 19(4) (2002) 73-80. [20] G. H Birk, John, I. Schmid, K. von der Massen T., Muller K., Product line engineering, the state of the practice, IEEE Software 20(6) (2003) 52-60. [21] P. Knauber, D. Muthig, K. Schmid, T. Wide, Applying product line concepts in small and medium-sized companies, IEEE Software 17(5) (2000) 88-95. [22] J.G. Wijnstra, Critical factors for a successful platform-based product family approach, in: Proceedings of the 2nd International Conference on Software Product Lines, 2002, pp. 68-89. [23] G. Böckle, Innovation management for product line engineering organizations, in: Proceedings of the 9th International Conference on Software Product Lines, 2005, pp. 124-134. [24] W. R. Scott, Institutions and organizations, Sage Publications, CA, 1995. [25] J. Bosch, Software product lines: organizational alternatives, in: Proceedings of the 23rd International Conference on Software Engineering, 2001, pp. 91-100. [26] R.R.. Macala, L.D. Jr. Stuckey and D.C. Gross, Managing domain-specific, productline development, IEEE Software 13 (3) (1996) 57-67. [27] D. Dikel, D. Kane, S. Ornburn, W. Loftus and J. Wilson, Applying software productline architecture, IEEE Computer 30 (8) (1997) 49-55. [28] I. Jacobsen, M. Griss and P. Jonsson, Software reuse - architecture, process and organization for business success, Addison Wesley, 1997. [29] M. Mannion, Organizing for software product line engineering, in: Proceedings of the 10th International Workshop on Software Technology and Engineering Practice, 2002, pp. 55 –61. [30] E. Koh and S. Kim, Issues on adopting software product line, in: Proceedings of the 11th Asia-Pacific Conference on Software Engineering, 2004, pp. 589. [31] G. H Birk, I. John, K. Schmid, T. von der Massen and K. Muller, Product line engineering, the state of the practice, IEEE Software 20 (6) (2003) 52-60.
90
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
[32] J. Bayer, O. Flege, P. Knauber, R. Laqua, D. Muthig, K. Schmid, T. Widen and J.M. DeBaud, PuLSE: a methodology to develop software product lines, in: Proceedings of the 5th ACM SIGSOFT Symposium on Software Reusability, 1999, pp. 122-131. [33] M. Verlage and T. Kiesgen, Five years of product line engineering in a small company, in: Proceedings of the 27th International Conference on Software Engineering, 2005, pp. 534 – 543. [34] E. H. Schein, Organizational psychology, Prentice Hall, 1988. [35] D.C. Wilson and R.H. Rosenfeld, Managing organizations, McGraw-Hill, 1990. [36] J.R. Gordon, Organizational Behavior: A diagnostic approach, Prentice Hall, New Jersey, 2002. [37] R. Beckhard, and R.T. Harris, Organizational transitions: managing complex change, Addison-Wesley, 1987. [38] A. Todd, Managing radical change, Long Range Planning 32 (2) (1999) 237-44. [39] G. Cao, S. Clarke, and B. Lehaney, A systematic view of organizational change and TQM, The TQM Magazine 12 (3) (2000) 186-93. [40] J.A. Walls and R.R. Callister, Conflict and its management, Journal of Management 21 (3) (1995) 515-558. [41] J. Kottler, Beyond blame: A new way of resolving conflicts in relationships, JosseyBass, San Francisco, 1994. [42] D. Hellriegel, J.W. Jr. Slocum, R.W. Woodman and N.S. Bruning, Organizational behavior, ITP Nelson, Canada, 1998. [43] K.A. Jehn, A multi-method examination of the benefits and detriments of intra-group conflict, Administrative Science Quarterly 40 (1995) 256-82. [44] F. J. Medina, L. Munduate, M.A. Dorado and I. Martínez, Types of intra-group conflict and affective reactions, Journal of Managerial Psychology 20 (3/4) (2005) 219-230. [45] M. Verlage and T. Kiesgen, Five years of product line engineering in a small company, in: Proceedings of the 27th International Conference on Software Engineering, 2005, pp. 534 – 543. [46] D. Garlan and D. Perry, Introduction to the special issue on software architecture, IEEE Transactions on Software Engineering 21(4) (1995), pp. 269-274. [47] B.J. Pronk, An interface-based platform approach, in: Proceedings of the 1st Software Product Lines Conference, 2000, pp. 331-352. [48] M. Jazayeri, A. Ran, and F. van der Linden, Software architecture for product families: principles and practice, Addison Wesley, 2000. [49] K. Mika, M. Tommi, Assessing systems adaptability to a product family, Journal of Systems Architecture 50 (2004) 383-392. [50] D. Dikel, D. Kane, S. Ornburn, W. Loftus and J. Wilson, Applying software productline architecture, IEEE Computer 30 (8) (1997) 49-55. [51] M. Verlage and T. Kiesgen, Five years of product line engineering in a small company, in: Proceedings of the 27th International Conference on Software Engineering, 2005, pp. 534-543. [52] J. Bayer, O. Flege, P. Knauber, R. Laqua, D. Muthig, K. Schmid, T. Wide and J.M. DeBaud, PuLSE: a methodology to develop software product lines, in: Proceedings of the 5th ACM SIGSOFT Symposium on Software Reusability, 1999, pp. 122-131. [53] P. Knauber, D. Muthig, K. Schmid and T. Wide, Applying product line concepts in small and medium-sized companies, IEEE Software 17(5) (2000) 88-95.
Software Product Line Engineering: The Future Research Directions
91
[54] P. America, H. Obbink, R. van Ommering and F. van der Linden COPA: a componentoriented platform architecting method family for product family engineering, in: Proceedings of the 1st Software Product Line Engineering Conference, 2000, pp. 167180. [55] D.M. Weiss and C.T.R. Lai, Software product line engineering: a family based software development process, Addison Wesley, 1999. [56] C. Atkinson, J. Bayer, and D. Muthig, Component-based product line development. The KobrA approach, Proceedings of the 1st Software Product Lines Conference, 2000, pp. 289-309. [57] K. C. Kang, S. Kim, J. Lee, K. Kim, E. Shin, and M. Huh, FORM: a feature-oriented reuse method with domain specific reference architectures, Annals of Software Engineering, 5 (1998) 143-168. [58] M. Matinlassi, Comparison of software product line architecture design methods: COPA, FAST, FORM, KobrA and QADA, in: Proceedings of the 26th International Conference on Software Engineering, 2004, pp.127-136. [59] J. Coplien, D. Hoffman, D. Weiss, Commonality and variability in software engineering, IEEE Software 15 (6) (1998) 37-45. [60] W. Lam, Creating reusable architectures: an experience report, ACM Software Engineering Notes 22(4) (1997) 39-43 [61] J.M. Thompson and M. P.E. Heimdahl, Structuring product family requirements for ndimensional and hierarchical product lines, Requirements Engineering Journal 8(1) (2003) 42-54. [62] M. Kim and S. Park, Goal and scenario driven product line development, in: Proceedings of the 11th Asia-Pacific Conference on Software Engineering, 2004, pp. 584 –585. [63] R. van Ommering, Software reuse in product populations, IEEE Transactions on Software Engineering 31(7) (2005) 537-550. [64] R.R. Macala, L.D. Jr. Stuckey and D.C. Gross, Managing domain-specific, productline development, IEEE Software 13 (3) (1996) 57-67. [65] H. Gomma, M.E. Shin, Multiple-view meta modeling of software product lines, in: Proceedings of the 8th IEEE International Conference on Engineering of Complex Computer Systems, 2002, pp. 238-246. [66] H. Zuo, M. Mannion, D. Sellier, and R. Foley, An extension of problem frame notation for software product lines, in: Proceedings of the 12th Asia Pacific Conference on Software Engineering, 2005, pp. 499-505. [67] L. Dobrica, E. Niemelä, UML notation extensions for product line architectures modeling, in: Proceedings of the 5th Australasian Workshop on Software and System Architectures, 2004, pp. 44 -51. [68] M. Eriksson, J. Börstler and K. Borg, The PLUSS approach - domain modeling with features, use cases and use case realizations, in: Proceedings of the 9th International Conference on Software Product Lines, 2005, pp. 33-44. [69] L. Etxeberria and G. Sagardui, Product line architecture: new issues for evaluation, in: Proceedings of the 9th International Conference on Software Product Lines, 2005, 174185.
92
Faheem Ahmed, Luiz Fernando Capretz and Muhammad Ali Babar
[70] B. Graaf, H. Van Kijk,A. Van Deursen, Evaluating an embedded software reference architecture – industrial experience report, in: Proceedings of the 9th European Conference on Software Maintenance and Reengineering, 2005, pp 354-363. [71] A. van der Hoek, E. Dincel and N. Medvidovic, Using service utilization metrics to assess the structure of product line architectures, in: Proceedings of the 9th International Software Metrics Symposium, 2003, pp. 298-308. [72] H. Zhang, S. Jarzabek, and B. Yang, Quality prediction and assessment for product lines, in: Proceedings of the 15th International Conference on Advanced Information Systems Engineering, 2003, pp. 681-695. [73] F. De Lange, J. Kang, Architecture true prototyping of product lines, in: Proceedings of the 5th International Workshop on Software Product Family Engineering, 2004, pp. 445-453. [74] G.C. Gannod, R.R. Lutz, An approach to architectural analysis of product lines, in: Proceedings of the 22nd International Conference on Software Engineering, 2000, pp.548-557. [75] E. Niemelä, M. Matinlassi, A. Taulavuori, Practical evaluation of software product family architectures, in: Proceedings of the 3rd International Conference on Software Product Lines, 2004, pp. 130-145. [76] J.A. McCall, Quality Factors, in Encyclopedia of Software Engineering, J.J. Marciniak, Editor. 1994, John Wiley: New York, U.S.A. pp. 958-971. [77] ISO/IEC, Information technology - Software product quality: Quality model. ISO/IEC FDIS 9126-1:2000(E) [78] M. Ali-Babar and I. Gorton, A Tool for Managing Software Architecture Knowledge, Proceedings of the 2nd Workshop on SHAring and Reusing architectural knowledge Architecture, rationale, and Design Intent (SHARK/ADI 2007), Collocated with ICSE 2007., 2007. [79] C. Hofmeister, et al., A General Model of Software Architecture Design Derived from Five Industrial Approaches, 5th Working IEEE/IFIP Conference on Software Architecture (WICSA 05), Pittsburgh, PA, USA, 2005. [80] L. Bass, M. Klein, and F. Bachmann, Quality Attribute Design Primitives and the Attribute Driven Design Method, Proceedings of the 4th International Workshop on Product Family Engineering, 2001. [81] P. Clements, R. Kazman, and M. Klein, Evaluating Software Architectures: Methods and Case Studies. 2002: Addison-Wesley. [82] P. Kruchten, The 4+1 View Model of architecture, Software, IEEE, 1995. 12(6): pp. 42-50. [83] P. Kruchten, The Rational Unified Process: An Introduction. 2nd ed. 2000: AddisonWesley. [84] L. Bass and R. Kazman, Architecture-Based Development, Tech Report CMU/SEI-99TR-007, Software Engineering Institute (SEI), Carnegie Mellon University, Pittsburgh, USA, 1999. [85] M. Ali-Babar, I. Gorton, and B. Kitchenham, A Framework for Supporting Architecture Knowledge and Rationale Management, in Rationale Management in Software Engineering, A.H. Dutoit, et al., Editors. 2006, Springer. pp. 237-254. [86] M. Ali-Babar, I. Gorton, and R. Jeffery, Capturing and Using Software Architecture Knowledge for Architecture-Based Software Development, 5th International Conference on Quality Software, 2005.
In: Software Engineering and Development Editor: Enrique A. Belini, pp. 93-123
ISBN: 978-1-60692-146-3 © 2009 Nova Science Publishers, Inc.
Short Communication D
SOFTWARE DEVELOPMENT FOR INVERSE DETERMINATION OF CONSTITUTIVE MODEL PARAMETERS A. Andrade-Campos1,a, P. Pilvin2,b, J. Simões1,3,c and F. Teixeira-Dias1,d 1
Departamento de Engenharia Mecânica, Universidade de Aveiro Campus Universitário de Santiago, Portugal 2 Laboratoire d'Ingénierie des Matériaux de Bretagne, Université de Bretagne-Sud Rue de Saint-Maudé, France 3 Escola Superior de Artes e Design, Avenida Calouste Gulbenkian, Matosinhos, Portugal
Abstract Computer simulation software using finite element analysis (FEA) has, nowadays, reached reasonable maturity. FEA software is used in such diverse fields as structural engineering, sheet metal forming, mould industry, biomechanics, fluid dynamics, etc. This type of engineering software uses an increasingly large number of sophisticated geometrical and material models. The quality of the results relies on the input data, which are not always readily available. The aim of inverse problem software, which will be considered here, is to determine one or more of the input data relating to FEA numerical simulations. The development of numerical methodologies for inverse determination of material constitutive model parameters will be addressed in this chapter.Inverse problems for parameter identification involve estimating the parameters for material constitutive models, leading to more accurate results with respect to physical experiments, i.e. minimizing the difference between experimental results and simulations subject to a limited number of physical constraints. These problems can involve both hyperelastic and hypoelastic material constitutive models. The complexity of the process with which material parameters are a
E-mail address:
[email protected]. E-mail address:
[email protected]. c E-mail address: josesimoes@ua. d E-mail address:
[email protected]. b
94
A. Andrade-Campos, P. Pilvin, J. Simões et al. evaluated increases with the complexity of the material model itself. In order to determine the best suited material parameter set, in the less computationally expensive way, different approaches and different optimization methods can be used. The most widespread optimization methods are the gradient-based methods, the genetic, evolutionary and natureinspired algorithms, the immune algorithms and the methods based in neural networks and artificial intelligence. By far, the better performing methods are gradient-based but their performance is known to be highly dependent on the starting set of parameters and their results are often inconsistent. Nature-inspired techniques provide a better way to determine an optimized set of parameters (the overall minimum). Therefore, the difficulties associated to choosing a starting set of parameters for this process is minor. However, these proved to be computationally more expensive than gradient-based methods.Optimization methods present advantages and disadvantages and their performance is highly dependent on the constitutive model itself. There is no unique algorithm robust enough to deal with every possible situation, but the use of sequential multiple methods can lead to the global optimum. The aim of this strategy is to take advantage of the strength of each selected algorithm.This strategy, using gradient-based methods and evolutionary algorithms, is demonstrated for an elastic-plastic model with non-linear hardening, for seven distinct hyperelastic models (Humphrey, Martins, Mooney-Rivlin, Neo-Hookean, Ogden, Veronda-Westmann and Yeoh) and for one thermoelastic-viscoplastic hypoelastic model. The performance of the described strategy is also evaluated through an analytical approach.
Keywords: Engineering software, inverse problems, material parameter identification, optimization, gradient-based, evolutionary algorithms.
Introduction Presently, numerical simulation software assumes an important role in the design and technological development of new materials and structures. The industrial and scientific communities are already familiarized with the use of simulation software and are becoming more demanding with the precision of results. For this fact, more complex techniques have been developed to simulate with increasing accuracy the behaviour of different materials. Consequently, different constitutive behaviour models have also been presented to characterize the materials in a wider field of applications. However, many of these constitutive behaviour models demand the determination of a large number of parameters adjusted to the material whose behaviour is to be simulated. The problem of parameter identification of mathematical models, created to describe with accuracy the behaviour of physical systems, is a common problem in science and engineering. The complexity of the models (and the number of material parameters) increases with the complexity of the physical system. The determination of parameters should always be performed confronting mathematical and experimental results. However, from the moment when the number of experimental tests and parameters increases, it becomes impracticable to identify the parameters in an exploratory way [1][2]. In these cases, it is necessary to solve the problem using inverse formulations. This approach often leads to the resolution of non-linear optimization problems [3]. Different approaches and different optimization methods can be used to solve non-linear optimization problems with lower computational costs and to efficiently determine the best suited material parameter set. The most common optimization methods are the gradient-based methods followed by the genetic, evolutionary and nature-inspired algorithms. The immune
Software Development for Inverse Determination of Constitutive Model Parameters
95
algorithms and the methods based in neural networks and artificial intelligence are also often used. Without doubt, the better performing methods are gradient-based but their performance is known to be dependent on the starting set of parameters. Nature-inspired techniques provide a different way to determine an optimized set of parameters. Therefore, the difficulty of choosing a starting set of parameters for this process is minor. However, these proved to be computationally more expensive than the gradient-based method. Optimization methods have advantages and disadvantages and their performance is highly dependent on the constitutive model itself. There is no single algorithm robust enough to deal with every possible situation, but the use of multiple methods can lead to the global optimum. The robustness of an optimization algorithm can be improved introducing the cascade methodology, as considered in [4][5] and also adopted in [6][7]. The cascade methodology tries to solve a problem sequentially using a number of autonomous optimization stages. This cascade methodology becomes a very versatile and robust algorithm. Its robustness and efficiency is expected to be always superior to the optimization methods used individually in each stage. The mathematical equations that characterize the behaviour of each material are designated as constitutive models. They describe the macroscopic behaviour that results from the internal constitution of the material. Three classes of behaviour can be distinguished that can be combined in many ways [8]: (i) elastic – when the material recovers its deformation when loads are removed. If this relation is linear and homogeneous then the elasticity is called linear. (ii) plastic – this is characterized by permanent deformations, remaining even after loads are removed, provided the loads are high enough, exceeding some threshold. If the deformation before the plastic threshold is negligible, then the behaviour is called rigidperfectly plastic. (iii) viscous – the response of the material depends on the speed with which loads are applied. The constitutive model equations of solid materials as, for example, metals, try to represent these physical behaviours. Nonlinear constitutive models can be divided in three types:
Hookean model – intended to extend the linear elastic model to situations where stretches remain small, but rotations may be large. Hypoelastic model – intended to model nonlinear stress-strain behaviour but restricted to infinitesimal strains. Its main application is an approximate theory of plasticity and, then, it is used when elastic strains remain very small. Hyperelastic model – intended to be used to model materials that respond elastically up to very large strains, and account both for nonlinear material behaviour and also for nonlinear kinematics. Usually, these models are formulated with the aid of a strain energy function and strain invariants.
The main applications considered in this chapter will be seven distinct hyperelastic models, one elastic-plastic model and one thermoelastic-viscoplastic hypoelastic model, used in different types of materials. The hyperelastic models are employed to simulate the mechanical behaviour of biologic tissues. Human and animal tissues are known to show nonlinear elastic behaviour up to very large strains, demanding the use of hyperelastic models such as the ones presented by Humphrey [9], Martins [10], Mooney[11]-Rivlin [12], NeoHookean, Ogden [13], Veronda-Westmann [14] and Yeoh [15]. The neo-Hookean model
96
A. Andrade-Campos, P. Pilvin, J. Simões et al.
depends on a single parameter. However, the Ogden model has 6 parameters that need to be identified. The hypoelastic model used in this work combines the three classes of behaviour above mentioned and was developed to characterize metallic materials at room, medium and high temperatures [16]. The model is a unified constitutive model in the sense that plasticity and creep are defined simultaneously, i.e., they are described by the same set of flow rules. In this model, the microstructural state of the material is tracked via a set of variables that evolve both with the deformation and with the temperature. These internal variables may represent a specific material property, such as the resistance to plastic deformation, but are not necessarily physically measurable quantities [9][17][18]. The internal variable thermoelasticviscoplastic constitutive model has 12 material parameters that need to be determined for each material.
Parameter Identification Inverse Problems Computer simulation software using the finite element method (FEM) has proven its value and efficacy [19]. FEM software can be used in such diverse fields as structural engineering, sheet metal forming, mould industry, biomechanics, fluid dynamics, etc. As with any simulation method, FEM simulations require the introduction of several forms of input data, such as initial geometry, domain discretization (finite element mesh), boundary conditions, material constitutive models, etc. The quality of the results relies on the quality of the input data, which are not always readily available. The simulation of physical processes and its final results can be defined as a direct problem (see Figure 1). In this kind of problem there is a total knowledge of the numerical model implemented in the simulation software and the required input data. The goal of this problem is the final set of results that, for a structural problem software, can be the deformed geometry, strains, stresses and other final properties.
Input data Initial geometry Finite element mesh Boundary conditions Process parameters Constitutive model arameters
Output data SIMULATION SOFTWARE
Deformed geometry Strains and stresses Process evolution properties
Figure 1. Direct problem scheme.
Considering that direct problems are well established, it is possible to solve more difficult problems, namely inverse problems. The goal of inverse problems is to determine one or a set of the direct problem input data. Generally, it consists in providing information to the numerical model from the full knowledge of the physical process. In a first approach, the analysis of the experimental observations leads to the association of a mathematical model that would be implemented in the simulation software. These experimental values can then be used to determine the different parameters and coefficients
Software Development for Inverse Determination of Constitutive Model Parameters
97
needed for the mathematical formulation of the model. This can be accomplished solving an inverse problem which consists of searching for a set of parameter values for which the comparison between the experimental reality and the simulation may be considered satisfactory. This methodology is schematically described in Figure 2. The comparison between the physical system data and the numerical data results in a cost function that must be evaluated. Occasionally, it is possible to obtain several distinct sets of parameters for which the comparison is acceptable. When this is the case, it is the role of the user to evaluate the results obtained considering the physical definition and meaning of each parameter. The stages of decision and evolution to a new set of parameters, considering the value of the cost function and the parameter of the previous steps, are done using optimization algorithms and software. In order to improve the process of finding suitable set of parameters at acceptable CPU time, these stages must be accomplished with efficient optimization algorithms.
Input data Initial geometry Finite element mesh Boundary conditions Process parameters Initial const. model parameters
Output data SIMULATION SOFTWARE
New input data New Constitutive model parameters
Deformed geometry Strains and stresses Process evolution properties
Comparison OPTIMIZATION SOFTWARE
Numerical data versus Experimental data
Figure 2. Inverse problem scheme.
The development of software for the determination of the constitutive model parameters started in the beginning of the 1990s using straightforward curve-fitting techniques (vd, e.g. [19][20]). The coupling between engineering simulation software and optimization software was introduced some years after. The works of Chaboche [21], Schnur and Zabaras [22], Cailletaud and Pilvin [3][23], Mahnken and Stein [24][25], Gélin and Ghouati [26][27] are examples. Presently, there are FEM software that offer modules to determine constitutive model parameters when experimental input data is given (e.g. Abaqus [28], Nastran [29], among others).
Definition Of The Objective Function To efficiently determine model parameters, the definition of the error function between the experimental and numerical results is required. This function will be the optimization
98
A. Andrade-Campos, P. Pilvin, J. Simões et al.
objective function and should be able to efficiently conduct the optimization process. A wrong definition of this function will compromise all the optimization process and, therefore, the determination of parameters for the constitutive model. An objective function (OF) should respect the following criteria [30]:
The errors of the involved experimental data should be removed. All the experimental data points on a single curve should be considered in the optimization and have an equal opportunity to be optimized. All experimental curves should have equal opportunity to be optimized and the identification process should be independent of the number of data points in each experimental curve. An objective function should be able to deal with multi-sub-objective problems, in which the units of sub-objectives may be different and all the sub-objectives should have an equal opportunity to be optimized. Different units and/or the number of curves in each sub-objective should not affect the overall performance of the fitting. The weighting factors used in the objective function should be found automatically.
It should be noted that experimental data consists of several discrete values representing measured points, leading to an experimental curve. In the specific case of structural material models, the experimental data is a set of stress-strain points, defining a stress-strain curve. One of the most used objective error function in engineering software is the sum of the squares of the stress difference at various strains. This function can be mathematically expressed by 2
⎡σ ijsim ( A) − σ ijexp ⎤ S ( A) = ∑∑ ⎢ ⎥ , Dij j =1 i =1 ⎣ ⎢ ⎦⎥ M
Nj
(1)
where A is the vector containing the values of the material parameters, M is the number of curves, Nj is the number of experimental points in each curve and D is a matrix whose coefficients are chosen taking into account the uncertain nature of some observed variables. σsim and σexp are the numerical and experimental values, respectively. Variations of equation 1 were proposed by Mahnken and Stein [24] and by Ghouati and Gélin [26], adding a second term related to stabilization terms and constraints. Gavrus et al. [31] proposed a similar objective function but with an automatic experimental error matrix D, defined as
(
N ⎧⎪ σ kexp ∑ =1 k Di = ⎨ ⎪⎩σ iexp
)
2
if experimental errors are constants if errors are proportional to σ iexp .
(2)
Ponthot and Kleinermann [19] use a relative objective function for the determination of metal forming constitutive models. These authors have shown that a process using relative objective functions reaches the optimum value faster than a process using absolute objective functions. They used the following function:
Software Development for Inverse Determination of Constitutive Model Parameters
M
S ( A) = ∑ j =1
1 Nj
sim exp ωi σ ij ( A) − σ ij ∑ σ ijexp i =1 Ω Nj
ωi is the weight given to the ith experimental point and Ω =
99
2
(3)
∑
N
i =1
ωi is the sum of all
attributed weights. a b = a b if b>10-12 and a b = a if b 0.25, Ponthot and Kleinermann’s OF matches the OF proposed by Cao and Lin. Both OF define relative errors. Therefore, when the values of εexp increase the absolute differences are minimized, leading to smaller relative errors. This can influence the optimization process considering that some data points will not have an equal opportunity to be optimized. The
100
A. Andrade-Campos, P. Pilvin, J. Simões et al.
data points with smaller εexp will have greater weight in the OF and, consequently, in the optimization process. 340
Experimental datal OF square of the differences OF by Cao and Lin
290
1000
Simulation data OF by Ponthot and Kleidermann
100
1 190
σ
0.1
140
Value of OF
10
240
0.01 90
0.001
40 -10
0.0001
0
0.1
0.2
0.3
0.4
ε
0.5
0.6
0.00001
Figure 3. Comparison of different objective functions (OF). Final value of the objective function (i) Square of the differences =2117.12; (ii) Ponthot and Kleinermann = 2.79; (iii) Cao and Lin = 0.622; (iv) RQUAD=0.997956; Bias=3.59;
The OF defined in equation 1 has the huge advantage of simplicity and agreement to the second∗ criteria. However, if multi-sub-objectives are used, it is necessary to add a coefficient to level the different units of each sub-objective. In economics, statistics and spreadsheet software, the RQUAD function is generally used to correlate different curves. The RQUAD function defines the square of the Pearson product and can be written as
∑ [σ N
RQUAD = 1 −
i =1
N
∑σ i =1
exp i
sim i
−
( A) − σ iexp
(∑
N i =1
σ
exp i
)
]
2
2
(7)
N
This OF is used often in engineering software but it is a good correlation indicative. Note that 0 3. This property specifies
Debugging Concurrent Programs Using Metaheuristics
197
an invariant: for a concurrent system to fulfil this property the variable x must always be greater than 3. This property consists of all the executions σ in which the variable x is greater than 3 for all the states of σ. Other two temporal operators are ♦ (read “eventually”) and U (read “until”). A concurrent system satisfies the formula ♦p if for each execution the atomic proposition p is true in some state (eventually in the future p is true). A concurrent system satisfies the formula p U q if for each execution there exists one state s in which q is true and for each state before s the atomic proposition p is true.
2.2.
Property Automaton and Checking
Checking if a concurrent system M satisfies a given property P is equivalent to check the validity of the following proposition: ∀σ ∈ S ω : σ ∈ M → σ ∈ P ,
(3)
which is equivalent to the following one /P . ¬∃σ ∈ S ω : σ ∈ M ∧ σ ∈
(4)
In explicit state model checking the concurrent system M and the property P are represented by finite state ω-automata, A(M ) and A(P ) respectively, that accept those executions they contain. In the context of ω-automata, an execution is accepting if there is an accepting state that occurs infinitely often in the execution. Using A(M ) and A(P ), the check of Equation (4) is translated to check if the intersection automaton A(M ) ∩ A(P ) accepts no execution. In HSF-SPIN (and SPIN) the automaton A(P ), which captures the violations of the property, is called never claim. When a property P is expressed with an LTL formula ϕ the never claim A(P ) is the same as the automaton A(¬P ) associated to the LTL formula ¬ϕ. This never claim can be automatically computed from ¬ϕ using a translation algorithm [21]. In order to find a violation of a given LTL property, HSF-SPIN explores the intersection (or synchronous product) of the concurrent model and the never claim, A(M ) ∩ A(P ), also called B¨uchi automaton. The intersection automaton is computed on-the-fly as the exploration progresses. One state of the intersection automaton is a pair (s, t) where s is a state of the automaton associated to the concurrent system A(M ) and t is a state of the never claim A(P ). In the intersection automaton, one state (s0, t0 ) is successor of (s, t) if s0 is successor of s in A(M ), t0 is successor of t in A(P ), and the propositional formula in the arc (t, t0) of A(P ) is true when it is evaluated using the values of the variables in the state s. As an illustration, in Fig. 1 we show the automaton of a simple concurrent system (left box), the never claim used to check the LTL formula (p → ♦q) (which means that an occurrence of p is always followed by an occurrence of q, not necessarily in the next state), and the synchronous product of these two automata. HSF-SPIN searches in the B¨uchi automaton for an execution σ = αβ ω composed of a partial execution α ∈ S ∗ and a cycle of states β ∈ S ∗ containing an accepting state. If such an execution is found it violates the liveness component of the property and, thus, the whole property. During the search, it is also possible to find a state in which the end state of the never claim is reached (if any). This means that an execution has been found that
198
Francisco Chicano and Enrique Alba Concurrent system 1 {p} 2
{p} 3
{} {}
Never claim
{p, q} 4
Synchronous product
true
×
1,a
q a
b q!p
=
2,a
1,b 3,a
4,a
2,b
3,b
4,b
Figure 1. Synchronous product of a simple concurrent system and a never claim. violates the safety component of the property and the partial execution α ∈ S ∗ that leads the model to that state violates the property 2 . In HSF-SPIN and SPIN model checkers the search can be done using the Nested Depth First Search algorithm (NDFS) [25]. However, if the property is a safety one (the liveness component is true) the problem of finding a property violation is reduced to find a partial execution α ∈ S ∗, i.e., it is not required to find an additional cycle containing the accepting state. In this case classical graph exploration algorithms such as Breadth First Search (BFS), or Depth First Search (DFS) can be used for finding property violations. These classical algorithms cannot be used when we are searching for general property violations (as we do in this chapter) because they are not designed to find the cycle of states β above mentioned. A special case is that of deadlocks. The absence of deadlock is a safety property, however it cannot be expressed with an LTL formula [19]. Thus, when the objective of a search is to check the absence of deadlocks no never claim is used in HSF-SPIN. Instead, a state with no successors is searched using DFS.
2.3.
Strongly Connected Components
In order to improve the search for property violations it is possible to take into account the structure of the never claim. The idea is based on the fact that a cycle of states in the B¨uchi automaton entails a cycle in the never claim (and in the concurrent system). For an illustration, let us focus again on Fig. 1. The cycle ( 1a, 2a, 4a, 1a) of the B¨uchi automaton is mapped into the cycle (a, a) of the never claim and the cycle (1, 2, 4, 1) of the concurrent system. We can observe, that it is not possible to find a cycle in the B¨uchi automaton in which the letters alternate between a and b because there is no such as cycle in the never claim. For improving the search first we need to compute the strongly connected components (SCCs) of the never claim. A strongly connected subgraph G = (V, A) of a directed graph is a subgraph in which for all pairs of different nodes u, v ∈ V there exist two paths: one from u to v and another one from v to u. The strongly connected components of a directed graph are its maximal strongly connected subgraphs. Once we have the SCCs of the never claim we have to classify them into three categories depending on the accepting cycles they include. By an N-SCC, we denote an SCC in which no cycle is accepting. A P-SCC is an SCC in which there exists at least one accepting cycle and at least one non-accepting cycle. 2
A deeper explanation of the foundations of the automata-based model checking can be found in [15] and [26].
Debugging Concurrent Programs Using Metaheuristics
199
Finally, a F-SCC is an SCC in which all the cycles are accepting [20]. As an illustration we show in Fig. 2 a never claim with three SCCs, one of each kind.
N-SCC a P-SCC
b c
e
F-SCC f
g
d
Figure 2. A never claim with an N-SCC, a P-SCC, and an F-SCC. We omitted the labels associated to the arcs for clarity. As we mentioned above, all the cycles found in the B¨uchi automaton have an associated cycle in the never claim, and, according to the definition of SCC, this cycle is included in one SCC of the never claim. Furthermore, if the cycle is accepting (which is the objective of the search) this SCC is necessarily a P-SCC or an F-SCC. The classification of the SCCs of the never claim can be used to improve the search for property violations. In particular, the accepting states in an N-SCC can be ignored 3, and the cycles found inside an F-SCC can be considered as accepting. In HSF-SPIN, there is an implementation of an improved version of NDFS called Improved Nested Depth First Search (INDFS) that takes into account the classification of the SCCs of the never claim [20].
2.4.
Partial Order Reduction
Partial order reduction (POR) is a method that exploits the commutativity of asynchronous systems in order to reduce the size of the state space. The interleaving model in concurrent systems imposes an arbitrary ordering between concurrent events. When the automaton of the concurrent system is built, the events are interleaved in all possible ways. The ordering between independent concurrent instructions is meaningless. Hence, we can consider just one ordering for checking one given property since the other orderings are equivalent. This fact can be used to construct a reduced state graph hopefully much easier to explore compared to the full state graph (original automaton). We use here a POR proposal based on ample sets [27]. Before giving more details on POR, we need to introduce some terminology. We call a partial function γ : S → S a transition. Intuitively, a transition corresponds to one instruction in the program code of the concurrent model. The set of all the transitions that are defined for state s is denoted with enabled(s). According to these definitions, the set of successors of s must be T (s) = {γ(s)|γ ∈ enabled(s)}. In short, we say that two transitions γ and δ are independent when 3
An accepting state can be part of an N-SCC if the state has no loop and it is the only state of the N-SCC.
200
Francisco Chicano and Enrique Alba
they do not disable one another and executing them in either order results in the same state. That is, for all s if γ, δ ∈ enabled(s) it holds that: 1. γ ∈ enabled(δ(s)) and δ ∈ enabled(γ(s)) 2. γ(δ(s)) = δ(γ(s)) Let L : S → 2AP be a function that labels each state of the system automaton A(M ) with a set of atomic propositions from AP . In the automaton of a concurrent system, this function assigns to each state s the set of propositions appearing in the LTL formula that are true in s. One transition γ is invisible with respect to a set of propositions AP 0 ⊆ AP when its execution from any state does not change the value of the propositional variables in AP 0 , that is, for each state s in which γ is defined, L(s) ∩ AP 0 = L(γ(s)) ∩ AP 0 . The main idea of ample sets is to explore only a subset ample(s) ⊆ enabled(s) of the enabled transitions of each state s such that the reduced state space is equivalent to the full state space. This reduction of the state space is performed on-the-fly while the graph is generated. In order to keep the equivalence between the complete and the reduced automaton, the reduced set of transitions must fulfil the following conditions [15]: • C0: for each state s, ample(s) = ∅ if and only if enabled(s) = ∅. • C1: for each state s and each path in the full state graph that starts at s, a transition γ that is dependent on a transition δ ∈ ample(s) cannot be executed without a transition in ample(s) occurring previously. • C2: for each state s, if enabled(s) 6= ample(s) then each transition γ ∈ ample(s) is invisible with respect to the atomic propositions of the LTL formula being verified. • C3: a cycle is not allowed if it contains a state in which some transition γ is enabled but never included in ample(s) for any state s of the cycle. The first three conditions are not related to the particular search algorithm being used. However, the way of ensuring C3 depends on the search algorithm. In [27] three alternatives for ensuring that C3 is fulfilled were proposed. From them, the only one that can be applied to any possible exploration algorithm is the so-called C3static and this is the one we use in our experiments. In order to fulfil condition C3static , the structure of the processes of the model is statically analyzed and at least one transition on each local cycle is marked as sticky. Condition C3static requires that states s containing a sticky transition in enabled(s) be fully expanded: ample(s) = enabled(s). This condition is also called c2s in a later work by Boˇsnaˇcki et al. [7].
2.5.
Using Heuristic Information
In order to guide the search to the accepting states, a heuristic value is associated to each state of the transition graph of the model. Different kinds of heuristic functions have been defined in the past to better guide exhaustive algorithms. In [23] structural heuristics are introduced that attempt to explore the structure of a program in a way conducive to find errors. One example of this kind of heuristic information is code coverage, a well known
Debugging Concurrent Programs Using Metaheuristics
201
metric in the software testing domain. Another example is thread interleaving, in which states yielding a thread scheduling with many context changes are rewarded. Unlike structural heuristics, property-specific heuristics [23] rely on features of the particular property checked. Formula-based heuristics, for example, are based on the expression of the LTL formula checked [19]. Using the logic expression that must be false in an accepting state, these heuristics estimate the number of transitions required to get such an accepting state from the current one. Given a logic formula ϕ, the heuristic function for that formula Hϕ is defined using its subformulae. In this work we use a formula-based heuristic shown in Table 1 that is defined in [19]. Table 1. Formula-based heuristic function ϕ true f alse p a⊗b ¬ψ ψ∨ξ ψ∧ξ f ull(q) empty(q) 2*q?[t]
Hϕ (s) H ϕ (s) 0 ∞ ∞ 0 if p then 0 else 1 if p then 1 else 0 if a ⊗ b then 0 else 1 if a ⊗ b then 1 else 0 H ψ (s) Hψ (s) min{Hψ (s), Hξ (s)} H ψ (s) + H ξ (s) Hψ (s) + Hξ (s) min{H ψ (s), H ξ (s)} capa(q) − len(q) if f ull(q) then 1 else 0 len(q) if empty(q) then 1 else 0 minimal prefix of q if head(q) 6= t then 0 else without t maximal prefix of t’s i@s Di (pci , s) if pci = s then 1 else 0 ψ, ξ: formulae without temporal operators p: logic proposition a, b: variables or constants ⊗: relational operator (=, 6=, ) q: queue capa(q): capacity of queue q len(q): length of queue q head(q): message in the head of queue q t: tag of a message i: process s: state of a process automaton pci : current state of process i in its corresponding automaton Di (u, v): minimum number of transitions for reaching v from u in the local automaton of process i Hϕ (s): Heuristic function for formula ϕ. H ϕ (s): Heuristic function for formula ¬ϕ.
For searching for deadlocks several heuristic functions can be used. On the one hand, the number of active processes can be used as heuristic value of a state. We denote this heuristic as Hap. On the other hand, the number of executable (enabled) transitions in a state can also be used as heuristic value, denoted with Hex . Another option consists in approximating the deadlock situation with a logic predicate and deriving the heuristic function of that predicate using the rules of Table 1 (see [19]). There is another group of heuristic functions called state-based heuristics that can be used when the objective state is known. From this group we can highlight the Hamming distance Hham and the distance of finite state machines Hf sm . In the first case, the heuristic value is computed as the Hamming distance between the binary representations of the current and the objective state. In the latter, the heuristic value is the sum of the minimum
202
Francisco Chicano and Enrique Alba
number of transitions required to reach the objective state from the current one in the local automaton of each process. We will explain this in more detail. Each process of the concurrent system has an associated local automaton. Given one state of the complete system, each process is in one state of its associated automaton. Before the execution of any search algorithm, the minimum number of transitions required to reach one state from another one is computed for each pair of states in each local automaton. With this information the Hf sm heuristic function consults for each process the minimum number of transitions to reach the objective state from the current state in the local automaton. The value returned by Hf sm is the sum of all these minimum transitions for all the processes.
3.
Problem Formalization
In this chapter we tackle the problem of searching for general property violations in concurrent systems. As we previously mentioned, this problem can be translated into the search of a path4 in a graph (the B¨uchi automaton) starting in the initial state and ending in an objective node (accepting state) and an additional cycle involving the objective node; or a path that leads to the end state of the never claim or to a deadlock state (if the property checked is the absence of deadlocks). We formalize here the problem as follows. Let G = (S, T ) be a directed graph where S is the set of nodes and T ⊆ S × S is the set of arcs. Let q ∈ S be the initial node of the graph, F ⊆ S a set of distinguished nodes that we call accepting nodes, and E ⊆ S a set of nodes that we call end nodes. We denote with T (s) the set of successors of node s. A finite path over the graph is a sequence of nodes π = π1 π2 . . . πn where πi ∈ S for i = 1, 2, . . . , n and πi ∈ T (πi−1 ) for i = 2, . . ., n. We denote with πi the ith node of the sequence and we use |π| to refer to the length of the path, that is, the number of nodes of π. We say that a path π is a starting path if the first node of the path is the initial node of the graph, that is, π1 = q. We will use π∗ to refer to the last node of the sequence π, that is, π∗ = π|π| . We say that a path π is a cycle if the first and the last nodes of the path are the same, that is, π1 = π∗ . Given a directed graph G, the problem at hand consists in finding a starting path π (π1 = q) for which one of the following propositions holds: • π ends in an accepting node and there exists a cycle ν containing the accepting node. That is, π∗ ∈ F ∧ π∗ = ν1 = ν∗ . • π ends in an end node. That is, π∗ ∈ E. The graph G used in the problem is derived from the B¨uchi automaton B (synchronous product of the concurrent system and the never claim, if any). The set of nodes S in G is the set of states in B, the set of arcs T in G is the set of transitions in B, the initial node q in G is the initial state in B, the set of accepting nodes F in G is the set of accepting states in B, and the set of end nodes E in G is the set of nodes that map into end states of the 4
We use here the traditional meaning of path. That is, a path is an alternating sequence of vertices and arcs, beginning and ending with a vertex, in which each vertex is incident to the two arcs that precede and follow it in the sequence, and the vertices that precede and follow an arc are the end vertices of that arc. We use in this chapter the word path instead of the nowadays widely used term walk with the above indicated meaning.
Debugging Concurrent Programs Using Metaheuristics
203
never claim associated to B. In the following, we will also use the words state, transition, accepting state, and end state to refer to the elements in S, T , F , and E, respectively. If the property to check is the absence of deadlocks, G is equal to B, F is empty, and E is the set of deadlock states.
4.
Algorithmic Proposal
In order to solve the previously defined problem we propose here an algorithm that we call ACOhg-mc. This algorithm is based on ACOhg, a new variant of ACO that has been applied to the search for safety errors in concurrent systems [2]. We describe ACOhg in the next section and ACOhg-mc in Section 4.2.. Finally, we describe how the improvement based on SCCs is applied to ACOhg-mc.
4.1.
ACOhg Algorithm
ACOhg is a new kind of Ant Colony Optimization algorithm proposed in [2] that can deal with construction graphs of unknown size or too large to fit into the computer memory. Actually, this new model was proposed for applying an ACO-like algorithm to the problem of searching for safety property violations in very large concurrent systems, however it is general enough to deal with other problems with similar features. In this section we give a general description of ACOhg that is independent on the problem solved and we specify the problem-dependent elements of ACOhg for our problem in the next section. The objective of ACOhg is to find a path from the initial node to one objective node from a set O in a very large exploration graph. We denote with f a function that maps the paths of the graph into real numbers. This function must be designed to reach minimum values when the shortest path to an objective node is found. ACOhg minimizes this objective function. ACO metaheuristic [17] is a global optimization algorithm inspired by the foraging behaviour of real ants. The main idea consists in simulating the ants behaviour in a graph, called construction graph, in order to search for the shortest path from an initial node to an objective one. The cooperation among the different simulated ants is a key factor in the search that is performed indirectly by means of pheromone trails, which is a model of the chemicals real ants use for their communication. The main procedures of an ACO algorithm are the construction phase and the pheromone update. These two procedures are scheduled during the execution of ACO until a given stopping criterion is fulfilled. In the construction phase, each artificial ant follows a path in the construction graph. In the pheromone update, the pheromone trails of the arcs are modified. In [2] two techniques were proposed for dealing with huge graphs when ACOhg is used: the expansion and the missionary technique. In this chapter we focus on the missionary technique because it requires less memory and the quality of the results is similar using both techniques (see [1]). Thus, in the following we will describe ACOhg using the missionary technique, however we will use the name ACOhg alone. In short, the two main differences between ACOhg and the traditional ACO variants are the following ones. First, the length of the paths traversed by ants (the number of arcs in the path) in the construction phase is limited. That is, when the path of an ant reaches a given maximum length λant the ant is stopped. Second, the ants start the path construction
204
Francisco Chicano and Enrique Alba
from different nodes during the search. At the beginning, the ants are placed on the initial node of the graph, and the algorithm is executed during a given number of steps σs (called stage). If no objective node is found, the last nodes of the best paths constructed by the ants are used as starting nodes for the ants in the next stage. In this way, during the next stage the ants try to go further in the graph (see [2] for more details). In Algorithm 1 we show the pseudocode of ACOhg. ACOhg [1] init = {initial node}; next init = ∅; τ = initializePheromone(); step = 1; stage = 1; step ≤ msteps [Ant operations]k=1 to colsize ak = ∅; ak1 = selectInitNodeRan/ O node = selectSuccessor (ak∗ , domly (init); |ak | < λant ∧ T (ak∗ ) − ak 6= ∅ ∧ ak∗ ∈ k k k T (a∗ ), τ ,η); a = a + node; τ = localPheromoneUpdate(τ ,ξ,node); next init = selectBestPaths(init, next init, ak ); f (ak ) < f (abest ) abest = ak ; τ = pheromoneEvaporation(τ , ρ); τ = pheromoneUpdate(τ , abest ); step ≡ 0 mod σs init = next init; next init = ∅; stage = stage+1; τ = pheromoneReset(); step = step + 1; P|init| selectInitNodeRandomly (init) [1] r=uniformRandom(0,1) * i=1 1/f (init[i]) i=1 to |init| p=p+1/f (init[i]) r ≤ p init[i] ∗ [!ht] selectBestPaths (init, next init, ak ) [1] ak∗ is not the last state of any starting path of init |next init| < ι π is the only starting path of init that holds π∗ = ak1 next init ∪{π + ak } w=argmaxf (next init) f (ak ) < f (w) π is the only starting path of init that holds π∗ = ak1 (next init - {w}) ∪{π + ak } next init next init In the following we will describe the algorithm, but previously we are going to clarify some issues related to the notation used in Algorithm 1. In the pseudocode, the path traversed by the kth artificial ant is denoted with ak . For this reason we use the same notation as in Section 3. for referring to the length of the path ( |ak |), the jth node of the path (akj ), and the last node of the path (ak∗ ). We use the operator + to refer to the concatenation of two paths. In line 10, we use the expression T (ak∗ ) − ak to refer to the elements of T (ak∗ ) that are not in the sequence ak . That is, in that expression we interpret ak as a set of nodes. The algorithm works as follows. At the beginning, the variables are initialized (lines 15). All the pheromone trails are initialized with the same value: a random number between τ0min and τ0max . In the init set (initial nodes for the ants construction), a starting path with only the initial node is inserted (line 1). This way, all the ants of the first stage begin the construction of their path at the initial node. After the initialization, the algorithm enters in a loop that is executed until a given maximum number of steps (msteps) set by the user is performed (line 6). In a loop, each ant builds a path starting in the final node of a previous path (line 9). This path is randomly selected from the init set using the procedure shown in Algorithm 2. For the construction of the path, the ants enter a loop (lines 10-14) in which each ant k stochastically selects the next node according to the pheromone (τij ) and the heuristic value (ηij ) associated to each arc (ak∗ , j) with j ∈ T (ak∗ ) (line 11). In particular, if the last node of the kth ant path is i = ak∗ , then the ant selects the next node j ∈ T (i) with probability [17] [τij ]α[ηij ]β , for j ∈ T (i) , α β s∈T (i)[τis ] [ηis ]
pkij = P
(5)
where α and β are two parameters of the algorithm determining the relative influence of the pheromone trail and the heuristic value on the path construction, respectively (see Figure 3).
Debugging Concurrent Programs Using Metaheuristics
205
According to the previous expression, artificial ants prefer paths with a higher concentration of pheromone, like real ants in real world. When an ant has to select a node, the last node of the current ant path is expanded. Then the ant selects one successor node and the remaining ones are removed from memory. This way, the amount of memory required in the path construction is small. The heuristic value ηij of an arc is a problem specific value determined by the designer in order to guide the search to promising regions of the graph.
Pheromone Trail
IJij
k
Heuristic
i
Șij
T(i) n
j m l
Figure 3. An ant during the construction phase. After the movement of an ant from a node to the next one the pheromone trail associated to the arc traversed is updated as in Ant Colony Systems (ACS) [16] using the expression τij ← (1 − ξ)τij (line 13) where ξ, with 0 < ξ < 1, controls the evaporation of the pheromone during the construction phase. This mechanism increases the exploration of the algorithm, since it reduces the probability that an ant follows the path of a previous ant in the same step. The construction process is iterated until the ant reaches the maximum length λant, it finds an objective node, or all the successors of the last node of the current path, T (ak∗ ), have been visited by the ant during the construction phase. This last condition prevents the ants from constructing cycles in their paths. After the construction phase, the ant is used to update the next init set (line 15), which will be the init set in the next stage. In next init, only starting paths are allowed and all the paths must have different last nodes. This rule is ensured by selectBestPaths shown in Algorithm 3, which proceeds as it is explained in the following lines. A path ak is inserted in the next init set if its last node is not one of the last nodes of a starting path π already included in the set. Before the inclusion, the path must be concatenated with the corresponding starting path of init, that is, the starting path π with π∗ = ak1 (this path exists and it is unique). This way, only starting paths are stored in the next init set. The cardinality of next init is bounded by a given parameter ι. When this limit is reached and a new path must be included in the set, the starting path with higher objective value is removed from the set.
206
Francisco Chicano and Enrique Alba
When all the ants have built their paths, a pheromone update phase is performed. First, all the pheromone trails are reduced according to the expression τij ← (1 − ρ)τij (line 20), where ρ is the pheromone evaporation rate and it holds that 0 < ρ ≤ 1. Then, the pheromone trails associated to the arcs traversed by the best-so-far ant ( abest ) are increased using the expression τij ← τij + 1/f (abest ), ∀(i, j) ∈ abest (line 21). This way, the best path found is awarded with an extra amount of pheromone and the ants will follow that path with higher probability in the next step, as in real world. We use here the mechanism introduced in Max-Min Ant Systems (MMAS) [34] for keeping the value of pheromone trails in a given interval [τmin , τmax] in order to maintain the probability of selecting one node above a given threshold. The values of the trail limits are τmax = 1/ρf (abest) and τmin = τmax /a where the parameter a controls the size of the interval. Finally, with a frequency of σs steps, a new stage starts. The init set is replaced by next init and all the pheromone trails are removed from memory (lines 22-27). In addition to the pheromone trails, the arcs to which the removed pheromone trails are associated are also discarded (unless they also belong to a path in next init). This removing step allows the algorithm to reduce the amount of memory required to a minimum value. This minimum amount of memory is the one utilized for storing the best paths found in one stage (the next init set).
4.2.
ACOhg-mc
In this section we present ACOhg-mc, an algorithm based on ACOhg for searching for general property violations in concurrent systems. In Algorithm 4 we show a high level object oriented pseudocode of ACOhg-mc. We assume that acohg1 and acohg2 are two instances of a class implementing ACOhg. In order to complete the description of the algorithm we need to specify the objective function, the heuristic values, and the objective nodes used in acohg1 and acohg2. We will do this later in this section, but before we will give a high level description of ACOhg-mc. The search that ACOhg-mc performs is composed of two different phases (see Fig. 4). In the first one, ACOhg is used for finding accepting states in the B¨uchi automaton (line 2 in Algorithm 4). In this phase, the search of ACOhg starts in the initial node of the graph q and the set of objective nodes O is empty. That is, although the algorithm searches for accepting states, there is no preference on a specific set of them. If the algorithm finds accepting states, in a second phase a new search is performed using ACOhg again for each accepting state discovered (lines 3 to 8). In this second search the objective is to find a cycle involving the accepting state. The search starts in one accepting state and the algorithm searches for the same state in order to find a cycle. That is, the initial node of the search and the only objective node are the same: the accepting state. If a cycle is found ACOhg-mc returns the complete accepting path (line 6). If no cycle is found for any of the accepting states ACOhg-mc runs again the first phase after including the accepting states in a tabu list (line 9). This tabu list prevents the algorithm from searching again cycles containing the just explored accepting states. If one of the accepting states in the tabu list is reached it will not be included in the list of accepting states to be explored in the second phase. ACOhg-mc alternates between the two phases until no accepting state is found in the first one (line 10).
Debugging Concurrent Programs Using Metaheuristics
207
[!ht] ACOhg-mc [1] accpt = acohg1.findAcceptingStates(); First phase node in accpt acohg2.findCycle(node); Second phase acohg2.cycleFound() acohg2.acceptingPath(); acohg1.insertTabu(accpt); empty(accpt) null;
First phase
Second phase
Figure 4. An illustration of the search that ACOhg-mc performs in the first and second phase.
The algorithm can also stop its search due to another reason: an end state has been found. That is, when an end state is found either in the first or the second phase of the search the algorithm stops and returns the path from the initial state to that end state. If this happens, an execution of the concurrent system has been found that violates the safety component of the checked property. When the property to check is the absence of deadlocks only the first phase of the search is required. In this case, ACOhg-mc searches for deadlock states (states with no successors) instead of accepting states. When a deadlock state is found the algorithm stops returning the path from the initial state to that deadlock state. The second phase of the search, the objective of which is to find an accepting cycle, is never run in this situation. Now we are going to give the details of the ACOhg algorithms used inside ACOhg-mc. First of all, we use in this chapter a node-based pheromone model, that is, the pheromone trails are associated to the nodes instead of the arcs. This means that all the values τxj associated to the arcs which head is node j are in fact the same value and is associated to node j. At this moment we must talk about the two heuristic functions presented before: η and H. The heuristic function η depends on each arc of the construction graph and it is defined in the context of ACO algorithms. It is a non-negative function used by ACO algorithms for guiding the search. The higher the value of ηij , the higher the probability of selecting arc (i, j) during the construction phase of the ants. The second heuristic, H, depends on each state of the B¨uchi automaton and it is defined in the context of the problem (heuristic model checking). We use here the notation H to refer to a general heuristic function but in practice this function will be one of the functions defined in Section 2.5. (e.g. Hf sm or Hϕ ). This heuristic H is a non-negative function designed to be minimized, that is, the lower the value of H(j), the higher the preference to explore node j. In our proposal we must derive η from H. The exact expression we use is ηij = 1/(1 + H(j)). This way, ηij increases when H(j) decreases (high preference to explore node j).
208
Francisco Chicano and Enrique Alba Finally, the objective function f to be minimized is defined as ( |π + ak | if ak∗ ∈ O k f (a ) = −|ak | λ if ak∗ ∈ /O , |π + ak | + H(ak∗ ) + pp + pc λant ant −1
(6)
where π is the starting path in init whose last node is the first one of ak , pp , and pc are penalty values that are added when the ant does not end in an objective node and when ak contains a cycle, respectively. The last term in the second row of Eq. (6) makes the penalty higher in shorter cycles (see [9] for more details). The configuration of the ACOhg algorithms executed inside ACOhg-mc is, in general, different in the two phases, since they tackle different objectives. We highlight this fact by using different variables for referring to both algorithms in Algorithm 4: acohg1 and acohg2. For example, in the first phase (acohg1) a more exploratory search is required in order to find a diverse set of accepting states. In addition, the accepting states are not known and no state-based heuristic can be used; a formula-based heuristic or a deadlock heuristic must be used instead. On the other hand, in the second phase ( acohg2) the search must be guided to search for one concrete state and, in this case, a state-based heuristic like the Hamming distance or the finite state machines distance is more suitable.
4.3.
Improvement Using SCCs
When we are searching for an execution that violates an LTL property we can make the search more efficient if we take into account the classification of the never claim SCCs. The improvements are localized in two places of ACOhg-mc. During the first phase, in which accepting states are searched for, those accepting states that belong to an N-SCC in the never claim are ignored. The reason is that an accepting state in an N-SCC cannot be part of an accepting cycle. This way, we reduce the number of accepting states to be explored in the second phase. The second improvement is localized in the computation of the successors of a state (line 10 in Algorithm 1) in both, the first and the second phase of ACOhg-mc. When the successors are computed, ACOhg checks if they are included in the path that the ant has traversed up to the moment. If they are, the state is not considered as the next node to visit since the ant would build a cycle. The improvement consists in checking if this cycle is in an F-SCC. This can be easily checked by finding if the state that closes the cycle is in an F-SCC of the never claim. If it is, then an accepting cycle has been found and the global search stops. The advantages of these improvements depend on the structure of the LTL formula and the model to check. We can notice no advantages in some cases, especially when the number of N-SCC and F-SCC is small. However, the computational cost of the improvements is negligible, since it is possible to check the kind of SCC associated to a state in constant time.
5.
Experiments
In this section we present some results obtained with our ACOhg-mc algorithm. For the experiments we have selected ten models from the BEEM benchmark [31] that are
Debugging Concurrent Programs Using Metaheuristics
209
presented in the following section. After that, we discuss the parameters of the algorithm used in the experiments in Section 5.2.. In Section 5.3. we compare the results obtained with ACOhg-mc against NDFS and INDFS. Next, in Section 5.4. we study the influence on the results of the partial order reduction technique.
5.1.
Promela Models
BEEM (BEnchmarks for Explicit Model checking) is a benchmark of models by Radek Pel´anek [31] that can be found at http://anna.fi.muni.cz/models. The benchmark includes more than 50 parameterized models (a total of 300 concrete models) together with their correctness properties. The main goal of BEEM is to serve as a standard used for the comparison of different model checking algorithms. In order to perform high quality experimental evaluation, researchers need tools in which they can implement model checking techniques (such as HSF-SPIN, which we use) and benchmark sets of models which can be used for comparisons (this is the contribution of BEEM). The models of BEEM are specified in DVE language [30] (the language used in the DiVinE model checker) but Promela versions of some models can be found in the benchmark. These Promela specifications were automatically generated from the DVE models. This translation keeps the semantic of the models but, as it is said in the BEEM web site, the state space is not exactly the same as in the DVE models. We have selected 10 out of the 300 concrete models of the benchmark. For this selection we have considered only those models for which the complete state space has not been explored due to memory constraints (this information is available in the web site). We proceed in this way because ACOhg-mc has been designed to work in large models in which an exhaustive search is not applicable due to the large amount of required memory. There are 49 models in which the state space exploration is incomplete and we have selected 10 out of these models. All these models violate a property. In Table 2 we present the models with some information about them: lines of code, number of processes, the kind of property violated, and the abbreviation we will use below. Table 2. Promela models used in the experiments (from the BEEM benchmark). Model driving phils.5 iprotocol.7 leader filters.6 peterson.7 public subscribe.5 train-gate.7 elevator.5 firewire link.6 schedule world.3 phils.8
Abbrv. dri ipr lea pet pub tra ele fir sch phi
LoC 142 188 222 155 618 307 277 1229 125 323
Processes 4 6 5 5 11 11 7 11 1 16
Property ♦p ((♦p) ∧ (♦q)) → (♦t) ♦p (!p → ♦q) (p → ♦(q ∨ t) (p → ♦q) (p → ♦q) deadlock deadlock deadlock
The first model, driving phils.5, is composed of several processes access-
210
Francisco Chicano and Enrique Alba
ing several resources. iprotocol.7 is an optimized sliding window protocol. leader filters.6 is a leader election algorithm based on filters. The forth model, peterson.7 is the Peterson’s mutual exclusion algorithm for more than two processes (five in this case). public subscribe.5 is a publish/subscribe notification protocol. train-gate.7 is a simple controller of a train gate. elevator.5 is an elevator controller for a building with six floors. firewire link.6 is the link layer protocol of the IEEE-1394 standard. schedule world.3 is a scheduler for machines. Finally, phils.8 is a model of the Dijkstra dining philosophers problem with 16 philosophers.
5.2.
Configuration of the Experiments
The parameters used in the experiments for ACOhg-mc in the two phases are shown in Table 3. These parameters have been selected according to the recommendations in [1]. As mentioned in Section 4., in the first phase we use an explorative configuration ( ξ = 0.7, λant = 40) while in the second phase the configuration is adjusted to search in the region near the accepting state found (intensification). It is possible that these parameters can be tuned in order to improve the efficiency and the efficacy of the search. However, this tuning requires time and we must take into account this time when the algorithm has to be applied, especially if we want to compare it against parameter-free algorithms like NDFS or INDFS. In this work, we do not tune the parameters of the algorithm; we just run the algorithm with the parameters recommended in the literature for ACOhg. This way the tuning time is zero. Table 3. Parameters for ACOhg-mc First phase Parameter Value msteps 100 colsize 10 λant 40 σs 4 ι 10 ξ 0.7 a 5 ρ 0.2 τ0min 0.1 max τ0 10.0 τmax 1/ρf (abest ) τmin τmax /a α 1.0 β 2.0 pp 1000 pc 1000
Second phase Parameter Value msteps 100 colsize 20 λant 4 σs 4 ι 10 ξ 0.5 a 5 ρ 0.2 τ0min 0.1 max τ0 10.0 τmax 1/ρf (abest ) τmin τmax /a α 1.0 β 2.0 pp 1000 pc 1000
With respect to the heuristic information, we use Hϕ (the formula-based heuristic) in the first phase of the search when the objective is to find accepting states and Hap when the ob-
Debugging Concurrent Programs Using Metaheuristics
211
jective is to find a deadlock state. In the second phase we use the distance of finite state machines Hf sm . ACOhg-mc has been implemented inside the HSF-SPIN model checker [18]. In this way, we can use the HSF-SPIN implementation of the heuristic functions above mentioned and, at the same time, all the existing work related to parsing Promela models and interpreting them. We need to perform several independent runs in order to get quantitative information of the behaviour of the algorithm. We perform 100 independent runs of ACOhg-mc to get a high statistical confidence, and we report the mean and the standard deviation of the independent runs. The machine used in the experiments is a Pentium IV at 2.8 GHz with 512 MB of RAM and Linux operative system with kernel version 2.4.19-4GB. In all the experiments the maximum memory assigned to the algorithms is 512 MB: when a process exceeds this memory it is automatically stopped. We do this in order to avoid a high amount of data flow from/to the secondary memory, which could affect significantly the CPU time required in the search.
5.3.
Comparison against Exhaustive Techniques
In this first experiment we compare the results obtained with ACOhg-mc against a classical algorithm used for finding errors in concurrent systems, NDFS, and an improved version of NDFS that takes into account the SCCs of the never claim, INDFS. Both algorithms are deterministic and for this reason we only perform one single run of them. For the models in which the objective is to find a deadlock we only use NDFS, since there is no never claim in order to take advantage of the SCC improvement as INDFS does. In Table 4 we show the hit rate, the length of the error trails, the memory required (in Kilobytes), and the run time (in milliseconds) of the three algorithms. For ACOhg-mc we show the average and the standard deviation of the results obtained in the 100 independent runs. We highlight with a grey background the best results (maximum values for hit rate and minimum values for the rest of the measures). We also show the results of a statistical test (with level of significance α = 0.05) in order to check if there exist statistically significant differences between ACOhg-mc and the exhaustive algorithms. A plus sign means that the difference is significant and a minus sign means that it is not. In the case of the hit rate we use a Westlake-Schuirmann test of equivalence of two independent proportions, for the rest of the measures we use the one sample Wilcoxon sign rank test because we compare one sample (the results of ACOhg-mc) with one single value (the results of NDFS and INDFS). [33]. Concerning the hit rate we can observe that ACOhg-mc is the only one that is able to find error paths in dri, lea, and phi. NDFS and INDFS are not able to find error paths in these models because they require more memory than the available one. This is a relevant result since NDFS is a very popular algorithm in the formal methods community for checking properties using an explicit state model checker. If we focus on the remaining models we observe that ACOhg-mc fails to find an error path in ipr (while NDFS and INDFS are able to find one) and its hit rate in the tra is 97%. In general, the algorithm with higher total hit rate is ACOhg-mc. With respect to the length of the error paths we observe that ACOhg-mc obtains shorter error paths than NDFS and INDFS with statistical significance. The only exceptions are
212
Francisco Chicano and Enrique Alba Table 4. Comparison of ACOhg-mc, NDFS, and INDFS 2*Models 5*dri
5*ipr
5*lea
5*pet
5*pub
5*tra
5*ele
5*fir
5*sch
5*phi
2*Measures Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms)
ACOhg-mc Avg. Std. gris 100/100 gris gris 37.10 gris 1.45 gris 1967.04 gris 19.16 gris 1307.80 gris 379.64 0/100 • • • • • • gris 100/100 gris gris 21.30 gris 1.45 gris 2057.28 gris 16.32 gris 8255.90 gris 743.28 gris 100/100 gris gris 113.28 gris 70.82 2401.96 70.27 52686.00 76471.00 gris 100/100 gris gris 62.65 gris 59.52 gris 2744.84 gris 110.38 2029.40 685.59 97/100 gris 11.26 gris 2.68 gris 1986.53 gris 12.55 71.24 20.97 gris 100/100 gris gris 10.11 gris 7.95 gris 2513.32 gris 313.04 5882.10 7527.91
Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms)
gris 100/100 30.18 2077.00 9.10 gris 100/100 8.90 2376.52 67.90 gris 100/100 gris 28.96 gris 2483.92 gris 138.00
gris 6.88 0.00 3.49 gris 3.67 338.07 47.84 gris gris 7.88 gris 221.83 gris 140.25
NDFS Result 0/1 • • • gris 1/1 2657.00 104448.00 17470.00 0/1 • • • gris 1/1 9996.00 6577.00 60.00 gris 1/1 2101.00 5105.00 60.00 gris 1/1 131.00 2137.00 gris 10.00 gris 1/1 5553.00 53248.00 gris 610.00 gris 1/1 gris 22.00 gris 1885.00 gris 0.00 gris 1/1 gris 6.00 gris 1753.00 gris 0.00 0/1 • • •
Test + • • • + • • • + • • • − + + + − + + + + + + + − + + +
INDFS Result Test 0/1 + • • • • • • gris 1/1 + 2657.00 • 104448.00 • 17430.00 • 0/1 + • • • • • • gris 1/1 − 124.00 + gris 1889.00 + gris 10.00 + gris 1/1 − 2053.00 + 4981.00 + gris 50.00 + gris 1/1 + 131.00 + 2133.00 + 20.00 + gris 1/1 − 5553.00 + 53248.00 + 630.00 +
− + + + − + + + + • • •
n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
that of fir and sch. The largest differences are that of ele (around 500 times shorter) and pet (around 100 times shorter). Furthermore, we limited the exploration depth of NDFS and INDFS to 10,000 in order to avoid stack overflow problems. If we allowed these algorithms to explore deeper regions we would obtain longer error paths with them. In fact, we ran NDFS using a depth limit of 50,000 in pet and we got an error path of 50,000 states. This means that the length of the error path that is shown in Table 4 for NDFS in pet is in fact a lower bound of the real length that NDFS would obtain in theory. In general, ACOhg-mc obtains error paths that are shorter (with a large difference in some cases) than the ones obtained with NDFS and INDFS. This is a very important result since short error paths are preferred because they enable the programmers to find out faster what is wrong in the concurrent system.
Debugging Concurrent Programs Using Metaheuristics
213
INDFS). [33]. Table 4. Comparison of ACOhg-mc, NDFS, and INDFS Models
dri
ipr
lea
pet
pub
tra
ele
fir
sch
phi
Measures Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms)
ACOhg-mc Avg. Std. 100/100 37.10 1.45 1967.04 19.16 1307.80 379.64 0/100 • • • • • • 100/100 21.30 1.45 2057.28 16.32 8255.90 743.28 100/100 113.28 70.82 2401.96 70.27 52686.00 76471.00 100/100 62.65 59.52 2744.84 110.38 2029.40 685.59 97/100 11.26 2.68 1986.53 12.55 71.24 20.97 100/100 10.11 7.95 2513.32 313.04 5882.10 7527.91 100/100 30.18 2077.00 9.10 100/100 8.90 2376.52 67.90 100/100 28.96 2483.92 138.00
6.88 0.00 3.49 3.67 338.07 47.84 7.88 221.83 140.25
NDFS Result Test 0/1 + • • • • • • 1/1 + 2657.00 • 104448.00 • 17470.00 • 0/1 + • • • • • • 1/1 − 9996.00 + 6577.00 + 60.00 + 1/1 − 2101.00 + 5105.00 + 60.00 + 1/1 + 131.00 + 2137.00 + 10.00 + 1/1 − 5553.00 + 53248.00 + 610.00 + 1/1 22.00 1885.00 0.00 1/1 6.00 1753.00 0.00 0/1 • • •
− + + + − + + + + • • •
INDFS Result 0/1 • • • 1/1 2657.00 104448.00 17430.00 0/1 • • • 1/1 124.00 1889.00 10.00 1/1 2053.00 4981.00 50.00 1/1 131.00 2133.00 20.00 1/1 5553.00 53248.00 630.00 n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
Test + • • • + • • • + • • • − + + + − + + + + + + + − + + + n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
Concerning the hit rate we can observe that ACOhg-mc is the only one that is able to find error paths in dri, lea, and phi. NDFS and INDFS are not able to find error paths in these models because they require more memory than the available one. This is a relevant result since NDFS is a very popular algorithm in the formal methods community for checking properties using an explicit state model checker. If we focus on the remaining models we observe that ACOhg-mc fails to find an error path in ipr (while NDFS and INDFS are able to find one) and its hit rate in the tra is 97%. In general, the algorithm with higher total hit rate is ACOhg-mc. With respect to the length of the error paths we observe that ACOhg-mc obtains shorter error paths than NDFS and INDFS with statistical significance. The only exceptions are
214
Francisco Chicano and Enrique Alba
that of fir and sch. The largest differences are that of ele (around 500 times shorter) and pet (around 100 times shorter). Furthermore, we limited the exploration depth of NDFS and INDFS to 10,000 in order to avoid stack overflow problems. If we allowed these algorithms to explore deeper regions we would obtain longer error paths with them. In fact, we ran NDFS using a depth limit of 50,000 in pet and we got an error path of 50,000 states. This means that the length of the error path that is shown in Table 4 for NDFS in pet is in fact a lower bound of the real length that NDFS would obtain in theory. In general, ACOhg-mc obtains error paths that are shorter (with a large difference in some cases) than the ones obtained with NDFS and INDFS. This is a very important result since short error paths are preferred because they enable the programmers to find out faster what is wrong in the concurrent system. If we focus on the computational resources we observe that ACOhg-mc requires between 2 MB and 3 MB of memory in all the models (the maximum is 2.7 MB in pub). This behaviour contrasts with NDFS and INDFS, in which the memory required is highly dependent on the model being checked. In some cases the available memory is not enough for completing the search as it happens in dri, lea, and phi. We must clarify here two issues. First, the failure of ACOhg-mc in ipr is not due to memory constraints (the memory required in ipr was 1953 KB) but to its stochastic nature. Second, NDFS and INDFS do not permanently store the visited states in memory (as other algorithms like Breadth First Search or A∗ do), they only store the states belonging to a branch of the search as they are needed. In fact, a usual property of DFS-like algorithms is that they require a low amount of memory but they obtain a long path to the objective state. In Table 4 we can see that our ACOhg-mc can outperform the memory requirements of both NDFS and INDFS. With respect to the time required for the search, NDFS and INDFS are faster than ACOhg-mc in all the models in which they find an error. The mechanisms included in ACOhg-mc in order to be able to find short error paths with high hit rate and low amount of memory extend the time required for the search. The maximum difference with respect to the time is around 50 seconds (in pet), which is not large from the tester point of view if we take into account that the error path obtained is much shorter (one hundred times). In summary, ACOhg-mc is able to find shorter error paths than NDFS and INDFS using less memory. Furthermore, the results shown in this section suggest that ACOhg-mc is more effective than NDFS and INDFS (higher hit rate). However, in order to clearly support this last claim more experimentation is required.
5.4.
ACOhg-mc and Partial Order Reduction
In our second experiment we want to analyze the influence of the POR technique on the results of ACOhg-mc. In [9] the POR technique was used in combination with ACOhg in order to find safety property violations in concurrent systems. The results showed that the combination is beneficial for the search. The results of this section extend those of [9] in two ways. First, here we use POR in general properties (not only safety properties). Second, we do not know the structure of the selected models of BEEM and, thus, we do not know if their state spaces can be reduced or not. In Table 5 we present the results of applying ACOhg-mc with and without the POR technique to the selected BEEM models (we have omitted here ipr because it is not relevant for the discussion). The information reported is
Debugging Concurrent Programs Using Metaheuristics
215
the same as in the previous section. We also show the results of a statistical test (with level of significance α = 0.05) in order to check if there exist statistically significant differences (last column). A plus sign means that the difference is significant and a minus sign means that it is not. In the case of the hit rate we use, as before, a Westlake-Schuirmann test of equivalence of two independent proportions, for the rest of the measures we use this time a Kruskal-Wallis test because we are comparing two samples [33]. Table 5. Influence of POR on the results of ACOhg-mc Models
dri
lea
pet
pub
tra
ele
fir
sch
phi
Measures Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms)
ACOhg-mc Avg. Std. 100/100 37.10 1.45 1967.04 19.16 1307.80 379.64 100/100 21.30 1.45 2057.28 16.32 8255.90 743.28 100/100 113.28 70.82 2401.96 70.27 52686.00 76471.00 100/100 62.65 59.52 2744.84 110.38 2029.40 685.59 97/100 11.26 2.68 1986.53 12.55 71.24 20.97 100/100 10.11 7.95 2513.32 313.04 5882.10 7527.91 100/100 30.18 6.88 2077.00 0.00 9.10 3.49 100/100 8.90 3.67 2376.52 338.07 67.90 47.84 100/100 28.96 7.88 2483.92 221.83 138.00 140.25
ACOhg-mc+POR Avg. Std. 100/100 37.24 1.49 2012.08 35.25 1344.80 370.73 100/100 21.60 1.26 2110.04 54.77 7843.10 599.94 100/100 80.52 5.17 2386.68 12.73 1772.70 15789.89 100/100 53.26 24.99 2301.88 149.78 44.20 23.71 98/100 11.79 3.83 2024.20 26.77 67.86 25.32 100/100 11.79 14.34 2592.72 348.88 6558.90 8361.86 100/100 30.50 6.30 2413.00 0.00 13.80 4.85 100/100 9.00 5.77 2294.04 327.34 65.40 45.86 100/100 27.64 8.96 2458.24 196.01 158.30 156.41
Test − − + + − + + + − + + + − + + + − − + − − − + − − − + + − − + − − − + −
With respect to the hit rate, we can observe that the POR technique has no influence on the effectiveness of the algorithm. The only model in which the results are different is tra but the differences are not significant. If we focus on the length of the error trails we only observe three statistically significant differences in lea, pet, and pub. The first one is a small difference (and the test yielded only marginal significance, with a p-value of 0.0433) but the other two are large differences and they suggest that using POR the length of the error trails can be reduced. This is an interesting result, since, in general, the reduction in the construction graph performed by
216
Francisco Chicano and Enrique Alba
POR does not maintain the optimal paths. That is, states belonging to the optimal error paths can be reduced by POR and, thus, the optimal error path in the reduced model can be longer than the one of the original model. However, when the POR technique does not reduce states belonging to an optimal path, the reduction of the exploration graph can help the algorithms to find a shorter path. Concerning the memory required for the search we cannot observe large differences. In some models like dri and lea ACOhg-mc requires more memory using POR (with statistical significance) but in others like pet and pub less memory is required. Why is not memory always reduced? The objective of POR is to reduce the state space. With this reduction model checkers can always explore the complete state space using less memory. Thus, we expect a reduction of memory when POR is used. However, if the model does not fulfil the property a counterexample must exist and the model checker stops when this counterexample is found. We are here measuring the memory required up to the discovery of this counterexample and this memory could not be reduced, furthermore, it can be increased, as it can be observed in dri. This is not a particular behaviour of ACOhg-mc, it can be also observed in NDFS and INDFS as it is shown in Table 6. Especially interesting is the pet model for which POR is beneficial if ACOhg-mc is used and very damaging if NDFS is used instead. Finally, with respect to the time required for the search, the statistically significant differences in Table 5 suggest that using POR an error path is found earlier. The large differences found in pet and pub and the small one of lea support this proposition. Unlike this, dri and fir support the opposite hypothesis but in this case the differences are smaller. Thus, the general conclusion is that execution time is reduced with POR when ACOhg-mc is used. In summary, we have observed in this section that POR changes the state space in a way that, in general, is beneficial when ACOhg-mc is used for the search for property violations. Memory, computation time, and error path length can be noticeably reduced using POR. However, for some models a small increase in these parameters can be observed.
6.
Discussion
In this section we discuss the utility of our proposal from the software engineer point of view, giving some guidelines that could help practitioners to decide when to use ACOhg-mc for searching for errors in concurrent systems. First of all, it must be clear that ACOhg-mc can find short counterexamples in faulty concurrent systems, but it cannot be used for completely ensuring that a concurrent system satisfies a given property. Thus, ACOhg-mc should be used in the first/middle stages of the software development and after any maintenance modification made on the concurrent system. In these phases, errors are expected to exist in any concurrent software. In spite of the previous considerations, ACOhg-mc can also be used to assure with high probability that the software satisfies a given desirable property (perhaps obtained from its specification). In this case, it can be used at the end of the software life cycle. This is similar to state that the software is “probably correct” after a testing phase in which the software has been run on a set of test cases. Unlike this, in critical systems (like airplane controller software)
Debugging Concurrent Programs Using Metaheuristics
217
Table 6. Influence of POR on the results of NDFS and INDFS Models
ipr
pet
pub
tra
ele
fir
sch
Measures Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms) Hit rate Length Mem. (KB) CPU (ms)
NDFS 1/1 2657 104448 17470 1/1 9996 6577 60 1/1 2101 5105 60 1/1 131 2137 10 1/1 5553 53248 610 1/1 22 1885 0 1/1 6 1753 0
NDFS+POR 1/1 2657 60416 7700 1/1 9914 448512 39110 1/1 2386 6571 100 1/1 131 2134 10 1/1 5553 53248 950 1/1 22 2281 10 1/1 6 1753 0
INDFS 1/1 2657 104448 17430 1/1 124 1889 10 1/1 2053 4981 50 1/1 131 2133 20 1/1 5553 53248 630 n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
INDFS+POR 1/1 2657 60416 7670 1/1 132 1888 0 1/1 2363 6351 110 1/1 131 2134 10 1/1 5553 53248 950 n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap. n/ap.
an exhaustive algorithm must be used in the final stages to verify that the software really satisfies the specification. We have stated in the experimental section what are the main advantages of using ACOhg-mc against exhaustive techniques (such as NDFS) in the search for property violations: shorter error paths can be obtained with higher probability and less memory. But, what about the drawbacks? The main drawback we have found from the point of view of the applicability of ACOhg-mc is the large amount of parameters of the algorithm. On the one hand, these parameters make ACOhg-mc more flexible, since it is possible to tackle models with different features changing the parameterization. On the other hand, software practitioners have no time to adjust the parameters of the algorithm and they want a robust algorithm that works well in most situations with minimum cost. In this sense, we understand that a parameterization study must be a priority in the following steps of this research, although general non-parameterized solvers for a problem cannot be expected in practice (and in theory). In fact, from the experiments performed for this and previous work we have outlined a set of rules for assigning values to the parameters (some of them are published in [1]). We already have some clues on how the number of parameters of ACOhg-mc could be largely reduced (work in progress).
218
7.
Francisco Chicano and Enrique Alba
Conclusion
We have presented here a proposal based on ant colony optimization for finding property violations in concurrent systems. This problem is of capital importance in the development of software for critical systems. Our proposal, called ACOhg-mc, is a stochastic algorithm that uses heuristic information based on the property to check to guide the search. We have shown the performance of the proposal on a set of ten models of the BEEM benchmark. We have compared ACOhg-mc against two algorithms used in explicit model checking: NDFS and INDFS. The results show that ACOhg-mc is able to outperform both algorithms in efficacy and efficiency for most of the models. ACOhg-mc requires a very low amount of memory and it is able to find short error trails. We have also analyzed the influence of the POR technique on the results obtained by ACOhg-mc. The results show that, in general, the POR technique can reduce the memory, the computation time, and the length of the error trails found by ACOhg-mc. As future work we plan to combine ACOhg-mc with other techniques for reducing the state space such as symmetry reduction. We have observed in a preliminary (not published) study that a version of ACOhg-mc that does not use pheromone trails for guiding the search is also able to obtain competitive results and requires even less memory. An additional advantage of such kind of algorithm is that it has fewer parameters than a version using pheromone trails. We will study the advantages and limitations of this alternative.
Acknowledgements We would like to thank Walter J. Gutjahr for his revision of this work and his constructive and helpful comments and suggestions. This work has been partially funded by the Spanish Ministry of Education and Science and FEDER under contract TIN2005-08818C04-01 (the OPLINK project). It has also been partially funded by the Spanish Ministry of Industry under contract FIT-330225-2007-1 (the European EUREKA-CELTIC project CARLINK).
References [1] Enrique Alba and Francisco Chicano. ACOhg: Dealing with huge graphs. In Proc. of GECCO, pages 10–17, 2007. [2] Enrique Alba and Francisco Chicano. Finding safety errors with ACO. In Proc. of GECCO, pages 1066–1073, 2007. [3] Bowen Alpern and Fred B. Schneider. Defining liveness. Inform. Proc. Letters, 21:181–185, 1985. [4] P. Ammann, P. Black, and W. Majurski. Using model checking to generate tests from specifications. In Proceedings of the 2nd IEEE International Conference on Formal Engineering Methods, pages 46–54, Brisbane, Australia, December 1998. IEEE Computer Society Press.
Debugging Concurrent Programs Using Metaheuristics
219
[5] Saddek Bensalem, Vijay Ganesh, Yassine Lakhnech, C´esar Munoz, Sam Owre, Harald Rueß, John Rushby, Vlad Rusu, Hassen Sa¨ıdi, N. Shankar, Eli Singerman, and Ashish Tiwari. An overview of SAL. In C. Michael Holloway, editor, Fifth NASA Langley Formal Methods Workshop, pages 187–196, Hampton, VA, 2000. [6] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268–308, 2003. [7] Dragan Boˇsnaˇcki, Stefan Leue, and Alberto Lluch-Lafuente. Partial-order reduction for general state exploring algorithms. In SPIN 2006, volume 3925 of Lecture Notes in Computer Science, pages 271–287, 2006. [8] Jerry R. Burch, Edmund M. Clarke, David E. Long, Kenneth L. McMillan, and David L. Dill. Symbolic model checking for sequential circuit verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(4), April 1994. [9] Francisco Chicano and Enrique Alba. Ant colony optimization with partial order reduction for discovering safety property violations in concurrent models. Information Processing Letters, 2007. (to appear). [10] Francisco Chicano and Enrique Alba. Finding liveness errors with ACO. In Proceedings of the Cenfenrence on Evolutionary Computation, 2008. [11] Francisco Chicano and Enrique Alba. Searching for liveness property violations in concurrent systems with ACO. In Proceedings of Genetic and Evolutionary Computation Conference, 2008. [12] E. Clarke, R. Enders, T. Filkorn, and S. Jha. Exploiting symmetry in temporal logic model checking. Formal Methods in System Design, 9(1-2):77–104, August 1996. [13] E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Program. Lang. Syst., 8(2):244–263, 1986. [14] Edmund M. Clarke and E. Allen Emerson. Design and synthesis of synchronization skeletons using branching-time temporal logic. In Logic of Programs, Workshop, pages 52–71, London, UK, 1982. Springer-Verlag. [15] Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Checking. The MIT Press, January 2000. [16] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 6(4):317–365. [17] Marco Dorigo and Thomas St¨utzle. Ant Colony Optimization. The MIT Press, 2004. [18] Stefan Edelkamp, Alberto Lluch Lafuente, and Stefan Leue. Directed explicit model checking with hsf-spin. In Lecture Notes in Computer Science, 2057, pages 57–79. Springer, 2001.
220
Francisco Chicano and Enrique Alba
[19] Stefan Edelkamp, Alberto Lluch Lafuente, and Stefan Leue. Protocol verification with heuristic search. In AAAI-Spring Symposium on Model-based Validation Intelligence, pages 75–83, 2001. [20] Stefan Edelkamp, Stefan Leue, and Alberto Lluch-Lafuente. Directed explicit-state model checking in the validation of communication protocols. Intl. Jnl. of Soft. Tools for Tech. Transfer, 5:247–267, 2004. [21] R. Gerth, D. Peled, M.Y. Vardi, and P. Wolper. Simple on-the-fly automatic verification of linear temporal logic. In Proceedings of IFIP/WG6.1 Symposium on Protocol Specification, Testing, and Verification (PSTV95), pages 3–18, Warsaw, Poland, June 1995. [22] Patrice Godefroid and Sarfraz Khurshid. Exploring very large state spaces using genetic algorithms. Intl. Jnl. on Soft. Tools for Tech. Transfer, 6(2):117–127, 2004. [23] A. Groce and W. Visser. Heuristics for model checking Java programs. Intl. Jnl. on Software Tools for Technology Transfer, 6(4):260–276, 2004. [24] Charles Anthony Richard Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–580, 1969. [25] G. J. Holzmann, D. Peled, and M. Yannakakis. On nested depth first search. In Proc. Second SPIN Workshop, pages 23–32, 1996. [26] Gerald J. Holzmann. The SPIN Model Checker. Addison-Wesley, 2004. [27] Alberto Lluch-Lafuente, Stefan Leue, and Stefan Edelkamp. Partial order reduction in directed model checking. In 9th International SPIN Workshop on Model Checking Software, Grenoble, April 2002. Springer. [28] Kenneth L. McMillan. Symbolic Model Checking. An approach to the state explosion problem. PhD thesis, Carnegie Mellon University, 1992. [29] Christoph C. Michael, Gary McGraw, and Michael A. Schatz. Generating software test data by evolution. IEEE Trans. on Soft. Eng., 27(12):1085–1110, 2001. [30] Radek Pel´anek. Web portal for benchmarking explicit model checkers. Technical Report FIMU-RS-2006-03, Faculty of Informatics, Masaryk University Brno, 2006. [31] Radek Pel´anek. BEEM: Benchmarks for explicit model checkers. In Proceedings of SPIN Workshop, volume 4595 of Lecture Notes in Computer Science, pages 263–267. Springer, 2007. [32] Doron Peled. Combining partial order reductions with on-the-fly model-checking. Formal Methods in System Design, 8(1):39–64, January 1996. [33] David J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, 2007.
Debugging Concurrent Programs Using Metaheuristics
221
[34] T. St¨utzle and H. H. Hoos. MAX-MIN ant system. Future Generation Computer Systems, 16(8):889–914. Reviewed by Walter J. Gutjahr (University of Vienna)
INDEX A absolute zero, 141 absorption, 141 academic, 58, 176, 181, 182 acceptors, 128 access, 7, 8, 9, 16, 60, 61, 62, 182 accidental, 26 accounting, 166 accuracy, 94, 118 achievement, 14, 74, 79, 176, 179 ACM, 10, 11, 12, 35, 64, 65, 67, 88, 90, 91, 172, 173, 191, 219, 220 ACS, 206 activation, 115, 116 activation energy, 115, 116 actuators, 15 ad hoc, 158, 177 adaptability, 90 adaptation, 44, 45, 46, 47 administration, 70, 75 advertisement, 74 aerospace, 153 affective reactions, 90 age, 82, 127, 180 agents, 61, 66, 187, 188 aggregation, 51 aid, viii, 37, 38, 42, 57, 61, 95, 101, 191 aiding, 55, 57, 61, 62 air-traffic, 188 algorithm, vii, x, xi, 7, 8, 94, 95, 102, 103, 104, 105, 106, 107, 108, 109, 114, 115, 120, 122, 123, 191, 193, 195, 197, 198, 200, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 215, 217, 218 alloys, 122 alpha, 154 alphabets, vii, 10 alternative, 50, 54, 135, 137, 194, 218 alternatives, 63, 87, 89, 200 aluminium, 116, 122 aluminium alloys, 122 amplitude, 141
Amsterdam, 66 animal tissues, 95 animations, 137, 140 anisotropy, 113 annealing, 103 ants, 203, 204, 205, 206, 207, 209 application, viii, 16, 17, 18, 19, 26, 27, 29, 38, 56, 57, 58, 59, 61, 63, 78, 80, 81, 83, 85, 86, 95, 105, 110, 121, 126, 139, 140, 152, 154, 156, 159, 161, 164, 165, 166, 170, 183, 195 architecture design, 83, 87, 91 argument, 6 Aristotle, 125 arithmetic, 4 artificial intelligence, ix, 94, 95, 102, 103 Asia, 91 assessment, 84, 92, 155, 179 assets, ix, 39, 69, 70, 71, 72, 77, 79, 80, 81, 82, 85, 86, 87, 158 assignment, 40, 47, 53 assimilation, 44 assumptions, 127, 177, 178 asymptotically, 7 asynchronous, 34, 199 Atlantic, 68 atoms, 130, 134, 138, 141, 142, 143, 144, 145 attitudes, 176 Aurora, 37, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68 Australia, 65, 68, 175, 183, 218 authentication, 166 authority, 79 automata, 196, 197 automation, 157 availability, 156 awareness, 73, 176, 179
B barrier, viii, 13, 14, 19, 20, 132 barriers, 126 Bayesian, 84 behavior, ix, 17, 61, 69, 77, 79, 80, 86, 90, 189, 191
224
Index
behaviours, 95 Belgium, 21, 22 beliefs, 178 benchmark, xi, 106, 152, 193, 195, 210, 211, 218 benchmarking, 220 bending, 122 benefits, vii, ix, 17, 19, 27, 30, 37, 38, 39, 63, 69, 74, 75, 76, 77, 90 biomechanics, ix, 93, 96, 121 biomolecular, 152 bit vectors, vii, 3, 4, 7, 8, 9 black-box, 31, 35, 169 blame, 90 blocks, 4, 5, 6, 9, 145, 160 bonds, 145 bonus, 159 Boston, 22 bottlenecks, 39 bottom-up, 153 bounds, vii, 3, 4, 7, 9, 10, 11, 170 brainstorming, 39 branching, 5, 196 Brno, 220 building blocks, 31, 32, 145, 161 buildings, 188 business environment, 73 business model, 158 bust, x, 94, 95, 103, 120 buttons, 142
C CAD, 157 CAM, 157 Canada, 10, 64, 67, 69, 90 capacity, 127, 152, 201 carbon, 143, 144 carbon atoms, 143 case study, viii, 25, 27, 29, 33, 35, 38, 41, 57, 60, 73, 75, 78, 81, 172, 173 catalyst, 74 category b, 51 cell, 5, 10, 187 channels, 127, 128 chemical bonds, 134 chemical engineering, 152 chemicals, 203 children, 178, 180, 181 chiral, 143, 144 chiral center, 143 chirality, 136, 144 chromosome, 105 classes, 33, 95, 96, 101, 104, 161, 162, 164, 166, 168, 187, 188, 189, 190 classical, 101, 104, 105, 136, 137, 139, 198, 212 classification, 48, 49, 50, 51, 52, 58, 87, 153, 166, 199, 209 classroom, 135, 150, 175, 177
classrooms, 129 clients, 154 Co, 21 code generation, 15 coding, 134, 188 cognition, 149 cognitive, 80, 126, 127, 128, 131, 132, 134, 135, 137, 138, 144, 146, 149 cognitive load, 135, 137, 144, 146 cognitive perspective, 134, 138 cognitive psychology, 127 cognitive science, 126, 131 cognitive system, 127, 128, 134, 137 collaboration, 78, 84, 126, 148, 155, 161, 163, 191 Collaboration, 155 collaborative learning experiences, 181 college students, x, 125 colors, 144 communication, 28, 29, 30, 39, 71, 72, 78, 79, 80, 81, 163, 166, 177, 178, 179, 182, 191, 203, 220 communities, 67, 72, 94, 128, 129, 150 community, vii, viii, 13, 14, 15, 20, 26, 67, 76, 81, 120, 129, 131, 158, 159, 178, 195, 213 commutativity, 199 compensation, 79 competence, 132, 135, 149 competition, 73 competitive advantage, 74 compiler, viii, 25, 28, 29, 30, 31 complement, 137 complex systems, x, 151, 171, 188 complexity, viii, ix, x, 7, 10, 13, 14, 15, 19, 20, 26, 34, 71, 87, 93, 94, 116, 132, 151, 152, 153, 157, 159, 170, 171, 176, 179, 180 compliance, 169 components, viii, x, 13, 14, 16, 17, 18, 19, 26, 71, 72, 78, 82, 83, 85, 86, 88, 151, 152, 153, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 171, 172, 173, 188, 196, 198 composites, 123 composition, 51, 131, 161 comprehension, 61, 66, 132 computation, 4, 7, 102, 109, 172, 209, 216, 218 computational fluid dynamics, 153 computer research, 194 Computer simulation, ix, 93, 96 computer software, 183 computer technology, 175, 183 computing, 164, 165, 172 concentrates, 85 concentration, 206 conception, 73, 78, 83, 176 concrete, 15, 134, 177, 209, 210 concurrency, 17 confidence, 72, 167, 211 configuration, 19, 86, 132, 138, 143, 144, 145, 165, 166, 167, 171, 190, 209, 211 conflict, 80, 81, 90, 158 Congress, 21
Index connectivity, 130, 134, 142, 143 constraints, ix, 29, 31, 34, 71, 72, 82, 93, 98, 101, 104, 115, 210, 214 construction, 26, 31, 128, 146, 160, 178, 203, 205, 206, 208, 209, 215 constructionist, 181, 183 constructivist, 126, 127, 128, 129, 137, 146, 150, 180 consultants, 48 consumer electronics, 69, 70 contiguity, 128 continuity, 104, 179 control, viii, 15, 18, 20, 25, 27, 29, 60, 72, 79, 82, 106, 118, 127, 142, 157, 181, 187, 188, 194 convergence, 104, 109, 112, 119 convex, 102, 107 cooperative learning, 219 coordination, ix, 69, 74, 79, 80, 81, 85 COPA, 91 Coping, 20 copper, 147 CORBA, 173 correlation, 83, 100 costs, 94, 159, 160, 169 coupling, 34, 97, 163 coverage, 169, 170, 189, 190, 200 covering, 14, 139 CPU, 4, 97, 103, 104, 120, 212, 213, 215, 217 CRC, 220 creative process, 126 creep, 96, 99 CRP, 164 crystallisation, 152 cues, 132, 133, 134, 137, 150 cultural factors, 128 culture, 51, 62, 77, 78, 79, 175 curiosity, 181 curriculum, 126, 147, 149 curriculum development, 126 curve-fitting, 97 customers, 72, 73, 74, 75, 76, 80, 86 Cybernetics, 21, 22 cycles, 18, 198, 199, 206, 207, 209
D data analysis, 153, 179 data structure, vii, viii, 3, 4, 5, 6, 7, 8, 9, 10, 12, 25, 30, 34 data transfer, 165 database, 28, 56, 58, 59, 136, 139, 141, 143, 145, 188, 191 database management, 58 decisions, 52, 57, 73, 79, 80, 86, 87, 103 decomposition, 47, 161, 164 defects, 167, 168 definition, x, 28, 48, 52, 53, 55, 56, 61, 71, 75, 76, 97, 98, 99, 112, 118, 151, 152, 155, 167, 179, 195, 199
225
deformation, 95, 96, 115, 121, 122 degrees of freedom, 30, 188 delivery, 73, 74, 156, 176, 177 demand, x, 94, 132, 175, 190 derivatives, 104, 105 designers, 126, 163 detection, 167, 190 detonation, 152 deviation, 105 discipline, 126 discontinuity, 107 discourse, 128, 178 discretization, 15, 96 disequilibrium, 99 dislocation, 115 displacement, 142 distributed computing, 172 distribution, 57, 78, 105, 171 diversity, 16, 19, 79, 82 division, 4, 79, 99 division of labor, 79 DSL, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 duplication, 161
E earth, 152 economics, 100, 182 Education, 137, 147, 148, 149, 150, 175, 183, 184 educational background, 179 educational research, 126 educational software, x, 126, 127, 128, 129, 134, 146, 175, 176, 177, 178, 180, 181, 182, 183, 184 educators, 126, 147, 175, 176 elastic deformation, 121 elasticity, 95 election, 210 electricity, 170 electromagnetic, 141 electron, 132 electrons, 130 email, 50 employees, 38, 39, 46, 79, 81 enantiomers, 143 encapsulated, 33 encoding, 7, 12, 134 energy, 95, 112, 113, 141 engagement, 126, 127, 128, 144 enterprise, 41, 73, 78, 83 entropy, 7, 10 environment, x, 17, 18, 58, 59, 73, 75, 78, 79, 81, 102, 128, 136, 137, 139, 140, 141, 150, 151, 152, 155, 162, 169, 170, 171, 189 epistemological, 126 epistemology, 147 equipment, 19 estimating, ix, 51, 52, 55, 93 ethane, 131, 133
226
Index
Europe, 70, 72, 88, 147, 148 Europeans, 72 evaporation, 206, 207 evolution, vii, viii, 13, 14, 15, 17, 19, 64, 66, 76, 82, 89, 96, 97, 102, 104, 109, 111, 115, 116, 119, 121, 133, 158, 220 evolutionary process, 102 exclusion, 210 execution, 17, 18, 30, 71, 167, 188, 189, 190, 191, 196, 197, 198, 200, 202, 203, 208, 209, 216 exercise, 60, 79 expert, 28, 39, 126, 128, 132, 134, 191 expertise, 28, 55, 76, 161, 163 explicit knowledge, 41 external environment, 80
F facilitators, 67 failure, 99, 156, 169, 214 family, 70, 72, 73, 74, 77, 82, 83, 84, 85, 87, 88, 89, 90, 91, 92, 161 family development, 72 family members, 82 faults, 18, 156, 168, 171 February, 68 feedback, 71, 155, 182 filters, 159, 210, 211 financial support, 120 finite differences, 119 finite element method, 96, 97, 104, 122 Finland, 83 first generation, 119 fitness, 64, 105 flexibility, x, 133, 151, 152, 169, 188 flow, vii, viii, 37, 38, 39, 40, 41, 42, 43, 54, 55, 56, 57, 58, 60, 62, 63, 67, 68, 96, 115, 212 fluid, ix, 93, 96, 152, 153 focusing, 126, 127, 130, 156 food, 152 forecasting, 74, 153, 156 foreign language, 51 France, 93 free rotation, 143 freedom, 30, 188 funding, 72, 73
G gas, 115 gender, 149 gene, vii, 3, 4 generalizations, vii, 3, 4 generation, viii, 25, 29, 105, 119, 158, 166, 171, 190, 191 generators, 159 genes, 102
genetic algorithms, 103, 104, 220 genre, 182 Germany, 64, 172 global warming, 152 goals, 14, 18, 38, 72, 73, 75, 76, 77, 79, 128 goods and services, 74 government, 61 grain, 115 graph, 153, 189, 195, 198, 199, 200, 202, 203, 204, 206, 207, 208, 215, 216 Greece, 125 grid computing, 156, 160, 164 grids, 152, 153, 171 grouping, 52, 161, 162 groups, 14, 19, 67, 72, 78, 79, 80, 84, 102, 138, 139 growth, viii, 13, 73 guidance, 87 guidelines, 74, 77, 78, 79, 81, 82, 84, 146, 153, 159, 161, 167, 173, 216
H handicapped, 18, 194 handling, 83, 191 Hawaii, 66 health, 64, 182 hearing, 27 heat, 152 height, 5 heterogeneity, viii, 13, 14, 15, 16, 17, 18, 19, 166 heterogeneous, 14, 17, 18, 188 heuristic, xi, 106, 193, 195, 196, 200, 201, 202, 205, 206, 207, 208, 209, 211, 218, 220 high school, 147 high temperature, 115 higher education, 183 high-level, 26, 31 hips, 75 Holland, 121 Hong Kong, 172 human, 26, 27, 28, 40, 49, 58, 64, 80, 102, 105, 127, 128, 130, 131, 161, 194 human brain, 26 human resource management, 49 human resources, 40 hybrid, 134 hybrids, 131 hypermedia, 183 hypothesis, 216
I IBM, 165 ice, 166 ICT, 179
Index identification, ix, 16, 40, 43, 50, 57, 58, 61, 62, 81, 86, 93, 94, 98, 101, 102, 103, 104, 110, 112, 116, 119, 120, 122, 161, 162 identification problem, 101, 102, 103, 104 identity, 73, 104, 138 ideology, 83 idiosyncratic, 129 images, 132, 134, 143, 144, 145 imagination, 176 imaging, 144 immunity, 102 implementation, viii, xi, 9, 13, 14, 15, 16, 17, 18, 25, 28, 29, 30, 34, 41, 63, 76, 80, 81, 82, 83, 85, 88, 127, 146, 161, 162, 163, 164, 165, 167, 177, 188, 189, 190, 193, 199, 211 in transition, 150 incentives, 159 inclusion, 16, 17, 60, 71, 206 incompatibility, 80 incompressible, 112, 113 indexing, vii, 3, 4, 6, 7, 8, 9, 10 industrial, viii, 15, 25, 29, 84, 85, 92, 94, 173 industrial application, 25 industrial experience, 92 industrialization, 75 industry, ix, 56, 61, 69, 70, 75, 82, 84, 86, 93, 96, 110, 159 inelastic, 121, 122 infinite, 196 information processing, 128 Information System, 64, 66, 67 information systems, 49, 56, 58, 66, 67, 194 information technology, ix, 69, 70 infrared, 141 infrastructure, 39, 42, 55, 59, 68, 72, 77, 81, 82, 85, 87, 152, 153, 158, 166, 167 inheritance, 18, 19, 31, 33, 51, 164 inherited, 19, 31 initial state, 71, 202, 208 innovation, 75, 76, 78, 79, 81, 89, 148, 151 inorganic, 139, 141 inspection, 83, 129, 167, 169 inspections, 167 institutionalization, ix, 69, 77 institutions, 83 instruction, 127, 128, 137, 146, 149, 179, 180, 181, 199 instructionism, 180 integration, x, 14, 17, 19, 38, 40, 61, 68, 80, 118, 125, 126, 134, 154, 157, 163, 166, 190 integrity, 154, 155, 156, 161, 165, 172 Intel, 136, 165 intelligence, 129, 133 intensity, 15 interaction, 78, 127, 130, 175, 181, 182, 191 interactions, 128, 134, 170, 182, 188, 189 interactivity, 182 interdisciplinary, 19
227
interface, 73, 74, 88, 122, 139, 140, 142, 144, 145, 157, 158, 161, 162, 164, 167, 171, 190 international standards, 65 Internet, 28, 170, 178, 182, 183 interoperability, 156, 159 interpretation, 75, 81, 87, 101, 128 interrelationships, 82 interval, 4, 103, 207 intervention, 194 interviews, 58 intrinsic, 14, 17, 28, 129 invariants, 95, 112, 196 inversion, 133, 138 investigative, 129 investment, 73, 76, 161, 163, 170 ions, 19, 75, 153, 188, 190, 191, 195, 203, 214, 219 IR spectroscopy, 141 Ireland, 63, 69 ISO, 92 isolation, 169 isotropic, 112, 115, 121 iteration, 102, 104, 106, 109, 119
J Jacobian, 104, 119 Jacobian matrix, 104, 119 January, 172, 219, 220 Japan, 3 Java, viii, 25, 31, 33, 34, 35, 170, 194, 195, 220 job scheduling, 172 jobs, vii, 38, 57, 79 Jordan, 148 justification, 75, 86
K kernel, 211 kinematics, 95 knowledge construction, 146 knowledge transfer, 41, 54
L language, viii, x, 16, 25, 26, 27, 28, 29, 30, 31, 33, 34, 44, 51, 52, 59, 86, 125, 129, 130, 131, 133, 146, 163, 169, 170, 178, 181, 191, 194, 210 large-scale, 161 lattice, 115 law, 110, 118 laws, 121, 129, 130, 132, 146 lead, x, 94, 95, 103, 104, 108, 110, 118, 120, 138, 153, 164, 169, 170, 188 leadership, 62, 181 learners, 127, 128, 129, 132, 134, 135, 137, 138, 144, 146, 147, 176, 177, 179, 180, 181, 182
228
Index
learning, x, 67, 75, 78, 79, 80, 81, 102, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 146, 148, 149, 150, 175, 176, 177, 178, 179, 180, 181, 182, 183 learning environment, 128, 129, 137, 146, 176, 177 learning outcomes, 176, 177 learning process, 181, 183 learning styles, 136, 176 life experiences, 150 lifecycle, 15, 16, 19, 87, 164, 167, 168, 169, 170, 171, 216 lifetime, 152 limitations, 27, 31, 38, 63, 152, 218 linear, 9, 19, 95, 104, 107, 116, 143, 145, 196, 220 linear function, 116 links, 17, 18, 60, 170 Linux, 152, 211 listening, 190 literacy, 126, 128, 179 location, 47, 50, 63, 190 logging, 18 London, 69, 184, 219 long-term, viii, 13, 14, 16, 18, 73, 75, 78, 79 loyalty, 74, 75
M machines, 187, 201, 209, 211 mainstream, 159 maintenance, vii, viii, ix, 13, 14, 18, 19, 37, 39, 40, 41, 42, 57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 86, 92, 154, 216 maintenance tasks, 65 management, vii, viii, ix, 29, 37, 38, 39, 41, 43, 56, 57, 58, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 89, 90, 157, 166, 171, 172 manipulation, 134 manufacturing, 157 mapping, 65, 81 market, 26, 70, 72, 73, 74, 75, 76, 80, 85, 86, 176, 177 market segment, 70, 72, 73, 74, 75, 76, 80, 85, 86 market share, 70, 74, 80 marketing, 73, 74, 76, 89 marketing strategy, 73 marriage, 149 Marx, 149 Maryland, 67, 187 Massachusetts, 121, 181, 184 Massachusetts Institute of Technology, 21, 121, 181, 184, 219 material resources, 58 mathematics, 129, 133, 149 mathematics education, 149 matrix, 98, 104, 105, 119, 123 MDA, 27 measurement, 40, 58
measures, 76, 189, 195, 212, 215 media, 127, 135, 136, 137, 138, 139, 144, 146, 149, 183, 184 medicine, 129, 152 Mediterranean, 20 melting, 130 memorizing, 128 memory, xi, 4, 5, 26, 39, 41, 128, 152, 193, 194, 195, 203, 206, 207, 210, 211, 212, 213, 214, 216, 217, 218 mental image, 132, 138 mental model, 133, 138 mental representation, 140 messages, xi, 29, 31, 33, 34, 127, 187, 188 metals, 95, 115, 121, 123 metaphor, 176, 177, 178, 179, 180 metaphors, 178, 184 metric, 201 Mexican, 61, 67 Mexico, 37, 63 Microsoft, 15, 21, 29, 172 middleware, 15, 20, 190 migration, 74 military, 153 mimicking, 191 mining, 205 Ministry of Education, 218 mirror, 129, 143, 144 misconception, 145 MMA, 102 mobile phone, 75 modality, 128 modeling, 41, 42, 43, 44, 45, 66, 71, 84, 85, 86, 87, 91, 125, 126, 147, 148, 149, 154, 160, 161, 172 models, ix, x, xi, 16, 17, 27, 39, 40, 41, 42, 43, 44, 48, 53, 58, 61, 64, 71, 78, 83, 84, 85, 93, 94, 95, 96, 98, 99, 104, 105, 106, 111, 112, 113, 114, 115, 120, 121, 122, 127, 130, 131, 132, 133, 137, 138, 139, 140, 144, 145, 146, 147, 153, 154, 161, 164, 165, 189, 191, 193, 195, 210, 211, 212, 213, 214, 216, 217, 218, 219 modules, 15, 16, 17, 18, 19, 58, 59, 97, 139 modulus, 116 molecular medicine, 152 molecular structure, x, 125, 131, 133, 134, 135, 136, 137, 138, 140, 143, 144, 147 molecules, 130, 131, 133, 134, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146 money, 170, 194 monomer, 144, 145, 146 monomer molecules, 144, 145 MOS, 29, 30, 33, 35 motion, 141 mouse, 190 movement, 79, 130, 140, 141, 206 MSC, 122 multidisciplinary, 126, 148 multimedia, 127, 128, 149, 178, 182 multiplication, 4, 181
Index multiplicity, 118, 120 muscular tissue, 113 mutation, 105, 108
N NASA, 20, 172, 219 natural, 102, 105 natural evolution, 105 natural selection, 102 negotiating, 80 Netherlands, 148, 149, 150 network, xi, 85, 103, 122, 161, 172, 187, 188, 189, 190 neural network, ix, 94, 95, 103, 122 neural networks, ix, 94, 95, 103 neurons, 103 New Jersey, 147, 148, 172 New York, 21, 64, 92, 122, 147, 149, 183 NEXUS, 15, 20, 21 nodes, 6, 41, 42, 152, 166, 198, 202, 204, 205, 206, 207, 208 non-linear, x, 94, 95, 101, 102, 104, 107, 110, 120, 123 normal, 80, 105, 136, 140, 141, 142, 147 norms, 51 Norway, 65 novelty, 149 nuclear, 110, 152, 194 nuclear power, 194 nuclear power plant, 194
O objectivity, 103, 109 observations, 38, 57, 58, 59, 73, 79, 96 off-the-shelf, 17, 157 online, 9, 182 openness, 79 operating system, 19, 58, 75, 152, 183, 190 operator, 106, 107, 170, 201, 204 opposition, 103 optical, 143 optical activity, 143 optimization, ix, xi, 94, 95, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 114, 115, 117, 118, 119, 120, 121, 122, 193, 195, 203, 218, 219 optimization method, ix, 94, 101, 102, 103, 104, 107, 108, 111, 120, 122 oral, 128 organic, 139, 141, 143 organization, vii, ix, 37, 39, 41, 48, 51, 56, 59, 61, 67, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 86, 89, 126, 132 organizational behavior, 77, 80 organizational culture, 78, 79
229
organizations, vii, ix, 38, 39, 40, 41, 48, 56, 61, 62, 64, 65, 66, 69, 70, 74, 76, 78, 79, 80, 81, 87, 89, 90 orientation, 26, 75, 76, 133, 140, 143, 144, 176, 182 overload, 57, 59 OWL, 28
P Pacific, 91 packaging, 40, 161 paper, 25, 26, 27, 29, 34, 44, 50, 63, 64, 65, 66, 67, 68, 195 paradigm shift, x, 151, 152 parallelism, 26 parameter, ix, 93, 94, 96, 97, 100, 101, 102, 104, 112, 113, 119, 122, 206, 207 parents, 177 Paris, 122 particle physics, 152 particle swarm optimization, 102 particles, x, 125, 128, 129, 130, 146 partition, 4, 9 passive, 121, 128 payback period, 73, 79, 80 PDAs, 187 pedagogy, 126, 128 peer, 167 peer review, 167 penalties, 104 penalty, 101, 104, 105, 209 Pennsylvania, 66 perception, 80, 130, 132, 134, 147 perceptions, 178 performance, ix, x, xi, 8, 18, 30, 39, 54, 68, 74, 75, 76, 77, 86, 94, 95, 98, 102, 103, 107, 108, 112, 133, 152, 153, 159, 170, 172, 176, 180, 194, 195, 218 personal, 39, 55, 57, 60, 80 Perth, 183, 184 perturbation, 109, 119 pheromone, 203, 205, 206, 207, 208, 218 philosophers, 211 philosophical, 183 philosophy, 16, 129, 150, 177, 178 phone, 187 physics, 129, 132, 147, 152 pig, 113, 115 planning, 56, 73, 74, 76, 78, 79, 81, 190 plasma, 152 plasma physics, 152 plastic, x, 95, 96, 110, 115, 122 plastic deformation, 96, 115, 122 plastic strain, 110, 115 plasticity, 13, 95, 96 platforms, 15, 17, 18, 30, 187 play, ix, 38, 40, 55, 69, 75, 77, 86, 126, 131, 158, 175
230
Index
plug-in, 136 Poisson, 116 Poisson ratio, 116 Poland, 220 polarization, 143 politics, 79 polymer, 145, 146 polymers, 136, 144, 145, 146 polynomials, 9 population, 70, 71, 88, 105, 106, 108, 119 portfolio, 76 Portugal, 67, 93, 123 positive correlation, 133 postmortem, 40, 62, 65 power, 7, 29, 34, 35, 79, 101, 182, 194 pragmatic, vii, viii, 25, 27, 30, 34 predicate, 201 predictability, 165 prediction, 92, 112, 121 pre-existing, 14, 71 preference, 207, 209 preprocessing, 5, 6, 7 prevention, 156 primitives, 44 prior knowledge, 127 proactive, 73 probability, 103, 205, 206, 207, 209, 216, 217 probe, 10 problem solving, 155, 181, 182 problem-solving task, 182 production, 70, 71, 72, 73, 75, 76, 126, 163 productivity, 27, 58, 70, 82, 156 profit, 63, 73, 76 profitability, 74 profits, 73 program, viii, xi, 17, 25, 27, 30, 31, 34, 40, 61, 82, 137, 139, 140, 142, 144, 167, 168, 170, 181, 188, 193, 194, 195, 196, 199, 200 programming, viii, 13, 15, 16, 17, 18, 19, 25, 26, 30, 34, 52, 58, 59, 159, 163, 181, 194, 220 programming languages, 15, 17, 26, 52, 194 promote, 68, 136, 137, 144, 181 pronunciation, 181 property, vii, xi, 50, 96, 193, 194, 195, 196, 197, 198, 199, 201, 202, 203, 207, 208, 209, 210, 214, 216, 217, 218, 219 proposition, 197, 201, 216 protocol, 188, 189, 190, 210, 211 protocols, xi, 187, 188, 189, 220 prototype, 180, 181 prototyping, 85, 92 psychology, 90 psychometric approach, 143 public, 17, 210, 211
Q quality assurance, 188
quantum, 26, 126, 141, 153 quantum mechanics, 141 query, 4, 5, 6, 7, 8, 9 questionnaires, 84
R race, 143 radiation, 141 radical, 90 Raman, 4, 10, 11, 12, 141, 142 Raman spectroscopy, 141 random, 205 range, x, 9, 75, 131, 140, 141, 171, 175, 182 rapid prototyping, 31 reaction mechanism, 138 real numbers, 203 real time, 169 reality, 97, 131, 145, 176 reasoning, 66, 149, 150, 178 recognition, 87, 138 recombination, 102 rectification, 150 reduction, ix, xi, 6, 39, 69, 109, 111, 194, 195, 196, 199, 200, 210, 215, 216, 218, 219, 220 reductionism, 129 refining, 86 reflection, 133, 138, 150, 177, 181, 182 registries, 56 regression, 190, 191 regular, 45, 60 regulations, 170 relationship, 44, 47, 80, 135, 141, 143, 144, 149, 195 relationships, 45, 48, 50, 52, 53, 62, 75, 78, 79, 90, 130, 134, 135, 138 relative size, 134 relevance, 38 reliability, 72, 86, 156, 165, 167, 169 remote sensing, 153 repair, 168 replication, 17 reproduction, 102 research, ix, x, 4, 9, 14, 16, 19, 20, 26, 57, 60, 61, 62, 70, 75, 76, 77, 81, 83, 86, 88, 125, 126, 128, 129, 131, 133, 146, 148, 151, 152, 153, 156, 158, 166, 175, 189, 194, 217 research and development, 75, 126, 183, 189 researchers, viii, ix, xi, 16, 20, 26, 37, 40, 70, 72, 78, 84, 86, 87, 126, 128, 134, 135, 137, 187, 188, 189, 191, 210 resistance, 78, 80, 81, 96, 115 resolution, 94, 101, 120 resource management, 49, 72, 82, 166 resources, xi, 29, 38, 40, 43, 58, 72, 74, 76, 77, 82, 128, 152, 153, 155, 156, 166, 177, 181, 182, 190, 193, 195, 210, 214 responsibilities, 72, 76, 78, 81 retention, 73
Index returns, 161, 207, 208 reusability, x, 15, 16, 79, 83, 85, 86, 88, 151, 152, 171 risk, 72, 74 risks, 82 road map, 81 roadmap, 75 robotic, vii, viii, 13, 14, 15, 16, 17, 18, 19, 20 robotics, 13, 14, 15, 17, 18, 19, 20, 21 robustness, 95, 120, 190 rods, 134 ROI, 158, 161, 162, 163 rotations, 95, 138, 143, 150 routines, 33, 170, 190 RTI, 15
S safety, 156, 169, 195, 196, 198, 203, 208, 214, 218, 219 salts, 130 sample, 43, 44, 45, 46, 50, 51, 52, 53, 54, 212 Samsung, 75 sand, 6 SAS, 179 satisfaction, 18, 74, 78 savings, 160 scalability, x, 88, 151, 152, 153, 156, 166, 189, 191 scalable, 153, 156, 165, 166, 167 scalar, 115 scaling, 105, 106 scheduling, 18, 56, 72, 166, 201 schema, 49, 50, 51, 52 Schmid, 73, 88, 89, 90 school, x, 125, 147, 177 science education, 129, 148 scientific computing, 152, 172 scientific progress, 26 scientists, 126, 152, 155 scripts, 28 search, xi, 6, 38, 39, 52, 103, 104, 105, 106, 111, 120, 139, 172, 190, 193, 194, 195, 196, 197, 198, 199, 200, 202, 203, 204, 206, 207, 208, 209, 210, 211, 212, 214, 216, 217, 218, 220 searches, 9, 197, 207, 208 searching, 97, 190, 195, 198, 201, 202, 203, 207, 209, 216 second language, 31 secondary education, 136 security, x, 29, 57, 79, 151, 166 sediment, 130 selecting, 42, 49, 127, 207, 209 self-assessment, 161 semantic, 210 semantics, 84, 171 sensing, 153 sensitivity, 99, 104 Sensitivity Analysis, 104
231
sensors, 15 sentences, 178 separation, 19, 31, 34 series, 4, 141, 177 services, 16, 17, 28, 56, 58, 74, 75, 152, 162, 165, 166 shape, 41, 121, 122, 126, 130, 133, 136 shares, 74 sharing, 61, 70, 75, 79, 80, 82, 87 shear, 116, 118, 119 sign, 74, 116, 212, 215 signals, 99, 188, 191 silver, 26 simulation, ix, 29, 93, 94, 96, 97, 104, 112, 122, 153, 169, 187, 188, 190 simulations, ix, 84, 93, 96, 123, 182 Simultaneous Localization and Mapping, 14 sites, 170, 182 skeletal muscle, 121 skills, 19, 39, 40, 51, 77, 126, 131, 132, 134, 135, 147, 150, 179, 180 small firms, 67 Smithsonian, 148 Smithsonian Institute, 148 smoke, 190 social construct, 149 social context, 126 software, vii, viii, ix, x, xi, 13, 14, 15, 16, 17, 18, 19, 20, 21, 25, 26, 28, 29, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 45, 48, 49, 51, 54, 56, 57, 58, 59, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 100, 102, 122, 125, 126, 134, 135, 136, 146, 151, 152, 153, 154, 156, 157, 159, 160, 161, 162, 163, 165, 166, 167, 168, 169, 170, 171, 172, 173, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 187, 188, 189, 190, 191, 193, 194, 195, 196, 201, 216, 217, 218, 220 solutions, xi, 14, 15, 39, 54, 64, 66, 87, 103, 143, 159, 193, 195 sounds, 127, 182 space exploration, 210 Spain, 13, 20, 21, 37, 63, 193 spatial, x, 125, 129, 130, 133, 134, 135, 142, 146, 149 spatial ability, 133 specific knowledge, 44, 46, 47, 62 spectroscopy, 138, 141 spectrum, 141 speed, x, 95, 140, 151, 152, 153, 171 SPSS, x, 175, 179 SQL, 28 stabilization, 98, 102 staffing, 78 stages, xi, 14, 16, 18, 19, 42, 48, 95, 97, 102, 118, 135, 157, 168, 170, 179, 180, 190, 193, 195, 216, 217 stakeholder, 73, 83
232
Index
stakeholders, 75, 85, 87, 154 standard deviation, 105, 211, 212 standardization, 16 standards, 20, 65, 159, 170 starvation, xi, 193, 194, 196 statistical analysis, 179 statistics, 100 steel, 110 stochastic, 105, 214, 218 storage, 4, 10, 57 strain, 95, 110, 112, 113, 115, 119 strains, 95, 96, 98 strategic, 56, 73, 74, 75, 76, 78, 79, 80, 86 strategic planning, 73, 76, 78, 79 strategies, vii, viii, 37, 38, 40, 41, 42, 60, 63, 66, 67, 74, 80, 81, 106, 108, 109, 110, 111, 112, 114, 117, 118, 119, 120, 121, 133, 160 strength, x, 16, 94, 103 stress, 84, 98, 112, 115, 119, 170 stress level, 170 stress-strain curves, 111, 118 structural characteristics, 146 structuring, 72 students, x, 19, 125, 126, 128, 129, 130, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 146, 147, 176, 177, 178, 182 subjective, 51, 87 substances, 129 subtraction, 4 supercomputers, 152, 166 superconducting, 122 superiority, 108 suppliers, 88, 154 swarms, 102 switching, ix, 69, 81, 106 symbolic, x, 125, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 139, 141, 145, 146, 150 symbols, 14, 130, 131, 132 symmetry, 136, 137, 138, 139, 140, 142, 143, 194, 218, 219 synchronization, 17, 190, 219 synchronous, 197, 202 synergistic, 128 syntax, viii, 25, 30, 31, 34, 171 synthesis, 145, 146, 219 systems, vii, viii, x, xi, 13, 14, 15, 16, 17, 37, 38, 39, 41, 42, 49, 54, 55, 56, 57, 58, 59, 60, 63, 65, 66, 67, 68, 70, 71, 72, 78, 87, 88, 90, 94, 123, 129, 131, 135, 136, 151, 152, 153, 154, 156, 157, 158, 159, 161, 164, 165, 166, 167, 169, 170, 171, 172, 188, 189, 191, 193, 194, 195, 199, 202, 203, 207, 212, 214, 216, 218, 219
T tactics, 87 tangible, 20 targets, 74, 75
taxonomy, 48, 50 teachers, 176, 177, 178, 179, 181, 182, 184 teaching, x, 126, 141, 147, 148, 175, 176, 177, 178, 179, 180, 181, 182, 183 teaching process, 176 team members, 31 technology, 28, 29, 30, 33, 35, 59, 62, 64, 66, 70, 71, 73, 75, 78, 80, 81, 82, 92, 126, 129, 136, 147, 164, 175, 182, 183 telecommunication, 161 telecommunications, ix, 69, 70, 152 telephony, 28 temperature, 96, 115, 116, 141 temporal, xi, 191, 193, 194, 196, 197, 201, 219, 220 tensile, 112, 118 tertiary education, x, 125, 136, 146 test data, 171, 191, 220 textbooks, 135, 137, 141 theory, x, 76, 77, 95, 102, 121, 126, 127, 128, 130, 131, 134, 138, 147, 149, 175, 176, 214, 217 Thessaloniki, 125 thinking, 83, 126, 133, 135, 137, 140, 143, 144, 146, 150 threatening, 170 threats, 156 three-dimensional, 116, 132, 134, 135, 137, 142, 143, 147, 150 three-dimensional model, 147 three-dimensional representation, 132 three-dimensional space, 116 threshold, 95, 207 time, vii, viii, ix, x, xi, 3, 4, 5, 6, 7, 8, 9, 13, 14, 16, 17, 18, 19, 26, 30, 31, 32, 34, 37, 39, 43, 45, 47, 49, 51, 52, 53, 54, 55, 57, 59, 62, 69, 70, 72, 73, 74, 80, 83, 86, 97, 103, 111, 120, 126, 127, 134, 140, 144, 151, 152, 153, 159, 168, 169, 170, 171, 190, 193, 195, 196, 210, 211, 212, 214, 215, 216, 217, 218 time consuming, 62, 126 time frame, 31, 32 timing, 29, 74, 76 tissue, 113, 115 Tokyo, 3 tolerance, 17, 156 tracking, 72, 171 trade, 81 trade-off, 8 traffic, 188 training, 40, 72, 77, 103, 163, 164, 181 transfer, 53, 54, 56, 57, 130, 140, 165 transformation, 82, 132, 135, 136 transformations, x, 28, 125, 130, 131 transition, 116, 141, 161, 191, 199, 200, 203 transitions, 90, 141, 199, 200, 201, 202 translation, x, 175, 197, 210 transmission, 29 traps, 89 trees, 6, 10, 11, 12, 190 trend, ix, 18, 20, 69, 86, 88
Index triggers, 191 trust, 82 turnover, 39 two-dimensional, 116, 132 two-dimensional space, 116 two-way, 149
233
visualization, x, 125, 126, 128, 129, 130, 131, 133, 134, 135, 136, 137, 142, 144, 146, 148, 150, 187, 188 visuospatial, 126, 131, 133, 135, 137, 138, 140, 143, 144, 146, 150 Vitter, 10, 11 voice, 28
U ubiquitous, xi, 187 UML, 26, 27, 35, 44, 84, 85, 91, 161, 162, 171, 172, 173 undergraduate, 139, 140, 149 uniaxial tension, 112, 116 United Arab Emirates, 69 United States, 72 universal gas constant, 115 universe, 101, 103, 104, 105, 106, 111
V validation, 14, 15, 16, 18, 88, 158, 166, 167, 220 validity, 197 values, 5, 96, 97, 98, 99, 104, 108, 110, 114, 115, 117, 191, 197, 203, 207, 208, 209, 212, 217 variability, 76, 78, 79, 81, 82, 83, 84, 85, 86, 91, 161, 171 variable, 9, 83, 85, 86, 96, 105, 115, 116, 118, 121, 190, 191, 197 variables, xi, 96, 98, 102, 105, 112, 115, 191, 193, 196, 197, 200, 201, 205, 209 variation, 55, 72, 82, 84, 85, 86 vector, vii, 3, 4, 5, 6, 7, 8, 9, 98, 102, 104, 105, 109 vehicles, 29 vibration, 136, 140, 141, 142 virtual world, 176 visible, 34, 135 vision, 29, 73, 75, 76, 79
W war, 62, 66 Warsaw, 220 web, xi, 19, 60, 68, 136, 162, 170, 182, 187, 188, 189, 190, 210 Web Ontology Language, 28 web sites, 170, 182 web-based, 181, 182 windows, 146 winning, 129 word processing, x, 175, 179 work environment, 39 workers, 38 workflow, 41, 45, 46, 53, 153, 157 working memory, 128 workplace, 54 workspace, 59 writing, 46, 159, 168
X XML, 28, 167
Y yield, 31, 161