Advances
in COMPUTERS VOLUME 41
Contributors to This Volume VICTORR. BASILI ALANW. BROWN DAVIDJ. CARNEY SHU-WIE F. C...
31 downloads
1126 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 41
Contributors to This Volume VICTORR. BASILI ALANW. BROWN DAVIDJ. CARNEY SHU-WIE F. CHEW THOMAS M. CONTE DICKHAMLET WEN-MEI W. H w AVRAHAMLEFF CALTON PV JOCK A. RADER H. DIETERROMBACH MARTIN VERLAGE
Advances in
COMPUTERS EDITED BY
MARVIN ZELKOWITZ Department of Computer Science University of Maryland College Park,Maryland
VOLUME 41
ACADEMIC PRESS San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper.
@
Copyright 0 1995 by ACADEMIC PRESS,INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NWl 7DX International Standard Serial Number: 0065-2458 International Standard Book Number: 0-12-012141-7 PRINTED IN THE UNlTED STATES OF AMERICA 95 96 9 7 9 8 99 0 0 B C 9 8 7 6
5
4
3 2 1
Contents CONTRIBUTORS..
PREFACE .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix xi
Directions in Software Process Remarch
.
H Dieter Rombach and Martin Verlrge
. . . . . . . . . . . . . . . 2. Product Engineering Processes. . . . . . . . . . 3. Process Engineering Processes. . . . . . . . . . 4 . A Framework for Integrated Product and Process Engineering . 5. Software Engineering Process Representation Languages . . 1. Motivation
6. Support for Software Engineering Processes . . 7. Future Directions in Software Process Research . 8.Summary . . . . . . . . . . . References . . . . . . . . . . .
. . . . . .
. .
. .
. . . . . .
. . . . . . . . . . . . . . . . . .
2 10 12 19 28 50 54 57 59
The Experience Factory and Its Relationship to Other Quality Approaches
.
Victor R Badli
1. Introduction . . . . . . . . . . . . 2. Experience Factory/Quality Improvement Paradigm . 3 A Comparison with Other Improvement Paradigms . 4 . Conclusion . . . . . . . . . . . . References . . . . . . . . . . . .
.
. . . . . . . . . . . . . . .
. . . . . . . . . .
66 67 75 80 81
CASE Adoption: A Process. Not an Event
.
Jock A Rader
1. Introduction . . . . . . . . . . . . 2. The Keys to Successful CASE Adoption . . . . 3. Planning and Preparing for CASE Adoption . . . 4 . CASE Adoption Case Study . . . . . . . 5 Awareness: The First Phase . . . . . . . 6. Evaluation and Selection . . . . . . . . 7 . Supporting First Operational Use . . . . . . 8. Expansion and Evolution: Second Victim and Beyond
.
V
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
84 85 93 106 113 124 137 149
vi
CONTENTS
.
9 CASE Adoption Summary. References . . . . .
. . . . . . . . . . . . 153 . . . . . . . . . . . . 155
On the Necessary Conditions for the Composition of Integrated Software Engineering Environments
. . 1. Introduction . . . . . . . . . . . . . . . 2. A Three-Level Model of SoftwareEngineering Environments . 3. The Mechanisms and Semantics of Integration . . . . . 4. Integration in Practice: Process Aspects of Integration . . . David J Carney and Alan W Brown
. .
5 The Conditions Necessary for Integration 6 Toward Engineered Environments. . . 7 Summary and Conclusions . . . . . References . . . . . . . . .
.
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . .
. . . . . . . .
158 162 165 172 179 185 186 188
Software Quality. Software Process. and Software Testing Dick Hamlet
. Introduction . . . . . . . . . . . . . . . . . 192 . Testing Background and Terminology . . . . . . . . . 198 . Testing to Detect Failures . . . . . . . . . . . . . 201
1 2 3 4. 5
. . . 6. Dependability . . . . 7. Conclusions . . . . . References . . . . . Testing for Reliability
. Comparing Test Methods
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 211 . 217 . 220 . 225 . 227
Advances in BenchmarkingTechniques: New Standards and Quantitative Metrics
. . 1. Introduction . . . . . . . . . . . . . . . . . 232 2. Summary of Popular Benchmark Suites . . . . . . . . . 235 3. Benchmark Characterizations . . . . . . . . . . . . 242 Thomas M Conte and Wen-mei W Hwu
.
4 Final Remarks References .
. . . . . . . . . . . . . . . . 251 . . . . . . . . . . . . . . . . 251
vii
CONTENTS
An Evolutionary Path for Transaction Proamsing Systems
.
Calton Pu. Anaham Ldf. and Shu-Wlo F Chon
1. Introduction . . . . . . . . 2. Classification Model and Terminology 3 . Taxonomy Instantiations . . . . 4. Systems Evolution . . . . . . 5 . Beyond Traditional TP . . . . . References . . . . . . . .
. . . . . . . . .
256
AUTHOR INDEX .
. . . . . . . . . . . . . . . .
297
SUBJECT INDEX .
. . . . . . . . . . . . . . . . 303
CONTENTSOF VOLUMESIN THISSWES
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 257 . 260 . 280 . 287 . 295
. . . . . . . . . .
315
This Page Intentionally Left Blank
Numbers in parentheses indicate the pages on which the authors' contributions begin.
V I ~ OR. RBASILI(65) Institute for Advanced Computer Studies and Department of Computer Science, University of Maryland, College Park, Maryland 20742 ALAN W. BROWN(157) Sofhvare Engineering Institute, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213 DAVIDJ. CARNEY ( 157) Sofhvare Engineering Institute, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213 SHU-WIE F. CHEN(255) Department of Computer Science, Columbia University, New York, New York 10027 THOMAS M. CONTE(23 1) Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina 29208 DICKHAMLET (191) Department of Computer Science, Portland State University, Portland, Oregon 97207 WEN-MEI W. H w (231) Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 AVRAHAM LEFF'(255) Department of Computer Science, Columbia University, New York, New York 10027 CALTON Pu2(255) Department of Computer Science, Columbia University,New York, New York 10027 JOCK A. RADER (83) Radar and Communications System, Hughes Aircrafr Company, LQS Angeles, California 90009 H. DIETERROMBACH (1) Fachbereich lnformatik, UniversitUt Kaiserslautern, 67653 Kaiserslautern, Germany MARTIN VERLAGE ( 1) Fachbereich Informatik, Universitdt Kaiserslautern, 67653 Kaiserslautern, Germany
I
Present address: IBM Thomas J. Watson Research Center, Yorktown Heights, New Yo& 10958.
* Resent address: Department of Computer Science and Engineering. Oregon Graduate Institute, Portland, Oregon 97291.
ix
This Page Intentionally Left Blank
Preface This series,Advances in Computers, has been published since 1960.It provides authors a means to develop in-depth treatments of important topics in the field of computer science that are too complex for the typical journal article or magazine, yet do not warrant full-length books. Topics covered in these volumes generally fall into two broad categories: (1) those that survey a maturing specialty of computing by providing a survey of the various approachesto the problem, and (2) those that represent new ideas on how to solve today’s computing problems. This volume represents a departure from previous volumes. My goal is to have every second or third volume in this series emphasize one particular application domain of computing in order to provide a detailed analysis of the problems in that domain and to provide alternative solutions to those problems. The issue of computer-based environments will be discussed in many of the seven chapters included in this volume. An environment represents the set of tools executing on a computing system, which are needed to solve problems in some application domain, as well as the underlying infrastructure, which generally consists of file system, user interface processes (e.g., keyboard and display management), communication, and mechanisms for starting and switching among programs in a computer system. The task of the system designer is to understand the needs of these tools for those infrastructure services and to understand the processes needed to use these tools effectively. Therefore, most of the papers in this volume address one of two questions. (1) What are the set of services needed to develop the automation component, consisting of tools and infrastructure services, in an environment? (2) What are the software processes needed to support the activities of computer professionals who use these systems? In the first paper, “Directions in Software Process Research,” H. Dieter Rombach and Martin Verlage discuss ways in which large-scale software systems are affected by the set of processes used in their development. While traditionally process decisions are considered within the domain of system management, today there is considerable interest in automating many of them. The authors survey many of the existing and proposed software process languages and various systems that may be used to help manage the set of activities which must be undertaken to complete a large project. Victor Basili, in Chapter 2, “The Experience Factory and Its Relationship to Other Quality Approaches,” refines the process model discussion of Chapter 1. Professor Basili proposes his own Experience Factory model for management of the development activities of a project. The Experience Factory is based on a Quality Improvement Paradigm for managing the activities of a software
xi
xii
PREFACE
development and uses the GoaldQuestionsiMetrics model in determining which data must be collected and evaluated as part of the development process. Basili also relates this model to other process models, including the Plan-Do-CheckAct of Shewart and Deming, Total Quality Management, and the Capability Maturity Model of the Software Engineering Institute. In Chapter 3, Jock Rader writes about “CASE Adoption: A Process, Not an Event.” While we would like the perfect environment, as defined above, Rader takes a pragmatic view and gives helpful suggestions to organizations wanting to automate many of their processes and unsure of which tools to purchase. His major message is that one does not simply buy tools from a catalog; the adoption of a new way of using computer automation requires considerable time and effort to achieve automation effectively. While Rader discusses the use of CASE tools for solving problems in any application domain, David Carney and Alan Brown, in “On the Necessary Conditions for the Composition of Integrated Software Engineering Environments,” restrict their discussion to software engineering environments, environments whose application domain is the production of software systems, which will be used by others. They are concerned about the integration problem: how the various tools in an environment are able to communicate and to pass information among themselves. They base their model of an environment on a three-level structure, a set of processes (e.g., goals) that must be achieved, a set of services that the environment provides, and a set of mechanisms for implementing those services. This work developed as a result of their previous work on reference models for software engineering environments. Dick Hamlet, in “Software Quality, Software Process, and Software Testing,” provides a good summary of the current state of the art in program testing. However, Professor Hamlet goes beyond merely cataloging the available set of testing strategies. Based on the process model used for software development, different testing approaches may be more effective. It is his goal to relate the various testing regimens to those processes for which they will be most effective. In Chapter 6, “Advances in Benchmarking Techniques: New Standards and Qualitative Metrics,” Thomas Conte and Wen-mei W. Hwu discuss a companion issue to building an environment-How well will it perform? The standard measure is the benchmark. Traditional concepts, such as processor speed (e.g., MIPS or millions of instructions per second), are very imprecise since each computer architecture is tailored to execute specific classes of instructions rapidly and others more inefficiently. Hardware concepts such as cache memory or multiple processors greatly affect instruction execution speed. Instead, “typical” programs are run as “benchmarks” for a given application domain. However, what is a typical program that does not unfairly favor one architecture over another? This is the basic problem in developing benchmarks, and Conte and Hwu discuss several approaches to this issue.
PREFACE
xiii
In the final chapter, “An Evolutionary Path for Transaction b e s s i n g Systems,” Calton Pu, Avraham Leff, and Shu-Wie F. Chen discuss a specific class of system: transactionprocessing systems.Unlike pmgrams in other environments that are executed in order to provide an answer (e.g., compile a program to produce the executable version), a transaction processing system is designed to execute continuously in order to supply answers whenever needed. These systems, such as reservation systems, typically involve constant database access. Important properties of such systems are recovery (information can be retrieved in the case of system failure) and concurrency (multiple transactions can be processed simultaneously). This paper discusses these issues and others in the development of transaction processing systems. I thank the authors for contributing their expertise to this volume. Each paper required significant time and effort on their part. I greatly appreciate the cooperation I have received from them while editing this volume.
This Page Intentionally Left Blank
Directions in Software Process Research H. DIETER ROMBACH AND MARTIN VERLAGE Fachbereich lnformatik Universitdt Kaiserslautern Kaiserslautern, Germany
Abstract Developing and maintaining software systems involves a variety of highly intemlated activities. The discipline of software engineering studies processes of both product engineeringand process engineering. Roduct engineering aims at developing software products of high quality at reasonable cost. Process engineering in contrast aims at choosing those product engineering processes appropriate for a given set of project goals and characteristics as well 88 improving the existing knowledge about those processes. Explicit models of both types of processes help a softwan development organizationto gain competitive advantage. This paper motivates the need for explicit process models, surveys existing languages to model processes, discusses tools to support model usage, and proposes a research agenda for future softwan process research.
1. Motivation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Software Business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Software Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Software Engineering Processes . . . . . . . . . . . . . . . . . . . . . . . 2. Product Engineering M e s s e s . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Life-Cycle (Coarse-Grain) Models . . . . . . . . . . . . . . . . . . . . . . 2.2 Development (Fine-Grain) Rocess Models . . . . . . . . . . . . . . . . . 3. M e s s Engineering Rocesses . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Improvement Rocesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Measurement M e s s e s . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Modeling and Planning Processes . . . . . . . . . . . . . . . . . . . . . . 3.4 Comprehensive Reuse Processes . . . . . . . . . . . . . . . . . . . . . . . 4. A Framework for Integrated Mutt and m e s s Engineering . . . . . . . . . . . 4.1 Product and Rocess Engineering. . . . . . . . . . . . . . . . . . . . . . . 4.2 Requirements for Process Engineering . . . . . . . . . . . . . . . . . . . . 4.3 Requirements for Roduct Engineering . . . . . . . . . . . . . . . . . . . . 5. Software Engineering Rocess Representation Languages. . . . . . . . . . . . . . 5.1 History.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ADVANCES IN COMPUTERS. VOL. 41
2 2 4 5 10
.
.
10 11 12 13 15 16 17 19 19
23
.
26 28 28
2
H. DIETER ROMBACH AND MARTIN VERLAGE
5.2 Survey of Existing Languages . . . . . . . . . . . . . . . . . . . . . . . . 5.3 M w - L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Support for Software Engineering Rocesses . . . . . . . . . . . . . . . . . . . . 6.1 Tools for Rocess Model Building and Project State Browsing . . . . . . . . . 6.2 Rocess-Sensitive Software Engineering Environments. . . . . . . . . . . . . 6.3 M V P - S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Future Directions in Software M e s s Research. . . . . . . . . . . . . . . . . . 7.1 Practice.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Research.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
. .
.
45 50 50
52 53 54 55 55
57 59
1. Motivation Sofhvare engineering as a field is currently evolving from an art to an engineering discipline (Shaw, 1990; Gibbs, 1994).One essential characteristicof engineering is the ability to model all elements of the respective discipline explicitly in order to plan, manage, and develop products better. Higher demands on software products and increasing competitiveness within the software business have triggered a push towards systematic software engineering approaches including explicit documentation of processes. In this section the need for establishing explicit software engineering knowhow in general and software processes in specific is motivated from a business perspective (Section 1.1). A broad definition of software engineering is presented in Section 1.2, aimed at identifying all types of relevant software engineering processes. These software engineering processes are described in more detail in Section 1.3.
1.1 Software Business Success in the sofhvare business requires explicit know-how in the form of product models, process models, and other experience (Fig. 1). Software process models addressboth engineeringand nonengineering issues. Softwareengineering process models capture both aspects related to the engineering of software products and the engineering of better software process models. Models of product engineering processes describe technical issues of product engineering (such as requirements engineering, design, coding, verification, integration, or validation) as well as managerial activities (such as product management, project management, quality assurance. or project data management). Models of process engineering processes describe aspects related to process improvement by cycles of planning, observing via measurement, and feedback. Nonengineering aspects are addressed by business processes (e.g., banking, acquisition, traffic, plant management) and social processes (e.g., communication, training, supervising).
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
3
Software Know-How
Business
Social
Process
Rocess
Models
Models
Enginecring
ProcessModels
Engineering Process Models
/ \ T&hnical
Managerial Procw
Process -+ : %a" rclntionship Models FIG. 1. A tentative taxonomy of know-how related to software development and maintenance.
Models
Engineering and nonengineering-related processes are highly interrelated (AbdelHamid, 1993). Software engineering is crucial for success in any software-related business. However, the focus of software engineering know-how depends on the type of business and what is crucial for success within that business. If, for example, we distinguish between software products, software-intensiveservices, and softwareintensive products as business domains, the need for different kinds of knowhow becomes evident. In the case of pure software products [e.g., computeraided software engineering (CASE) tools, games], success is defined, for instance, in terms of time to market, features of the product, or interoperability. In the case of software-intensive services (e.g., banks, insurance companies, travel agencies), success is defined, for instance, in customer satisfaction, or usability. In the case of software-intensive products (e.g., airplanes, traffic control systems), success is defined, for instance, in terms of reliability, certifiability, or userfriendliness. Engineering demands are obvious in all areas, but tailored solutions for each case are needed. The demand for software engineering know-how is caused by stronger requirements on quality, productivity, predictability, certifiability, etc. Market pressure, not only for safety-critical software, triggers the need to evolve software engineering from an art to an engineering discipline. Currently, the software engineering demands are only satisfied more or less accidentally with existing approaches which are typically based on isolated development technologies (e.g., CASE tools, compilers) and human experience. This enables the development of highquality software provided that human experience is available to compensate for lack of explicit engineering know-how, but is not repeatable otherwise. In order for software engineering to become an engineering discipline, software engineers
4
H. DIETER ROMBACH AND MARTIN VERLAGE
have to represent essential software engineering know-how in an explicit, reusable form. Software businesses have to enable the build-up of explicit software engineering know-how tailored to their specific needs (Leveson, 1992). This requires an organizational setup for both building explicit models of software engineering with respect to their business demands, and using such models effectively in future, similar projects.
1.2 Software Engineering Sofrware engineering addresses the technical aspects of software-related businesses. The need for software engineering is most obvious when developing large systems with stringent quality requirements. The reasons are manyfold the specification of large systems covers functional and nonfunctional aspects, the system as a whole is not understandable in all details, often the specification is incomplete at the beginning of and not stable during the project, the project plans are subject to change, and the development is performed in teams, which are often large. It is obvious that these characteristics of large systems require to understand the basic principles of software development and maintenance. In the past few years it was recognized that software engineering processes must comprise more than only the application of sound methods, techniques, and tools. Since the creation of the term “software engineering,” its meaning has evolved. Early definitions reflected a purely technical view of software development. An example is the definition given by Bauer in 1972: “Software Engineering deals with the establishment of sound engineering principles and methods in order to economically obtain software that is reliable and works on real machines.” Software engineeringsupports development and maintenance of software products (for development and maintenance processes the term product engineering processes is used interchangeably). But software engineers do not simply apply their knowledge. Software engineers evolve their know-how with respect to what is developed (i.e,, the software products), how it is developed (i.e., the technical processes), and how well it is developed (i.e., the product and process qualities). The knowledge is captured in explicit models, describing real-world concepts, and evolves (for these processes the term process engineering processes is used). In contrast to early definitions, today a broader view on software engineering is needed to cover all aspects of systematic software development: Software engineering is concerned with the definition, refinement, and evaluation of principles, methods, techniques, and tools to support: 0
Individual aspects of software development and maintenance (i.e., design, coding, verification, validation, etc.)
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
0
0
5
Planning of software developmentprojects(i.e., choosing appropriate methods, techniques, and tools and integrating them into project plans) Performing development,project management,and quality assuranceactivities according to the plan Assessing the performance of the development, feeding back, and improving products, methods, techniques, and tools
This definition implies that software engineering requires learning (based on experiments and case studies) in order to develop useful models and reusing in order to make software development efficient and effective (Rombach et al., 1993). The term model refers to an abstract representation of real-world phenomena in an explicit form. The models reflect an organization’s know-how about software development. Software engineering know-how has to be developed and maintained. Practical experience has shown the need for modeling software engineering entities (especially processes) (see Proceedings of the series of International Software Process Workshops, e.g., Ghezzi, 1994), measuring those entities (Rombach et al., 1993),reusing the models (Prieto-Diaz, 1993), and improving the models (Pfleeger and Rombach, 1994).
1.3 Software Engineering Processes For the development and maintenance of large software systems a variety of processes are proposed. In general, product engineeringprocesses can be classified into technical and managerial processes as described above (Fig. 1).The primary goal of technical processes is to transform and evolve product information from the problem description to the final system in a systematic way. The primary goal of managerial processes is to provide an environment and services for the development of software which meets the product requirements and project goals. Knowledge about these product engineering processes can be captured in models in order to allow process engineering processes to improve them systematically. In the following paragraphs a survey of software engineering processes (Lea, product engineering, and process engineering processes) is given. This is not to be understood as a comprehensive discussion of all software development-related processes. Instead an impression should be given about typical representatives of product and process engineering processes and how they differ from each other. Technical software engineering processes include, but are not limited to: 0
0
Requirements engineering. Starting from an (implicit or explicit) problem description, the requirements engineer develops a system requirementsdocument in conformance with existing documentation standards and quality requirements (e.g., completeness, consistency, and correctness). Design. Startingfrom the requirements document,the design engineer develops the design document, capturing the system’s architecture and main
6
H. DIETER ROMBACH AND MARTIN VERLAGE
0
0
0
0
algorithms and data structures in conformance with existing documentation standards and quality requirements (e.g., strong cohesion and low coupling of modules). The design documents contain requirements for single components (e.g., modules, subsystems, routines) considered as black boxes. Coding. Starting from a component requirement, the code engineer develops a component design and implementationin conformance with existing documentation standards and quality requirements (e.g., maintainability, performance). VeriJication. Starting from each developed document within the project, the verifier has to check for conformance with documentation guidelines, quality, and correspondence according to the source document. Verification is based on a formal basis (e.g., verifying source code against component requirements) or informal basis (e.g., verifying design document against requirements document). The latter is often called inspection, review, or audit. Integration. Starting with the set of object code components, the integration engineer has to link the parts of the system to build the complete application. This can be done in multiple steps, creating subsystems and different versions. Validation. Starting with each executable part of the system, test data is applied to components, subsystems, and the entire system by the validation engineer to check whether the behavior corresponds to the specification. Functional and nonfunctional features of the running system are observed. If failures occur, the corresponding fault is detected. After termination of rework performed by other processes, the system or its components are to be validated again.
It should be noted that this grouping of activities is made for communication purposes. A real project may require other aggregations or refinements of the processes. Nevertheless, there are different classes of processes, but they are strongly interrelated by means of data, control, and information flows. Product engineering processes that deal with management aspects are not directly related to a distinct phase of a software project, but may span over the whole lifetime. Managerial software engineering processes include, but are not limited to: 0
Product management provides a repository for all product-related documentation developed, used, and modified in a particular project, including product measurement data. It should provide access control to avoid security and integrity problems, manage product-related attributes (e.g.. producer, size, type, etc.), maintain relations among products (e.g., explicit documentation of module uses-relationship), and perform configuration management.
7
DIRECTIONS IN SOFWARE PROCESS RESEARCH
0
0
0
Project management supports the performance of the scheduled processes during the whole project, monitors progress, controls estimates about effort and time, and initiates replanning. Quality assurance controls deliverables and processes with respect to the quality goals (or quality requirements)specified in the project plan to provide confidencein their fulfillment (InternationalStandardsOrganization, 1991a). In contrast to project management, a quality assuranceengineer is interested in process and product attributes that may flag a problem concerned with quality (e.g., number of detected design defects per staff hour). Projecr data management provides a repository for all project-related documents like schedules, plans, or process measurement data. Its task is to record the history of a project (i.e., the project trace) by recording events, management data, and process attribute values related to quality.
Figure 2 gives an overview about the different product engineering processes and their relationships. All technical processes are in the middle of the figure. The arrows present the product flow between the processes (i.e., boxes). Products are represented by boxes with rounded comers. The introduced managerial pro-
Project Management
A
control informntion
I
Quality Assurance
FIG.2. Technical and managerial product engineering processes.
8
H. DIETER ROMBACH AND MARTIN VERLAGE
cesses are shown around the technical processes. The repositories of product and project data management (not shown in the figure) can be seen as integrated to a project database. Project management and quality assurance use the project database services (e.g., to trace the project). Moreover, they both stay in strong connection with the technical processes (e.g., assign tasks, check results). It is important to note that the processes and their relations shown in Fig. 2 are exemplarily. Other processes are conceivable. Learning within an organization can only take place if the experience gathered in single projects is captured, analyzed, organized, maintained, and reused after project termination and during planning of new projects. Systematic learning requires both explicit representation of knowledge and the performance of processes manipulating knowledge. The processes performed for organizationwide learning are called process engineering processes because their main focus is on the processes of development projects. We can distinguish between: 0
0
0
0
Improvement processes analyze the current state of the application of techniques, methods, and tools, make suggestions to evolve the processes, and assess whether the intended impact on process or product quality has been achieved. It should be noted that software development is an experimental discipline since little is known about the basic nature of products and processes (Rombach et al., 1993).Therefore modifications of the current state have to be assessed, which should be a separate improvement step. Modeling and planning processes are required for choosing software engineering objects and integrating them into a project plan for a given set of project goals and characteristics. Descriptive models of existing implicit processes are needed to document the current state of the practice. The set of all instantiated models is usually understood as the project plan. Measurement processes analyze the project goals, derive measures to allow characterizationof various kinds of objects in a quantitative manner, collect data for project control, validate and analyze the collected data, and interpret them in the context of the goals. Expressing product and process features in numbers allows scientific and managerial activities to be performed more traceable and systematicly (Rombach et al., 1993). Reuse processes organize experience and allow learning from project to project. They use the information available from each project. Knowledge in terms of products, processes, and measurement data is analyzed and validated for potential use in future projects. The packaged experience can be reused by other projects.
Figure 3 shows the process engineering processes listed above.' It should be noted that some of the process engineering processes are present in the environment of a particular project (i.e., measurement and modeling and planning), that
OiRECTiONS IN SOFTWARE PROCESS RESEARCH
9
FIG. 3. Process engineering processes.
improvement is concerned with the evolution of software know-how operating on the experience base, and that reuse processes offer an interface for both projects and the experience base. For example, product engineering processes are traced by measurement activities and recorded by reuse processes in order to allow improvement processes to make suggestions for better process performance. The improved process models can be used in later projects during modeling and planning to build a project plan with respect to the improved software engineering know-how. These relationships between process engineering processes are not shown in Fig. 3. Improvement processes establish a cycle of learning from project to project. Entries in the experience database are updated based on observations made primarily in projects. To summarize, software engineering is concerned with both product and process engineering processes. The evolution of these processes, which is a field of software engineering, is covered by the general term sofrware process research. We would characterize it this way: Software process research is concerned with the definition, refinement, and evaluation of principles, methods, techniques, and tools for a particular context to support:
10
H. DIETER ROMBACH AND MARTIN VERLAGE 0 0
0 0
Communication about software development processes Packaging of software engineering processes based on reusable components Analysis and reasoning about processes Project guidance and control using suitable methods and techniques (i.e., processes) Automated support through process-sensitive software engineering environments
Product engineering processes are (partially) the outcome of process engineering processes. This work gives a general overview of both kinds of processes in the first part (Sections 2-4) and discusses a specific infrastructure technology supporting software process research in the second part (Sections 5-7). The current states of product engineering processes and process engineering processes are presented in Section 2 and Section 3, respectively. An overall framework for integrating both kinds of processes conceptually is given in Section 4. This framework motivates the need for explicit process models and suggests requirements for models and process representation languages. Section 5 discusses to what extent these requirements have been met by presenting a sample set of approaches. Supporting methodology and technology for process representation languages is presented in Section 6. Finally, Section 7 lists possible future topics for practitioners and researchers for contributing to the field of software process research.
2. Product Engineering Processes Software development organizations have recognized the need for explicit models of both products and product engineering processes in order to develop software of high quality (see also International Standardization Organization, 1991b).This section gives a broad overview about what processes are modeled. Most of the documents are still used off-line during the project. The potential of explicit process models is still not used in real-world software development projects.
2.1 Life-Cycle (Coarse-Grain) Models Life-cycle models describe activities over a project’s lifetime on a high level of abstraction. They divide a project into several phases and describe temporal relationships (e.g., coding starts as soon as design terminates).This is an abstract view of a project, only suitable for educational, communication, and process classification purposes (i.e., functional module testing belongs to the validation phase, and is no subpart of it).2In general, life-cycle models are no special kind
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
11
of development process model. They are used to explain and classify them. Nevertheless, in the first years of their existence, life-cycle models were (mis) understood as process models and used for project planning and guidance, and sometimes they still are. The first life-cycle model that became known to a broader audience was the waterfall model (Royce, 1987). Its popularity stems from being one of the first attempts to clarify understanding of how software is developed. Subsequent modifications of the waterfall model were made to allow feedback and iteration. Criticism addressed its nonapplicability to certain kinds of projects. Mainly, this was caused by misunderstanding the waterfall model as a process model for software development, rather than a classification of its processes. A typical example was the waterfall model with overlapping phases, which tends to be more a process model than a life-cycle model. Nevertheless, all other life-cycle models proposed after the waterfall model share some commonality with it. The iterative enhancement model suggests dividing a specification into increments and then iterating over these increments (Basili, 1975). This enables learning in a project and allows one to deal with higher-risk (i.e., with respect to technical aspects or budget/schedule)system parts first. Each increment is developed following the waterfall model and an additional integration step. Prototyping is a life-cycle model introduced for specification validation using executable examples of the system to be developed (Davis, 1990). The intention is to give the end-user a feeling about the future system or to analyze feasibility of critical components. Several variations of the prototype model exist. Extreme ones are the development of a throwaway prototype, which is purely developed for specification validation, and the evolutionary approach, which modifies a system with respect to refined or corrected user specifications over multiple iterations. The spiral model covers both technical and managerial aspects of software development (Boehm, 1988). Sequential steps are performed to elaborate the system and to analyze risks caused by uncertainties (e.g., a weak requirement) and lack of know-how. The development steps may vary between a completely evolutionary approach and a two-phase prototype-waterfall approach. It differs from the other proposed life-cycle models in that it explicitly integrates nontechnical aspects like planning and management into the spiral.
2.2 Development (Fine-Grain) Process Models Development process models exist for both technicians and managers. Lifecycle models and development process models do not differ in what they describe, but they do it on different levels of abstraction and cover different aspects. In particular, development process models are more detailed and cover more aspects than life-cycle models. The documentation of these processes is at least done for
12
H. DIETER ROMBACH AND MARTIN VERLAGE
two reasons: (1) the engineers and managers need a guide through the technical details and steps of the employed methods and techniques, and (2) multiple techniques and methods are brought together by describing them in a single process model. The primary modeling concepts remain the same for both purposes. But it should be noted that technical and managerial process models, on the one hand, tend to be general when they cover several phases of software development, and, on the other hand, only present an isolated slice of how to apply product engineering technology. Technical and managerial process models often appear as standards. Some of them are published by public organizations [e.g., Institute of Electrical and Electronics Engineers [IEEE], American National Standards Institute (ANSI)] in order to spread technical knowledge among their members or to ensure specific qualities of the product or the process by their customers [e.g., in the case of U.S.Department of Defense (U.S. DoD) 2167A or International Standardization Organization (IS0)-9OOO family]. The standards usually make statements about the product (e.g., documentation), intermediate results (e.g., error reports), and the development processes (e.g., method to apply). A critical look at standards can be found in (Pfleeger et al., 1994). It is obvious that different process models are needed when concerned with different tasks (Section 1.1). But also different process models are needed when concerned with same tasks but in different contexts and with different expectations (e.g., quality requirements). Organization-specific standards, guidelines, and process models exist on various levels of granularity, the same process is described on company, division, department, project, and team level. Others may exist only on a particular level. Often the standards, guidelines, and process models do not contain the context-specific implications of the rules they should enforce. This sometimes makes it difficult to understand the reason or purpose of a rule for remote readers.
3. Process Engineering Processes There is a growing awareness in industry that process engineering is crucial for business success (Gibbs, 1994; Humphrey, 1991). Process engineering processes are performed to observe product engineering processes, to feed analysis results back to process performers, and to improve the already existing process models for use in future projects. First companies are starting to deal explicitly with process engineering processes in the context of process improvement programs (as, for example, conducted by the Software Engineering Institute (SEI) at Camegie-Mellon University, the BOOTSTRAP consortium (Haase et al., 1994), or the Software Technology Transfer Initiative (STTI-KL,) at UniversittIt Kaiserlautern). Nevertheless, explicitly and formally defined process engineering
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
13
processes are rare (although examples can be found in Madhavji et af., 1990; Hoisl, 1994).
3.1 Improvement Processes Improvement processes deal with an organization’s knowledge about how and how well to develop software. Models which reflect existing practices are used to plan and guide software projects. The projects are observed in order to determine problems or to detect relationships between phenomena not known previously. The new experience is brought into new models, reflecting improvements over the set of models which existed before the project was launched. Several process models for improvement have been discussed inside and outside the software process research community. They all share the basic principles of planning, observing, and giving feedback.
3.1.1 Plan-Do-Check-Act Improvement of industrial production lines or single processes is the goal of the Plan-Do-Check- Act approach (Deming, 1986). Observations made in Japanese companies have stimulated the formulation of an approach to change processes for the better. The general idea is that customer satisfaction is the key for business success. For this, statistical quality control is used to measure the impact of factors isolated by the organization on a product’s quality. A feedback cycle enables learning from past results. The approach consists of four steps: 1. Plan. Develop a plan for effective improvement (e.g., setting up quality measurement criteria and establishing methods for achieving them). The existing process is changed or a test is designed. Sometimes only new measurements (observations) are introduced to the process. 2. Do. The development organization carries out the plan, preferably on a small scale. 3. Check. After process termination, observe the effects of the change or test (i.e., check the product against the quality criteria set up in the plan phase). 4. Act. Study the results and document the lessons learned. Emphasis is on understanding the principles of the process, to predict product’s quality better and to suggest further improvement. The Plan-Do-Check-Act approach is designed to improve a single industrial production process within a team organization. Variations which likely exist in software engineering processes are not considered here. Moreover, the emphasis is on the product rather than on the process, and on one process attribute, namely, productivity. This suits for the evaluation of production, but
14
H. DIETER ROMBACH AND MARTIN VERLAGE
not as an approach for studying development processes themselves. The quality control methods are not justifiable for software development, also. Since the goal of the Plan-Do-Check- Act approach is eliminating product variations, it cannot be used for processes that produce no two outputs which are the same. A characteristic of the Plan-Do-Check-Act approach is the nontermination of this four-step improvement cycle. A process is seen as being continuously improvable, although it may have a good state already. The basic ideas that led to the formulation of this approach served also as input for designing the basic principles of Total Quality Management (TQM).
3.1.2 Total Quality Management TQM stems from the same roots as the Plan-Do-Check-Act approach, providing a rich framework for systematically achieving higher customer satisfaction, a more advanced organization, and improvements by modularizing the process into standardization, improvement, and innovation. It requires planning, measurement, and control for both products and processes throughout the whole project (Feigenbaum, 1991). Different measurement approaches are taken for observing product quality (statistical process control), customer satisfaction (e.g., quality function deployment), and advancing the organization (e.g., policy deployment). Improvement can therefore be demonstrated by comparing data taken from actual projects with quantitative characteristics of prior processes. TQM must be tailored to the specific needs of software development. In Zultner (1993), the adoption of TQM for software development organizationsis described. The report about a single software development organization out of 130 being awarded for quality improvement may indicate that the number of software development organizations currently establishing TQM is not as high as in pure production-oriented companies. A more detailed example for applying ideas of TQM to software development is described in Humphrey etal. (1991), connecting TQM and the SEI’s Capability Maturity Model (CMM) for assessing software development organizations.
3.1.3 Quality Improvement Paradigm The Quality Improvement Paradigm (QIP) is a six-step approach for systematically developing software and validating software engineering technology. The QIP is the result of the application of the scientific method to the problem of software quality. As such it is related to other methodological foundations (Basili, 1993). In particular, the QIP consists of the following steps for a development or improvement project (Basili et al., 1994a): 1. Characterize the project using models and baselines in the context of the
particular organization.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
15
2. Set quantijiable improvement goals on the basis of the initial characterization and of the capabilities that have a strategic relevance to the organization for successful project and organization performance and improvement. 3. Choose an appropriate process for improvement, and supporting methods and tools, making sure that they are consistent with the goals that have been set. This plan is developed on the basis of the characterization of the environment and the goals that have been set. 4. Execute the plan to develop products, and provide feedback based on the data on goal achievement that are being collected. 5 . Analyze the collected data and information gathered during and at the end of each specificproject to evaluatethe current practices, determineproblems, record findings, and make recommendations for future project improvements. 6. Package the experience newly gathered in the form of new, updated, or refined models. Store them in an experience base so it is available for future projects. Tracing the effects of applied technology in the context of a defined context and with quantifiable goals leads to conclusions about the suitability of this technology in the given environment. Because product engineering processes are also a subject of study, all technical aspects of software development are potential candidates for improvement. This exemplifies that improvement processes cannot be seen as being equal to technical processes. Improvement processes survive projects; they study and change product engineering processes. The QIP has been demonstrated in various applications to be a sound methodology for quality improvement. This has helped to get better product engineering processes in order to meet quality goals. One example of successful quality improvement by employing the paradigm is National Aeronautics and Space Administration (NASA’s) Software Engineering Laboratory, Greenbelt, Maryland (Basili et al., 1992). This organization was awarded the first IEEE Computer Society Award for Software Process Achievement.
3.2 Measurement Processes Measurement processes are concerned with the definition of goals, the derivation of metrics, the collection of data, their validation and analysis, and finally the interpretation of the results in the context of the environment from which the measures were taken. The goals of measurement vary along five characteristics: (1) what software engineering objects are being measured (e.g., products, processes, projects); (2) why are they being measured (e.g., control, characterization, assessment, evaluation, prediction. improvement); (3) who is interested in this measurements
16
H. DIETER ROMBACH AND MARTIN VERLAGE
(e.g., designer, tester, manager, quality assurance engineer, entire software development organization); (4) which of their properties are being measured (e.g., cost, adherence to schedule, reliability, maintainability, correctness); ( 5 ) in what environment are they being measured (e.g., kinds of people involved, technology used, applications tackled, resources available) (Basili et al., 1994~). Goal definition is the central part of any measurement activity in the sense that it establishes a context for all other measurement activities. Object (e.g., design document, validation process), purpose (e.g., analysis, control), viewpoint (e.g., designer, manager), quality focus (e.g., completeness, effectiveness), and environment (e.g., project organization X) of a goal define a context that is used to select appropriate metrics (Rombach, 1991b; Basili, 1992). Derivation of metrics is performed in a traceable way by following the Goal Question Metric (GQM) approach (Basili et al., 1994b). Data collection is performed before, during, and after a project or process to get information about the subject of study. Various techniques are employed to collect the data, mainly we distinguish between manual and automatic collection. Sample support systems are listed in Lott (1994b). Validation of measurement data and its analysis are performed during the project or after its termination. Statistical techniques are selected with respect to measure qualities (e.g., scale) and employed to identify features and relationships of the data (e.g., correlation). Interpretation of the collected and validated data should be performed in the context of the measurement goal. This process uses results from data collection and analysis (e.g., in the case of control to check whether the goal has been met or not). Here automated support is valuable, but interpretation must be done by humans.
3.3 Modeling and Planning Processes One task of software engineering is the assessment and improvement of methods and techniques used for software development and maintenance. The nature of software and processes is the subject of study. This requires the observation of process and product qualities over time. To deal with these objects of study, it is necessary to describe them and formulate expectations about their behavior (Sec. 1.2). The former is usually called modeling and the latter planning. Modeling captures essential aspects of real-world entities and leaves irrelevant details out. A model is a general description (or representation) of an entity (i.e., specifies a possible set of real-world entities). The counterpart of the real world is the model world which contains all models. One can compare process, product, or quality models with types in programming languages. When modeling is based on observations of real-world objects, it is called descriptive modeling. What is relevant and what is not depends on the intended use or purpose of the models.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
17
A general answer of which aspects to cover cannot be given. Also, the level of granularity depends on the context the models are used in. In the model world two layers are present (i.e., a generic and a concrete one). On the one hand, classes, templates, or types of processes are described in order to specify explicit, but generic knowledge about a set of real-world activities. These elements provide abstractions from projects. On the other hand, concrete objects are instantiated from these elements which directly correspond to activities, things, or other concepts in the real world (Dowson and Fernstrom, 1994). For example, on the type layer there exists a general process model design producing a design document. On the object layer there exist several design processes which all are instances of the same process model. These design processes are scheduled and assigned to different people in order to develop design documents for two different systems in different contexts. These processes have in common the information specified in the process model. The processes represent activities of a real-world project. Planning uses models (i.e., instantiates and relates them) to build a representation of the project to be performed with respect to the project’s goals and characteristics. Project goals are quality attributes (e.g., time, effort, reliability) with quantitative target values (e.g., 3.5 months, 960 staff hours, less than two critical error in acceptance testing). Project characteristicsare given by the project environment with respect to personnel (e.g., maximum eight persons, two of them unexperienced), process (e.g., waterfall model, methods and tools), and quality (e.g., estimation models). Experience of former projects helps the planner make estimations (e.g., about effort distribution or fault detection rate) and to set up internal project goals to be met (e.g., milestones or quality goals). Typically the overall process structure is a task breakdown structure that can be used for resource allocation and other management tasks. Various techniques to support these processes exist (for sample lists, see Gilb, 1988; von Mayrhauser, 1990).
3.4 Comprehensive Reuse Processes Reuse is often claimed to be a key to cost-effective software development. Reuse of software products has been an important area of software engineering for some time (Krueger, 1992). Different product reuse systems exist. Recently, all kinds of explicit knowledge were seen as objects of reuse (Prieto-Diaz, 1993).3 At least products, process models, and quality models can be considered as such software engineering knowledge. There are examples of systems providing support for reusing software engineering knowledge (Gish et al., 1994). In order to make reuse work, it must be embedded into software engineering processes. It must be considered during and integrated into software development (Matsumoto, 1992). The reuse process must be tailored to the objects stored in the repository. Nevertheless, basic principles are shared among them. On the one hand, the
18
H. DIETER ROMBACH AND MARTIN VERLAGE
characterization scheme of objects in a repository should consist of three parts, on the other hand the reuse process should be refined into several basic processes (Basili and Rombach, 1991). Both reuse candidates and required objects are described in a threefold manner. First, the object itself has to be documented. Second, the object’s interface must be specified. Finally, a comprehensiveobject documentationcontains the context the object is used in. This characterization scheme yields a complete description of software engineering knowledge (i.e., products, process models, or quality models). An example for a reuse object is given in Table I. How such a scheme can be used to retrieve objects from a repository is described in (Prieto-Diaz and Freeman, 1987). Often the reuse process is refined into the subprocesses identification, evaluation and selection, modijication, recording, and (re)packuging. Identification means computing the similarity between the required object and reuse objects residing in the repository. An example is given in Prieto-Diaz (1991). The identification process results in a set of possibly useful objects. The elements of that set have to be analyzed in depth to select the most suitable candidate. An exact match of reused and required object is ideal but unlikely to happen. Often a modification is required to meet the specification of the required object. At the end of a project, any experience gathered throughout the whole project has to be recorded and brought into the repository. This includes new objects and experiences made with reused objects. The repository itself must be organized
TABLEI EXAMPLE PROCESS ASSET Dimensions
Design inspections Object
Name Function Type Granularity Representation
SeLinspection.waterfal1 Certify appropriateness of design documents Inspection method Design stage Informal set of guidelines Object interface
InpuVoutput Dependencies
Specification and design document Assumes a readable design, qualified reader Object context
Application domain Solution domain Object quality
Ground support software for satellites Waterfall Ada life-cycle model, standard set of methods Average defect detection rate (e.g., >0.5 defects detected per staff-hour)
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
19
and maintained to provide good support. Because of this functionality and its separation from the projects, this active part of an organization is called the experiencefactory (Basili ef af.. 1994a). The generic reuse process is discussed in detail in Basili and Rombach (1991). A detailed discussion of process steps for product reuse was made within the ESPRIT project Reboot (see, e.g., Morel and Faget, 1993). Reuse as a process engineering process is not only affected by technological issues, but also by environmental influence. This shows again (Section 1) that any kind of engineering process is embedded in the context of other processes, either engineering or managerial processes (see also Frakes and Isoda, 1994).
4. A Framework for Integrated Product and Process Engineering A variety of product and process engineering processes is needed for effective and efficient development of large software systems. Significant results were achieved in the last 25 years in evolving software engineeringprocesses (Sommerville and Paul, 1993). The reason why the software crisis is still alive is twofold First, software systems are becoming increasingly more complex because the application complexity is growing. Solutions provided by software engineering lag behind the demands of practical software development. Second, there are only a few attempts to integrate solutionsfrom different research areas. Principles, techniques, methods, and tools developed independently possibly influence each other. This should be a subject of study. Improvement of product engineering requires better models. Better models can be built empirically based on lessons learned from development projects. Improvement of both product engineering and model building cannot be achieved independently of each other. A framework is needed that supports sound product engineering and facilitates learning for the sake of process engineering.
4.1
Product and Process Engineering
The TAME model in Figure 4 presents an overall view of software engineering processes and where in the organization they should be performed (Basili and Rombach, 1988). The upper half of the figure denotes single projects being planned and performed. During a project’s lifetime, all product engineering processes are performed (i.e., technical and managerial). Some process engineering processes are also performed in the project environment (i.e., project planning and measurement). The project feedback cycle in Fig. 4 denotes that product and process engineering processes may be interrelated (i.e., quality assurance uses measure-
20
H. DIETER ROMBACH AND MARTIN VERLAGE
I Identification,Evaluation, Saiectlon. and Modnicatbn
ment data in order to check for quality constraints, or project management triggers replanning the project plan because of budget slippage). Process engineering processes also operate in the experience factory! From the experience factory's perspective, quality improvement following the six steps of the QIP requires learning from projects and providing software know-how in terms of models and objects for future projects. Another feedback cycle exists within a project. During project performance,measurement data is used to observe and analyze the current project state, and to give suggestions when deviations from the plan are detected. So far, the subject (i.e., the product engineeringprocesses) and the methods (i.e., the process engineering processes) of software process research are introduced. Repeatedly it was stated that deriving explicit models is crucial for the success of software process research. Explicitly documenting aspects of the product engineeringprocesses is also crucial for project success. For example, the following questions could emerge in real-world projects and should be answered objectively and traceably with the help of explicit representations of process, products, and qualities:
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
0 0
0
0 0
0
21
What minimum coverage should be reached for structural testing in this project? How many omissions are detected in the different versions of the requirements document? Is there significantrelationshipbetween module complexity and effort distribution of module tests? What persons must be informed when a component specification must be revised? Is the fault detection rate in this project abnormal when compared to previous projects of that kind? Can the review of the requirements document be made more formal in order to allow earlier detection of design faults?
Product, process, and quality models are needed for communication, analysis, reasoning, guidance, control, and improvement purposes. Additionally, constraints often require documentation of relevant information [e.g., I S 0 9000-3 (International Standardization Organization, 1991b)]. The following discusses the potential benefits of explicit process representations in more detail, giving various examples of results of software process research. This is done to motivate representation languages as an infrastructure of software process research and to illustrate why there is such a diversity among them. No two different representation languages discussed in the second part of this work have the same goal (i.e.. communication, analysis, reasoning, guidance, control, automated support, or improvement) (Curtis er al., 1992). It is important to recognize why there are different motivations for the different approaches. Therefore, a list of five expected benefits (Bl-B5) is given in the following paragraphs (see also Rombach, 1991a). B 1: Explicit models support human communication. Software is developed in teams. Effective collaboration requires that each project member understands its responsibilities as well as its interfaces with the activities performed by other team members. Explicit representation of knowledge allows asynchronous communication of distributed parties. Moreover, the mere existence of product, process, and quality models supports different people understanding each other (e.g., by referring to a specific part of a representation). Experience gathered from software development organizations has shown the usefulness of explicit process documentation(Humphrey, 1989; Klingler et al., 1992). B2: Explicit models allow for packaging experience. Software engineering know-how is complex. The set of processes needs to be packaged attached with experience regarding their effectivenessand organized in an efficient way (Basili and Rombach, 1991). Complex objects and complex relationships are structured by writing them down in an orderly way (i.e., by
22
H. DIETER ROMBACH AND MARTIN VERLAGE
using a grammar). Aggregation, generalization, and classification hierarchies allow for sound process model management on various abstraction levels. Moreover, explicit process models establish maintenance of software engineering know-how. Different process model versions are managed and improvement can be demonstrated. Additional information is associated with packages. For example, quality models for prediction of efforts possibly are related to a class of process models. B3: Explicit models ease analysis and reasoning. Process representations possess features which are not directly encapsulated in language constructs (e.g., performance). Analysis is to be performed in order to allow statements about the process models. The spectrum of analysis techniques varies from human reasoning to formal verification of process models. An example for the first is given in Klingler et al. (1992): A reuse process was the subject of formalization and subsequent modification based on human analyses. An example for the latter (formal verification) is given in Gruhn (1991). It is obvious that here the set of applicable analysis techniques depends on the selected formalism. In contrast to static analysis and reasoning, several approaches allow studying dynamic properties of process models. One example for software process simulation is given in Kellner (1989). This is an appropriate way for humans to observe process behavior prior to application in real-world projects if formal techniques cannot be employed. B4: Explicit models provide project guidance and support control. Product engineers performing complex activities should be supported in a contextdependent way about what tasks to do next, what are the objects to be accessed, or what are the criteria for successful process termination (Saracelli and Bandat, 1993).Like a road map, explicit process documentation guides the developer through complex structures. A sound support can be achieved if the information is provided context-dependent with respect to the actual process state. By using a guidance system, one can determine whether the actual project performance meets the specification of the plan. This control is effective, if measurement of product and process features allows quantitative statements about the current state in relation to the prediction (Rombach et al., 1992). B5: Explicit models enable automated support through software engineering environments (SEES). Explicit representations of processes can be made accessible, under certain circumstances, to process-sensitive’ software engineeringenvironments (Fuggetta and Ghezzi, 1994). The process models are instantiated and interpreted in order to invoke tools and to compute the next project state. If product engineering processes are well understood, they can be automated with the help of this kind of environment (Penedo
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
23
and Riddle, 1993). But even if the whole functionality of an activity cannot be captured in a process model (e.g., because the process is not well understood or human creativity is not to be hindered), automated support can be given by managing an explicit project state and providing project information for all the users. Table I1 relates the benefits (i.e., Bl-B5) and the processes introduced in Section 1. For each kind of process it is expressed how explicit process models support it. The relations most obvious to us are given. Nevertheless, an empty table cell does not mean that explicit process models do not support this kind of process with respect to that benefit. But from our experience we see the listed relations as the relevant ones. In the next two sections requirements for process representation languages are elaborated and each requirement is related to the above-discussed benefits. If a requirement is fulfilled by a process representation language it is likely that the corresponding benefits are gained. We start with the discussion of requirements for process engineering (what language properties are useful when developing the models) and present requirements for product engineering (what language properties are useful when using the models) afterward.
4.2 Requirements for Process Engineering Process engineering deals with developing, tailoring, analyzing, and improving explicit models. The software engineeringmodels describe all aspects considered to be relevant in the context of a software development project. For example, during modeling, the concepts process, product, resource, attribute, and a variety of relationships like product flow, synchronization between processes, or refinement of objects have been identified as important to build models (Rombach, 1991a). Other types of elements may be justified (e.g., communication paths, events) depending on the process engineering tasks. In addition to the aspects captured in the models, the models themselves have to fulfill a set of requirements when acting as a tool during process engineering activities. The requirements for the models are intertwined with requirements for the process representation language. The following list of requirements expresses our idea of a suitable notation: R1: Natural Models. Project members should review the models built. The process models should not only capture all relevant aspects of the elements of software development, they should be able to represent these aspects in a natural (i.e., easy to identify) way. A one-to-one mapping between real-world phenomena and process model elements eases modeling and
24
H. DIETER ROMBACH AND MARTIN VERLAGE TABLE I1
PROCESS-BENEFIT COVERAGE Rocesses
Support by explicit process models
B1 Task definitions are offered to the product engineers. Real-world concepts must be easy to identify. Formal models allow for consistent interpretations. B4 Decisions about what course of action to take are supported by providing feedback in quantitative form and by showing alternatives. B5 The interpretation of flexible models tailored to specific contexts offers individual support for product engineers (e.g., by providing appropriate tools). B1 Explicit models help the process engineers to understand the Improvement processes. Improvements are documented to support future projects. B2 Model versions show how a process has evolved over time, documenting improvement steps. B3 Analyzing process traces and relationships between measurement data possibly leads to the identification of problems with respect to current models and stimulates suggestions for improving them. B2 A repository of process models reflects an organization’s software Reuse engineering experience. Models to be reused often must be tailored to new, different contexts. B3 Taking measurement data during process performance and checking consistency of different aspects on a formal basis validates the models and makes them more valuable for the experience factory. BI Formal models may be used to define a schema for data management. Measurement To ease interpretation of the schema they should describe realworld concepts in an understandable way. B4 Explicit models tell when, how, and what data to provide or collect. Moreover, the models build a context for multiple data. B5 Measurement activities are automated. Tools for data collection may be directly invoked from process descriptions. Modeling and planning B2 Formal models are selected, modified, and used as building blocks to create a new project plan, thereby generating a new package of explicit experience. B3 Formal languages allow for modeling processes and for checking them for consistency using tools. Project plans are built using experience from former projects and by reusing models and related information gathered through measurement.
Roduct engineering
maintenance of these models. This requirement is called elsewhere the general representation requirement (Gruhn. 1991). R2: Measurable Models. The impact of each applied technology in a particular development or improvement project can be observed on the effects that it has on the products and processes. Scientific evaluation of these effects
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
25
is needed in order to judge the efficiency and effectivenessof the technology and to detect relationships between process elements not known previously. In particular,the QIP (Section 3.1.3) relies heavily on measurement. Therefore process and product characteristics should be measured during the project. A proper definition of attributes and measurement data within the models is needed. For example, if the goal is to test for a possible relationshipbetween module complexity and test effort, collecting data from both products and processes must be able. R3: Tailorable Models. In some sense, process models represent generic information about multiple completed activities. On the one hand, to be useful for a number of projects, process models should describe the commonalities of the processes of these projects. On the other hand, the process environment is likely to evolve and change. Planning must consider the differencesand instantiatethe models using environment parameters (e.g., schedule, reliability). Tailorable models limit the number of models to be maintained and offer adaptability in different contexts (i.e., reuse). For example, requirements specification is likely to vary depending on the type of software to be developed. R4: Formal Models. Process engineering requires communication among different roles. Commonly, teams are concerned with those tasks. Moreover, there may be time gaps between modeling and use. Also, those people performingprocess engineeringand those performing product engineering are usually not the same. All these points require some degree of formality of the models in order to be understood in the same way by different people. If this were not true, implicit process knowledge would be sufficient, Thus, if a language fulfills this requirement, it means that there is a formal model implemented in this language which defmes the meaning of process model constructs. For example, the specification “product P is accessed by process A” should have a specific meaning (i.e., read write and paralleYexclusive access). It should be noted that it is important to allow for replanning of a project. Replanning means to interrupt a project, capture the current state, modify the plan and probably the models, adopt the state according to the new plan, and resume the project. Approaches tackling that problem exist (e.g., Bandinelli et al., 1993a), but the problem is currently unsolved. Nevertheless, this cannot be a requirement for a process representation language (it is r e f e d to the discussion of self-modifying code in the mid-1980s) nor a part of process engineering processes (an analogy would be that software developmenthas to do with changing process memory during runtime and ensuring that a modified system specification is met). This particular problem domain is beyond the scope of this work, taking into considerationthat this is a relevant issue of softwareprocess research.
26
H. DIETER ROMBACH AND MARTIN VERLAGE
4.3 Requirements for Product Engineering Product engineering deals with the development and maintenance of software based on explicit models suitable for a given set of project goals and characteristics. Explicit models are used in particular contexts to support development of software. They are parameterized or tailored with respect to the goals of *e activities which they support. Humans or machines interpret the process representations to perform the tasks. Consequently the developers emphasize different requirements from those concerned with the evolution of the used models. The following paragraphs present a list of requirements that should be fulfilled by any process representation language from the viewpoint of a developer in a particular project.
R5: Understandable models. Process models are used as a reference during the project either on-line or off-line. Because most of the activities concerning process engineering rely on human rather than machine interpretation of the models, understandability is a crucial point for the success of any process representation language. In contrast to the requirements R l and R4 which address completeness and formality of the language, understandability is oriented toward the style of presentation and how difficult it is for the user of the process representation to retrieve information. For example, the validation engineer should understand by reading the process model how to apply a particular structural testing technique when a user interface component is to be tested. R6: Executable models. Standard procedures of software development are frequently implemented as tools. They can be seen as hardwired process representations. Process-sensitive SEES use variable process descriptions to integrate tools in order to provide more comprehensive automated support. Process representation languages must have specific features in order to allow interpretation or execution by a machine for tool invocation and file access. These features include constructs for expressing operation system's objects (e.g., files, streams, executables), and control and data flow primitives. R7: Flexible models. Processes performed by humans are characterized by creativity and nondeterminism. The use of process models should ensure that the important aspects of the current situation in the development project are reflected. Therefore, a process representation language should provide features for handling decisions made by humans. Such decisions typically affect control flow and product states. For example, only after termination of the design process, the number of modules to be coded is known exactly. But this decision affects the number of processes for coding or validating the modules.
27
DIRECTIONS IN SOFMlARE PROCESS RESEARCH
R8: Traceablemodels. The models should ensure traceability within and across layers of abstraction (i.e., horizontal and vertical traceability). For each piece of information the developer should be able to determine the context it was originated, the processes which rely on that particular piece of information, and how the information was transformed. M e s s representation languages should provide constructs for explicitly stating different relationships between project plan elements. For example, when a local design error is detected, the project manager should know what developers to inform about interrupting their work and waiting for design modification [e.g., if a module is already at the validation stage, all affected roles (i.e., designer, programmer, and tester) are to be informed about additional rework on design, code, and other related documents]. Table 111 illustrates the relationship between the benefits of explicit process representations and the requirements for process and product engineering which have to be fulfilled by any language which is used to build such explicit models in order to gain the related benefits. Explicit process models which should serve for communication purposes should be described using real-world concepts (Rl),have the same meaning for different people (R4),and should be understandable (R5). Process models that should be reused in another context must allow for tailoring (R3), and should be formal because the recipient of the reused model is often not the one who defined the model (R4).To allow analysis and reasoning, process models must have measurable attributes to quantitatively define them (R2),must be formal in order to allow automated reasoning or analysis by human experts (R4), and should allow identifying relations among data (R8). To be useful for developers and managers in a specific project, an easy mapping between real-world phenomena and objects of the representation should be able (Rl), attributes of both products and processes are needed to control the project (W), the models need to be understood by the developers to tell them what to do (M), and finally the models should allow tracing of information (R8). And last but not least, if the TABLE I11 BENEF~T-REQUIREMENT CoVERAoE
B1 B2 B3 B4 B5
Benefit
R1
Communication Packaging Analysis and reasoning Project guidanceandcontrol Automated support
X
X
R2
R3
R4
R5
x
x x
x
R6
R7
X
X X
X X
x
x
R8
X X
28
H. DIETER ROMBACH AND MARTIN VERLAGE
process models are used as a basis for automated support by a software engineering environment they need to be tailorable to the specific context (i.e., allow one meaningful interpretation) (R3), must be understandable by an abstract machine (R6). and should possess flexibility in order to resume enactment even when the current course of action does not match the expectations (R7).
5. Software Engineering Process Representation Languages 5.1 History The need for an explicit documentation of software engineering knowledge in terms of product, process, and quality models exists. Representing, storing, and retrieving these models is required to enable use and improvement of the models. A language to represent the models must be designed to allow the models to be used in a way suited to the user (Rombach and Verlage, 1993). Different process representation languages were designed and implemented to serve as representation mechanisms for software engineering processes (i.e., to derive process models). Development of languages for representationof software engineering processes is performed at least by two major groups (i.e., tool integrators and process improvers) with different objectives. In the beginning, both communities operated independently. Ideas and results were interchanged later [e.g., see 1st International Software Process Workshop (ISPW) (Potts, 1984)-ISPW 9 (Ghezzi, 1994). Representation languages for processes exist within the tool building community. The beginning of software process research in this area was marked by the UNIX tool make (Feldman, 1979). Make automates a (construction) process by checking dependencies and calling tools. Later more sophisticated approaches were developed which are more complex tools. Today, some software engineering environments have a component called the process engine to interpret process representations in order to execute the processes (Fuggetta and Ghezzi, 1994). The languages to describe the processes include also powerful concepts for representing products, tool calls, user interaction, and control flow between processes. Moreover, the design of some languages recognized the need for flexible mechanisms [e.g., AppYA (Sutton et al., 1990)l. The tool building community emphasizes the understandability of process representations by machines. The languages are not designed to be interpreted by humans. Interaction between humans and computer is done mostly directly by the tools, which indicates that integration of human-based activities is not included. The other branch of language designers originates from the community which is concerned with process improvement. Years of experience in making software development more predictable and controllableled to the recognition that software
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
29
engineering processes need to be represented. A general motivation can be found in Humphrey (1989).Because the work on process models for process improvement was performed by different groups in different organizations, a landmark announcing the beginning of software process research cannot be determined. The main goals which can be identified in all of the approaches are to enable understanding, reasoning, and improvement of the processes of a software project performed by humans (for example, see Saracelli and Bandat, 1993).To reach these goals, languages developed for product engineering were often employed and extended to describe software engineering processes. Some languages were developed from scratch [e.g., MVP-L (Brkkers e? al., 1992)l. The language constructs can easily be matched with real-world concepts to ease descriptive modeling. The derived models can be checked by humans. Design of these languages for descriptive modeling was not for tool integration and automization purposes, but to facilitate the construction of models which humans can understand. Briefly, one can say that the tool integration community tries to automate software engineering processes starting from the bottom level of granularity, whereas the improvement community wants to start with representation of software engineering processes from the top. Both are working on closing the gap between the communities in order to provide a comprehensive support of the entire set of software engineering processes. Nevertheless, the existence of any process representation languages is justified. Every language was designed for its own purpose.
5.2 Survey of Existing Languages It is well known that no best programming language exists in general. For the same reason there is no besr software engineeringprocess representation language: requirements on the process representations vary too much, and there is no single use and application domain of the language. Therefore, process representation languages cannot be assessed with respect to each other generally. But comparing them helps to understand the different motivations for design and application of the languages and shows the variety of mechanisms developed to represent software engineering processes. The sample list of representation languages presented in this section is classified with respect to a common scheme. Additionally for each language it is stated whether the requirementsgiven in Section 4 are met. The classification scheme given below is an aggregate and extension of other surveys made with respect to software engineering process representation languages (i.e.. Armenise e? al., 1992;Kellner and Rombach, 1990; Curtis e? d., 1992). In particular, the emphasis of the scheme presented here is understanding the motivation for developing the languages and giving a flavor how they support software engineers, rather than discussing technical aspects of the languages
30
H. DIETER ROMBACH AND MARTIN VERLAGE
themselves. Nevertheless, issues of enactment are also discussed because the execution or interpretation mechanisms directly affect some properties of the models described in the languages (e.g., inference mechanism in some rulebased systems).
0
0
Process programming vs. process improvement. Product engineering uses different languages for different purposes and different levels of detail (e.g., specification, design, code). The same is true for process engineering languages, In particular, the languages for implementing processes (i.e., process Programming) and notations for conceptual modeling of processes (i.e., process improvement) are ends of a spectrum. Process programming languages offer an interface to operating system services and allow the direct invocation of tools for software development; process models are built in a bottom-up manner by coordinating them and establishing rules for their use. Process improvementlanguages combine well-defined building blocks, ignoring technical details. Both kinds of languages reflect the objectives of two different camps within the process engineering community. Efforts have started to integrate ideas of both camps in order to provide comprehensive support for process engineering. Hidden vs. guiding models. During interpretation of process models (i.e., project lifetime), the representation of the process models may be hidden from or presented to the developers. In the first case, the interaction scheme is completely encoded in the process models or the tools. They define the appearance to the user, and actively filter the information of the current project state. In the second case, the models themselves are used to inform the user. This case requires readability, well-structuredness, and problemorientedness of the models or their instances. The processes should form a modularized set of information for a particular recipient in order to support the task to be performed. Often languages for models guiding developers include special constructs directly reflecting real-world objects (e.g., file or manual) or abstract concepts (e.g., termination condition). Prescriptive vs. proscriptive process models. From the beginning of software process research the idea of automated process execution was seen as one of the benefits that would justify the efforts undertaken in recent years (Osterweil, 1987). In general, sequences of tool activations are described in process programs (e.g., an edit-compile-link cycle). The user modifies the product using tools and additionally provides process information when asked by the execution mechanism. This prescriptive style of “invoking” people has been subject to criticism. The other discussed style of human interaction is called proscriptive, which means a nonrestrictive style of formulating processes (Heimbigner, 1990). The process models or its instances provide sufficient information for the developer to support performance of the tasks. The developer is free, under the limitations set up by
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
0
31
the project plan, to decide what actions to do on the product or process state. This requires flexibility of control flow which is to be ensured by the process representation language. Languages similar to procedural programming languages support a prescriptive style, whereas the modeling of proscriptive processes is supported by rule-based approaches. Single-person vs. multiperson models. Today’s development projects are typically not performed by a single person. Collaboration, or cooperation, between persons, teams, organizations, and companies is undertaken when software is built (compare to Perry and Kaiser, 1991). On all these levels process models should exist in order to support synchronization,delegation, and distribution of work units. Current process research has concentrated mainly on process support for a single person and for teams. Process representations for a single-person document engineeringprocesses which should ensure the sound application of a particular technique or method. Only the activities of a developer’s workspace are managed. However, different people’s processes need to be coordinated. To integrateprocesses of multiple persons, it should be able to identify the local workspace and how communication is performed with the other team members. To allow reasoning, this information should be explicitly documented in the process models, i.e., the languages should provide specific features to express concepts of collaboration.
In general, all combinations of the characterization scheme’s attribute values may exist. Of course, some combinations are more likely to exist than others. For example, it is not meaningful to build a language for improvement which should be hidden from the recipients by using simulators or enactment machines. Reasoning on the basis of the original (or pretty printed) process documentation should be allowed. For each of the approaches, a table is given which summarizes its features and degree to which the language satisfies the requirements. The information under characterization explains to what end of each attribute’s spectrum the process representation language tends. The second group of rows, under the subtitle requirements satisfaction indicates, whether a requirement given in Sections 4.2 and 4.3is met fully (+), partially (0).or not (-). This evaluation should not be understood as an assessment. Again, there is no best process representation language. The evaluation is intended to allow reasoning about the suitability of a language in a given context which is defined in terms of software engineering processes. Using Table II and Table In, one should easily assess the suitability of a language within that context.
5.2.1 AppVA AppYA (Ada Process ProgrammingLanguage with Aspen) is one of the formalisms developed and maintained within the Arcadia project (Taylor et al., 1988).
32
H. DIETER ROMBACH AND MARTIN VERLAGE
The underlying philosophy is to automate all aspects of a software engineering process to the highest degree possible. The major goal is to build a processsensitive environment which enables engineers to concentrate on their creative work. AppYA is an extension to Ada that provides constructs for the specification, implementation, and use of relational data structures (Sutton et al., 1989). The 'extensions are relations for persistent product data storage, triggers which are tasklike units and react on modifications of the product data, predicates which are comparable to constraints on product states, and transactions on the product data. It should be noted that these extensions include pure technical constructs. No domain-specific concepts (e.g., document, schedule, developers) were added to Ada. For illustration, a procedure of an AppYA process program is given here6: procedure Reschedule-Tasks(c-task: i n task-enum; mg r-i d : emp-i d-t ype is R e s c h e d u l e t h e g i v e n t a s k and a n y dependent tasks Begin send-msg(mgr-id, " R e s c h e d u l e change p r o c e s s tasks'' 8 I' b e g i n n i n g w i t h t a s k I' 8 t ask- enum'i m ag e ( c - t a s k ) ) ; s e r i a l r e a d Change-Process-Team, Task-Assignments, Task-Status, Ta s k-S t a r t-S c hedu 1e, Ta s k-Or de r ; begin mg r-a s s ig n-t a s k s ( m-i d 1; mgr-schedule-tasks(m-id); upda t e-pr o j ec t-p 1ans (m-i d, s u b p r o j - i d ; n o t if y-a f f ec t ed-pe r sonne 1; end s e r i a l ; End Res c h edu 1e-Ta s k s ;
--
The main objectives of AppYA are analysis and execution of process models. The process is described at a low (Le., machine-executable) level, concentrating on the technical steps that are strongly supported by tools (e.g., coding and validation). The program is described in a procedural-like fashion. User interaction is performed by queries when the course of action cannot be computed on basis of the product states (e.g., choose an action on a product object). The user may change the project state only by using this interaction mechanism or acting via tools. Process representations written in AppYA are translated into executable Ada code. Processes are represented by concurrent threads of control, namely, Ada tasks. During execution of the processes, the Ada runtime system is used.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
33
Table IV gives an overview of AppUA at a glance. AppYA is one of the first process representation languages for the implementation and automation of processes. Like many of the programming languagesAppV A lacks for problem-oriented constructs. The focus is on providing a sufficient functionality of the process models. Nevertheless, AppYA demonstrated that there is a strong relationship between languages developed for product development and the mechanisms needed for process representation.
5.2.2 TEMPO Because product and process data tend to become complex in a real-world project, adequate representations are provided in the Adele system by employing an entity relationship database with object-oriented extensions. Objects are aggregated to form complex part-of hierarchies and are allowed to have different versions or version branchs. The data is manipulated by an Activity Manager which understands process models as combinationsof events, triggers, and methTABLE IV CliARAclWzAnON AND EVALUATION OF hFWA Characterization m e s s programming AppYA is an extension of the programming language Ada. Hidden User interaction is implemented using message concept. Prescriptive Ada is a procedural programming language. Humans are asked to provide information. Multiperson Multiple persons may be represented as ordinary data structures. providing a communication port. Requirements satisfaction - No software process-specific constructs are provided. R1 Natural repr. R2 Measurable models - Variables may store measurementdata. No support for data collection is provided. R3 Tailorable models 0 As far as Ada provides genericity. 0 Rograms are executable,but process engineers must agree on mapping R4 Formality concepts on constructs. R5 Understandability - Not designed for interpretation by people. The process program's functionality is spread over several places (i.e.. tasks, triggers, predicates, transactions). R6 Executability + AppYA code is translated into Ada code. - Hard-coded control flow.The user selects altcmatives during process R7 Flexibility Nntime. - Products are global data structures. Task coordination is done by R8 Traceability triggers and messages.
34
H. DIETER ROMBACH AND MARTIN VERLAGE
ods. Single process steps are implemented as methods. Logic expressions are checked when a method is called. If the value is “true,” pretriggers may react on this event being executed before the method’s program part. Post-triggers may follow method‘s activation (Belkhatiret al., 1993). Both triggers and methods have a program part that specifies the actions to be taken when the method is executed or the triggers reacts on an event. The program part is comparable to process programs in AppVA. The absence of software process-related concepts, scattering of process information on different types of objects and relations, and the need for implementing a process model with multiple triggers have led to the definition of a process representation language within the Adele project, called TEMPO (Belkhatir and Melo, 1993). TEMPO allows the definition of work environments which describe the context for a particular process step (i.e.. products, users, tools) along with its definition. To tailor each entity of a project representation to a special context, the concept role is introduced. In a role definition it is described, what the purpose of each entity in the context is [e.g., the role of each module in context of process release is a component to become part of the release, the role of the program manager in context of process release is to modify (i.e., process development) and test (Le., process validation) the components]. For illustration, an excerpt of a TEMPO process program is given here:
TYPEPROCESS r e l e a s e 1 EVENT r e a d y = ( s t a t e := r e a d y 1; ROLE USER = PManager; ROLE i m p l e m e n t development; ROLE v a l i d = v a l i d a t i o n ; module ; C ROLE component ON r e a d y D O i 2 I F i m p l e m e n t . t o - c h a n g e . X n a m e . s t a t e == r e a d y THEN 3 implement.to-change.Xname.state := a v a i l a b l e ;
...
The process definitions do not hide information. For example, the process type release has unrestricted access to information of process instances of type development. This has a negative impact on understandability and traceability. Nevertheless, because a database approach is used, additional features like different types of browsers may overcome these shortcomings of TEMPO. Table V gives an overview of TEMPO at a glance. It should be noted that TEMPO allows both prescriptive and proscriptive process definition styles. Proscriptive style is used for coordinating chunks of prescriptive process programs (e.g., see Belkhatir et al., 1993). But it is still an
DIRECTIONS IN SOFIWARE PROCESS RESEARCH
35
open question whether the prescriptive style is appropriate for interpretation by humans. If not, TEMPO is more suitable for tool integration purposes than providing support for humans.
5.2.3 HFSP A Hierarchical and Functional Software Process Description [HFSP] (Katayama, 1989) documents software processes as mathematical definitions. Activities are seen as functions, primarily characterized by input and output data. Complex process models are refined in less complex functions. Several constructors exist to combine the functional process models. Complex data structures are stored in an Objectbase. TABLEV CHARACTERIWTION AND EVALUATION OF TEMPO (ADELE) Characterization Process programming Adding dynamic features to a database for software development projects is the motivation for Adele. Hidden The user invokes processes and observes effects. The programs are not intended to be interpreted by the user. Mproscriptive Program parts of triggers and methods are prescriptive, whereas conditions for events allow proscriptive process specification. Multiperson Adele supports concumnt process execution of multiple users, but the coordination mechanisms are implemented in the system rather than be specified in the process models. Rquirements satisfaction
0 Rocess-specific concepts are included (Lea,role), but are still too general (e.g., human agents and products use the same language constructs). R2 Measurable models - Objects may store measurement data. No support for data collection is provided. R3 Tailorable models + Inheritance mechanisms are used to refine and specialize process models. R4 Formality 0 Programs are executable, but process engineers must agree on mapping concepts on ConstNcts. R5 Understandability - Because of the dynamic features of TEMPO,it is difficult to derive the complete semantics of a process description. R6 Executability + TEMPO programs are executed by the Adele kernel. R7 Flexibility + Events and triggers allow appropriate reactions on the cumnt project state. R8 Traceability 0 Loosely coupled processes hinder proper representation of context. Establishing explicit relationships is needed. R1 natural repr.
36
H. DIETER ROMBACH AND MARTIN VERLAGE
Human-based interpretation, nondeterminism, and incremental specification during enactment require flexible mechanisms to compute the value of the functions (i,e., the final product) (Katayama, 1989). Therefore, reflection is used to allow objects to operate on their own status. Two functions are provided to capture the actual process state (i.e., snap) and to explicitly set a state for a process (i.e., resume). This allows manipulation of complex process trees. Moreover, there are mechanisms used for communication between different process trees. If each developer has his or her own process tree, the communication mechanisms can be employed for supporting collaboration. Functions (i.e,, processes) within an execution tree run concurrently and possibly are nondeterministic. Scheduling of functions is performed on basis of the availability of data that the processes have as input. Mechanisms of the objectbase are used to interrupt the performance of a function if the requested data item is locked by another function.’ Nondeterminism is expressed by specifying different decompositions of a function. At the moment the aggregate function is to be executed, guards decide what decompositionof this function to select. For illustration, a function of a HFSP process description is given here6: Test-Unit
(
code, t e s t p k g , h i s t o r y . i n I h i s t o r y . o u t , feedback.code, feedback.pkg, r e s u l t ) ( code, t e s t p k g I r e s u l t )
=>do-test where ( result 1 f e e d b a c k . c o d e = modify-code-req f e e d b a c k . p k g = modify-pkg-req ( r e s u l t 1 h i s t o r y . o u t = append-history ( h i s t o r y . i n , result 1
The system to executdenact the functional descriptions consists of four major components that allow for planning, backtracking, exception handling, and communicating with the user. Table VI gives an overview of HFSP at a glance. HFSP was one of the first approaches offering support for the definition on several abstaction levels. The need for clear process interfaces is caused by the complexity of processes in real projects. To manage this complexity it should be able to offer well-structured process programs to the people who maintain the process descriptions. Obviously this should especiallybe true when the process models are intended to be read by humans.
5.2.4 MSL-Marvel Strategy Language The Marvel system is a process-sensitive SEE,managing products in a repository called the engineering database (Columbia University, 1991). Processes are
DIRECTIONS IN SOF’TVVARE PROCESS RESEARCH
37
TABLE VI CHARACTERIZATION AND EVALUATION OF HFSP Characterization Tool integration Tools are seen as atomic processes in HFSP. Hidden Project members activate functions but the execution is completely left to an execution machine. Prescriptive There is a strict control flow. Multiperson Several people can coordinate each having an own process tree. Requirements satisfaction R1 Natural repr. R2 Measurable models R3 Tailorable models R4 Formality
R5 Understandability R6 Executability R7 Flexibility RE Traceability
-
No software process-specific constructs are provided. Functionshave no attributes. Each metric has to be defined as a separate data item. 0 Functions have formal parameters. But tailoring must be performed mostly by hand. + A functionaldefinition of processes and tools allows for formal process models. Nevertheless, the formal model comprises only a few constructs and seems not to be sufficient for all process engineering Purpo*S. 0 Because the functionality of a process completely relies on tools, it is difficult to determine what the effect of an invocation is. + Functions describing processes may run on a process engine. + Reflection, backtracking, and concurrency control allow contextdependent courses of action. 0 Within an execution tree formal parameters allow for traceability. Relationships between cooperating execution trees are difficult to follow.
-
described using a rule-based approach focusing on the automatable activities. A developer’s working environment is formulated as a strategy, consisting of three parts (i.e., entities with their relations, tools, and rules). Entities are defined using a data description language, comparable to constructs of programming languages (e.g., C). For each tool a so-called envelope has to be specified, describing how the Marvel objects are transformed to be accepted as tool parameters, and how the tool is called from the system. A rule consists of a pre- and a postcondition, which describe the valid states for starting and terminating processes, and an activity, which describes the rule’s functionality. Rules are defined independently from each other. Activities are comparable to shell scripts. The flow of control is determined by forward- and backward-chaining during enactment. Chaining analyzes the pre- and postconditions with respect to the current state of the engineering database. Forwardchaining determines, by matching the postcondition of the current executed rule with all preconditions of all other rules, what rules may fire next. Chains of several
38
H. DIETER ROMBACH AND MARTIN VERLAGE
rules are computed hypothetically. Backward-chaining works in the opposite direction, i.e., it determines for a specific rule what activities must be executed before the precondition becomes true. For illustration, a rule of a Marvel process description is given here6: editC?f:DOCFILEl: # i f t h e f i l e has b e e n r e s e r v e d , you can go ahead and e d i t i t (f.reservation-status C EDITOR e d i t ? f 3 (and (?f.reformat-doc CurrentTioe));
= Checked-out) = Yes) ( ? f . t i m e s t a m p
=
The opportunisticevaluation of nonhierarchicrules makes it hard to understand the control flow and data flow statically. Testing of strategies is required in order to validate the results of process modeling. The Marvel system itself has evolved from a single-user environment to a multiuser environment (Ben-Shaul et al., 1992). But no communicationprimitives were added to MSL. Instead, coordination rules have to be specified to describe conflict resolution when multiple rules interfere in their behavior. This is performed strictly on basis of the engineering database. Table VII gives an overview of MSL at a glance. MSL is a good example of a language that has both characteristicshidden and proscriptive. The user is not intended to interpret the process programs, but is able to direct the flow of control, restricted by the limitations of the rules.
5.2.5 MERLIN An approach similar to Marvel and MSL is the system MERLIN (Peuschel et al., 1992). MERLIN uses also rules for implementing process models, but offers a more sophisticated set of constructs to the process model developer than MSL. In particular, MERLIN uses the concept of a role to be able to delegate different work contexts to distinct project members. A work context contains all products to work on when performing a specific task. Different kinds of relationships between the products can possibly exist. In addition, there is a list of processes in each work context specifying the actions on the objects. A work context contains all necessary information for a role performing a technical product engineering process. Therefore the work context is a particular view on the current project state offering only the information to the role which is needed in order to perform a process. Work contexts related to the same piece of project data are updated when one user changes the information.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
39
TABLE VII CHARACTERIZATION AND EVALUATION OF MSL Characterization Rocess programming The purpose of Marvel is to provide process integration for tools. Hidden The user activates processes from a menu, but the execution of shell scriptlike process programs is performed by the system. Pdproscriptive A rule-based approach is used for specifying single processes which are themselves described in a prescriptive style. Chains of processes are computed on basis of the actual state of the product data. Single/multiperson Marvel has evolved from a single-userenvironment to a system coordinating developers. Requirements satisfaction R1 Natural repr. - No software process-specific constructs are provided. R2 Measurable models - Attributes express product features. No support for data collection is provided. Rules are not instantiated and theEfore not measurable. R3 Tailorable models - Except for instantiation parameters MSL models need to be tailored by hand. R4 Formality 0 The shell-likelanguage for describing activities lacks of clear semantics. R5 Understandability - Because of the highly dynamic features of Marvel. it is difficult to derive the complete semantics of a process description. R6 Executability + Process programs are executable shell-scripts. R7 Flexibility + Rule chains are computed at runtime using the rules' conditions. Each rule specifies a particular process step, not considering any other processes. Adding new rules does (mostly) not affect the definition of already existing ones. RS Traceability 0 Loosely coupled processes hinder proper representation of context. Establishing explicit relationships is needed.
The formalism used in MERLIN to describe process models is a PROLOGlike language. Forward- and backward-chaining is performed, which both are more powerful than the mechanism employed in the Marvel system. The backtracking adopted from PROLOG allows hierarchies of preconditions which result in smaller process models (Peuschel and Schiifer, 1991). Tools are invoked by a CALL. An envelope is used to transform data maintained by MERLIN in a form understandable by the tool and vice versa. In addition to tool invocation, the specification of a rule may contain also actions on the data stored in the repository (e.g., remove, insert, etc.). It should be noted that MERLIN manages users, roles, documents, and relationships between them. No additional information (i.e., amibutes) is provided by the process models. For illustration, a programmer's role definition specifying access to processes of a MERLIN description is given here:
40
work-on
H. DIETER ROMBACH AND MARTIN VERLAGE
,
,
module to-be-edited C specification error-report, review], C module 3 C3 I . work-on ( r e v i e w to-be-reviewed C specification module I , t review 3 C3 I. work-on ( object-code, to-be-executed C specification module 3 11 [object-code 3 1. (
,
,
,
,
,
,
,
,
,
,
Instantiated work contexts form hierarchies during a project’s lifetime. On the one hand, work contexts of teams form the basis for building work contexts of the team members. On the other hand, the building of work contexts for teams uses information provided in larger work contexts (e.g., project work context). Thus the hierarchy consists of three layers: a projectwide, a team-specific, and a layer of individual work contexts. Like Marvel, the first version of MERLIN was designed as single-user system. Later versions included mechanisms for multiuser support. Different work contexts can be offered to different people. This is one of the main advantages of the MERLIN system. Table VIII gives an overview of MERLIN at a glance. MERLIN can be seen as a system in which the main ideas of Marvel are picked up. Its originality stems from the fact that MERLIN allows the role specific definition of work context to offer an user an encapsulated piece of the whole project state.
5.2.6 SLANG The SPADE environment is a system for developing, analyzing, and enacting process models described in the language SLANG (Space LANGuage) (Bandinelli er al., 1993b). SLANG allows for both process modeling in-the-small and in-the-large. Process modeling in-the-small means to operate on a level of fine granularity where a direct relationship between operating system entities (e.g., files, tools, ports, etc.) and process model elements exists. Process modeling inthe-large refers to the development of conceptual process models, which possess a well-defined interface for reuse and are refined in a systematic way. SLANG offers concepts for both levels based on Petri net formalisms. SLANG is designed to allow incremental definition of process models and dynamic (i.e., during project’s lifetime) evolution of these models (Bandinelli er al., 1993b). In SLANG, tokens represent both product documents and control information. Control information is related to return values of tool invocations (e.g., module-
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
41
TABLEVIII CHARACTWGXTIONAND EVALUATION OF MERLIN Characterization
Process programming MERLIN is designed to integrate tools. User may decide what course of Hidden Proscriptive Singlehnultiperson
action to take. The user does not see the language. An interface is provided to manipulate the objects of a work context. The rule-based approach allows an opportunisticcourse of action throughout the whole project. The concept of a role is employed. There is a n :m-relationship between people and roles. Requirements satisfaction
~~~~
~
R1 Natural repr.
+
R2 Measurable models
-
R3 Tailorable models R4 Formality
-
R5 Understandability
-
R6 Executability
+
R7 Flexibility
+
R8 Traceability
-
+
Several constructs of MERLIN allow easy mapping between the real world and the model world. Rules are not instantiated. They represent no objects which can be measured. Except by using parameters, tailoring has to be performed by hand. The language constructs have fued and clear semantics (e.g., several document access rights can be specified). Because of the highly dynamic features of MERLIN, it is difficult to derive the complete semantics of a process description. An inference engine was developed from scratch for MERLIN that is able to interpret all definitions. Rule chains are computed on runtime using the rules’ conditions. Each rule specifies a particular process step, not considehg any other processes. Adding new rules does (mostly) not affect the defdtion of already existing ones. Information about roles, processes. and work contexts is spread over several definitions. Traceability of the c m n t project state is established by the use of tools and not by the models themselves.
_compilecisuccessfully). It cannot easily be distinguished between products and control information. The product flow between tools needs to be extracted out of all tokens. Single process steps (i.e., tools or processingof control information) are represented by transitions. So-called black transitions refer to tools. Time constraints may be associated to transitions specifying when a transition has to fire (i.e., process the information). Different types of arcs relate places and transitions.There are normal arcs, specifyingdata flow, read-only arcs, specifying that a copy from a token is taken by the transition instead of consuming it, and overwrite arcs, destroying tokens already existing on a place. User interaction is realized by representing user input with tokens and special places (represented by double-line circles) containing these tokens. Activities are introduced in
42
H. DIETER ROMBACH AND MARTIN VERLAGE
SLANG to build modules which support process modeling in-the-large. Each activity has a well-defined interface (specified by places) and an implementation which is represented by a Petri net (Bandinelli et al., 1993). For illustration, a transition (event) and its associated guard (condition for tokens to be removed from the places) and action of a SLANG description are given here: E v e n t P r e p a r e For T e s t ( D F T E : E x e c T e s t , CTR: TestSummary; RET: E x e c T e s t , TRBC: T e s t Summary 1 Guard CTR.#PendingTests > O Action RET = DFTE; TRBC.TestingModule = CTR.Tetsing\odule; TRBC.Results = CTR.Results; TRBC.#PendingTests = CTR.#PendingTests
-1; Activities are both abstractionsof Petri nets and tokens. This reflective approach allows for specifyingprocess models and process engines interpreting the models. The evolution of process models can be described in SLANG (Bandinelli et al., 1993a). In particular, the definition of activities may change over time. This does not affect already existing activity instances. The change is visible only for later instances of the activity definition. Table IX gives an overview of SLANG at a glance. The use of a Petri net formalism has the advantage that algorithms for analyses of Petri nets (e.g., deadlocks, reachability, etc.) are easy to adapt for process engineering purposes (see also Gruhn, 1991).
5.2.7 STATEMATE The STATEMATE system was intended to aid in specifying and designing real-time reactive systems (Hare1et al., 1990).The tool supports product engineering. It was then used at the Software Engineering Institute (SEI), CarnegieMellon University, to model software engineering processes (Humphrey and Kellner, 1989).The parts of STATEMATE discussed here are its three graphical representation languages. We call them briefly “Statemate.” The language activity charts is used to specify functions (what is done) in a structured analysislike fashion from the functional perspective. The language statecharts captures the behavior of processes (when and how it is done). The language module charts helps to specify the organizational aspects of the system or environment, documenting where and by whom a task is to be accomplished
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
43
TABLEIX CHARACreRlUnON AND EVALUATION OF SPADE Characterization Process programming Petri nets specify data flow between tools. Complex nets are encapsulated in abstract building blocks (i.e., activity definitions). Hidden Users of the models do not see the representation in SLANG. Interaction is managed by the process engine and by tools. Prescriptive Arcs between places and transitions exactly describe what course of action to take. Each project state transition has to be specified in advance. Multiperson User input is specified as tokens on special places. Multiple users are managed, but are not represented in the process models. The coordination mechanisms are completely implemented in the models. Requirements satisfaction - Each piece of information (e.g.. documents, messages, user input, etc.) is represented by a token. There are no special constructs for products or processes. R2 Measurable models - Transitionand token properties are fixed. No mechanismsfor specifying attributes exist. R3 Tailorable models 0 The constructs for modeling-in-the-largeallow for encapsulation. But tailoring has to be performed mostly by hand. 0 ER nets (timed high-level Petri nets) are the formal basis of SLANG. R4 Formality But process engineers must agree on mapping conceptson constructs. R5 Understandability 0 Overall functionality of concurrent processes is difficult to understand. + The Petri nets used for process modeling are completely executable. R6 Executability + Late binding allows for changing models at runtime. The Petri nets R7 Flexibility themselves describe a fixed set of alternatives. Once a process model starts to be executed, only the internal definition of subprocesses to be executed in advance is allowed to be changed. + Activities and SLANG models give coherent infomtion about R8 Traceability single process steps. Process are directly c o ~ e ~ t by e da m representing product and information flow. R1 Natural repr.
(Kellner and Hansen, 1988). Abstraction hierachies may be expressed using these three notations. After modeling each view, the separate views of a software engineering process are connected subsequently to provide a comprehensive model of the real-world activity. Because of Statemate’s intended use (development of products), the languages lack special constructs for representing concepts dependent on software engineering processes. One of Statemate’s advantages is that it allows for modularizing process models with respect to different aspects (i.e., the perspectives). This eases understanding and modeling of processes. For illustration, a graphical representation (Fig. 5 ) and a textual definition of an activity in Statemate is given here6:
44
H. DIETER ROMBACH AND MARTIN VERLAGE
:T E S T m L S DEV-CODE MODIFY-CODE
i SW-DEVELOPMENT-FILES
c TEST-UNIT FDBK-RE-TESTS
-
FIG.5 . Activity chart of TEST-UNIT.
A c t i v i t y TEST-UNIT Type: INTERNAL Sub o f : TECHNICAL-STEPS D e f i n e d i n c h a r t : FUNCT-1 Description: Functional description f o r the p r o c e s s s t e p TEST-UNIT Imp 1emen t ed b y n o d u 1e : ORGAN-1: UN IT-T E ST-T E AN O u t p u t c o n d i t i o n s : FUNCT-1:REWORK-TESTS; F U N C T-1: R E W 0 R K-C 0 D E ; FUNCT-1 :PASSED; In p u t d a t a it ems : 0 RG AN-1 :HO D-U N IT-T E S T-PKG ; ORGAN-1:OBJECT-CODE; O u t p u t d a t a - i t e m s : ORGAN-1:COVERAGE-ATTAINED; ORGAN-1 :TEST-RESULTS; ORGAN-1: FDBK-RE-TESTS; ORGAN-1 :FDBK-RE-CODE; ORGAN-1: T E ST-S U C C ES S ;
-
The languages allow for several analyses. Consistency, completeness, and correctness of several aspects of a process model are checked (e.g., balance of information flow between different levels of abstractions, missing sources and targets of information, reachability, or loops in definitions). Moreover, STATEMATE allows for simulating the specifications. This validation facility prior to project enactment is very helpful when trying to observe dynamic aspects of the defined process models. The traces recorded during simulation are later analyzed in order to find model faults or to make quantitative predictions. STATEMATE
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
45
produces also executable code in the programming language C. Nevertheless, there is the problem to integrate tools into the process definition. Table X gives an overview of Statemate at a glance. STATEMATE is an example for using languages not originally developed for software process research. Other examples exist which use approaches designed for product development (e.g., see Saracelli and Bandat, 1993). They mostly use a specification or design language [e.g., Structured Analysis Diagram Technique (SADT)]. In general, those attempts are helpful for software process researchers in order to relate concepts of products and processes.
5.3 MVP-L MVP-L is the process modeling language of the MVP project (Multiview Process Modeling). The MVP project began at the University of Maryland, College Park, Maryland, and continues at Universitiit Kaiserslautern, Germany (Brtkkers et al., 1992). The MVP project aims at providing support for modeling, planning, enactment, measurement, recording, and packaging of software engineering processes. The main idea is to split process descriptions into views, each TABLE X CHARACTERIZATION AND EVALUATION OF STATEMATE Characterization Improvement STATEMATE is used to investigate concepts to ease understanding of processes and process models. Guidance The process description is intended to support project people, although the system itself is not used for managing project states. Prescriptive State transition diagrams and functional decompositiondescribe a strict control flow. Multiperson The examples developed by Kellner and co-workers are intended to-coordinatetasks of multiple engineers and teams. Requirements satisfaction
- No software process specific constructs are provided. R1 Natural repr. R2 Measurable models 0 Project trace can be recorded. A static set of properties is taken. R3 Tailorable models - Except by using parameters, tailoring has to be performed by hand. 0 Statemate’s charts are formal. But process engineers must agree on R4 Formality mapping concepts on constructs. R5 Understandability + Diagrams modularize processes with respect to different aspects. + Process models are simulated or code is generated. The capabilities R6 Executability for tool invocation are limited. - Changing the models is not allowed during execution. Each alternative R7 Flexibility course of action has to be planned in advance. + The three charts disDlav well-structured information. R8 Traceability
46
H. DIETER ROMBACH AND MARTIN VERLAGE
view is generally defined as projections of a process representation that focus on selected features of the process. The language MVP-L was designed with respect to: natural representation, representation of different kinds of elementary models, use of formal process model interface parameters, and instrumentation of software engineering processes for data collection. MVP-L’s basic concepts are models to describe products, processes, resources (i.e., humans and tools), and attributes which are related to the former three concepts. Attribute values correspond to measurement data collected throughout a project. The attributes along with the process, product, and resource model instances can be used to guide developers or inform management about the process states. In addition to guidance, attributes allow improvement processes to observe product, process, or resource objects in order to evaluate new introduced technology or to identify problems during analysis. The example in Fig. 6 shows a MVP-L process model describing a test process named Test-UnifStnrcfurul. The textual representation is only one style of displaying models to the process model builder or user. The processinterface lists the attributes of the process (exports) and the criteria for starting and terminating the process (entry-exitcriteria). The process model is an elementary one, that means that it has no refinement (i.e., there are no subprocesses). Instead the processbody contains the keyword implementation and a comment. This means that this process step is completely performed by humans, instead of being executed by the process engine. The processresources describe what agents should perform the process and what tool support is available for them. The language MVP-L was designed to support measurement activities within software development projects. The basic concepts were captured by interviewing project members at different organizations. The notations was validated in several industrial projects (e.g., Klingler et ul., 1992). Table XI gives an overview of MVP-L at a glance. A project plan in MVP-L instantiates multiple process, product, and resource models with their associated attributes. This set of objects is a representation in the model world for the activities, software documents, and agents in the real world. It is important to note that there is only a single representationof the project. All roles participating in a project retrieve information from that representation. Communication is enabled through use of the project’s representation described using MVP-L. The textual process description (as shown in Fig. 6) is suitable for machine interpretation. Humans need more sophisticated presentations of the models and the project state. Moreover, each role in a project has particular information requirements. There is a need for building excerpts from the models and the states. These views offer tailored support to any role. It is one of the major goals within the MVP project to develop standard views for typical project roles. For example, a project manager might be interested only in the task
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
47
proceu-model Test-Unit-Structural ( eflO: Proceu-effort ) m i procw-interface importr product-model Source-Code, Object-Code, Driver, Results, Stub, Test-Package, Coverage-Re~ults; procau-attribute-model Process-effort; exports eff : Process-effort := em; product-fiow conmume IC : Source-Code; oc : Object-Code; It : Stub& dr : Driver; produce cr : Coverage-Reoultq consumegroduce tpckg : Test-Packqe; context en try-exit-criteria local-entry-criteria (tpckg.coverage < 0.9) and (tpckg.mtatua = 'complete') and (oc.status = 'complete') and (dr.status = 'complete') and (kstatus = 'complete') and (crstatw != 'complete') and (mc.complexity > 10); global-entry-criteria local-exit-criteria (cr.status = 'complete'); global-exit-criteria end process-interiece process-body implementation -- Goal OJ fhir procerr i s f o apply the f c r f data fo fhe module. The generic coverage fool Aor t o be used. The coverage valuer are fo be inrerfed i n f o the rclafcd atiribute. end process-body process-resources person nel-assignment imports resource-model Tert-Engineer, Manager; objects tester : Test-Engineer; m g : Manager; tool-assignment iniports resource-model Code-Coverage-Tool; objects gct : Code-Coverage-Tool; end process-resources end process-model Test-Unit-SLructurd
----
FIG.6. MVP-L process model describing a test process.
breakdown structure and process attributes like state and effort. In particular, within the MVP project the GQM approach (Basili et al., 1994b) is used to define views on the project state in terms of measurementdata.The informationpresented to the manager might look like as shown in Fig. 7. The figure shows a task breakdown structure of a well-known reference example (Kellner et al., 1990). The view displays condensed information about the
48
H. DIETER ROMBACH AND MARTIN VERLAGE TABLE XI CHARACTERIZATIONAND EVALUATION OF W P - L Characterization
Improvement The language was designed to support the QIP. Several real-world activities were modeled and extended by measurement activities. Guidance MVP-L is intended to be interpreted by humans. A project state is managed by the MVP-System. Proscriptive A rule-based approach is employed. No control flow constructs are provided. Multiperson Humans may assume different roles and are assigned to processes in an unrestricted way (n : m-mapping). Requirements satisfaction
R1 Natural repr. R2 Measurable models R3 Tailorable Models
R4 Formality
R5 Understandability
R6 Executability
R7 Flexibility R8 Traceability
Main concepts of software development processes are considered in MVP-L one-to-one. Attributes are directly related to measures derived by using the GQM approach. Except of instantiation parameters MVP-L models need to be tailored by hand. The purpose is to design processes. The basic features are well-defined but there are minor ambiguities (i.e., in the definitions of multiple instantiations of objects (*-operator) and consistency between different levels of abstraction). MVP-L was designed by analyzing interviews with developers. The language reflects concepts of their understanding. The rule-based approach makes it sometimes difficult to determine the course of action in advance. MVP-L mainly supports processes enacted by humans. Interfaces between processes are managed, a project state is managed. Tool invocation is directly supported for measurement tools. Access to development tools is provided but they are not automatically invoked. Processes may be added or deleted without affecting other ones. Entry and exit criteria allow for specifying consistency constraints for the interpretation of models. Several relationships between objects ease browsing. Special ones only appear in project databases and are not considered in MVP-L (ex., uses-relationship between modules).
effort spent so far in the project and the state of each task which is listed in the project description. The process Test-UnitStrucfurul in the right-lower comer is the one described in Fig. 6. This view presents a snapshot of the project state. It could be used to monitor progress of the project or to control resources. Another example of a view derived from an MW-L description is shown in Fig. 8. This view is derived in order to guide a test engineer by providing information about the task the role has to perform.The objects on the left side
49
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
Legend:
I :nfinuncnt FIG.7. A project manager's view.
refer to products that are modified (i.e., testdocument) or read (i.e., module). The boxes in the middle denote activities that are to be performed by the human, which assumes the role test engineer; the round-cornered boxes denote intermediate results. It is intended that either restlrnitfunctioml or test-unitstmcturul is performed when enacting the process steps. The information on the right side shows the entry and exit criteria for each process (because of limited space the
........... ,' test-document
',
....................
tcstulun.statur = 'complete' A
tutguckagc.stutus = 'incomplete' tcstjackage.stutus = 'complete'
............. - - - - - - - - -.
...
aourte-code.compcxity z 10
vulidatc ....................
FIG.8. A test engineer's view.
> 0.9
specifiution.statur = 'complete' A rcsulu.rtamr = 'complete' testgrchgc.rtatus = 'complete' ( hilurcs.rtntus = 'complete' A failurcs.failurc-count= 0 ) v ( fuilurcs.stutus= 'complete' A failures.Mure-count > 0 )
A
............
A
... A cov-m.covcrage
,' module
50
H. DIETER ROMBACH AND MARTIN VERLAGE
criteria for functional and structural testing are (partially) omitted). The four circles in the middle of Fig. 8 denote attributes that possibly are used in a process criteria to specify control flow (e.g., the use of the coverage value in the tesr-unitstructurul s exit criteria). The dashed lines indicate aggregating objects of a higher abstraction level. The language MVP-L was used in several industrial projects to model technical processes (e.g., see Klingler et al., 1992). The experience gathered from the case studies helped to improve the formalism. Additionally, suggestions for tool support were made. Prototypical tools were developed (e.g., syntax-directed editor, model browser, consistency checker) and are used in projects between industry and the Software Technology Transfer Initiative Kaiserslautern (STTI-KL).
6. Support for Software Engineering Processes The focus of software engineering in the 1970s and 1980s was clearly on the product. Software process research has contributed to the understanding of software development. Emphasis on software development activities resulted in a better support for both product and process engineering processes. The ideas and approaches developed so far in the software process research community are present in a variety of tools. In general, we can distinguish between tools for developing software process models and tools for interpreting the models (i.e.. process-sensitive SEEs). In this section, a brief discussion about tool support for software process engineering processes is given in order to point out how future softwaredevelopmentand improvement of software engineeringknow-how could look like. In particular, Section6.1 presents some tools for process model development, Section 6.2 discusses process-sensitive SEEs, and finally Section 6.3 introduces the process-sensitive SEE MVP-S.
6.1 Tools for Process Model Building and Project State Browsing Tool support for process engineering is needed as for product engineering. Most of the systems developed for execution of process descriptions consist also of componentsfor management of process models. In this subsection we illustrate what tools are available for support of process model building and browsing of the project state. The structure of the brief overview is oriented toward the benefits Bl-B4 in Section 4.1. This issue is related to B5, process-sensitive SEEs, discussed in the next section. Tools for supporting human communication are needed because textual data entry and verification of process models is error-prone and it is difficult to
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
51
conceptually grasp the nature of the process models in textual from (Krasner et al., 1992). The tools can be classified either as filters or as translators. Filters reduce a representation’s complexity by removing information fsom a process model. This is helpful when one concentrates on a few aspects of a model (e.g., product flow). Translators build a representation of a model in a different style or language (e.g., graphical).Mostly a combinationof both is present in a language system (e.g., browsers for displaying process model relationships). Graphical editors (which are filters and bidirectional translators) allow partial specification of a process model. Textual input is required to add more detailed information [e.g., RocessWEAVER (FernstrOm, 1993)l. Packaging of experience is the issue supported most weakly by existing model management tools. In particular, we know of only two examples where it is able to relate process models and experience (Le., Decker and Valett, 1992; Oivo and Basili, 1992). Data about product and process qualities is attached to objects or the corresponding models to allow use of experience in other projects. The Software Management Environment (Decker and Valett, 1992) does not use executable process models but is a good example of how improvement is supported in an organization. Analysis and reasoning is supported in several ways. In general, one can distinguish between analyses for checking static and dynamic properties, and simulation. Checks for static properties include parsing, type checking, completeness, and other checks comparable to those performed by a compiler for a usual programming language. Each system containing a process engine has a more or less powerful component to check for static properties. This is especially true for all approaches presented in Section 5. Checks for dynamic properties are based on a formal definition of process model behavior and allow for detection of properties like reachability of a particular state, or deadlocks. Petri net-based approaches offer such support more frequently than other approaches [e.g., the SPADE environment (Bandinelli et al., 1993b)l. The reason is that Petri net formalisms have a formal foundation. Algorithms developed for Petri nets in general can be adapted for process modeling. Simulation of project plans is useful when dynamic properties cannot be detected on a formal basis. The STATEMATE tool is the one introduced in Section 5 which has the most advanced facilities for observing process behavior. Tools for supporting project guidance and control offer particular views on project states. Often a graphical representation is provided to present the user the objects accessible and the processes the user is allowed to perform (e.g., Marvel, MERLIN). If graphical notations are used for model building, these languages are also used during interpretationto display process state information [e.g., RocessWEAVER (Fernstr6m, 1993)l. Sophisticated tools offer query mechanisms to provide answers for particular user questions (e.g., list all modules with state “coded”). An example can be found as part of Adele.
52
H. DIETER ROMBACH AND MARTIN VERLAGE
6.2 Process-Sensitive Software Engineering Environments The first generation of software engineering environments mainly supported performing technical processes. Management and process engineering processes were only supported frequently. Modern software engineering environments provide “automated support of the engineering of software systems and the management of the software processes (European Computer Manufacturers Association, 1993).” This definition includes support for all software engineering processes (i.e., product and process engineering above). Several schemes exist for classifying software engineering environments (for a sample list it is referred to Fuggetta, 1993). The environments are distinguished with respect to many dimensions. One of them distinguishes between classes of tool integration, i.e., how tools work together in the same environment (Thomas and Nejmeh, 1992): Presentation integration means that the tools’ appearance and behavior is similar and all tool realizations rely on the same interaction paradigm. 0 Data integration enables maintenance of consistent information being exchanged between tools. Format, meaning, and data must be known for each tool accessing the data. 0 Control integrarion complements data integration by allowing tools to share functionality. Functions are provided and used by other tools. 0 Process integration means the strength of a tool-set to support an entire process, to react in the same way on events, and to enforce constraints which restricts the some aspect of the process (i.e., constraints on data and process state.) 0
The task for software process research is to provide mechanisms for integration of tools to build process-sensitivesoftware engineering environments. Such environments promise an improvement over pure product-oriented environments in offering better support for coordinating developers (Penedo and Riddle, 1993). In general, the interpretationof an explicit process description allows determining the way tools interact. All of the languages discussed in Section 5 can be used to formulate process descriptions that can be understood by process engines. For example, AppUA is the language designed for the Arcadia environment. The process-sensitive software engineering environments developed so far focus on specific issues. A comprehensive discussion is out of the scope of this paper. We refer for example to Penedo and Riddle (1993). First versions of such environments are commercially available to support software development organizations (e.g., Christie, 1994). Several other systems exist as research prototypes (a sample list can be found throughout Section 5 ) . Nevertheless, automated and integrated support for process engineering processes is rare. An overview of measurement support for example can be found in Lott
DIRECTIONS IN SOFWARE PROCESS RESEARCH
53
(1994a). At the moment it seems that each process engineering process is supported by locally operating tools, but integration has not yet been accomplished. Current process-sensitive SEES tackle only a few problems of real software projects [e.g., backtracking in MERLIN (Peuschel and Schiifer, 1991) or process evolution in SPADE (Bandinelli et al., 1993a)l. The challenge now is to integrate all the ideas and solutions.
6.3 MVP-S The MVP (multiview process modeling) project at Universitiit Kaiserslautem aims at providing an environment for modeling and analyzing process models, instantiating them in project plans, enacting the plan by a process engine, recording project data, and packaging experience in explicit process and quality models. This environment is called MVP-S. Process models and a project plan are described in MVP-L (Section 5.3) using the modeling facilities of MVP-S. This plan is instantiated by the process engine to serve as an information system of the project. Moreover, the explicit project state maintained by MVP-S builds the context for development tools, and specifies dependencies between development processes. Project traces and measurement data are used later on to derive information about product and process qualities in order to improve the knowledge concerning software development and process performance. Figure 9 shows the overall design of MVP-S. The system is divided into three parts. Services are offered to the users of the system in order to manipulate models, project plans, and project states. Tools provide the functionality of MVP-S with respect to models and project plans. The object management systems maintains persistent objects, like models, project states, products, or measurement data. The grey shaded boxes refer to system parts which are already realized in a prototypical manner. Each of those subsystems needs to be evolved in order to provide a rich framework for supporting product and process engineering processes as described in Section 4.1. The current research prototype realizes the basic ideas presented in Lott and Rombach (1993). Measurement-based project guidance of product engineers is supported by MVP-S. First solutions with respect to multi-view modeling and packaging of experience exist. Multi-view modeling means to allow project members to describe their particular view on the processes they are involved in. Later integration is performed to build a comprehensive software process models out of the single views (Verlage, 1994). Packaging of experience is done either by building new model versions (e.g., specifying stronger criteria, or providing fine-tunedprocess refinements) or by adding quantitativedata to models or objects in the experience database. The design of MVP-S should not be understood as a general architecture for process-sensitive software engineering environments. Instead it should illustrate
54
0
H. DIETER ROMBACH AND MARTIN VERLAGE
-.objects( m o d c poduct. ~ { ~ ~p,m~c~w~UqM q. ... u a) l i t.,.Iy . G Q M .
project databaae project plans
-.GQMI
.pmduas
-. FIG.9. Functional design of MVP-S.
-.measurement data project wce
uses
what services are relevant for software engineering support. The need for each tool is caused by our definition of software engineering in Section 1.2. Building comprehensive object management systems for software development organizations which support process-sensitive software engineering environments is a complex problem domain for its own (Estublier, 1994). To give satisfactory solutions in all these areas is far beyond the goals of the MVP project. Moreover, to provide a complete set of support technology is a task for the whole process community.
7. Future Directions in Software Process Research Software process research originated because of the need to understand and manage software engineering processes. Economic pressure motivates organizations to invest in this area. Research should provide solutions to problems encountered in practical software projects. Answers to questions about the nature of software and softwaredevelopment should also be given in order to move software engineering from an art to a discipline. We now give a partial list of future activities in software process research both from a practical and a research perspective.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
55
7.1 Practice Currently, the importance of process engineering has been recognized by industry. Especially the work on software maturity and CMM by the SEI has contributed to this increasing awareness (Humphrey, 1989, 1991). Process improvement groups are being established in many companies. However, most of the work performed by these groups can be characterized as organizational, informal, and qualitative. Explicit formal process models are rare (e.g., Klingler et af., 1992). Furthermore, many process measurement groups are being established in companies. However, most of the work performed by these groups can be characterized as exploratory (not tied to company goals and not based on a sound measurement process) and isolated (not related to explicit process models and therefore not contributing to process improvement). Sound measurement processes are rare (e.g., Pfleeger and Rombach, 1994). Most software process representation languages do not reflect practical needs (see the tables in Section 5). are not documented well, and are not supported by appropriate tools. Only a few languages have been used in real pilot projects, and only few tools and process-sensitive software engineering environments can be bought (e.g., see Christie, 1994). Nevertheless, it is important that industrial environments contribute to the further improvement of process technology by providing laboratory environments to: 0
0
0
Test existing technologies in pilot projects: Only studies and experiments in practical projects enable software process researchers to identify the strengths and weaknesses of existing technologies. Thereby, requirements and suggestions for further improvement are derived. Develop explicit models of their current processes, so the appropriateness of existing languages can be tested. The value of reference examples is demonstrated within the community (Kellner and Rombach, 1990; Kellner et al., 1990). Additionally, external process models are helpful to demonstrate suitability and point to problems (Klingler et af., 1992). Build prototype tools to support model building. Almost all of the actual process-sensitive environments require fine-grain process programs. Development and maintenance of these models needs effort that is difficult to justify. Building blocks or abstract languages are needed to ease modeling activities.
7.2 Research Software process research is a growing field of interest. Several forums allow for discussing process related issues and exchanging ideas (e.g., International Conference on Software Process, several sessions at the past InternationalConferences on Software Engineering [ICSE], International Software Process Work-
56
H. DIETER ROMBACH AND MARTIN VERLAGE
shops [ISPW], European Workshops on Software Process Technology [EWSFT], Workshop on Process-Sensitive Software Engineering Environment Architecture [PSEEA], Japanese Software Process Workshop [JSPW]. But there is still a demand for further research: 0
0
0
0
0
0
Provide a comprehensive, consistent framework and terminology for software process research. First steps toward this directions have been made (i.e. Conrad et al., 1992; Feiler and Humphrey, 1993; Lonchamp, 1993). But further efforts are needed to develop a common understanding in this rapidly emerging field, This is especially needed when solution islands developed so far are to be integrated. Business process (re)engineering is a related area (Swenson, 1993). Only a few attempts have been made to compare both fields with each other. Commonalities and differences between both areas have to be identified in order to compare them and to transfer solutions from one domain into the other. Team support requires special mechanisms. Computer Supported Cooperative Work (CSCW) researchers have developed models for coordinating the work of people with explicitprocess descriptions outside the process research area (Grudin, 1994). Software process researchers should investigate how far the solutions can be adopted. If they cannot be employed in processsensitive environments, it has to be checked what specialities restrict the adoption of technology from the similar CSCW domain. Process-sensitive software engineering environments have a potential to support the entire set of roles in a software project (Christie, 1994). Technicians and managers have to share the same project representation. This enables new concepts of guidance and coordination (Lott and Rombach, 1993). Measurement is supported and feedback about product and process qualities is given on-line to all interested parties. Methods for modeling software processes are needed in order to allow systematic development of process models. Most examples apply methods for product development in the process domain. But, tailored solutions are needed. It should be possible to define process models from different perspectives (e.g., developer, project manager, quality assurance engineer, etc.). Each of these role emphasizes different aspects of the process (Rombach and Verlage, 1993). An approach is needed to integrate the different views of these roles (Verlage, 1994). Project planning can use the explicit models to formulate project plans and attach quantitative goals to each process instance (e.g.. to specify deadlines or quality goals) (Saracelli and Bandat, 1993). An idea of combining approaches for measurement and project planning is given in Lott and Rombach
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
0
0
57
(1993). But more effort is needed in order to provide an integrated support for all participating roles. Project plans mostly do not remain stable over a project’s lifetime. Replanning is required to modify the plan according to the changed environment. Mechanisms for process evolution have to be provided (Conradi et al.. 1993). Some attempts have been made to tackle that problem (e.g., Bandinelli et al., 1993a). But more research has to be done in order to allow change of process models during enactment. Presenting process information to the user and the interaction paradigm between process engine and the engineer are crucial for the success of software process research results (Krasner et al., 1992). Graphical user interfaces should be tailored to the specific needs of humans when they are concerned with specific tasks. These interfaces should allow easy access to data and present the data in an understandable form. But presentation is not only a question what adequate presentation formalism to choose but moreover mechanisms for offering views on the project state are required.
8. Summary Software plays a crucial role in our lives. Therefore it is not acceptable that software is developed and maintained in a handcrafted way. Software needs to be developed and maintained in a predictable and certifiable way. This requires sound engineering principles. The principles are embodied in the methods, techniques, and tools used for development and should be recognized in the activities performed. Nevertheless, current engineering knowledge is of an unsatisfactory kind and diversity of development environments make it impossible, at least for the moment, to set up general rules which guarantee proper quality of software products when strictly followed. Therefore the product engineering processes are itself an object of study. Process engineering processes are performed to support and improve the product engineering processes. A framework integrates both kinds of processes by classifying project related and organization related processes (Section 4). It is oriented by the experience factory idea (Basili et al., 1994a). The experience factory approach illustrates the relationships between the different processes, which may become complex. Managing the processes needs support. The main focus of process engineering processes is on the activities which develop software, and therefore it is needed to document the product engineering processes. In particular, improvement, reuse, measurement, and modeling and planning benefit from explicit representations of process elements and their relationships in various ways. Supporting communication, structuring, analysis,
58
H. DIETER ROMBACH AND MARTIN VERLAGE
reasoning, guidance, control, and automation allows the design of improved processes through modification of the models. Notations are needed to capture the relevant aspects of the processes. Various suggestionswere made in recent years. A spectrumof example process representation languages is given in this chapter. Each of these languages tries to support at least one of the process engineering processes by addressing some of the benefits. The value of each process representation language in a particular context depends on the kind of support for each process engineering process. To allow reasoning about each language’s contribution, a three-step approach is taken: (1) It is stated how process engineering processes profit from explicit process models (Section 4.1), (2) requirements are set up which have to be fulfilled for achieving the benefits (Sections 4.2 and 4.3), and (3) for each language it is stated how far it fulfills the requirements (Section 5). Tracing this information from the back to the front gives an impression about how a language supports a particular process engineering process. Automated support for process model management exists in form of tools and process-sensitive software engineering environments. A brief discussion of already developed systems is given in Section 6. The sample set of process representation languages and software engineering environments is not expected to be a complete set offering solutions for each problem in the area of process engineering. Each of them tackles a distinct problem. Integration of the solutions is needed to offer comprehensive support. Moreover there are open issues that justify additional research effort. A tentative research agenda is given in Section 7.2 in order to give a flavor about what users of software process technology can expect Erom the community in the future, ACKNOWLEDGMENTS The authors would like to thank the entire research group at Universitiit Kaiserslautem for fruitful discussions and inspirations. All members of the Arbeitsgruppe Software Engineering have made a personal contribution to this work.
ENDNOTES I
The reuse process is represented by sub-processesto show the relationshipsbetween organizational
units in more detail.
* In fact, this is exactly the same way the various product engineering processes were introduced in Sec. 1.3. This is exactly the reason why the term reuse was not introduced as a product engineering process. The two boxes labeled Project Data Management and Product Management of Fig. 2 are part of the project database in Fig. 4. Also called process-centered or process-based. Taken from Example Solutions to a Common Process Modeling Problem Handout at the 6th International Software Rocess Workshop, Hakodate, Japan, October 1990. In the references nothing is stated about the deadlock problem.
’
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
59
Abdel-Hamid, T. K.(1993). A multiprojectperspective of single-project dynamics. J. Syst. Sofhvare 22(3), 151-165. Armenise. P., Bandinelli, S.,Ghezzi, C., and Monenti, A. (1992). Software process languages: Survey and assessment. In “Proceedings of the 4th Conference on Software Engineering and Knowledge Engineering, Capri, Italy,” pp. 455-462. IEEE Computer Society Press, Los Alamitos, CA. Bandinelli, S. C.. Fuggetta, A., and Ghezzi, C. (1993a). Software process model evolution in the SPADE environment. IEEE Trans. Sofrware Eng. SE-19(12), 1128-1 144. Bandinelli, S.C., Fuggetta, A., and Grigolli, S.(1993b). Process modeling in-the-large with SLANG. In “Proceedings of the 2nd International Conference on the Software Process” (L. Ostemeil, ed.), pp. 75-83. IEEE Computer Society Press, Los Alamitos, CA. Basili. V. R. (1975). Iterative enhancement: A practical techniques for software development. IEEE Trans. Sofnvare Eng. SE-1(4), 390-396. Basili, V. R. (1992). Software Modeling and Measurement: The Goal/Question/Metric Paradigm,” Tech. Rep. CS-TR-2956. Department of Computer Science, University of Maryland, College Park. Basili, V. R. (1993). The Experience Factory and its relationship to other improvement paradigms. In “Proceedings of the 4th European Software Engineering Conference” (I. SommeMlle and M. Paul, eds.), Lect. Notes Comput. Sci., No. 717. pp. 68-83. Springer-Verlag, Berlin. Basili, V. R., and Rombach, H. D. (1988). The TAME Project: Towards improvement-oriented software environments. IEEE Trans. Sofnvare Eng. SE-14(6), 758-773. Basili, V. R., and Rombach, H. D. (1991). Support for comprehensive reuse. Sofhvare Eng. 1. 6(5), 303-316. Basili, V.. Caldiera, G., McGany, F., Pajersky, R., Page, G.,and Waligora, S.(1992). The Software Engineering Laboratory-an operational Software Experience Factory. In “Roccedings of the 14th International Conference on Software Engineering,” pp. 370-381. ACR, New York,NY. Basili, V. R., Caldiera, G., and Rombach, H. D. (1994a). Experience Factory. In “Encyclopedia of Software Engineering” (J. J. Marciniak, ed.), Vol. 1. pp. 469-476. Wiley, New York. Basili, V. R., Caldiera,G..and Rombach, H. D.(1994b). Goal question metric paradigm.In “Encyclcpedia of Software Engineering’’ (J. J. Marciniak, ed.), Vol. I. pp. 528-532. Wiley, New York. Basili, V. R., Caldiera. G.,and Rombach. H. D. (1994c). Measurement.I n “Encyclopedia of Software Engineering” (J. J. Marciniak, ed.), Vol. 1, pp. 646-661. Wiley, New York. Bauer, F. L. (1972). “Software Engineering,” Information Processing. North-Holland Publ., Amsterdam. Bekhatir, N., and Melo, W. L. (1993). Supporting software maintenance processes in TEMPO. In “Proceedings of the Conference on Software Maintenance,”pp. 21-30. IEEE Computer Society Press, Los Alamitos, CA. Bekhatir, N., Estublier, J.. and Melo, W. L. (1993). Software process model and work space control in the adele system. In “Roceedings of the 2nd International Conferenceon the SoftwareProcess” (L. Osterweil, ed.), pp. 2-1 1. IEEE Computer Society Press, Los Alamitos, CA. Ben-Shaul, I. Z., Kaiser, G.E., and Heineman, G.T. (1992). An architecture for multi-user software development environments. In “Roceedings of the 5th ACM SIGSoftlSIGPLAN Symposium on S o h a r e Development Environments” (H. Weber, ed.), pp. 149-158. ACR, New York,NY. Appeared as ACM SIGSoft Sofrware Eng. Notes 17(5) (1992). Boehm. B. W. (1988). A spiral model of software development and enhancement. IEEE Compur. 21(5), 61-72. Brilckers,A., Lott, C. M., Rombach,H. D., Verlage,M. (1992). “MVPLanguageReport,” Tech. Rep. 229/92. Department of Computer Science, University of Kaiserslautern,Kaiserslautem, Germany. Christie, A. M. (1994). A Practical Guide to the Technology and Adoption of Software Process Automation, Tech. Rep. CMU/SEI-94-TR-007. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA.
60
H. DIETER ROMBACH AND MARTIN VERLAGE
Columbia University (1991). “The 3.0.1 User’s Manual.” Columbia University, New York. conradi. R., Femswrn. C., Fuggetta. A., and Snowdon, R. (1992). Towards a reference framework for process concepts. In “Proceedings of the 2nd European Workshop on Software Process Technology” (J.-C. Derniame, ed.). pp. 3-17. Springer-Verlag, Berlin. Conradi, R., Fernswm. C., and Fuggetta, A. (1993). A conceptual framework for evolving software processes. ACM SIGSofr Software Eng. Nores 18(4),26-35. Curtis, B., Kellner, M. I., and Over, J. (1992). Process modeling. Commun. ACM 35(9), 75-90. Davis, A. M. (1990). “Software Requirements-Analysis and Specification.” Prentice-Hall, Englewood Cliffs, NJ. Decker W., and Vallet, J. (1992). “Software Management Environment (SME) Concepts and Architecture.” Tech. Rep. SEL-89-103. NASA Goddard Space Flight Center, Greenbelt, MD. Deming. W. E. (1986). “Out of the Crisis.” Massachusetts Institute of Technology, Cambridge, MA. Dowson, M., and Femswm, C. (1994). Towards requirements for enactment mechanisms. In “Proceedings of the 3rd European Workshop on Software Process Technology” (B. C. Warboys, ed.), Lect. Notes Cornput. Sci., No. 772. pp. 90-106. Springer-Verlag, Berlin. Estublier, J. (1994). What process technology needs from databases. In “Proceedings of the 3rd European Workshop on Software Process Technology” (B.Warboys, ed.), Lect. Notes Cornput. Sci., No. 772, pp. 271-275. Springer-Verlag, Berlin. European Computer Manufacturers Association (1993). “Reference Model for Frameworks of Software Engineering Environments,” Tech. Rep. TR-55. ECMA, Geneva. Feigenbaum, A. V. (1991). “Total Quality Control.” McGraw-Hill, New York. Feiler, P. H., and Humphrey, W. S. (1993). Software process development and enactment: Concepts and definitions. In “Proceedings of the 2nd International Conference on the Software Process” (L.Osterweil, ed.), pp. 28-40. IEEE Computer Society Press, Los Alamitos, CA. Feldman, S. I. (1979). Make-a program for maintaining computer programs. Softwure-Pract. Exper. 9,255-265. Fernswm, C. (1993). Process WEAVER: Adding process support to UNIX. In “Proceedings of the 2nd International Conference on the Software Process” (L. Osterweil, ed.), pp. 12-26. IEEE Computer Society Press, Los Alamitos, CA. and Isoda, S. (1994). Success factors of systematic reuse. IEEE Softwure 11(5), 14-22. Frakes, W., Fuggetta. A. (1993). A classification of CASE technology. IEEE Cornput. 26(12), 25-38. Fuggetta, A., and Ghezzi, C. (1994). State of the art and open issues in process-centered software engineering environments. J. Sysr. Sofrware 26(1). 53-60. Ghezzi. C., ed. (1994). “Proceedings of the 9th International Software Process Workshop, Airlie, Virginia.’’ IEEE Computer Society Press, Los Alamitos, CA. Gibbs, W.W.(1994). Software’s chronic crisis. Sci. Am., September, pp. 86-95. Gilb, T. (1988). “Principles of SoftwareEngineering Management.” Addison-Wesley,Reading, MA. Gish, J. W.,Huff, K., and Thomson, R. (1994). The GTE environment-supporting understanding and adaptation in software reuse. In “Software Reusability” (W. Schilfer, R. Prieto-Diaz, and M. Matsumoto. eds.), Chapter 6. Ellis Harwood, Chichester, England. Grudin. J. (1994). Computer supported cooperative work: History and focus. IEEE Compur. 27(5), 19-26. Gruhn, V. (1991). Validation and verification of software process models. Ph.D. thesis, University of Dortmund, Dortmund, Germany. Haase, V., Messnarz, R., Koch, G.,Kugler, H. J., and Decrinis, P. (1994). Bootstrap: Fine-tuning process assessment. IEEE Software 11(4), 25-35. Harel, D., Lachover. H.,Naarnad. A.. Pnueli, A., Politi, M.. Sherman, R., Shtull-Trauring, A., and Trakhtenbrot, M. (1990). Statemate: A working environment for the development of complex reactive systems. IEEE Trans. Softwure Eng. SE-16(4).
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
61
Heimbigner, D. (1990). Proscription versus prescription in process-centered environments.In “Proceedings of the 6th International Software Process Workshop” (T. Katayama. ed.), pp. 99-102. IEEE Computer Society Press, Los Alamitos. CA. Hoisl. B. (1994). “A Process Model for Planning GQM-Based Measurement,” Tech. Rep. STTI94-ME. Software Technology Transfer Initiative, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany. Humphrey, W. S. (1989). “Managing the Software Process.” Addison-Wesley, Reading, MA. Humphrey, W. S.(1991). Recent findings in software process maturity.In “Software Development Environments and Case Technology” (A. Endres and H. Weber, 4s.). Lect. Notes Comput. Sci., No. 509, pp. 258-270. Springer-Verlag,Berlin. Humphrey, W. S., Kellner, M. (1989). Software process modeling: Principles of entity process models. In “Proceedings of the 11th International Conference on Software Engineering,” pp. 331-342. IEEE Computer Society Press, Los Alamitos. CA. Humphrey, W. S.,Snyder, T. R., and Willis, R. R. (1991). Software process improvement at Hughes Aircraft. IEEE Sofrware 8, 11-23. International Standards Organization (1991a). “IS0 8402: 1991-Quality: Terms and Definitions.” ISO. International Standardization Organization (1991b). “IS0 9ooo: Quality Management and Quality Assurance Standards; Part 3: Guidelines for the Application of IS0 9001 to the Development, Supply and Maintenance of Software.” ISO, Geneva, Switzerland. Katayama, T. (1989). A hierarchical and functional software process description and its enaction. In “Proceedings of the 1lth International Conference on Software Engineering.” pp. 343-352. IEEE Computer Society Press, Los Alamitos, CA. Kellner, M. I. (1989). Softwareprocessmodeling:Value and experience.In “SEI Technical Review.” pp. 23-54. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA. Kellner, M. I., and Hansen, G. A. (1988). “Software Process Modeling.” Tech. Rep. CMU/SEI88-TR-9. Software Engineering Institute, Camegie-Mellon University. Pittsburgh, PA. Kellner, M. I., and Rombach, H. D. (1990). Session summary: Comparisons of software process descriptions.In “Proceedings of the 6th International Software Process Workshop” (T. Katayama, ed.), pp. 7-18. IEEE Computer Society Press, Los Alamitos, CA. Kellner, M. I., Feiler, P. H., Finkelstein, A., Katayama, T., Osterweil. L. J.. Penedo, M. H.. and Rombach, H. D. (1990). Software process modeling example problem. In “Proceedings of the 6th International Software Process Workshop” (T. Katayama, ed.), pp. 19-29. IEEE Computer Society Press, Los Alamitos, CA. Klingler, C. D., Neviascr, M., Marmor-Squires, A., htt, C. M..and Rombach, H. D. (1992). A case study in process representation using MVP-L. In “Proceedings of the 7th Annual Conference on Computer Assurance (COMPASS 92);’ pp. 137-146. IEEE, New York, NY. Krasner, H., T e d , J.. Linehan, A., Amold, P., and Ett, W. H. (1992). Lessons learned from a software process modeling system. Comun. ACM 35(9), 91-100. Krueger, C. W. (1992). Software reuse. ACM Cornput. Sum. 24(2), 131-183. Leveson, N. G. (1992). High-pressure steam engines and computer software, keynote address. In “Proceedings of the 14th International Conference on Software Engineering. Melbourne. Australia,” pp. 2-14. ACR, New York, NY. Lonchamp, J. (1993). A structured conceptual and terminological framework for software process engineering. In “Proceedings of the 2nd International Conference on the Software Process,” (L. Ostemeil, ed.), pp. 41-53. IEEE Computer Society Press, Los Alamitos. CA. Lott, C. M. (1994a). Measurement support in software engineering environments. Inr. J. Somare Eng. Knowl. Eng. 43). Lon, C. M. (1994b). Data collection in a process-sensitive software engineering environment. In “Proceedings of the 9th International Software Process Workshop, Airlie, Virginia” (C. Ghezzi, ed.). IEEE Computer Society Press. Los Alamitos, CA.
62
H. DIETER ROMBACH AND MARTIN VERLAGE
La, C. M.. and Rombach. H. D. (1993). Measurement-based guidance of software projects using explicit project plans. Inf: Sofrwure Technol. 35(6/7). 407-419. Madhavji. N.,Gruhn. V., Biters. W., and Schllfer, W. (1990). PRISM = methodology + processoriented environment.In “Proceedings of the 12th International Conference on Software Engineering,” pp. 277-288. IEEE Computer Society Press. Los Alamitos, CA. Matsumoto, M. (1992). Automatic software reuse process in integrated CASE environment. IEICE Trans. Inf: Syst. E75-D(5), 657-673. Morel, J.-M.. and Faget. J. (1993). The REBOOT environment. In “Proceedings of the 2nd International Conference on Software Reuse.” IEEE Computer Society Press, Los Alamitos, CA. Oivo, M.,and Basili. V. R. (1992). Representing software engineering models: The TAME goal oriented approach. IEEE Trans. Software Eng. 18(10), 886-898. Osterweil, L. (1987). Software processes are software too. In “Proceedings of the 9th International Conferenceon Software Engineering,” pp. 2-1 3. IEEE Computer Society Press, Los Alamitos. CA. Penedo, M. H.. and Riddle, W. (1993). Process-sensitive SEE architecture (PSEEA) workshop summary. ACM SIGSoft Software Eng. Nores 18(3).A78-A94. Perry, D. E..and Kaiser, 0. E. (1991). Models of software development environments. IEEE Trans. SoftwareEng. SE-17(3). 283-295. Peuschel. B.. and Schllfer, W. (1991). “Efficient Execution of Rule Based Persistent SoftwareProcess Models in MERLIN,” Tech. Rep. Memo No. 59. Lehntuhl Wr Software Technologie, University of Dortmund, Dortmund. Germany. Peuschel, B., Schiifer, W., and Wolf, S.(1992). A knowledge-based software development environment supporting cooperative work. Int. J. Softwure Eng. Knowl. Eng. 2(1), 79-106. Pfleeger. S. L., Fenton, N.,and Rombach. H. D., eds. (1994). “IEEE Software, Special Issue on Measurement-BasedProcess Improvement.” IEEE Computer Society Press, Los Alamitos, CA. Pfleeger, S.L., Fenton, N.,and Page, S. (1994). Evaluating software engineering standards. IEEE cornput. 27(9), 7 1-79. Potts, C., ed. (1984). “Proceedings of Software Process Workshop,” Egham. UK. IEEE Computer Society Press, Los Alamitos. CA. Prieto-Dim, R. (1991). Implementing faceted classification for software reuse. Commun. ACM 34(5), 89-97. Prieto-Dim, R., and Freeman, P. (1987). Classifying software for reusability. IEEE Sofrware 41). 6-16. Rombach, H. D. (1991a). “MVP-L: A Language for Process Modeling in-the-Large,” Tech. Rep. CS-TR-2709. Department of Computer Science. University of Maryland, College Park. Rombach, H. D. (1991b). Practical benefits of goal-oriented measurement. I n “Software Reliability and Metrics” (N.Fenton and B. Littlewood, eds.), pp. 217-235. Elsevier Applied Science, London. Rombach, H. D., Basili, V. R., and Selby, R. W. (1993). “Experimental Software Engineering Issues: A Critical Assessment and Future Directions,” Lect. Notes Comput. Sci., No.706. SpringerVerlag. Berlin. Rombach, H. D., Ulery, B. T., and Valett, J. (1992). Toward full life cycle control: Adding maintenance measurement to the SEL. J. Syst. Software 18(2), 125-138. Rombach, H. D.. and Verlage, M. (1993). How to assess a software process modeling formalism from a project member’s point of view. In “Proceedings of the 2nd International Conference on the Software Process” (L. Osterweil. ed.), pp. 147-158. IEEE Computer Society Press, Los Alamitos, CA. Royce, W. W. (1987). Managingthe developmentof large softwaresystems:Concepts and techniques. WESCON Tech. Pap. 14, All-1-All-9 (1970). Reprinted in “Proceedings of the 9th International Conference on Software Engineering, pp. 328-338. IEEE Computer Society Press. Los Alamitos, CA. 1987.
DIRECTIONS IN SOFTWARE PROCESS RESEARCH
63
Saracelli, K. D., and Bandat, K. F. (1993). Process automation in software application development. IBM S y ~ t J. . 32(3), 376-396. Shaw, M. (1990). Prospects for an engineering discipline of software. IEEE Sofrware 7 , 15-24. Sommerville, I., and Paul, M., eds. (1993). “Proceedings of the 4th European Software Engineering Conference, Garmisch-Partenkirchen.Germany,” Lect. Notes Comput. Sci., No. 717. SpringerVerlag, Berlin. Sutton, S. M.. Jr., Heimbigner, D., and Ostenveil. L. (1989). “Applla: A Prototype Language for Software Process Programming.” Tech. Rep. CU-CS-448-89. Department of Computer Science, University of Colorado at Boulder. Sutton, S. M.. Jr.. Heimbigner, D.. and Osterweil. L. J. (1990). Language constructs for managing change in process-centered environments. In “Proceedings of the 4th ACM SIGSoftlSIGPLAN Symposiumon Practical Software Development Environments,” pp. 206-217. Appeared as ACM SIGSofr Sofiware Eng. Notes 15(6), (1990). Swenson, K. D. (1993). Visual support for reengineering work processes. In “Proceedings of the Conference on Organizational Computing Systems.”Association of Computing Machinery, New York. Taylor, R. N., Belz, F. C., Clarke, L. A,, Ostenveil, L., Selby. R. W., Wileden. J. C., Wolf, A. L.. and Young, M. (1988). Foundations for the arcadia environment architecture. In “Proceedings of the 3rd ACM SIGSoftlSIGPLAN Symposium on Practical Software Development Environments” (P. Henderson,ed.),pp. 1-13. AppearedasACMSIGSofrSofiwareEng.Notes 1x5)(1988). Thomas, I., and Nejmeh, B. A. (1992). Definitions of tool integration for environments. IEEE Sofiware 9(2). 29-35. Verlage. M. (1994). Multi-view modeling of softwareprocesses. In “Proceedings of the 3rd European ) Notes Workshop on Software Process Technology, Grenoble, France” (B. C. Warboys, 4.Lect. Comput. Sci., No. 772, pp. 123-127. Springer-Verlag, Berlin. von Mayrhauser, A. (1990). “Software Engineering: Methods and Management.” Academic Press, San Diego, CA. Zultner, R. E. (1993). TQM for technical teams. Commun. ACM 36(10), 79-91.
This Page Intentionally Left Blank
The Experience Factory and Its Relationship to Other Quality Approaches VICTOR R. BASIL1 Institute for Advanced Computer Studies and Department of Computer Science University of Maryland College Park, Maryland
Abstract This chapter describes the principles behind a specific set of integrated software quality improvement approaches which include the Quality Improvement Paradigm, an evolutionary and experimental improvement framework based on the scientific method and tailored for the software business, the GoaUQuestionMetric Paradigm, a paradigm for establishing project and corporate goals and a mechanism for measuring against those goals, and the Experience Factory Organization, an organizational approach for building software competencies and supplying them to projects on demand. It then compares these approachesto a set of approaches used in other businesses, such as the Plan-Do-Check-Act, Total Quality Management, Lean EnterpriseSystems, and the Capability Maturity Model.
1. Introduction..
................................
2. Experience Factory/Quality Improvement Paradigm . . . . . . . . . . . . . . . 2.1 The Experience Factory Organization . . . . . . . . . . . . . . . . . . . . . 2.2 Examples of Packaged Experience in the SEL . . . . . . . . . . . . . . . 2.3 InSummary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. A Comparison with Other Improvement Paradigms . . . . . . . . . . . . . . . 3.1 Plan-Do-Check-Act Cycle (PDCA) . . . . . . . . . . . . . . . . . . . . . 3.2 Total Quality Management (TQM) . . . . . . . . . . . . . . . . . . . . . . 3.3 SEI Capability Maturity Model (CMM) . . . . . . . . . . . . . . . . . . . . 3.4 Lean Enterprise Management . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Comparing the Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 4. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References.
66
. . 67 71
. . 73 74
. . 75 75 76 77 78 78 81 82
.................................. 65 copyright Q 1995 by Academic R s s . Inc.
ADVANCES IN COMPUTERS. VOL. 41
d@IS Of rCpmdUCliOn h Pny form rCSCNed.
VICTOR R. BASIL1
1. Introduction The conceptsof quality improvementhave permeated many businesses. It is clear that the nineties will be the quality era for software and there is a growing need to develop or adapt quality improvement approaches to the software business. Thus we must understand softwareas an artifact and software developmentas a business. Any successful business requires a combination of technical and managerial solutions. It requires that we understand the processes and products of the business, i.e., that we know the business. It requires that we define our business needs and the means to achieve them, i.e., we must define our process and product qualities. We need to define closed loop processes so that we can feed back information for project control. We need to evaluate every aspect of the business, so we must analyze our successes and failures. We must learn from our experiences, i.e., each project should provide information that allows us to do business better the next time. We must build competencies in our areas of business by packaging our successful experiences for reuse and then we must reuse our successful experiences or our competencies as the way we do business. Since the business we are dealing with is software, we must understand the nature of software and software development. Some of the most basic premises assumed in this work are that: The sofnvare discipline is evolutionary and experimental; it is a laboratory science. Thus we must experiment with techniques to see how and when they really work, to understand their limits, and to understand how to improve them. Sofnvare is development not production. We do not produce the same things over and over but rather each product is different from the last. Thus, unlike in production environments, we do not have lots of data points to provide us with reasonably accurate models for statistical quality control. The technologies of the discipline are human based. It does not matter how high we raise the level of discourse or the virtual machine, the development of solutions is still based on individual creativity and human ability will always create variations in the studies. There is a lack of models that allow us to reason about the process and the product. This is an artifact of several of the above observations. Since we have been unable to build reliable, mathematically tractable models, we have tended not to build any. And those that we have, we do not always understand in context. All sofhvare is not the same: process is a variable, goals are variable, content varies, etc. We have often made the simplifying assumption that software is software is software. But this is no more true that hardware is hardware is hardware. Building a satellite and a toaster are not the same thing, any more than building a microcode for a toaster and the flight dynamic software for the satellite are the same thing. Packaged, reusable, experiences require additional resources in the form of organization, processes, people, etc. The requirement that we build packages of
THE EXPERIENCE FACTORY
67
reusable experiences implies that we must learn by analyzing and synthesizing our experiences. These activities are not a byproduct of software development, they require their own set of processes and resources.
2. Experience Factory/Quality Improvement Paradigm The Experience Factory/Qualify Zmprovement Paradigm (EF/QIP) (Basili, 1985, 1989; Basili and Rombach, 1987, 1988) aims at addressing the issues of quality improvement in the software business by providing a mechanism for continuous improvement through the experimentation, packaging, and reuse of experiences based on a business’s needs. The approach has been evolving since 1976based on lessons learned in the National Aeronautics and Space Administration/Goddard Space Flight Center (NASNGSFC) Software Engineering Laboratory (SEL) (Basili et al., 1992). The basis for the approach is the QIP,which consists of six fundamental steps: Characterize the current project and its environment with respect to models and metrics. Set the quantifiable goals for successfulproject performance and improvement. Choose the appropriate process model and supporting methods and tools for this project. Execute the processes, constructthe products, collect and validate the prescribed data, and analyze it to provide real-time feedback for corrective action. Analyze the data to evaluate the current practices, determine problems, record findings, and make recommendations for future project improvements. Package the experience in the form of updated and refined models and other forms of structured knowledge gained from this and prior projects and save it in an experience base to be reused on future projects.
Although it is difficult to describe the QIP in great detail here, we will provide a little more insight into the preceding six steps here.
Characterizing the Project and Environment. Based on a set of models of what we know about our business we need to classify the current project with respect to a variety of characteristics,distinguish the relevant project environment for the current project, and find the class of projects with similar characteristics and goals. This provides a context for goal definition, reusable experiences and objects,process selection,evaluationand comparison,and prediction. There are a large variety of project characteristicsand environmentalfactors that need to be modeled and baselined. They include various people factors, such as the number of people, level of expertise, group organization, problem experience, process experience; problem factors, such as the application domain, newness to state of the art, susceptibility to change, problem constraints, etc.;
68
VICTOR R. BASIL1
process factors, such as the life cycle model, methods, techniques, tools, programming language, and other notations; product factors, such as deliverables, system size, required qualities, e.g., reliability, portability, etc.; and resource factors, such as target and development machines, calendar time, budget, existing software, etc.
Goal Setting and Measurement. We need to establish goals for the processes and products. These goals should be measurable, driven by models of the business. There are a variety of mechanisms for defining measurable goals: Quality Function Deployment Approach (QFD) (Kogure and Akao, 1983), the GoaYQuestionMetricParadigm (GQM) (Weiss and Basili, 1985), and Software Quality Metrics Approach (SQM) (McCall et al., 1977). We have used the GQM as the mechanism for defining, tracking, and evaluating the set of operational goals, using measurement. These goals may be defined for any object, for a variety of reasons, with respect to various models of quality, from various points of view, relative to a particular environment. For example, goals should be defined from a variety of points of view: user, customer, project manager, corporation, etc. A goal is defined by filling in a set of values for the various parameters in the template. Template parameters included purpose (what object and why), perspective (what aspect and who), and the environmentalcharacteristics (where). Purpose: Analyze some (objects: process, products, other experience models) for the purpose of (why: characterization, evaluation, prediction, motivation, improvement)
Perspective: With respect to (focus: cost, correctness, defect removal, changes, reliability, user friendliness, . . .) from the point of view of (who: user, customer, manager, developer, corporation, . . .)
Environment: In the following context (problem factors, people factors, resource factors, process factors, . . .)
Example: Analyze the (system testing method) for the purpose of (evaluation) with respect to a model of (defect removal effectiveness) from the point of view of the (developer) in the following context: the standard NASNGSFC environment,
THE EXPERIENCE FACTORY
69
i.e., process model (SELversion of the waterfall model, . . .), application (ground support software for satellites), machine (running on a DEC 780 under VMS),etc. The goals are defined in an operational, tractable way by refining them into a set of quantifiable questions that are used to extract the appropriate information from the models of the object of interest and the focus. The questions and models define the metrics and the metrics, in turn, specify the data that needs to be collected. The models provide a framework for interpretation. Thus, the GQM is used to (1) specify the goals for the organization and the projects, (2) trace those goals to the data that are intended to define these goals operationally, and (3) provide a framework for interpreting the data to understand and evaluate the achievement of the goals, (4) and support the development of data models based on experience.
Choosing the Execution Model. We need to be able to choose a generic process model appropriate to the specific context, environment,project characteristics, and goals established for the project at hand, as well as any goals established for the organization, e.g., experimentation with various processes or other experience objects. This implies we need to understand under what conditions various processes are effective. All processes must be defined to be measurable and defined in terms of the goals they must satisfy. The concept of defining goals for processes will be made clearer in later chapters. Once we have chosen a particular process model, we must tailor it to the project and choose the specific integrated set of sub-processes, such as methods and techniques, appropriate for the project. In practice, the selection of processes is iterative with the redefinition of goals and even some environmentaland project characteristics. It is important that the execution model resulting from these first three steps be integrated in terms of its context, goals, and processes. The real goal is to have a set of processes that will help the developer satisfy the goals set for the project in the given environment. This may sometimes require that we manipulate all three sets of variables to ensure this consistency. Executing the Processes. The development process must support the access and reuse packaged experience of all kinds. On the other hand, it needs to be supported by various types of analyses, some done in close to real time for feedback for corrective action. To support this analysis, data needs to be collected from the project. But this data collection must be integrated into the processes-it must not be an add on, e.g., defect classification forms part of configuration control mechanism. Processes must be defined to be measurable to begin with, e.g., design inspections can be defined so that we keep track of the various activities, the effort expended in those activities, such as peer reading, and the effects of those activities, such as the number and types of defects found. This allows us to measure such things as domain understanding (how well the process performer understands the object of study and the application domain) and assures that the processes are well defined and can evolve.
70
VICTOR R. BASIL1
Support activities, such as data validation, education and training in the models, and metrics and data forms are also important. Automated support necessary to support mechanical tasks and deal with the large amounts of data and information needed for analysis. It should be noted, however, that most of the data cannot be automatically collected. This is because the more interesting and insightful data tends to require human response. The kinds of data collected include: resource data such as, effort by activity, phase, type of personnel, computer time, and calendar time; change and defect data, such as changes and defects by various classification schemes, process data such as process definition, process conformance, and domain understanding; product data such as product characteristics,both logical, e.g., application domain, function, and physical, e.g., size, structure, and use and context information, e.g., who will be using the product and how will they be using it so we can build operational profiles.
Analyzing the Data. Based on the goals, we interpret the data that has been collected. We can use this data to characterize and understand, so we can answer questionslike “What project characteristicseffect the choice of processes, methods and techniques?” and “Which phase is typically the greatest source of errors?” We can use the data to evaluate and analyze to answer questions like “What is the statement coverage of the acceptancetest plan?” and “Doesthe CleanroomProcess reduce the rework effort?” We can use the data to predict and control to answer questions like “Given a set of project characteristics,what is the expected cost and reliability,based upon our history?” and “Given the specific characteristics of all the modules in the system, which modules are most likely to have defects so I can concentrate the reading or testing effort on them?” We can use the data to motivate and improve so we can answer questions such as “For what classes of errors is a particular technique most effective?” and “What are the best combination of approachesto use for a project with acontinually evolving set of requirementsbased on our organization’s experience?” Packaging the Models. We need to define and refine models of all forms of experiences, e.g., resource models and baselines, change and defect baselines and models, product models and baselines, process definitionsand models, method and technique evaluations,products and product parts, quality models, and lessons learned. These can appear in a variety of forms, e.g., we can have mathematical models, informal relationships, histograms, algorithms, and procedures, based on our experience with their application in similar projects, so they may be reused in future projects. Packaging also includes training, deployment, and institutionalization. The six steps of the QIP can be combined in various ways to provide different views into the activities. First note that there are two feedback loops, a project feedback loop that takes place in the execution phase and an organizational feedback loop that takes place after a project is completed. The organizational
THE EXPERIENCE FACTORY
71
learning loop changes the organization’s understanding of the world by the packaging of what was learned from the last project and as part of the characterization and baselining of the environment for the new project. It should be noted that there are numerous other loops visible at lower levels of instantiation, but these high-level loops are the most important from an organizational structure point of view. One high-level organizationalview of the paradigm is that we must understand (characterize), assess (set goals, choose processes, execute processes, analyze data), and package (package experience). Another view is to plan for a project (characterize, set goals, choose processes), develop it (execute processes), and then learn from the experience (execute processes, analyze data).
2.1 The Experience Factory Organization To support the Improvement Paradigm, an organizational structure called the Experience Factory Organization (EFO) was developed. It recognizes the fact that improving the software process and product requires the continual accumulation of evaluated experiences (learning),in a form that can be effectively understood and modified (experience models), stored in a repository of integrated experience models (experiencebase), that can be accessed or modified to meet the needs of the current project (reuse). Systematic learning requires support for recording, off-linegeneralizing,tailoring, formalizing, and synthesizing of experience. The off-line requirement is based on the fact that reuse requires separate resources to create reusable objects. Packaging and modeling useful experiencerequires a variety of models and formal notations that are tailorable, extendible, understandable,flexible, and accessible. An effective experience base must contain accessible and integrated set of models that capture the local experiences. Systematic reuse requires support for using existing experience and on-line generalizingor tailoring or candidate experience. This combination of ingredients requires an organizational structure that supports: a software evolution model that supports reuse, processes for learning, packaging, and storing experience, and the integration of these two functions. It requires separate logical or physical organizations with different focuses and priorities, process models, expertise requirements. We divide the functions into a Project Organization whose focudprionty is product delivery, supported by packaged reusable experiences,and an Experience Factory whose focus is to support project developments by analyzing and synthesizing all kinds of experience, acting as a repository for such experience, and supplying that experience to various projects on demand. The Experience Factory packages experience by building informal, formal or schematized, and productized models and measures of various software processes, products, and other forms of knowledge via people, documents, and automated support.
72
VICTOR A. BASIL1
The Experience Factory deals with reuse of all kinds of knowledge and experience. But what makes us think we can be successful with reuse this time, when we have not been so successful in the past. Part of the reason is that we are not talking about reuse of only code in isolation but about reuse of all kinds of experience and of the context for that experience. The Experience Factory recognizes and provides support for the fact that experience requires the appropriate context definition for to be reusable and it needs to be identified and analyzed for its reuse potential, It recognizes that experience cannot always be reused as is, that it needs to be tailored and packaged to make it easy to reuse. In the past, reuse of experience has been too informal, and has not been supported by the organization. It has to be fully incorporated into the development or maintenance process models. Another major issue is that a project’s focus is delivery, not reuse, i.e., reuse cannot be a by-product of software development. It requires a separate organization to support the packaging and reuse of local experience. The ExperienceFactory really represents a paradigm shift from current software development thinking. It separates the types of activities that need to be performed by assigning them to different organizations, recognizing that they truly represent different processes and focuses. Project personnel are primarily responsible for the planning and development activities-the Project Organization (Fig. 1) and a separate organization, the Experience Factory (Fig. 2 ) is primarily responsible for the learning and technology transfer activities. In the Project Organization, we are problem solving. The processes we perform to solve a problem consist
. EXPERIENCE FACTORY
PROJECT ORGANIZATION
I Characterize
projecUenvironment characteristics
Set Goals
Choose Process
tailorable goals, processes, tools products, r e ~ ~ umodels, r ~ e defect models, from similar projects
...
Execution Plans
Execute Process
c
FIG. 1. The Project Organization.
I
THE EXPERIENCE FACTORY
PROJECT ORGANIZATION
73
EXPERIENCE FACTORY
FIG. 2. The Experience Factory.
of the decomposition of a problem into simpler ones, instantiation of higherlevel solutions into lower-level detail, the design and implementation of various solution processes, and activities such as validation and verification. In the Experience Factory, we are understanding solutions and packaging experience for reuse. The processes we perform are the unificationof different solutions and redefinition of the problem, generalization and formalization of solutions in order to abstract them and make them easy to access and modify, an analysis synthesis process enabling us to understand and abstract, and various experimentation activities so we can learn. These sets of activities are totally different.
2.2 Examples of Packaged Experience in the SEL The SEL has been in existence since 1976 and is a consortium of three organizations: NASNGSFC, the University of Maryland, and Computer Sciences Corporation (McGarry, 1985; Basili et al., 1992). Its goals have been to (1) understand the software process in a particular environment, (2) determine the impact of available technologies, and (3) infuse identifidrefined methods back into the development process. The approach has been to identify technologies with potential, apply and extract detailed data in a production environment(experiments), and measure the impact (cost, reliability, quality, etc.). Over the years we have learned a great deal and have packaged all kinds of experience. We have built resource models and baselines, e.g., local cost models, resource allocation models; change and defect models and baselines, e.g., defect prediction models; types of defects expected for the application, product models, and baselines, e.g., actual vs. expected product size, library access; over time, pro-
74
VICTOR R. BASIL1
cess definitions and models, e.g., process models for Cleanroom, Ada waterfall model; method andtechniquemodels and evaluations, e.g., best method for finding interface faults; products and product models, e.g., Ada generics for simulation of satellite orbits; a variety of quality models, e.g., reliability models, defect slippage models, ease of change models; and a library of lessons learned, e.g., risks associated with an Ada development (Basili et al., 1992; Basili and Green, 1994). We have used a variety of forms for packaged experience. There are equations definingthe relationshipbetween variables, e.g., effort = 1.48*KSLOC98, number of runs = 108 150*KSLOCt; histograms or pie cham of raw or a n a l y d data, e.g., classes of faults: 30% data, 24% interface, 16% control, 15% initialization, 15% computation; graphs defining ranges of “normal,” e.g., graphs of size growth over time with confidence levels; specific lessons learned associated with project types, phases, activities, e.g., reading by stepwise abstraction is most effective for finding interface faults; or in the form of risks or recommendations, e.g., definition of a unit for unit test in Ada needs to be carefully defined; and models or algorithms specifying the processes, methods, or techniques, e.g., an SADT diagram defining design inspections with the reading technique being a variable on the focus and reader perspective. Note that these packaged experiences are representative of software development in the Flight Dynamics Division at NASA/GSFC. They take into account the local characteristicsand are tailored to that environment. Another organization might have different models or even different variables for their models and therefore could not simply use these models. This inability to just use someone else’s models is a result of all software not being the same. Thesemodels are used on new projects to help managementcontrol development (Valett, 1987)and provide the organization with a basis for improvement based on experimentationwith new methods. It is an example of the EF/QIP in practice.
+
2.3 In Summary How does the EF/QIP approach work in practice? You begin by getting a commitment. You then define the organizational structure and the associated processes. This means collecting data to establish baselines, e.g., defects and resources, that are process and product independent, and then measuring your strengths and weaknesses to provide a business focus and goals for improvement, and establishing product quality baselines. Using this information about your business, you select and experiment with methods and techniques to improve your processes based on your product quality needs and you then evaluate your improvement based on existing resource and defect baselines. You can define and tailor better and more measurable processes, based on the experience and knowledge gained within your own environment. You must measure for process conformance and domain understanding to make sure that your results are valid. t KSLOC is thousands of source lines of code.
75
THE EXPERIENCE FACTORY
In this way, you begin to understand the relationship between some process characteristics and product qualities and are able to manipulate some processes to achieve those product characteristics. As you change your processes you will establish new baselines and learn where the next place for improvement might be. The SEL experience is that the cost of the ExperienceFactory activities amounts to about 11% of the total software expenditures. The majority of this cost (approximately 7%) has gone into analysis rather than data collection and archiving. However, the overall benefits have been measurable. Defect rates have decreased from an average of about 4.5 per KLOC to about 1 per KLOC. Cost per system has shrunk from an average of about 490 staff months to about 210 staff months and the amount of reuse has jumped from an average of about 20% to about 79%. Thus, the cost of running an Experience Factory has more than paid for itself in the lowering of the cost to develop new systems, meanwhile achieving an improvement in the quality of those systems.
3. A Comparison with Other Improvement Paradigms Aside from the Experience Factory/Quality Improvement Paradigm, there have been a variety of organizational frameworks proposed to improve quality for various businesses. The ones discussed here include: Plan-Do-Check-Act is a QIP based on a feedback cycle for optimizing a single process model or production line. Total Quality Management represents a management approach to long-term success through customer satisfactionbased on the participation of all members of an organization. The SEI Capability Maturity Model is a staged process improvement based on assessment with regard to a set of key process areas until you reach level 5 which represents continuous process improvement. Lean (sofhvare) Development represents a principle supporting the concentration of the production on “value-added” activities and the elimination or reduction of “not-value-added” activities. In what follows, we will try to define these concepts in a little more detail to distinguish and compare them. We will focus only on the major drivers of each approach.
3.1 Plan-Do-Check-Act
Cycle (PDCA)
The approach is based on work by Shewart (193 1) and was made popular by Deming (1986). The goal of this approach is to optimize and improve a single process modeYproduction line. It uses such techniques as feedback loops and statistical quality control to experiment with methods for improvement and build predictive models of the product. PLAN -b
4
DO-b
CHECK -+ACT
-I
76
VICTOR R. BASIL1
If a family of processes (P)produces a family of products (X)then the approach yields a series of versions of product X (each meant to be an improvement of X),produced by a series of modifications (improvements) to the processes P,
PO,PI, P z ~*
*
*
p
Pn + X,
XI,X,, . . . Xn 9
where Pi,represents an improvement over Pi-/ and Xi has better quality than X i - / . The basic procedure involves four basic steps: Plan: Develop a plan for effective improvement, e.g., quality measurement criteria are set up as targets and methods for achieving the quality criteria are established. Do: The plan is carried out, preferably on a small scale, i.e., the product is produced by complying with development standards and quality guidelines. Check: The effects of the plan are observed; at each stage of development, the product is checked against the individual quality criteria set up in the Plan phase. Acr: The results are studied to determine what was learned and what can be predicted, e.g., corrective action is taken based upon problem reports.
3.2 Total Quality Management (TQM) The term Total Quality Management (TQM) was coined by the Naval Air Systems Command in 1985 to describe its Japanese-style management approach to quality improvement (Feigenbaum, 1991). The goal of TQM is to generate institutional commitment to success through customer satisfaction. The approaches to achieving TQM vary greatly in practice so to provide some basis for comparison, we offer the approach being applied at Hughes. Hughes uses such techniques as QFD, design of experiments (DOE), and statistical process control (SPC), to improve the product through the process. Identify + Identify Important needs items
+ Make
+
Improvements
Hold Gains
+
Provide Satisfaction
The approach has similar characteristics to the PDCA approach. If Process
(P)+ Product (X)then the approach yields
P*
PI,
Pz,. . . , P, -+XI3 X I , x,, . . . , xn
where Pi,represents an improvement over Pi-/ and Xi provides better customer satisfaction than Xi-/. In this approach, after identifying the needs of the customer, you use QFD to identify important items in the development of the system. DOE is employed to
77
THE EXPERIENCE FACTORY
make improvements and SPC is used to control the process and hold whatever gains have been made. This should then provide the specified satisfaction in the product based upon the customer needs.
3.3 SEI Capability Maturity Model (CMM) The approach is based upon organizational and quality management maturity models developed by Likert (1967) and Crosby (1980). respectively. A software maturity model was developed by Radice er al. (1985) while he was at IBM. It was made popular by Humphrey (1989) at the SEI. The goal of the approach is to achieve a level 5 maturity rating, i.e., continuous process improvement via defect prevention, technology innovation, and process change management. As part of the approach, a five-level process maturity model is defined (Fig. 3). A maturity level is defined based on repeated assessment of an organization’s capability in key process areas (KPA). KPAs include such processes as Requirements Management, Software Project Planning, Project Tracking and Oversight, Configuration Management, Quality Assurance, and SubcontractorManagement. Improvement is achieved by action plans for processes that had a poor assessment result. Thus, if a Process (P) is level i then modify the process based upon the key processes of the model until the process model is at level i + 1. Different KPSAs play a role at different levels. The SEI has developed a Process Improvement Cycle to support the movement through process levels. Basically it consists of the following activities: Initialize Establish sponsorship Create vision and strategy Establish improvement structure For each Maturity level: Characterize current practice in terms of KPAs Assessment recommendations
Level
Focus
4 Managed
Product 8i Process Quality
3 Defined
Engineering Process
2 Repeatable
Project Management
4
4
+
-
78
VICTOR R. BASIL1
Revise strategy (generate action plans and prioritize KF'As) For each KPA: Establish process action teams Implement tactical plan, define processes. plan and execute pilot(s), plan and execute Institutionalize Document and analyze lessons Revise organizational approach
3.4
Lean Enterprise Management
The approach is based on a philosophy that has been used to improve factory output. Womack er al. (1WO), have written a book on the application of lean enterprises in the automotive industry. The goal is to build software using the minimal set of activities needed, eliminating nonessential steps, i.e., tailoring the process to the product needs. The approach uses such concepts as technology management, human-centered management, decentralized organization, quality management, supplier and customer integration, and internationalizatiodregionalization. Given the characteristics for product V, select the appropriate mix of subprocesses pi, qj, rk . . . to satisfy the goals for V, yielding a minimal tailored process PV which is composed of pi, qj, rk . . . Process (PV) + Product (V)
3.5 Comparing the Approaches As stated above, the Quality Improvement Paradigm has evolved over 17 years based on lessons learned in the SEL (Basili, 1985, 1989; Basili and Rombach, 1987, 1988; Basili et al., 1992). Its goal is to build a continually improving organization based upon its evolving goals and an assessment of its status relative to those goals. The approach uses internal assessment against the organizations own goals and status (rather than process areas) and such techniques as GQM, model building, and qualitativdquantitative analysis to improve the product through the process.
Characterize-Set Goalsxhoose Process-Execute-Analyze-PackageCorporate
+Project 100P
I
If Processes (Px, Qy,RZ,. . .)+ Products (X,Y, Z, . . .) and we want to build V, then based on an understanding of the relationship between Px, Qn R,,
THE EXPERIENCE FACTORY
79
. . . and X,Y, 2,. . . and goals for V we select the appropriate mix of processes pi, qj, rk . . . to satisfy the goals for V, yielding a tailored Process (PV)+ Product (V) The EF/QIP is similar to the PDCA in that they are both based on the scientific method. They are both evolutionary paradigms, based on feedback loops from product to process. The process is improved via experiments; process modifications are tried and evaluated and that is how learning takes place. The major differences are due to the fact that the PDCA paradigm is based on production, i.e., it attempts to optimize a single process modeYproduction line, whereas the QIP is aimed at development. In development, we rarely replicate the same thing twice. In production, we can collect a sufficient set of data based upon continual repetition of the same process to develop quantitative models of the process that will allow us to evaluate and predict quite accurately the effects of the single process model. We can use statistical quality control approaches with small tolerances. This is difficult for development, i.e., we must learn form one process about another, so our models are less rigorous and more abstract. Development processes are also more human based. This again effects the building, use, and accuracy of the types of models we can build. So although development models may be based on experimentation, the building of baselines and statistical sampling, the error estimates are typically high. The EF/QIP approach is compatible with TQM in that it can cover goals that are customer satisfaction driven and it is based on the philosophy that quality is everyone’sjob. That is, everyoneis part of the technology infusion process. Someone can be on the project team on one project and on the experimenting team on another. All the project personnel play the major role in the feedback mechanism. If they are not using the technology right it can be because they don’t understand it, e.g., it wasn’t taught right, it doesn’t fit or interface with other project activities, it needs to be tailored, or it simply doesn’t work. You need the user to tell you how to change it. The EF/QIP philosophy is that no method is “packaged” that hasn’t been tried (applied, analyzed, tailored). The fact that it is based upon evolution, measurement, and experimentation is consistent with TQM. The differences between EF/QIP and TQM are based on the fact that the QIP offers specific steps and model types and is defined specifically for the software domain. The EF/QIP approach is most similar to the concepts of Lean Enterprise Management in that they are both based upon the scientific methocWDCA philosophy. They both use feedback loops from product to process and learn from experiments. More specifically, they are both based upon the ideas of tailoring a set of processes to meet particular probledproduct under development. The goal is to generate an optimum set of processes, based upon models of the
80
VICTOR R. BASIL1
business and our experience about the relationshipbetween process characteristics and product characteristics. The major differences are once again based upon the fact that LEM was developed for production rather than development and so model building is based on continual repetition of the same process. Thus, one can gather sufficient data to develop accurate models for statistical quality control. Since the EF/QIP is based on development and the processes are human based, we must learn from the application of one set of processes in a particular environment about another set of processes in different environment. So the model building is more difficult, the models are less accurate, and we have to be cautious in the application of the models. This learning across projects or products also requires two major feedback loops, rather than one. In production, one is sufficient because the process being changed on the product line is the same one that is being packaged for all other products. In the EF/QIP, the project feedback loop is used to help fix the process for the particular project under development and it is with the corporate feedback loop that we must learn by analysis and syntheses across different product developments. The EF/QIP organization is different from the SEI CMM approach, in that the latter is really more an assessment approach rather than an improvement approach. In the EF/QIP approach, you pull yourself up from the top rather than pushing up from the bottom. At step 1 you start with a level 5 style organization even though you do not yet have level 5 process capabilities. That is, you are driven by an understanding of your business, your product and process problems, your business goals, your experience with methods, etc. You learn from your business, not from an external model of process. You make process improvements based upon an understanding of the relationship between process and product in your organization. Technology infusion is motivated by the local problems, so people are more willing to try something new. But what does a level 5 organization really mean? It is an organization that can manipulate process to achieve various product characteristics. This requires that we have a process and an organizational structure to help us: understand our processes and products, measure and model the project and the organization, define and tailor process and product qualities explicitly, understand the relationship between process and product qualities, feed back information for project control, experiment with methods and techniques, evaluate our successes and failures, learn from our experiences, package successful experiences, and reuse successful experiences. This is compatible with the EF/QIP organization. QIP is not incompatible with the SEI CMM model in that you can still use key process assessments to evaluate where you stand (along with your internal goals, needs, etc.). However, using the EF/QIP, the chances are that you will move up the maturity scale faster. You will have more experience early on
THE EXPERIENCE FACTORY
81
operating within an improvement organizationstructure, and you can demonstrate product improvement benefits early.
4. Conclusion Important characteristics of the EF/QIP process indicate the fact that it is iterative; you should converge over time so don’t be overly concerned with perfecting any step on the first pass. However, the better your initial guess at the baselines the quicker you will converge. No method is “packaged” that hasn’t been tried (applied, analyzed, tailored). Everyone is part of the technology infusion process. Someone can be on the project team on one project and on the experimenting team on another. Project personnel play the major role in the feedback mechanism. We need to learn from them about the effective use of technology. If they are not using the technology right it can be because they don’t understand it or it wasn’t taught right, it doesn’t fithnterface with other project activities, it needs to be tailored, or it doesn’t work and you need the user to tell you how to change it. Technology infusion is motivated by the local problems, so people are more willing to try something new. In addition, it is important to evaluate process conformance and domain understanding or you have very little basis for understanding and assessment. The integration of the Improvement Paradigm, the GoaVQuestiodMetricParadigm, and the EFO provides a framework for software engineering development, maintenance, and research. It takes advantage of the experimental nature of software engineering. Based upon our experience in the SEL and other organizations, it helps us understand how software is built and where the problems are, define and formalize effective models of process and product, evaluate the process and the product in the right context, predict and control process and product qualities, package and reuse successful experiences, and feed back experience to current and future projects. It can be applied today and evolve with technology. The approach provides a framework for defining quality operationally relative to the project and the organization, justification for selecting and tailoring the appropriate methods and tools for the project and the organization, a mechanism for evaluating the quality of the process and the product relative to the specific project goals, and a mechanism for improving the organization’s ability to develop quality systems productively. The approach is being adopted by several organizations to varying degrees, such as Motorola and HP, but it is not a simple solution and it requires long-term commitment by top-level management. In summary, the QIP approach provides for a separation of concerns and focus in differentiating between problem solving and experience modeling and packaging. It offers a support for learning and reuse and a means of formalizing and integrating management and development technologies. It allows for the generation of a tangible corporate asset: an experiencebase of software competencies. It offers a Lean Enterprise Management approach compatible with TQM
82
VICTOR R. BASIL1
while providing a level 5 CMM organizational structure. It links focused research with development. Best of all you can start small, evolve and expand, e.g., focus on a homogeneous set of projects or a particular set of packages and build from there. So any company can begin new and evolve. RepeRBNCes
Basili, V. R. (1985). Quantitative evaluation of software engineering methodology. In “Proceedings of the 1st Pan Pacific Computer Conference, Melbourne, Australia” (also available as Technical Report TR-1519, Department of Computer Science, University of Maryland, College Park, 1985). Basili, V. R.(1989). Software development: A paradigm for the future. In “Proceedings of the 13th Annual International Computer Software and Applications Conference (COMPSAC). Keynote Address, Orlando, FL.” Basili. V. R., and Green, S. (1994). Software process evolution at the SEL. IEEE Sofnvare Mag., July, pp. 58-66. Basili, V. R.. and Rombach, H. D. (1987). Tailoring the software process to projcct goals and environments. I n “Proceedings of the 9th International Conference on Software Engineering, Monterey, CA,” pp. 345-357. Basili. V. R., and Rombach, H. D. (1988). The TAME Project: Towards improvement-oriented software environments. IEEE Trans. Sojhvure Eng. SE-14(6), 758-773. Basili. V. R., Caldiera, G., McGany, F., Pajerski. R., Page, G.. and Waligora. S.(1992). The software engineeringlaboratory-an operational softwareexperiencefactory. In “Proceedings of the International Conference on Software Engineering,” pp. 370-381. Crosby,P.B.(1980). QualityisFne:Theartofmakingqualityceltain.NewAmcricanLibrary,NewY~. Deming, W. E. (1986). “Out of the Crisis.” MIT Center for Advanced Engineering Study, MIT Press, Cambridge MA. Feigenbaum, A. V. (1991). ‘‘Total Quality Control,” 40th Anniv. Ed. McGmw-Hill, New York. Humphrey, W. S. (1989). “Managing the Software Process,” SEI Ser. Software Eng. AddisonWesley, Reading, MA. Kogure M., and Akao, Y. (1983). Quality function deployment and CWQC in Japan. Qual. Prog., October, pp. 25-29. Likert. R. (1967). “The Human Organization:Its Managementand Value.” McGmw-Hill,New York. Mdall. J. A.. Richards, P. K, and Waiters. G. F. (1977). “FadorS in software Qdity,” RAW lR-77-369. McGany. F. E. (1985). “Recent SEL Studies,’’ Proc. 10th Annu. Software Eng. Workshop. NASA Goddard Space Flight Center, Greenbelt, MD. Paulk, M. C., Curtis, B.. Chrissis. M. B.. and Weber, C. V. Capability Maturity Model for Software, Version 1.1. Technical Report SEI-93-TR-024. Radice, R., Harding, A. J., Munnis, P. E., and Phillips, R. W. (1985). A programming process study. IBM Syst. J. 24(2). Shewhart. W. A. (1931). “Economic Control of Quality of M ~ ~ f a c t U r Product.” rd Van Nostrand, New York. Software Engineering Institute (1933). “Capability Maturity Model,”CMU/SEI-93-TR-25 Version 1.1. Camegie-Mellon University, Pittsburgh, PA. Valett, J. D. (1987). The dynamic management i n f o d o n tool (DYNAMITE): Analysis of the prototype. requirements and operational scenarios. M.Sc. Thesis, University of Maryland, College Pa&. Weiss, D. M., and Basili, V. R. ( 1985). Evaluating softwaredevelopmentby analysisof changes: Some data from the software engineering laboratory. IEEE Trans. Sofnvare Eng. SE-11(2), 157-168. Womack, J. P., Jones, D. T., and Roos, D. (1990). “The Machine that Changed the World Based on the Massachusetts Institue of Technology 5-Million Dollar 5-Year Study on the Future of the Automobile.” Rawson Associates, New York.
CASE Adoption: A Process, Not an Event JOCK A. RADER Radar and CommunicationsSystems Hughes Aircraft Company Los Angeles, California
Abstract Although tens of thousands of computer-aidedsoftware engineering (CASE) licenses have been sold, penetration of CASE into operational projects has been modest. Three of the most important reasons are: (1) the relative immaturity of CASE technology. (2) a failure to treat CASE adoption as a design process, and (3) d i s t i c expectations. The chapter defines a CASE adoption process, with four overlapping phases (i.e., awareness, evaluation. first victim, and second victim and beyond). At an abstract level, the process has much in common with the introduction of any new technology and we will refer to the work in the discipline of managing organizational change. At the detailed level, however, specificity to CASE is more pronounced. The chapter also explores some of the difficulties in adopting CASE tools and. indeed, any new technology. A primary factor for success is the quality of the CASE team, a major element of the CASE infrastructure.We identify and discuss the significance of the roles played by CASE team members, including champions and change agents. All the elements of the infrastructure are very important and their respective contributions are examined. In order to illustrate the adoption process, we consider in some detail a case study of successful CASE adoption.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Organization of Chapter. . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Influences on the Author . . . . . . . . . . . . . . . . . . . . . . . . . 2. The Keys to Successful CASE Adoption . . . . . . . . . . . . . . . . . . . . 2.1 Prehistoric Analogy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Adoption Is Difficult: Technology Maturation Study . . . . . . . . . . . . . 2.3 Must Work Hard to Inject New Technology . . . . . . . . . . . . . . . . . 2.4 Reassuring Management. . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Reassuring Technical Staff . . . . . . . . . . . . . . . . . . . . . . . . 3. Planning and Reparing for CASE Adoption . . . . . . . . . . . . . . . . . . . 3.1 Introduction to Plan Generation . . . . . . . . . . . . . . . . . . . . . . 3.2 Adoption Plan Examples . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 CASE Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The Importance of the CASE Team. . . . . . . . . . . . . . . . . . . . . ADVANCES IN COMPUTERS. VOL. 41
83
84 84 85 85 86 81 88 89 92 93 93 101 101 104
Copyright Q 1995 by Academic hi. lac. AU rights of rrpodwtion in my form mewed.
.
84 4
.
JOCK A RADER CASE Adoption Case Study
.......................... .........................
4.1 Awareness . . . . . . 4.2 Evaluation and Selection 4.3 F i t Victim . . . . . .
......................... 5. Awareness: The First Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Awareness: Understand the Organization . . . . . . . . . . . . . . . . . . 5.2 Awareness: Tools to Understand the Organization . . . . . . . . . . . . . . 5.3 Awareness of CASE Technology . . . . . . . . . . . . . . . . . . . . . . 5.4 Learning from the Vendors . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Other Learning Avenues . . . . . . . . . . . . . . . . . . . . . . . . . 6. Evaluation and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
7
8. 9.
.........................
................
6.1 6.2 6.3 6.4 6.5
A 'Zhree-Filter Approach to CASE Evaluation Activities Associated with an Evaluation Filter . . . . . . . . . . . . . . . . Filter Two: Detailed Evaluation . . . . . . . . . . . . . . . . . . . . . . Technical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cost: Always an Important Selection Factor . . . . . . . . . . . . . . . . . Supporting First Operational Use . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Choice of First Victim . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Overview of CASPs (Computer Aided SubRocesses) . . . . . . . . . . . . . 7.3 Reparation for the First Victim . . . . . . . . . . . . . . . . . . . . . . 7.4 Training. Training. and More Training . . . . . . . . . . . . . . . . . . . 7.5 Supporting Operational Use . . . . . . . . . . . . . . . . . . . . . . . . Expansion and Evolution: Second Victim and Beyond . . . . . . . . . . . . . . . 8.1 EnlargingtheScope . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Build on Success: Maintain the Momentum CASE Adoption Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................
.
1
106 107 108 1-10 114 114 117 118 120 123 125 125 127 129 132 135 137 137 138 140 143 147 149 150 152 153 155
Introduction
Although tens of thousands of computer-aided software engineering (CASE) licenses have been sold. penetration of CASE into operational projects has been modest. Three of the most important reasons are: (1) the relative immaturity of CASE technology; (2) a failure to treat CASE adoption as a process; and (3) unrealistic expectations. This chapter will define a process for the successful adoption of CASE after first exploring some of the difficulties in adopting CASE tools and indeed any new technology. At an abstract level. the process will have much in common with the introduction of any new technology and we will refer to the work in the discipline of managing organizational change. At the detailed level. however. specificity to CASE is more pronounced.
1.1 Organization of Chapter We start in Section 2 by outlining some of the obstacles to successful adoption. d suggest mitigating actions. A sequence of four overlapping phases (awareness. evaluation. first victim. second victim and beyond) are introduced in Section 3. The roles of the CASE team and the importance of CASE infrastructure are also
CASE ADOPTION: A PROCESS, NOT AN EVENT
85
discussed here. Before exploring the phases in detail, we look at a case study of successful CASE adoption in Section 4. Sections 5 through 8 detail the four phases, beginning with the awareness phase. The actions of this phase help you move from a relatively ignorant state to an informed state. The activities of the evaluation phase, described in Section 6, provide hands-on experience and lead to tool selection. The ramifications of supporting the first victim (first operational user) are the subject of Section 7. Finally, in Section 8, we cover the second victim and beyond, where we look at building on the successes of first operational use. Section 9 summarizes the important lessons of the chapter.
1.2. Influences on the Author The technical problems of CASE adoption are considerabledue to the complexity and immaturityof CASE technology, the immaturity of integrationtechnology, and frequent misalignment among process, methods, and tools. Thus,much effort must be devoted to attacking the technical problems. However, no amount of technical effort can succeed, if inadequate attention is paid to the organizational change issues occasioned by major technical change. Since the Fall of 1988, the author has devoted the majority of his time to various issues of CASE technology. His experience and attitudes are based on several years of empirical practice as champion and change agent, 14 months in residence with the CASE Environments Project at the Software Engineering Institute (at Carnegie-Mellon University), and dozens of interviews with other practitionersand researchers. Only after years of experience did he start to become acquainted with the literature of organizationalchange, and became increasingly intrigued as he saw that the issues discussed there resonated with personal experience. Thus, you will find many references to that literature throughout the chapter. First, because it helps to provide a known framework with which to identify and consider many of the problems associated with CASE adoption. And second, when you examine the problems in the broader context of any kind of organizational change, it becomes easier to separate fundamental underlying issues from the particular symptoms viewed on a specific adoption effort. Understanding fundamental causes allows you to be more effective in seeking solutions. The author, is particularly interested in operational use and hopes that this chapter proves useful to CASE teams and their adventurous victims.
2. The Keys to Successful CASE Adoption The keys to successful CASE adoption are realistic expectationsand a realistic adoption approach. Expectations relate both to what can be accomplished (im-
86
JOCK A. RADER
prove productivity and quality by 300%) and the time frame (in only 8 days). Achievements requiring years are often expected in months, unrealistic by a factor of 10. An adoption effort requires treatment as a project in its own right, complete with objectives, requirements, strategy, process, design, implementation, testing, and integration. CASE adoption is a difficult technical problem and a difficult organizational change problem. Conversely, the reason that so many CASE adoption attempts fail is that inadequate attention and resources are paid to the adoption effort, and expectations are set unreasonably high. In the sections of this chapter subsequent to this one, we lay out an approach for CASE adoption. In the remainder of this one, we explore some of the difficulties of technological change in general and CASE adoption specifically. We start with an analogy.
2.1 Prehistoric Analogy Suppose a primitive carpenter who is used to splitting logs with a stone wedge and a stone hammer is provided with a modern chisel and hammer. Would he be able to adjust to this new technology? Probably yes, but some experimentation and adjustment would be necessary. But suppose he was provided with a power saw? What would he do? Pretty much what a software engineer with a new CASE tool and inadequate knowledge does-the wrong thing until in frustration the new tool is abandoned altogether. The problem is that two important resources are missing-expertise and physical infrastructure. Where is the instructor and where is the electric socket? There are many topics for the instructor to cover. At first, just learning to cut a branch in two in a safe manner would require instruction time and practice. Then more instruction and practice would be needed to learn about setting the depth of cut or the angle of the cut. Plus, to be effective with the new tool, the primitive craftsman would have to learn new ways of designing and building objects of wood. An electric socket, part of the physical infrastructure, has to be supported by a generator and fuel and wires. Somehow these need to be acquired and installed. But the saw expert may not be a generator expert, or know how to provide fuel. So additional experts may be required. Operating the saw or the generator is not the same as repairing either. Even more expertise and support is required for sustained use. And there is much more sophisticated use possible, such as that practiced by a master cabinetmaker. The analogy can be extended to include the need for toolsmith functions. Skilled craftsmen use mortise boxes, jigs, and other aids (helper tools) to do their jobs. These may be built by the craftsman or may be acquired from a toolsmith. So the knowledge is needed to tailor the tool and to extend it.
CASE ADOPTION: A PROCESS, NOT AN EVENT
87
Hopefully, the analogy helps to illustrate that adoption resources and practices have a major impact on whether the introduction of CASE technology will succeed or fail. Although modem CASE technology ostensibly is not as foreign to most software organizations as a power saw would be to a primitive craftsman, it is far more foreign than is generally conceded by adoption efforts. Changes in process, methods, and culture accompany CASE adoption and must be intelligently managed. And as in the analogy, expertise and physical infrastructure need to be developed and put in place.
2.2 Adoption Is Difficult: Technology Maturation Study All too often CASE adoption is treated as no more difficult than moving from Version 6.0 to Version 6.2 of your operating system, even though the expectations are vastly higher. Merely dropping tool reference manuals on a programmer’s desk will not result in the effective use of new technology. Much more is needed in the way of training and support. In order to successfully adopt CASE technology it is necessary to attack both the technical problem and the cultural problem, both of which are very complex. This is not unique to CASE adoption but applies to any significant organizational or cultural change. “Managing technological change” has become a discipline of its own, with many books and articles appearing over the last several years such as Belasco (1990) and Fowler and Maher (1992). Also a number of consulting firms have sprung up, which specialize in organizational change. There are also studies which concentrate on the maturation and adoption of software technology such as Zelkowitz (1993), which looks at technology transfer inside the National Aeronautics and Space Agency (NASA). Software technologies studied include object-orientation. use of Ada, inspections, and cleanroom. The study results emphasize the multiyear nature of technology transfer and adoption. Redwine and Riddle (1985) published an interesting paper on technology maturation, summarized in Glass (1989), in which they define a five-step maturation process, from emergence of key idea (step 0) to substantial evidence of value and applicability (step 4). They then studied the history of 14 software technologies, including for example, structured programming, Unix, and cost estimation models. Their finding was that for the technologies which made it all the way to Step 4, the elapsed time was 15-20 years. They identified six potential accelerators which could help reduce the elapsed time for the maturation of a particular technology to the low end of that range. Three of the accelerators (recognition of need, management commitment, and training) are often mentioned as facilitators for organizational change in the literature of that discipline.
88
0 0 0
0 0
JOCK A. RADER
Conceptual integrity Clear recognition of need-addresses a recognized problem Tuneability-easily adopted to related needs Prior positive experience with related technology Management commitment Training (at the appropriate time)
We will not speculate here as to the exact degree of maturity of CASE technology except to note that we believe that it is not yet to Step 4. The real point is that workstation-based CASE tools did not begin to appear until the middle 1980s, when all the enabling technologieswere first available. These technologies include affordable workstations and graphical interfaces, capable of supporting common software methods such as structured analysis. Reasoning from the Redwine-Riddle results, we should not really expect substantial use until later in the decade. Given the relatively immature state of the technology, it is particularly important to concentrate on good adoption practices if you want to be successful in using workstation-based CASE tools.
2.3 Must Work Hard to Inject New Technology In order to cope with all the challenges, good planning is required which addresses the various inhibiting factors. A CASE team needs to be assembled to generate plans and then to see to their execution. The roles and skills of the team members are described in a separate section. However, at this point we will discuss briefly three very important roles: 0 0 0
Sponsor Champion Change agent
The sponsor is a manager or group of managers at high enough a level that they can provide the necessary resources and the necessary legitimacy for the adoption effort to succeed. To be effective in this role, the sponsor has to be a believer in the benefits of the technological change. Part of the champion’s job is to keep the sponsor informed and sold. In many cases, the champion may have to sell the sponsor in the first place. Without a sponsor, there is little hope of success. But in addition to the provision of resources, it is critical that the sponsor be seen to actively support the effort. A champion is the same as what Bill Curtis of the Software Engineering Institute in Curtis et al. (1987) calls the superconceptualizer. In his research, he found that most successful projects enjoy the services of an individual who is the “keeper of the project vision.” This individual is extremely familiar with
CASE ADOPTION: A PROCESS, NOT AN EVENT
89
the application area (e.g., software methods, CASE tools, and CASE adoption), is skilled at communicating the project vision, and is dedicated to the technical success of the project. The project vision has to be communicated to the project staff (change agents and adopters) in a coherent way to maintain the conceptual integrity of the project. But the vision must also be communicatedto the adopters’ management (who are accepting risk) and to the sponsor. The change agents are the major performers and facilitators on the CASE team. They provide expertise on the use of the tools and the methods they support. Thus, they will be active in training and in hand holding. They also have to be expert in the use and the administration of the host platforms. They install tools, administeruser accounts and privileges, and troubleshootproblems with the tools and the platforms. And they are toolsmiths, who tailor the tools, enhance them, and integrate them. Frequently, in order to facilitate timely support, some of the change agents are loaned to early adoption projects on an as-needed basis. Depending on circumstances, the project may or may not have to pay for this type of support. The activities of the CASE team are described in the remaining sections. It is important to remember not to expect dramatic early results from adoption of new technology. Large accomplishments can result from modest beginnings. After all, a 120-ton (110,OOO-kg) blue whale starts as a single cell. You should expect to evolve and grow your CASE environment as you use it, and to expect benefits to increase as familiarity increases and functions evolve.
2.4 Reassuring Management Managers are anxious for a variety of reasons. It is natural for them to recognize that CASE introduction means increased cost and increased risk, whereas the benefits are much less clear. Most experienced managers have been through numerous improvement initiatives that have failed to provide real benefits. Since their reward system is based on reducing both costs and risks, it should be obvious why their enthusiasm for CASE is limited. The following issues need to be regularly discussed both with project management to help allay their concerns, and with the sponsor to help maintain sponsorship. There are both short-term and long-term benefits to the use of CASE tools, some more difficult to quantify than others. The author believes that in 10 years for an organization to be in the business of building and maintaining complex, software-intensivesystems, it will have to be a skilled user of CASE environments. Not only will nonusers fail to be competitive, they will be perceived as backward and unsuitable. Just as no one can build large, complex digital modules today without a modem computer-aided design (CAD) system. Even today, government customers frequently want their contractors to be competent users of CASE tools.
90
JOCK A. RADER
Many users of CASE tools emphasize that the earliest benefit from CASE tool use is improved product quality, which then leads to improved life cycle costs. It is critically important to note that successful tool use cannot be separated from methods use and process use. Generally, vendors develop CASE tools to implement modem methods, many of which are too cumbersome to practice without automated support. Tools can be used both to encourage and to enforce methods. In Rader et al. (1993), the authors observe that a clear concept of operations accompanies successful adoption efforts. Later in the chapter, we will discuss the CASP (computer aided subprocess) concept, an approach to integrate process, methods, and tools. Motorola senior vice president, John Major, reported in Card (1994). enumerates the following advantages from his company’s initiative to improve the softwareprocess: development cycle time reduced by 20%;productivity increased by a factor of 2 to 4; and improved quality, as measured by defect rate. Tools should be viewed as an integral part of a meaningful process improvement program. An important aspect of tool use is the ability to achieve automatic document generation from an engineering database, thereby creating engineering documentation, which is more complete, more consistent, and more structured. If you are using a tool which supports structured analysis, it will contain subtools which perform various consistency and completeness checks on the engineering database. A specification generated from that database will enjoy the same completeness and consistency.When the spec is automatically generated, it becomes much easier to generate and to maintain. When project data is captured in an engineering database, in terms of meaningful objects, it becomes possible to automatically perform a wide range of activities including status accounting, analysis, report generation, and compliance verification. The subtools to perform these activities can be executed on demand or programmed to zxecute on a scheduled basis. As more and more process fragments (subprocesses) are supported by tools, more and more project data will reside in electronic databases. This opens the door to future benefits involving integration and automation across the entire lifecycle. Part of getting management’s support for new ideas is just a matter of making them comfortable with those ideas. One technique for making them comfortable is repeated exposure over time. This can be done via a variety of interchanges including briefings, status meetings, informal chats, vendor presentations, articles, and papers. The wider the range of sources, implying broad support in the industry, the more comfortable they are likely to become. It is important to stress key issues and key benefits, and to be consistent.
2.4.1 Mitigating the Cost of CASE Adoption The cost of adopting CASE tools can be considerable. The most obvious costs are those associated with the purchase of platforms and tool licenses. But if
CASE ADOPTION: A PROCESS, NOT AN EVENT
91
adoption is to succeed, there are also the necessary costs of training and support. Training costs are not only the costs of the instructor, but also the time costs of project staff while they are in training and their reduced productivity as they work their way up the learning curve. If management demands a short-term return on CASE in order to consider its introduction, there are many articles in the literature reporting a wide range of productivity gains. For instance, figures collected by Don Rieffer, and quoted in Myers (1992), report productivity gains of up to 25%. The average gain in three categories of industries was: information systems, 12%; scientific, 10%; and real-time aerospace, 9%. Depending on the circumstances, a modest 10-15% productivity increase may or may not cover the cost of introducing CASE, but certainly it will help offset the cost. Unfortunately, some articles have reported much larger increases, such as 100-300%. Although it’s possible that in some circumstances, tool adoption coupled with other changes could lead to dramatic improvement, claims of this sort can cause great harm. A manager could easily dismiss CASE technology as frivolous when presented with these numbers. Alternatively, a less skeptical manager might believe these results are readily possible, and embark on a course leading to grossly unfulfilled expectations. In some instances, the sponsor may be willing to adopt the stance that upgrading tools is a cost of staying in business, since tools are routinely upgraded for secretaries and clerks, usually without elaboratecost/benefitjustifications. Project management’s anxiety about costs can certainly be allayed if much or all of the adoption costs are assumed by the sponsor, rather than by the project.
2.4.2 Mitigating Project Risk Concerns A very important stricture is to choose the first project wisely, that is, in a way that provides the maximum opportunity for adoption success. This means that the project should be small enough that the CASE team can support it adequately. A rough rule of thumb might be a project where 15 is the largest number of users in the first 12 months, and the initial user set should number no more than 5 . Ideally, you want to be able to get commitment from key project technical leaders. You should plan and prepare for CASE adoption as explained in this chapter. This means providing all the necessary resources. Involve project management in reviewing your preparations and your plans so that they know what to expect. Demonstrate what is expected early and often, and evolve your approach. Make sure that the project is aware of the sponsor’s support and interest. The sponsor can do much to reduce risk, since the sponsor can provide additional resources and can also help amelioratesome of the project constraints, by assisting in the negotiation of milestones or the formats of work products. Even more
92
JOCK A. RADER
important, when the project staff see the sponsor’s support demonstrated, they more readily commit to the goals of the adoption effort. Another important action, which both demonstrates sponsor commitment and reduces risk, is the choice of a CASE champion,who can perform the superconceptualizer function described above. This individual provides the bridges among the sponsor, the project, and the change agents. A weak choice will build weak bridges, increasing risk directly, and calls into question the sponsor’s resolution, also increasing risk, but more subtly.
2.5 Reassuring Technical Staff There are many techniques which can be used to help win support from the technical staff who will be your adopters. Remember that they too will be very skeptical at first. Virtually all seasoned software engineers have been afflicted during their careers with the adoption of new procedures, which all too frequently have enhanced their work experience with the twin agonies of increased effort coupled with increased risk. At first, project staff will be ignorant of what you are trying to accomplish, how you plan to go about it, and most importantly how it will affect them. The CASE team has the responsibility to inform and to explain. So in addition to the requisite training in process, methods, and tools, you must provide overview training to provide a context and to get everyone involved. Make sure that everyone attends demonstrationsearly on, and then provide hands on experience to as many people as possible. For many people, the graphical editors of CASE tools are seductive. It is particularly desirable to win the support of the entrenched experts. Explain what you are trying to achieve and get them involved in planning and preparation for the adoption. This makes it clear that you are interested in their expertise and experience. You can explain that CASE helps to leverage the best people, it does not automate their jobs away. Elucidate how you plan to use CASE to automate drudge work, such as document generation or consistency checking. Explain that CASE does not reduce either the need for application expertise or the need for creativity. Make the technical staff aware of the arguments that you have advanced to project management. Technical staff have many of the same concerns as management. Moreover, before they commit to support your experiment, they would like to believe that project management is committed. If they see that a convincing argument exists for the adoption, they will be more receptive to the idea and less skeptical about management’s sincerity. Also inform them of the sponsor’s interest and commitment. Better yet, allow the sponsor to address the project staff on these topics.
CASE ADOPTION: A PROCESS, NOT AN EVENT
93
As the project progresses, hold regular meetings with the staff to share experiences, good and bad, with the use of the new technology. It is reassuring to know that your coworkers are struggling with the same problems that you are, it is very helpful when you can use a solution one of them has developed, and it is very satisfying when your solution is adopted by others in the group. When the adoption effort is managed correctly, the staff will see that it evolves in response to the issues that are surfaced at these meetings. Do not pretend that you do not expect unforeseen problems to crop up. In McSharry (1994). the following technology transfer rule is suggested, "Maintain a healthy respect for Murphy's Law." Inform the staff that you expect problems to arise and you are prepared to deal with them. You plan to discuss the problems and potential solutions in the regular adoption status meetings, and you have support from management and the sponsor in working solutions to the problems.
3. Planning and Preparing for CASE Adoption In this section we discuss CASE adoption planning based on five phases of CASE adoption. First we describe the purpose of an adoption plan and suggest an outline. This is followed by some informal examples of possible contents. The last two topics in this section explain the resources of a CASE infrastructure, and the importance of the CASE team. The subsequent section provides a case history of CASE adoption, and the four sections after that discuss each adoption phase individually.
3.1 Introduction to Plan Generation The five phases of CASE adoption are: 0 Planning and staffing 1 Awareness and gathering 2 Evaluation and selection 3 First victim 4 Second victim and beyond (expansion and evolution)
The zeroth phase is to get ready for the four phases which follow, and it is the zeroth phase that is the main concern of this subsection. The five phases are understood to overlap and interact. The designation of the fist operational users as first victim is intentional. Rather than expecting the first user to believe that they are fortunate because they have been chosen as your guinea pigs, it seems fair to acknowledge that you will probably be introducing risk into their project. By making a joke of
94
JOCK A. RADER
their role as victim, you are acknowledging the risk, and expressing that the CASE team accepts responsibility to minimize the risk by its support. Of course, this commitment must be backed up with deeds. As mentioned in the introduction, it takes about 6 months to get ready to introduce tools onto a first project (steps 0-3), and 3-5 years to institutionalize the use of CASE technology (steps 0-4). The time required for institutionalization depends on: 0 0
0
The size of the tool set, The degree of tool integration, The size of the adopting organization, and The resources brought to bear.
The results of phases 1-3 will greatly influence phase 4, so that any plan made for the fourth phase before CASE has been successfully adopted into at least one real project, is likely to experience heavy revision. The five phases are derived from empirical observation and participation, influenced by the literature of CASE adoption and by the more general literature of technological change, e.g., as in Fowler and Maher (1992), ANSI (1992), IEEE (1994), Belasco (1990), and Bridges (1991). Briefly the main activities of the phases are:
0. Planning and staffing: begin to staff the CASE team, generate an adoption plan, establish a sponsor 1. Awareness: learn the relevant processes, needs, and culture of the current organization; gain an understanding of available, applicable CASE technology, gather list of applicable tools 2. Evaluation and selection: evaluate applicable tools, narrow the field and evaluate in more detail, continue evaluate and narrow until selection is made 3. First victim: plan and prepare for introduction, train CASE users, provide in process support, constantly evolve 4. Second victim and beyond: institutionalize and constantly evolve
3.1.1 Elements of the Adoption Plan A possible organization for an adoption plan, containing eight sections, is provided below. The sections can be partitioned into two classes, a general set (numbers 1-3) which establishes objectives and resources; and a second set which describes actions leading to realization of the objectives. The first section might contain strategic objectives (automate all development phases and project management over 4 years), and short-term objectives (get a
CASE ADOPTION: A PROCESS, NOT AN EVENT
front-end tool into operationaluse on a real project as soon as possible). Following an incremental approach, early versions of the plan might only pertain to short term objectives and a portion of the ultimate environment, e.g., a single tool or method. Thus, Sections4-8 might only be generated for the short-termobjectives. Later they would be expanded or repeated to cover additional tools and methods, and appropriate combinations of methods and tools. 1. Vision and objectives and goals 2. General resources 3. General constraints 4. Awareness 5 . Evaluation and selection 6. Preparation for the first victim 7. Support of the first victim 8. Second victim and beyond
3.1.2 Evolving the Adoption Plan The following sentiment is attributed to General Eisenhower, the allied commander responsible for the retaking of Europe in World War 11, in his remarks at the National Defense Executive Reserve Conference in November 1957, “plans are worthless, but planning is everything.” The point is that learning to make effective use of methods and tools is full of surprises, even though adopting CASE is less dangerous than waging war. And the wise commander expects their plan to evolve during the come of a battle or campaign, in response to these surprises and to changing contingencies. Still, the vision statement and the enumeration of objectives and goals can be very helpful when making decisions as you evolve the plan. The only real surprise would be if there were no significant changes. Thus, the plan should be written anticipating that it will be called upon to evolve. The less knowledgeable the CASE team is, the more sketchy will be the detailed portions of the plan. So those portions will grow as well as evolve in the course of the campaign. In Favaro er al. (1994), the author speaking about the experience of the European Space Agency described how that project had been wounded by overly strict enforcement of their plan. The project levied requirements on tools to support some processes that could not be met by any COTS (commercial off the shelf) tools. For whatever reason, these requirements were treated as inviolable,leaving the project with an unhappy dilemma. They could either forego tool support for that subprocess or write their own tool. A critical consideration, if your goal is to provide a reasonably complete environment or even to institutionalize a small set of tools, is the immaturity of CASE technology. In 3-5 years, some of the tools you might pick today will
96
JOCK A. RADER
no longer be supported,and others will have been surpassedby events. A competitor, who might not even be a contender now, may be providing a tremendously better capability then. Or the capabilities of some tools might be subsumed into the capabilities of other tools or framework services. Since VisiCalc burst on the scene around 1980 the history of tools for desktop platforms includes many examples of competitive products, even early leaders, that have followed the Edsel into oblivion. With respect to integration frameworks or services, the offerings are even more immature. Therefore it would be particularly risky to irrevocably commit your future project support environment to a particular product which is offered today. And finally, do not forget that much can change in your own organization in 3-5 years, especially in the current business climate where companies are merging, buying and selling operational units, and redefining their product lines. So flexibility and the ability to evolve are key. In Figure 1, we see the continuous nature of the first plan, then act-evaluateadjust cycle. In case it is not clear, the author admits to a strong bias toward an evolutionary approach when confronting complex problems or development efforts. For more information on evolutionary development, see the writings of Tom Gilb (1985,1988).The extensive literature on the spiral model, made popular by Barry Boehm (1988)also emphasizesan evolving approachto planning. Finally the literature of Total Quality Management (TQM) recommends an evolving approach to improvement. The title of this chapter suggests that CASE adoption is a process not an event. This can be restated to say that adoption is a journey, not a destination. Viewed as a journey, the plan can be considered an itinerary providing a destination and a path.
O-Fpg Adjud
CIIL
Ra. I . Plan-Act-Evaluate-Adjust cycle.
CASE ADOPTION: A PROCESS, NOT AN EVENT
0 0
0 0 0 0
97
Choose a destination Choose a path Start your journey Experience life and its challenges Adjust path andor destination Continue the journey
3.1.3 Questions Answered by an Adoption Plan In order to maintain focus it is necessary to have an adoption plan. This is not unique to CASE adoption but applies to any technological or organizational change. Based on the plan outline suggested above, we now discuss in more detail the type of planning material that should be in each of the sections. To a large extent, the discussion is in the form of questions that specific sections of the plan should attempt to answer. This is not meant to be a comprehensivelist of questions, but rather to give by example the kind of information that should be in each section.
Vision, Objectives, and Goals. This section of the plan outlines what we are trying to accomplish and the scope of our ambition. Are we trying to get one tool into use on a project of 10 people or are we trying to develop and deploy a complete environment for a thousand software engineers? ..-
What will be the characteristicsof the new environment? What benefits are we looking for? Do we want to improve quality? Productivity?Do we want to improve communication among engineers? Is metrics collection important? Is obtaining more realistic CASE expertise and startingto build a CASE infrastructureimportant? What life-cycle phases are we interested in? What platforms and operating systems are we concerned with? What specific accomplishments are we seeking?-a method to automate document generation or to generate more consistent software requirements specifications (SRS)?What is the time frame?--first trial generation in seven months? First customer delivery in 10 months?
Resources. A discussion of resources is important so that staff will know what they can expect, and so they can judge whether the resources are compatible with goals. Two critically important resources are management commitment (you need a sponsor) and the CASE team. What resources do we have that will help with the proposed adoption? Do we have existing documentation of our existing processes?-of the subprocesses we expect to automate?Are staff who are familiar with the organizationand its processes available to work on the project. Do we have staff with some experience with CASE tools?-staff with experience with the methods to be automated? How much
98
JOCK A. RADER
management support do we have? Where will the funding come from? What equipment is or will be available? How do we expect to acquire tools licenses? How will we support installation, administration, and support of the licenses?
Constraints. Constraints are another form of requirements, which when not clearly explicated can lead to confusion, frustration. and bad feelings. Staff, whose ideas or work are rejected because it conflicts with unstated constraints, may become demotivated. If the constraint is still not revealed when rejection occurs, conflict is virtually certain. Some constraints are limits placed on resources, and thus are closely related to resources. As with resources, a statement of constraints is useful for evaluating compatibility with goals, and helps make explicit what is expected. What aspects of the existing environment are eligible for change and which are not? What platform or platforms must be accommodated? Are there customer constraints that must be met? Is secure processing an issue. What existing methods and tools must the new functionality integrate with? Are there organizational or customer reasons to favor some approaches? Are there other initiatives to change the culture to which this effort must interface? Is it necessary to meet some external standard of process maturity, such as the Software Engineering Institute (SEI) Capabilities Maturity Model (Paulk ef al., 1993) or IS0 9OOO.
Awareness. In the section on awareness, you explain how the CASE team will gather the information that will lead them into the evaluation phase. For each of the tool types identified of interest in the objectives section, you should compile a list of candidate tools to be evaluated. You may wish to compose two lists, one of candidates for evaluation, and one of tools briefly considered but rejected for important deficiencies. What types of tools will be evaluated? How will the team find out what is available? What tool characteristics need to be collected for each tool of a given type?-Cost? Platform? Methods supported? What important attributes. if missing, would disqualify a tool? Will demos always be required? Should some hands-on use be required? How will the existing software process be captured?
Evaluation and Selection. The evaluation sectiondefines the overallevaluation and selection strategy.The selection strategy may go through several stages, successively narrowing the field at each stage. (Stages are explained below.) It broadly defines the evaluation criteria for each stage, and provides guidance to evaluate how are we doing, how to decide whether to continue, and how should we modify the plan. What will be the evaluation strategy?What characteristicswill be evaluated? How many tools will be evaluated? Exactly which tools? How many stages of evaluation
CASE ADOPTION: A PROCESS, NOT AN EVENT
will there be?. What will be evaluation means at each stage?'Brochurereading?Extensive prototyping? What is the prototyping strategy? How will the software be acquired?-Evaluation copy? Purchase? Loan from elsewhere in organization?Which specific hosts will be used? Who will support them? How will results be recorded?
Preparing for the First Victim. This section describes how the CASE team will get ready to support the first victim, who may or not be known initially. As this first project is identified, more detail can be provided, including dates that various levels of capability are needed. It can describe the functionality of prototypes needed to reduce risk and to help define the detailed requirements for the operational functionalities. What is a likely target project for first introduction?What is the likely timetable? How will methods training be provided? Tools training? Who will pay for the hardware and the software? Who will actually go through all the pain of ordering and seeing that everything is delivered on schedule? What functionality needs to be prototyped to reduce risk? What manner of tailoring, enhancing, and integrating might be required for operational use?
Supporting the First victim. At some point there will be actual users of the tools and the methods. In addition to training, they will need ongoing
support. Moreover, the tools and their use will evolve during their use, especially on early projects. Who is the first victim? Who on the project is likely to be an early adopter? When can we start working the opportunities?Who will support the platforms and the operating system? W h o will do backups and administer accounts? In what mix will the project and the sponsor pay for support?How will decisionson the evolution of the tools environment be made?
Second Victim and Beyond. This section deals with how you will spread the use of your environment to other projects, and your strategy for institutionalizing and evolving your environment. It should describe how you will employ human resources (i,e., experienced users), not just licenses and hardware. Did we get good results from first use? What will we do if the answer is "no?" What things worked and what did not? How large a second or third project are we willing to consider? What organizational problems might have to be overcome? What will be the cost of spreading use to the entire organization? What will be the cost benefits? Over what time frame? What are the other important benefits (based on early experience when you have some)?In what order will new capabilities be introduced? How will we coordinate support of multiple projects? How will we manage evolution of the environment in the face of requests from different projects?
100
JOCK A. RADER
TABLE I INITIAL AD~FIIONPLAN Corns Section
Example contents
To reduce the amount of engineering effort to generate formal requirements documentation and testing documentation, and to bccome more knowledgeable about CASE tool integration or the real value of structured analysis tools (Note that increased knowledge is a legitimate goal.) Funding for 1.2 people for eight months, a commitment of nine hours per week 2 Resources of system manager support for the workstations, Tonto is free half-time for three months and full time thereafter and he already is familiar with structuredanalysis. Availability of two old workstations which will run the latest version of the tools being investigated and adopted, three evaluation licenses of one tool and one license of a second tool 3 Constraints Existing workstations must be used, there is a limit of $15,OOO that can be spent on software and external trainingkonsultation; at least a third of the money must fund someone from Sam’s organization. Project XY47 is starting in seven months and is a highly desirable candidate for first victim. The eventual system must be portable across at least a particular list of host platforms. Read brochures. ads, papers, etc., go to tools fair and watch demos, talk to users 4 Awareness when possible Compile a list of applicable tools. The workstations must be moved to the west end of the sixth floor and connected 5 Evaluation and selection to the network. Evaluation copies have to be obtained from the vendors. Arrange one day of training at the vendor office. Rehost Tool B onto the evaluation workstations. Have Tonto spend several hours a day for two weeks becoming proficient with the tool’s support for structuredanalysis and document generation Try to shadow the generation of an existing document, e.g., try to generate an interface specification of the quality necessary for delivery to the customer. Design evaluation strategy and evaluation criteria. Evaluate and document ’ Select After threc weeks of shadow generation of an interface Specification,how much 6 Preparation additional work will be needed to provide a production capability? Will it be for first possible to generate such a capability at all? What will be the hardware and victim software cost to provide enough seats for Project xyq? How useful is the user interface for the tools? Are. the tools robust enough for production use on a project? Is response adequate? Order hardware; order software; arrange for training Perform necessary tailoring, enhancing, and integrating Find a first victim 7 Support first Train project staff in methods, procedures, tools; try to use CASE team as trainers where possible victim Consult and troubleshoot, evolve functionality in response to user needs 8 Second Keep CASE team intact, assume partial funding from overhead and part from victim and projects Use testimonials and consulting from first victim beyond 1 Objectives
9
CASE ADOPTION: A PROCESS, NOT AN EVENT
101
3.2 Adoption Plan Examples The most important characteristicof the plan is its usefulness, not its formality. Table I provides an informal example as a starting point. As the action steps of the plan unfold and evolve, more information becomes known and decisions get made. After a first victim has been identified and analysis of needs performed, the additional information in Table I1 could be added to the adoption plan.
3.3 CASE Infrastructure The CASE infrastructure is a set of resources which are required in order to make CASE adoption successful, whether it is operational use of a single tool on a single small project, or if it is routine use of a score of tools in a division of lo00 software engineers working on 15 different projects. Resources are of two kinds: first, people with skills and knowledge; and second, artifacts such as tools and documents. We divide the resources into the following categories.
1. People and the CASE team 2. Tools 3. Host platforms TABLE I1 ADDITIONAL
DETAILFOR ADOFTION PLAN
Order 12 Ajax Model 77 workstations,with 24 megabytes (MB) random access memory (RAM); and a Model88 server with 64 MB RAM and 4000 MB disk storage (workstations will also be used for code and test) Order licenses: ten simultaneousstructured analysis users and five simultaneous desktop publishers Need to generate customerdeliverable requirementsspecification in ten months Approximately two-person level of support for tailoring and extending, also arrange system manager support Leslie will train project staff in methods and procedures, Gary will do tool 7 Support first victim training. Still looking for a better way to support the workstations Have identified a couple of project people who are interested in the methods and tools 8 Second victim Will target for BigWow program. stating eight months after first victim, if early results promising and beyond * Keep CASE team intact, assume partial funding from overhead and part from projects Lori. from first project, able to spend 25% of her time consulting to new project for six months
6 Preparationfor
first victim
102
JOCK A. RADER
4. Subprocess and methods
5. Standards 6. Framework technologies 7. Documentation and training We now describe the status of each of these sets of resources as it might exist at the end of the planning phase, and before the awareness phase.
3.3.1 People and the CASE Team In the literature of technical change, three types of necessary stakeholders are frequently identified: champions of change, agents of change, and sponsors of change. All three types are needed to make change successful. Champions and agents will be members of the CASE team, the team responsible for making adoption happen. The sponsor is a manager at high enough a level to provide two very important assets. The obvious asset is money and other useful resources. But perhaps even more important as an asset is legitimacy. If a division manager shows a project manager that adopting new technology is as important to senior management as meeting schedule, the project manager eventually will accept adoption as a legitimate problem in the same manner as schedule. The champion and agents are discussed in the next section about the importance of the CASE team. Certainly, CASE skills and knowledge are major assets of the CASE team. However, their communication skills are also very important, especially for champions, who must constantly be selling the new technology to managers and technical staff alike. For an organization with very little CASE experience, CASE knowledge and skills are likely to be meager. At the end of planning, there should be at least a champion and a sponsor. Otherwise, planning is not likely to be productive. It is to be hoped that change agents such as candidate toolsmiths and operating system experts have been identified.
3.3.2 Tangible Resources By tangible resources, we mean categories 2-7 from above.
Tools Artifacts. During planning there could well be no tool assets, or perhaps some out-of-date evaluation copies lying about.
Knowledge. Some of the team members may have experience with tools, not necessarily with tools of the type to be adopted.
CASE ADOPTION: A PROCESS, NOT AN EVENT
103
Host Platforms Artifacts. The platforms and networks to be used for evaluation may or may not be existing assets. During planning it may not yet be known which platforms and networks will be used.
Knowledge. The team may or may not know how to support the host platforms.
Subprocess and Methods Artifact. If the methods to be used are known, then the team probably has access to documentation on the methods probably from the general literature. Knowledge. The team may or may not have expertise with the relevant methods. Some team members may have received training in methods.
Standards Artifacts. The team may have some documents on reference models, message standards, schema, data interchange standards, or presentation standards. Knowledge. The team probably knows little about interface standards, especially if this is to be a first CASE adoption experience for the team.
Framework and Integration Technologies Artifacts. The team may have some documents on framework technologies: e.g., data repository frameworks such as PCTE and ATIS; data exchange formats, such as CDIF; or control frameworks, such as Softbench. They are unlikely to have any framework services software. See Brown et al. (1994) for discussions of these technologies. Knowledge. The team probably knows little about this topic, especially if this is to be a first CASE adoption experience for the team.
Documentation and Training Artifacts. The team will have generated very little documentation except for the adoption plan itself. Knowledge. The team may or may not be able to provide training for some of the methods to be adopted.
104
JOCK A. RADER
3.4 The Importance of the CASE Team An important part of the CASE infrastructure is the CASE team, the group of individuals entrusted with making the adoption of CASE successful. This group becomes CASE experts, evaluates CASE tools, helps select the tools to be introduced, and tailors and enhances the tools, after first deciding how they are to be used. They work with the users to understand user needs and user concerns. They provide methods training and tools training, and explain the concept of operations that led to the specific capabilities being provided. They also provide support to users of the tools. In addition to the operationalresponsibilities,the team has strategic responsibilities it must perform. It must develop and maintain both the project vision and the plan. Because of the immaturity of the technology, it can be anticipated that their will be considerable evolution of the plan. Other important duties include communicating objectives and goals throughout the organization on a continuing basis. Most importantly, the team must keep the commitment of its sponsors. As has been mentioned, this support is needed both for access to resources and for legitimacy. Legitimacy can be extremely important in working with the first victim. It is critical that the group function as a coherent team for two reasons. First, each individual will make countless decisions that are interrelated to the decisions made by the other team members. It is therefore crucial that the team meet regularly together to maintain a common view of their objectives and to be jointly aware how reality is impinging on their plans. The second reason is the need for continuity of the CASE team. There can be no institutionalization of CASE use, if there is not a knowledgeable team to support it.
3.4.1
Traits and Duties of the Team Leader
The role of the team leader is particularly important. The leader is instrumental in developing the project vision and keeping it sold to the team, the eventual users, and the sponsors, alike. It is desirable that the vision remain relatively constant over time. Conceptual integrity of the vision is an important ingredient in winning staff support. The leader also has the primary responsibility for building the team. This means recruiting the right people for the team and then making sure that they have adequate resources and adequate opportunities to acquire the knowledge that they need. This implies that the leader have access to management and the sponsor. The leader need not be the chief technical expert in any of the relevant areas, but must have an adequate grasp of methods and tools and needs so as to be able to help the team mediate solutions to problems. The leader is constantly acquiring and disseminating information of all kinds enabling the other team
CASE ADOPTION: A PROCESS, NOT AN EVENT
105
members to better focus on their activities. The leader uses their grasp to act as knowledgeable spokesperson outside the team, helping to keep the project sold. Good communications skills are extremely important for the team leader as is obvious from the description of the duties. There could be more than one champion, but the project leader is probably the main, if not the only, champion. Flexibility is another important trait, because the adoption plan will evolve as the project unfolds. At each evolutionary stage the plan evolves based on the current best understanding of user needs, organizationalneeds, and tool capabilities. All of these will always be imperfectly understood and changing besides, particularly so for the new technology. The author believes that continuity in the team leadership is an important conditionfor success of the project. Organizationsresist change, and inconsistency or perceived inconsistency in the objectives and the actions of the team will support the natural tendency of the organization to reject the new technology. This tendency will be intensified even more if the leader has left to work on something that is “more important.”
3.4.2 Traits Needed by CASE Team Members The team collectively requires knowledge in many areas. Perhaps most obviously, the team must be expert in the use of the tools. But to be effective with the tools, it must also be expert in the methods that the tools support. Also important is knowledge of the applications area(s) in which the tools are to be used. This expertise is necessary both to define how the tools can be used and then to support their use with training and consulting. New users will grasp understanding more quickly and feel more comfortable when the CASE team is able to discuss concepts and tool use with reference to specifics of the application. Because the tools need to be tailored and enhanced, the team requires implementation knowledge. This includes some level of internals expertise with the tools, expert knowledge of the operating systems involved (workstations and servers), programming knowledge, and knowledge of the networks involved. By internals knowledge in this discussion we mean the ability to manipulate the myriad’s of customization and extension features of these highly complex tools. This usually requires knowledge of internal formats or database access procedures. Finally, the team needs to be familiar with the users’ organization and the processes the organization is used to following. It is likely that with some care the new computer aided methods can support some of the existing activities, as well as modifying, adding or deleting others. Understanding existing processes, and their underlying rationale, can be valuable when designing and subsequently supporting new ones. For most organizations that are looking to introduce CASE, it is improbable to be able to form a team that has all of this knowledge a priori. At inception,
106
JOCK A. RADER
the team must certainly have some of the knowledge, but must posses the ability and the will to acquire the rest. So a general interest in new technology or a specific interest in CASE technology is a trait required by the team. Because it is not possible for each member of the team to be expert in each area, it is important that the team meet regularly to keep each other informed about significant developments in the various areas, as well as to coordinate the activities of the team.
3.4.3 Training: A Crucial Responsibility Providing training is extremely important and will be discussed again in detail in the section about supporting the first victim. It is a critically important duty of the CASE team. In some cases the team may conduct the training, and in others arrange for an external agent to provide the training. It is common for organizations, particularly technical organizations to skimp on training. Inadequate training is very destructive to major change. Types of training necessary include: 0 0
0 0 0
0
Methods training Tools training Illustrative examples Project standards training Training of trainers In-process support, a form of training
Even more training and support is required for institutionalization than for the first victim, if the first victim was chosen wisely, because the average project will contain more late adopters and laggards among its staff than the first victim. On the other hand, first operational use has created a group of experienced users, which can be exploited on later projects.
4. CASE Adoption Case Study This CASE adoption case study is of a project at Hughes Aircraft. A year before first operational use, we knew very little about CASE. Six months later we had put together the beginnings of a team and had some experimental use. During the 6 months leading up to first use, we did extensive prototyping, first to make a selection, and then to get ready for the first victim. Use of the CASE tools on the project spanned about 28 months. Document deliveries on the project, approximately every 6 months, were important events for assessing the value of the CASE tools. With each delivery, we were able to achieve consistently better results.
CASE ADOPTION: A PROCESS, NOT AN EVENT
107
The application was embedded avionics for the Department of Defense, but for the most part this had only minor significance to the adoption experience. The nature of the customer did require that more attention be paid to detailed formatting requirements of the CASE tool generated documents, than if this had been a commercial development.
4.1 Awareness Interest in CASE tools began to develop along several fronts in 1988. Some of the technical staff began to talk to vendors, to invite vendors in for demos, and to acquire evaluation copies for experimentation. At the same time, a member of senior management began to get interested in the possibilities of CASE tools. He organized some additional meetings with vendors and provided some funding to begin more extensive investigations. Technical staff was also attending trade shows and reading the literature. In October and November of that year, he sent two technical staff, an applications software engineer and a support software person (the author), to spend 2-3 days at the home office of two of the leading vendors of workstation-based tools, which support structured requirements analysis. The purpose of the visits was to learn something about the two companies and their products. It was also an opportunity to get some intensive hands-on experience with the tools with expert handholding by the vendors. The author was to become the CASE champion and the CASE team leader. He was already knowledgeable in the method of structured analysis and served as an instructor in the Hughes advanced technical education program (ATEP). The class covers the Hatley-Pirbhai real-time structured analysis method, and involves 20 class hours plus homework. Fortuitously, although a first victim had not yet been chosen, one of the students in a class in early 1989 became the lead requirements engineer for the first project. Because structured analysis was the predominant method implemented by CASE tools at that time, and because management was interested in creating better requirements, it was decided to concentrate on structured analysis and document generation for first operational use. Many of the vendors provide a template-driven document generation capability. Typically, the vendor supplies a set of templates, You, the user, can use the templates as is, modify the vendor’s templates, or generate your own templates. Because a structured analysis model maps naturally into the format of a 2167A (Department of Defense, 1988) Software Requirements Specification(SRS),a favorite Department of Defense format, most vendors provide a template to generate an SRS. We looked at sample SRS documents, which were automatically generated by the vendors from sample structured analysis models, and found that the results were generally deficient in both format and technical content. However, the
108
JOCK A. RADER
vendors claimed that the autogeneration process could be tailored to meet our needs. This then became the basis for our serious evaluation.
4.2
Evaluation and Selection
We quickly narrowed the evaluation to two workstation-basedproducts. If you are beyond the awareness phase, you probably know what they are. At that time, personal computers (Pcs) and Macintosh (Macs) did not seem like industrialstrength desktops, so all tools hosted only on those platforms were eliminated from consideration. Other tools were eliminated for a spectrum of reasons: they did not support the desired requirements method, or they did not have a viable document generation facility, or their use of windows was clumsy (lack of multiple windows or scrolling), or their database was too primitive. Also, some of our customers required the use of Digital’s VAXNMS as the development platform, so we had a preference for tools which ran on that platform. So in effect, we had a two stage evaluation procedure. The first stage was a quick culling out of all but two tools by the evaluation criteria just discussed. And the second stage was conductedby first developing a scenario for operational use, and then prototyping the use of each tool with respect to that scenario.
4.2.1 Minireview of Selected Structured Analysis Concepts We provide a quick review of structured analysis because some appreciation of the method helps make the document generation technique clearer. For more detail, the standard text on structured analysis is Demarco (1978). Extensions (control specificationsand control flows) to the method for real-time applications are added in Hatley and Pirbhai (1987). The system (or subsystem) being defined is represented by a tree of data flow diagrams (dfds) and associated process specifications (pspecs). A dfd contains a set of bubbles (circles) representing processes, interconnected by arrows representing data flows. Each bubble on a dfd has a child which is either another dfd or a pspec. Eventually all branches will terminate in a pspec. The tree, thus defined, has a dfd for each node and a pspec for each leaf. Detailed definitions for each data flow appear in a data dictionary. The root node of the tree is called the context diagram and contains exactly one bubble which represents the complete functionality of the system. Arrayed around the bubble are agents external to the system, represented by rectangles. Arrows representing data flows and control flows are drawn between the system bubble and the external agents. Thus, these flows correspond to system interfaces. A pspec describes the functionality provided by its parent bubble. It describes how inputs to the process are consumed in order to produce the outputs. Although
CASE ADOPTION: A PROCESS, NOT AN EVENT
109
there are many ways to define a pspec, e.g., via diagrams, truth tables, text, etc., tools mostly expect textual pspecs. Both of our two finalist tools provide the capability to attach multiple notations to each object in the model (dfds, pspecs, data flows). These notations figure heavily in the tailoring and enhancing of the document generation activity. The details of how notations behave differs between the two tools, but the essential functionality is similar.
4.2.2 Evaluation Criteria and Activities The goals of our evaluation activity was to answer a number of questions. 0
0
0 0
Can we represent the technical content of one of our typical software systems as a structured analysis model, captured by the tools? How useful are the tools for entering and maintaining the model in the tool database? Is such a model useful for engineering communication? Can deliverable documents be automatically generated from the model?
This evaluation activity consisted of choosing a requirements specification (SRS)from a recent project, then capturing the technical content in a structured analysis model, entered into the database of both tools. The tailorability features of the document generation function of both tools was then exercised to try and overcome the deficiencies mentioned above. The document-createfunction uses a template,either provided by the vendor or the user, to generate a document with a given structure. The template defines the sectionsof the target documentand for each sectiondefinesthe contentsviaacombination of text and directives.Text is copied directly from the template to the target file. A simple directive might pull a single object (e.g., dfd, pspec, notation) from the database and insert it into the target file. A complicated directive might walk all or part of the tree of dfds and pspecs and generatea correspondingtree of subsections in the target document. This is a powerful feature. The target file is produced in a format which is specific to the publishing tool for the target document. The normal format choices on workstations include Interleaf and Framemaker.Therefore,the document-createfunction and the directives must be able to generate their outputs in the necessary formats. The main content deficiency of the predefined templates is a lack of descriptive text, except for the contents of the pspec’s. For instance, there is no narrative for each dfd explaining its overall functionality, and the input and output flows for the dfd are merely listed with no discussion of their attributes or purpose. Nor is there any obvious place other than the template, to put text which gives an overview of the software system, or its performance characteristics, or references to other documents. etc.
110
JOCK A. RADER
This is where the notation capability becomes important. Several paragraphs which give a description of the overall purpose of a dfd can be placed in a notation attached to that dfd. The document generation utility can then be trained to look for such notations and to include the text in the appropriate place in the target document, for example, immediately preceding the diagram. Note that not only are we tailoring the tool, we are tailoring and enhancing the structured analysis method, as well, by requiring that a notation be attached to each dfd. To data flows we attached notations which provide unit and accuracy requirements on the flow plus a notation to give a brief description of the flow. For external flows we also attached a notation which gives a reference to the interface specification where the detailed definition for the interface can be found. We also attached to the context diagram, a handful of notations which were not specific to any particular object in the structured analysis model but needed to find their way into the target document. Included in this category are document introduction, overall system performance requirements, and a description of the component verification strategy. A more challenging form of enhancement was the definition of our own directives. This required that our toolsmith master use of the tool’s application programmer’s interface (API) and also the input format for our publishing tool. Documentation for both interfaces was inadequate, so that a great deal of trial and error was required. However, once mastered this gave us a great deal of power that we continued to build on for several years. The selection event itself was anticlimatic. We felt that the two tools were very similar in overall quality and both were suitable for our needs. One tool was stronger in some respects but the second tool was stronger in others. If our needs had been different, we might have found reason to more clearly prefer one of the tools, since not all methods all supported equally well by both tools. Then about that time it became clear that a major customer of ours was adamant in its preference, and since we had no strong preference ourselves, we took a path of low resistance.
4.3 First Victim During the three months just before first use, the CASE team performed the necessary tailoring and enhancement to the CASE tools. This work was based on the experiments and prototypes developed in the months before. The CASE team received good support from the vendors at this time, but still had to combine the support with trial and e m r in order to work out how to do all the enhancements. These types of interfaces tend to be better documented in 1995 than they were at the time. Methods training was given by the CASE champion and the lead requirements engineer. The first class contained 12 hours of lecture over 3 days. Various
CASE ADOPTION: A PROCESS, NOT AN EVENT
111
versions of the training were repeated several times. Engineers did not seem to retain very much of the material unless they were soon to apply it. We experienced the same phenomenon with training on subsequent projects as well. When methods training was repeated months into the project, the trainer could use examples from the project which also seemed to help. Tool training was conducted partially by the vendor and partially by the CASE team. Later on, the CASE team handled the tool training without vendor assistance. This allows the training to focus on the tool functionality required on the project, and allows training for the enhancements to be folded in smoothly with the training.
4.3.1 First Project Usage: Months 1-6 The first six to eight weeks of this period was a time of high chaos. The workstations and operating system were new to the organizations as well as the CASE tools. At first, even the local vendor office had difficulty configuring the operating system to be reliable. Once we were beyond this problem, we ran into situations where the print queue would hang and prevent all printing, or that individual user accounts could not run the tools. Both were frequently recumng problems in the first few months. During the first 6 months, the toolsmith divided his time between working on improved capability and troubleshooting various problems with the two CASE tools. During this same period, the system manager spent about 75% of his time trying to keep the system and the tools healthy enough to use. This was a very difficult time for the project staff who were trying to use the tools and the CASE team as well. Due to great diligence and extra work by both groups, we were finally able to “automatically” generate a 90-page document for on-time delivery to the customer. This very first document required 50 hours of touch-up by the technical staff in order to be suitable for delivery. The majority of the manual effort was to make diagrams more readable and to handle security markings. Three years later, we were able to routinely generate documents of several hundred pages, which only required two hours of touch-up by publications staff, more than a 50-fold improvement. The point here is not that specific tools caused specific problems, but rather that any large complex tool is likely to bring on similar problems. Discussions with many users of many different CASE tools over the years indicate that each organization has struggled against problems of similar difficulty, sometimes the same problems, sometimes different problems. In the first month or two of operational use, the publishing tool and the operating system would conspire to hang the print queues for hours or even days at a time. It was difficult to establish the correct set of quotas and privileges for
112
JOCK A. RADER
users so that they could execute both the publishing and structured analysis tools. Although the frequency of user problems slowly decreased, two years after initial project use system administrators were still solving user problems by adjusting quotas and privileges. Some of the problems involving quotas and permissions were so subtle and so difficult to correct that the technical experts from both the publishing tool and the structured analysis tool made the suggestion, at different times, to delete all protections from the file system in order to solve the problems. Clearly this was unacceptable for a system shared by multiple projects. Another difficulty was that the database server for one of the tools would disappear for a variety of reasons and would have to be restarted. Because this server required special system privileges to execute, it could only be restarted by the system administrator, who was not always readily available. Eventually to facilitate restarts, the CASE toolsmith was also granted system administrator’s privileges.
4.3.2 What Worked: Communication and Cooperation Stated very succinctly, the most important ingredients were communication and cooperation. There was a weekly scheduled meeting attended by all affected parties, including the lead user (lead requirements analyst), the toolsmiths, the system administratorand the CASE champion. Additionally there were numerous ad hoc meetings each week, focused on specific needs or problems. The operating philosophy of the group was: 0
0 0
We all understand the vision We all are aware of the current plan We will review the current difficulties, current status, and the current plan We will attack the difficulties andor modify the plan as appropriate in order to continue progress toward the vision
The role of the sponsor during this difficult period was very important. The sponsor provided both resources and legitimacy, as he was supposed to. In this instance, he provided appropriate funding for the toolsmiths, the system administrator, and the CASE champion so that these were not project expenses (acknowledgingthe responsibilityto help the first victim be successful). Additionally he regularly expressed his desire to see CASE use be successful to the project manager, even as he was kept informed of the considerable difficulties that were being experienced. When it came to serious deficiencies of the tools, both vendors were responsive. Both took serious problems seriously, even though solutions were not always
CASE ADOPTION: A PROCESS, NOT AN EVENT
113
immediate. For this type of problem, they gave us access to their best technical people to help find a solution, or failing that, to help find an acceptable work around. In one instance, a vendor slipped a product release delivery date because resources were shifted to solve a crucial problem.
4.3.3 Subsequent Deliveries: Months 11-28 Document deliveries were about every six months on the project. After each delivery, the CASE team and lead engineers from the project did a postmortem. Roblems were listed and for each, the group estimated the amount of manual effort that would be saved if the deficiency were removed. This helped to create a priority ordering of tasks to be performed by the CASE team before the next major milestone. In this way the amount of manual effort required to touch up an automatically generated document, dropped dramatically over the life of the project. Also, the functionality provided by the tools in support of the project was improved with each milestone. Entering data into the structured analysis model and modifying the model could be cumbersome in some instances. Over time, the CASE team provided a set of utilities to automate or assist with some of these cumbersome functions. About halfway through this first project, the CASE team began to get parttime support from a second toolsmith who became expert in the tailoring of the publishing tool. This individualpossessed detailed knowledge of other publishing tools, which helped him to quickly become valuable to the CASE team. The first toolsmith was consumed by his responsibilities with the structured analysis tool and had little time to become as expert with the publishing tool. Working together, they were able to make much more sophisticated use of the capabilities of the publishing tool in support of automatic document generation. In addition to a requirements model, a design model was captured into the tool database. A design document was then automatically generated from the design model. This document was first delivered about 12 months into the project.
4.3.4 Second and Third Projects: Months 30 and Beyond About 24 months after h t operational use, the tools were introduced onto a second project, and shortly after that onto a third project. The third project was much larger with approximately 120 software engineers. It was with the third project that manual touch-up per document was reduced to two hours. There were over 20 distinct components, each with its own requirements model on this project. Counting multiple versions of requirements documents and design documents, well over 100 deliverable documents were generated on the project.
114
JOCK A. RADER
All of the first three usages of the tailoredenhanced tool set were in the Radar Systems business unit of Hughes. After that, usage spread to other Hughes business units, and also to at least one other company that was teamed with Hughes on a contract. The CASE team from the first operational use, were also very active in the spreadto the next two projects. Eventually, the toolsmith became interested in other assignments and the position was taken over by someone else. Application engineers from the first project also became very important in carrying their expertise with use of the tools to the new projects. With 120 software engineers on the third project, the toolsmith was kept busy full-time making improvements to the tools.
5. Awareness: The First Phase Awareness here means the activity of evolving from a state of relative ignorance to a state of satisfactory knowledge. It is an activity that all of us must go through on our way to become knowledgeable or expert on any topic. The level of knowledge in this discussion must be such as to prepare you for the next phase, evaluation and selection. Depending on ultimate objectives, you have to determine what the scope of your awareness must be. Your goal might be a complete and integrated environment for 6000 engineers, or a structured analysis tool for a project of 5 engineers. Clearly the time frame and the effort are greatly different for the two cases. There are two main areas where you must go through this activity-you must understand both your organization and the available technology. The technology includes both methods technology and tools technology. We concentrate on these areas, in turn, in the following subsections. Not only must you understand these in the present but you must also develop a strong feel for how they are evolving. Methods and tools technology is changing very rapidly, and many organizations are changing rapidly too. In order to be successful, you will need to balance needs, resources, and available technology as they exist today and as they will evolve in the future. It is a clear consequence that flexibility must be an important consideration in how you proceed.
5.1 Awareness: Understand the Organization
5.1.1 Organization Topics There are many things that the team must understand about the organization and how it conducts business. At the highest level, how does the organization develop software?What is the process? In most cases a reliable process document
CASE ADOPTION: A PROCESS, NOT AN EVENT
115
will not exist. However, even where the process seems to be random and ad hoc, there are probably some regular aspects from one project to the next, e.g., document or coding standards. If your current adoption effort is of limited scope, e.g., requirements analysis only, then the team can concentrate on how the organization treats this one subprocess.
The Process Dimension. To understand the process, the team can interview staff on a cross-section of projects looking for answers to questions like, what kind of products (software artifacts) are generated for a project? What is the content of these? What is the format? How much consistency and how much variation is there across projects? The team will doubtless have some expertise in these matters, which will help to understand the information gleaned from interviews. Team members are also candidates as interviewees. What sort of internal (project staff only) reviews do projects hold? For each type of applicable review, who is expected to attend the reviews? What is the job function of the attendees?When are the reviews held? Are there entry criteria for the review to take place? What is the purpose of the review? Is a record made of questions and unresolved issues? How is follow-up on unresolved issues handled? Are there external reviews held? If so, ask the same questions as for internal reviews? If there are contract implicationsrelated to reviews, what are those implications? What methods are used across the projects? Are staff using structured analysis or structured design? If so, whose flavor? (e.g., Hatley-Pirbhai or Ward-Mellor) Are object oriented methods in use? For analysis? Or design? Or coding? What other methods are in use? Are some of the projects performing inspections? On what artifacts? Are any test methods in use? Are there special methods in use, such as for signal processing or economic simulation? How are software requirements generated? Are they generated by an external customer or by the software staff? If the software is embedded in a system, are the software requirements generated by system engineering staff? If the requirements are generated outside the software organizations, are they adequate or are they usually regenerated by the software analysts? What external influences bear on the form and content of software artifacts? When contracting with the Department of Defense, the standard 2167A is frequently applied, along with some of its companion standards for various of the documents required. Other government agencies have their own collection of standards which they want to see followed. Any internal or external customer can require that a particular set of standards be followed. The Applications Dimension. Another dimension of the organization is the application area or software domain? Are the applications real-time or
116
JOCK A. RADER
interactive or batch? Are they standalone or embedded? Do they relate to avionics or entertainment or bookkeeping or medical care? Are they mission critical? How serious are the consequences if the software fails?
Existing Platforms Dimension. The existing development environment is another important dimension. What is the range of computing platforms used?
This needs to include mainframes, minicomputers, and desktop computers. The latter are frequently divided into workstaions and P C s (meaning both IBMcompatibles and Macs). The author interviewed one project of 200 engineers that had 20 different types of desktop computers. Although 20 might be unusual for one project, most organizations are employing a heterogeneous collection of processors, which may or may not be interconnected. Moreover, the heterogeneous nature is not likely to go away for a variety of reasons. One reason is that that some platforms are preeminent with respect to a particular aspect of computing, e.g., graphics, or high performance, or communications, or reliability. If you are servicing a diverse customer base, your customers are unlikely all to be using the same platforms. Third, in many organizations, new projects are worked by coalitions of companies or units within a company, each of which might bring its own platforms to the project. The networks represent a second aspect of the physical infrastructure in the organization.How are all the relevant copies of all the processors interconnected? In most large companies today, we see standalone computers that have been connected to local area networks, and local area networks that have been interconnected into wide area networks. But the growth has been mostly from the bottom up, with network managers struggling to put some sort of top-down order on this rapid growth. Your ability to interact with another person or resource on the network can be heavily dependent on the'details of the network path to that person or resource. You may be able to send e-mail with enclosures to a person 50 miles away but be unable to include enclosures with e-mail to the person in the next office. From your PC or workstation you may be able to connect as an x-client to some servers and only as a dumb terminal to others.
The People Dimension. Another dimension is that of the people in the organization. What is the mix of experience and expertise with respect to the application domains? Do the relevant staff have any experience in the methods addressed by the application? A few knowledgeable people can facilitate both selling the adoption to the staff and helping to ease the transition. Are the staff experienced on the platforms that will host the new tools? It is not uncommon that new tools and new platforms are introduced at the same time. The team then must also address the need for operating systems expertise on the new platforms, i.e., system management, system administration,problem solving, and consulting.
CASE ADOPTION: A PROCESS, NOT AN EVENT
117
Existing Tools Dimension. There is an important dimension represented by the collection of existing tools that an organizationuses. This not only includes the standard coding tools (editors, compilers, linkers, and debuggers), but configuration management tools, project management tools, and documentation tools. Plus there might well be some existing use of modem CASE tools in the organization. If so, this can be an excellent source of lessons learned. In most software organizations, there is also a large collection of utilities, developed by the staff over the years, which are important to understand because they almost always address real problems as perceived by project members. Unique Foibles Dimension. Does the organization have any unique strengths or weaknesses? Perhaps no other company has combined payroll processing and thermal imaging in the same product. The company might build the best cash register with the worst software, or might build a mediocre cash register but compensate with unsurpassed software capabilities. There may be unique problems or constraints such as having to generate reports in 11 languages spanning three different alphabets.
5.2 Awareness: Tools to Understand the Organization There are many instruments or tools available to help you assess the current state of your organization. A few will be mentioned here which you can either employ or use to help point you to other techniques. There is not wide agreement as to the level to which you should document your current process, and it doubtless depends on may factors including the magnitude of the change that you are attempting to make. There is, however, nearly universal agreement that the process of understanding your process is valuable, with participants reporting that redundant or superfluous activities are frequently found. You would not want your first adoption of CASE to be automating a superfluous subprocess. Although some experts advocate a very extensive process documentation activity, others argue that too much detail is counterproductive. The latter argument runs that documenting to an excessive level of detail both takes a very long time, and it bums out the team. In fact, most processes, as practiced, exhibit so many exceptions that if all were documented, the documentation activity could continue indefinitely. One approach to mitigate against excessive detail is to place a limit on the schedule and manpower for this activity.
5.2.1 SEI Capability Maturity Model (CMM) The Software Engineering Institute (SEI) is a Federally Funded Research and Development Center (FFRDC) associated with Carnegie-Mellon University in Pittsburgh. It has been at the forefront for a number of years stressing the
JOCK A. RADER
importance of process maturity in the development of complex or mission critical software intensive systems. The staff at the SEI have developed a capabilities maturity model (CMM) (Paulk et al., 1993), which provides a basis to evaluate the ability of an organization to produce software. The CMM identifies a set of 18 Key Practice Areas (KPAs), such as requirements management and software configuration management. For each KPA, it defines a set of key practices and further identifies capabilitiesthat an organizationmight or might not have relevant to the KPA. Although the primary use of the CMM has been to determine the maturity level of an organization on a scale of from 1 to 5, an important use is to employ the inventory of capabilities to assess the process of an organization without worrying about a final grade. The CASE team can borrow as much or as little as makes sense to them from the lists of capabilities in order to become more aware of the process of the organization. The literature on process maturity, as espoused by the SEI, and on experience reports on process improvement activities by software organizations is extensive.
5.2.2 Questionnaire from Textbook Another source of ideas can be found in (Pressman, 1988). written as a guide for instituting software engineering technology. This book takes a slightly broader perspective in that he wishes to give guidance on how to introduce methods, whether or not tools are introduced at the same time. Whereas the focus of this chapter is specifically to introduce tools into an organization, the ultimate value of tools is to help automate methods and subprocesses. It cannot be emphasized enough that if the underlying methods and subprocesses are not already familiar to the Organization, they must be taught before tool training is provided. You will hear this message from any tool vendor (of course, they want to sell you training), but you will also hear it from any project in any company where tools have been successfully introduced. With care, tools training and methods training can be combined, but there are dangers, which will be discussed later when training is explored in more detail. Pressman’s Chapter Four is on assessment. In it he suggests 18 areas of investigation when conducting an audit and follows this up with a detailed audit questionnaire in his Appendix B. The appendix also contains a discussion of the purpose of each question and a set of inferences based on typical answers to the questions. You should look at the questions of any assessment method as suggestions to consider for your purposes.
5.3 Awareness of CASE Technology If you are truly ignorant about CASE, then you probably want to acquire some general CASE background and concentrate on tools for one or two software
CASE ADOPTION: A PROCESS, NOT AN EVENT
119
subprocesses-a sort of breadth and selected depth approach. The field of CASE technology is very complex and fast changing, so it is best to start small and grow. The goal of the awareness phase is survey the field, become aware of what the important considerations are, and to compile a list of tools relevant to your interests for more detailed examination. After you increase the number of candidates from zero to a positive number during awareness, you will narrow the broad field to a smaller number during the evaluation phase. If your ambition encompasses many tools and integrationamong tools, then you are strongly advised to first become expert with a few tools and a few point-to-point integrations. If your goal is an integrated environment with a broad range of tools, realize that it will several years to make significant progress toward that goal.
Reading the Literature. If you can remember back to when you were in high school and the feeling of bewilderment you experienced when some older person inquired as to your choice of a career, you can begin to appreciate the array of confusing claims and choices presented to the fist time CASE shopper. Reading is one way both to begin developing a general background for CASE technology and to become familiar with tools addressing a specific subprocess. An effective place to start to find your bearings is by browsing through the trade publications and the more practical technical journals. Look in your own collection of periodicals, look through recent issues at a library and browse at the book store. There will be articles on CASE in many of them plus numerous ads. Do not necessarily believe the ads but use them to get a grasp of the variety of offerings in your areas of interest and to get a feel for what the vendors think are important characteristics of their tools. Collect ads and articles in a folder. In the library you can look back over many months and years to see which tools have been around a long time and which are new. Regularly, you can find articles which provide a list of tools which address different aspects of a software project. A partial list of applicable publications are: Datamation Info World (weekly) Software IEEE Software ACM Software Engineering Notes PC World Open Systems Today American Programmer
Many of the ads invite you to send or call for a brochure or more information. In some publications a reader request card is provided so that you can make
120
JOCK A. RADER
numerous requests with one postcard. At first you can be indiscriminate as you struggle to become familiar with the offerings. Later as you begin to zero in on your interests better, you can become more selective. Sometimes the vendor in addition to sending the requested brochure will follow up with a phone call, which can give you further opportunity to gather information. Because you (the CASE team) are going to be involved with CASE a long time and the field is highly dynamic, you should plan to regularly skim through a variety of publications as long as your involvement lasts. Some of the publications are free to qualified subscribers allowing you to receive a continuos infusion of new ideas and trends at your desk at no cost. In the library you can also find articles in conference proceedings and books which discuss CASE issues. Because of the long lead time for publication, these are more likely to help you with your general CASE education rather than with the most current developments.
5.4 5.4.1
Learning from the Vendors
Vendor Demonstrations
It is very common for vendors to come into your company to do a demonstration of their product and to talk to you about both it and their company. Try to have some concept of how you would use the tool in your organization. This will greatly assist you in assessing what you see and also to suggest questions to you. If you are totally passive during the demo and let the vendor show you their tool only in its most flattering light, you miss a significant learning opportunity. Obviously, if the demo is the very first demo you have ever attended, you are at a disadvantage. But you will see many other demos including demos by this vendor again. Often, a vendor will conduct a half-day or full-day public seminar, typically in a hotel, on some aspect of their product line, that include demonstrations of the product. Usually they have both sales and technical support people at such a seminar. For a vendor in which you are interested, be sure that you are on their mailing list to be notified of such seminars. If you are interested in the functionality that you see in the demo, important questions to ask always include: Is this product shipping now? If not, when will it be shipping on my platform? How real is that date? It is not unusual to see an impressive new capability that will turn out not to actually ship to customers for another 6-18 months. A clear advantage of an in-house demo is that you have more control over what the vendor demonstrates and you are better able to ask questions. On the other hand, the public demo is a low-pressure way to get an update on what the vendor is doing. And if you do not represent the possibility of large sales, public demos may be your best access. Even when you are currently using a vendor’s
CASE ADOPTION: A PROCESS, NOT AN EVENT
121
products, it is useful to see what improvements they are pitching. It also makes sense to attend occasional demos by competitors of your vendor, as at some point you may wish to switch. And it may give you ideas for new functions that you would like to see included in the product you are currently using. During a demo, see if you can drive the mouse for a while. This helps you focus on exactly what steps must be followed to accomplish something with the tool. It also gives you the latitude to explore use of the tool that you might not see were you to remain purely an observer. Don’t be hesitant to ask questions, and to be probing and critical with your questions. Many issues that are critical to operational use of the product will not be mentioned by the vendor. It is your responsibility to learn to ask the right questions. Whatever tools you ultimately pick will have some deficiencies that you will have to work around. (If you were a salesperson, would you be mentioning them?) You will find that when you ask detailed technical questions, the vendor’s technical support people are more likely than their sales people to give you an accurate and useful answer. Frequently, if the question is detailed enough, the salesperson cannot answer anyway. Asking the technical person explicitly for their personal response is a technique you may find productive. Sometimes it is useful to visit the vendor at their offices. Such a visit can take many forms and involve discussions, demos, training, hands-on experimentation, or any combination. Discussions might include product descriptions, detailed technical discussions with developers, strategic direction of the vendor, stability of the vendor, or product pricing. It is usually a good opportunity to begin establishing relationships that will be important later if you decide to do business with this vendor.
Learning by Doing: Evaluation Copies. If some evaluation is done in the awareness phase, it will be at a superficial level, perhaps a few days per tool. You do not need hands on for each tool. However, you do need to have some experience at actually using tools. Many of the features of a capable tool are meaningless when you have no experience entering data or modifying it. The team also needs to gain beginning understanding of usability concerns when performing consistency checks or generating reports. 5.4.2 Aside about Vendors The majority of your early contacts with a vendor are going to be through their marketing organization, whose staff are usually paid on the basis of the number of licenses sold, not on their accuracy in pointing out to you the foibles and shortcomings of their products. It is much more likely that you will lose sight of this reality than that they will. Therefore, you must be very skeptical
122
JOCK A. RADER
about what you hear and even what you see. In this short discussion, we will point out a some common areas of misunderstanding. This is not an attack on vendors. From whom else are we going to acquire the products we want? The author has met scores of vendor staff in the last 6 years and has respect for the abilities and integrity of virtually all he has met. There are some important questions to ask after listening to a presentation or watching a demo on new capabilities or products. The product you watched while it was demonstrated today, may not experience first customer ship for another 6-18 months. What is the date for first shipment to customers? For which platforms? When will you will ship for my platforms? Are these dates estimates? or commitments we can get in writing?
Even if the product is shipping today on your platform, do not assume that there has ever been an operational user. Even if you are told that Acme Widgets has made the tool their corporate standard and made a major buy, do not assume that Acme Widgets has any copies in operational use, or even any in organized experimental use. There is even a chance that Acme Widgets has yet to install any copies, or finalize the purchase order. We will not explore the wisdom of Acme Widgets corporate commitment here. However, examples of uninformed adoption are not unusual. The questions you should ask are: Are there any operational (or experimental) users? Can you please give a few names and phone numbers. Are any close to my facility?
Do follow through when you get names. It is possible that you will have to ask several times to receive any names and that the majority of names that you eventually receive will not lead to a user with the experience that you would like. The author took part in a study at the SEI, documented in (Rader, 1991). in which a team of investigators spent six months, with very limited success, trying to run down leads from vendors for operational use of particular products. The more complex or ambitious the product the larger the difficulty. When you do get through to a possible user, ask: Do you have the tool in operational use on a real project? Who is the customer for the project? Can you tell me the name of the project? What work products and deliverables are you generating with help of the tool?
And if your interest is keen enough, Would you be willing to sit down with us and show us in more detail how you are using the tool?
CASE ADOPTION: A PROCESS, NOT AN EVENT
123
5.5 Other Learning Avenues Trade shows are a p a t place to browse, to make contacts, to become aware of new products, and to acquire brochures (not to mention plastic bags to cany them in). If the show is not too crowded, you can do extended comparison shopping of directly competing tools, asking questions like: What makes your tool better than your competitors? Whom do you see as your competitors? I liked the global optimize feature of Vendor A’s tool. Are you planning to offer that feature?
The purpose of the questions is reduce your ignoranceand to gather information. It is easy to look at a demo of the global optimize feature and think, “very impressive.” But you need to ask yourself, “just exactly what would I do with it?” Ask the vendor, ask their competitors, and ask other attendees at the booth who look less confused than you do. Occasionally the answers will greatly expand your understanding of the issues and reveal to you considerations, of which you were previously unaware. Sometimes you can get contact information for a current user or even meet them there at the show.
5.5.1 Reuse Other People‘s Experience (ROPE) At this point you will probably benefit from other users experience, whether that use range from operational, experimental, or just an evaluation. Because in all cases, they have more experience than you do. If possible, visit their offices with questions like: What reports, documents, artifacts does the tool help generate? Can you show me samples? How did you choose this tool? Can I get a copy of an evaluation report? Can you walk through how the tool is used? How helpful has the vendor been? Which tool would you choose, if choosing today? What financial details can you share?
The literature of benchmarking (e.g., Camp, 1989) can help you prepare for such visits. Users in your own company will generally be easier to get to, may be more inclined to spend time with you and to give you copies of sample reports or evaluations. They are easier to follow up with later for more detail, to answer questions that didn’t occur to you at the time (frequently because you didn’t know enough to ask). Also you might want to hire some number of hours of consulting down the road. It may not be possible to arrange consulting from employees of another company.
124
JOCK A. RADER
Still, users from other companies will usually talk to you over the phone and might well host a short visit. At this stage, you may have little to offer them in return in the way of CASE experience. You should be prepared to share what you know, however, and to talk about what your plans and objectives are.
5.5.2 Seminars, Tutorials, and Conferences Education on topics in software engineering are regularly offered in public venues under a variety of names, lectures ( one to three hours), tutorials (halfday to full day), seminars (two to three days), and short courses (two to five days). The time ranges are just rough estimates. Many of these courses focus on CASE, wholly or in part. This chapter has its origin in a CASE seminar, wherein adoption is one of the major foci. Public presentations can be very useful when there is a match between your need, the material, and the presenters. When the presentations will exceed half a day, the advertising brochure usually provides a detailed syllabus and a short biography of the presenters to help you decide whether there is a match. Presenters knowledge, choice of topic, and lecture skill vary considerably. In addition to the presentation, you have the opportunity to ask questions and to hear the questions and concerns of other attendees. During breaks you can chat with the other delegates or the presenter. The formal presentation is not always the most valuable learning experience. Computer and software conferences vary widely in scope and can draw anywhere from fifty to fifty thousand participants. Both of the journals, ZEEE Computer and the Communicurions of the ACM, provide miniabstracts on upcoming software conferences in their back pages every month. Each issue also carries several ads for conferences, sometimes with program details. Conference announcements also appear in almost all of the trade magazines. It is common now for a day of tutorials to precede the conference proper. For large conferences more than a dozen tutorials may be offered. Conference tutorials are usually priced lower than tutorials not associated with a conference and can represent good value. High-priced speakers frequently give tutorials at conferences as a way of prospecting. Some conferences are organized solely for technical interchange and others are organized for the profit of the organizers. A rule of thumb, suggested by Marv Zelkowitz at the University of Maryland, is that the former usually charge less than $12O/day and the latter charge more than $250/day. There are advantages and disadvantages to both types. At the former, the technical content will be more uniformly high, as papers will have been refereed; but many of the papers may fail to address the major interests of practitioners. At the latter, there will probably be more talks based on practical experience; but some presentations
CASE ADOPTION: A PROCESS, NOT AN EVENT
125
could be of questionable quality, or might be sales presentations posing as technical talks. If you are on travel for any of the above, evenings provide an extended opportunity to network, even while enjoying some of the attractions of the travel site. At conferences, there are often evening activities directed at networking, e.g., receptions and dinners; or more directly focused on a specific topic, i.e., birds of a feather (BOF) sessions and special interest group (SIG)meetings. On bulletin boards at the conference or in the conference schedule, you frequently see notices of meeting times and places for BOFs and SIGs. A SIG normally has activity outside the conference and frequently will have meetings more often than just at the conference. Or all the activity may be by phone, e-mail, fax, and postal service mail. A BOF is less formal and more ad hoc. It may also attract more passionate individuals.
6. Evaluation and Selection Evaluation can be considered a sequence of evaluation filters, which successively reduce the number of candidate tools to be considered for a particular purpose. In this section we will talk about the philosophy of evaluation and suggest some types of filter criteria that have been found to be useful for CASE tool evaluation. More detailed discussions of the mechanics of evaluation or lists of CASE tool evaluation criteria can be found in Pressman (1988), Boehm (1981). Firth et al. (1987), and IEEE (1994).
6.1 A Three-Filter Approach to CASE Evaluation
The evaluation and selection phase follows the awareness phase, and can be viewed as a sequence of filters as illustrated in Fig. 2. We will talk in terms of
D Winnow
Detnild Evaluation
(CON**)
(Medium)
Firot Victim (Fln.1
FIG. 2. Filters in adoption process.
126
JOCK A. RADER
threefilters because it is conceptuallyconvenient. You may find for your situation that it is more desirable to combine or to split filters. As we move from left to right, each filter requires more resources and more schedule to operate. Therefore, the early filters exist to reduce the load at the subsequent filters. The filters can be summarized: 1. Elimination of unsuitable candidates 2. In-depth evaluation 3. First operational use
In a typical scenario, the first filter might reduce 11 candidate tools to 2 or 3, and the second filter further reduce the field to one candidate. The third filter is not recognized as such by many organizations, but certainly should be. Common sense dictates that after first operational use, you should consult your lessons learned to decide whether it makes sense to expand the use of a tool to successive projects. If not, the third filter has reduced the number of candidates to zero. At that point, you would probably want to reconsider the results of applying the second filter to select another product for use on the next project. A similar approach is discussed in Lydon (1994), where they have institutionalized the evaluation and selection process based on a CMM level 5 KPA (Paulk et al., 1993). He describes a typical scenario where as many as six candidates would survive the first filter, and two or three would survive the second. This means evaluation of two or three candidates in operational use on real projects. It also means detailed evaluation of six tools. Such an approach implies a high degree of sophistication with respect to knowledge of CASE, of technological change, and of organizational change. Tools exist to support process and methods, a philosophy that is explored further with the introduction of Computer-AidedSubProcess (CASPs) in the next section. Because individual tools do not support all of the activities on a software project, it is convenient to partition the complete process for all technical and management activities on a project into process fragments, which we will call subprocesses. In Table III is provided a partial list of project subprocesses. When you evaluate a tool, your ultimate goal is to assess its suitability and value to provide automated support to one or more of the subprocesses. Consequently, your evaluation filters should be highly cognizant of the ultimate goals. The number of candidates that you will find to support a subprocess will vary, depending on the subprocess. For requirements analysis, you could probably find upwards of 20 candidates, but for reverse engineering of Jovial code you may only be able to find 1 or 2. The first filter is pretty coarse and should require little energy to operate. However, if you have little experience with CASE, you will have spent quite a bit of effort during the awareness phase, determining the starting set for this
CASE ADOPTION: A PROCESS, NOT AN EVENT
127
TABLE 111 SomARe PRom suBPRocEssEs
Software requirements analysis Architecture design Detailed design Coding Unit test System test Configuration management Status tracking Project planning Defect tracking Reverse engineering
filter. You will also spend more time doing the evaluation due to your lack of experience. If your awareness research for tools to support a given subprocess reveals a large list, you might wish to separate this filter into a prefilter and a main filter. With a process such as requirements analysis, a prefilter with the following yes-no criteria might reduce a candidate set from more than 20 tools to about 8. 0 0
Supports Hatley-Pirbhai real-time structured analysis Supports multiple simultaneous users on a network
The main filter would then consider a larger set of criteria on a smaller set of candidates. Your selection values for this filter would probably be gathered from a combination of vendor brochures, the CASE literature, interviews with existing users, questions answered by vendors, and observation of demos. Any hands-on experience would be limited to a small subset of the candidates. Lydon refers to his first filter as a “paper filter,” because knowledge gathered is predominantly based on written communication, rather than on experience through use.
6.2 Activities Associated with an Evaluation Filter A set of evaluation activitiesis presented in Fig. 3. There are threein preparation for the evaluation activity itself and then four postevaluation activities. In preparation, you must develop a set of evaluation criteria and an evaluation strategy. If tool execution is involved, you must also obtain evaluation copies and find suitable hosts for your evaluation effort. One of the following subsectionsdiscusses aspects of choosing technical criteria, and is followed by a subsection on cost criteria. After evaluation is completed, analyzing the results and making a selection is the most obvious activity. The evaluation results and the selection should be
128
JOCK A. RADER
Evaluation
Evaluation Petform Evaluation
Evaluation
Lereonr
FIG.3. Activities in evaluation and select phase.
documented in an evaluation report for each filter that you operate. But you should also generate a lessons learned report, including sections on implications for operational use report. The latter is an important input to developing the support plan in the first victim phase. Avoid either extreme of evaluation formality when making a selection or down selection (reducing the number of candidates). One extreme is to ignore any list of evaluation criteria and to select on the basis of intuition after doing the evaluation. The other extreme is to rigorously assign values for each criterion for each tool, and to blindly select the “passed” candidates solely on the basis of their total scores. The scores are only a guide. Remember that much is arbitrary about the choice of criteria, the range of values for each criterion, and the weight for each criterion. And the raw scores assigned as a result of the evaluation can only be approximate for most criteria. Therefore, to make the evaluation report useful, you should be generous in providing explanations of the meaning on the raw data for criteria where that is appropriate. A score of 0-10 cannot provide much insight to criteria such as, “satisfies the chocolate flavor of object oriented design” or “is portable.” A discussion of rationale for the down-selection choices should also be included. You will find many reasons to refer to this report, perhaps, at some point in the future, when things are not going well during some subsequent filter. This may be to remind you of particular results or to verify that your more detailed evaluation yields different values for some criteria than was the case earlier. It is also possible that your management or your first victim, or both, one day may
CASE ADOPTION: A PROCESS, NOT AN EVENT
question (perhaps with intensity) your selections. Finally, other parts of your organization will hear about your activities and will ask for copies of any such reports to assist them in their decision processes.
6.3 Filter Two: Detailed Evaluation The number one criterion has to be “fit for use.” You must be able to answer the questions: Is it possible and plausible to use this tool (set of tools) on a real project? If not, how much tailoring and enhancement must be done in order to make it fit for use? Does the tool readily support the necessary tailoring and enhancement?
Many of your detailed criteria will support answering the first question, but the whole question is bigger than the sum of its individual criteria and must be answered separately. Moreover, the need to answer it will help shape your evaluation strategy and greatly affect mid-course adjustments in the evaluation process.
6.3.1 Evaluation Objectives and Strategy Choices Let us suppose that we enter this phase with seven candidates. Our goal is to decide two things: Which of the seven tools is fit for use on real projects? What is the relative ranking of the seven tools?
In the evaluation you are trying to judge fitness for use on operational projects and therefore you want to duplicate real project conditions as best you can. You have a variety of choices as to how to achieve this, including to shadow existing work to apply to an internal research project; to make up a project, or to apply to small piece of real project. There are relative advantages and disadvantages to each approach. Whichever approach(es) you choose, you must remember to make the test as real as possible. In particular, you can not change realistic constraintsjust to suit the tools unless you intend to change those constraints in your real environment. Still you can limit the amount of application work that you do. If your test calls for you to generate part of a deliverable document and that document is typically 500 pages on a real project, you do not have to generate the entire 500 pages. You do need to verify that you can generatethe full breadth of types of components in the document, and you have to verify that you can generate the full depth of detail required.But you don’t have to generate 11 instances of the samecomponent type in full detail. However, you do need to verify that the tools will handle 500 pages and up to whatever limit seems reasonable to you.
130
JOCK A. RADER
When you shadow existing work, you try to duplicate the generation, in whole or part of one or more artifacts. Usually you do not want to duplicate existing documents exactly, because you improving the quality of the artifacts as well as the process. Therefore you first design a better format and map the existing artifact into this format. Then you figure out how to automatically generate the artifact in its new format using the tools. The example below helps to illustrate. When applying the tools to a research project, you must take great care to force that the tools get used in a realistic manner. Research projects normally have very loose constraints, which are not characteristic of real projects. You then have to enforcea higher degree of rigor in order to get the results you want. But this approach does have the advantagethat staff other than the CASE team will be using the tools. Even though research users may be different than operational project users, they will provide insights not likely from CASE team users only. If you make up a project, you have a great deal of control, but it is very easy to make up the project in such a way as to only lightly stress the tools. After all, you have been steeping yourself in use of the tools and the methods that they support. It is all too easy to easy to design the project to nicely fit with the tools. This approach also requires that you do quite a bit of engineering. Chances are good that you will scrimp on the effort, probably failing to exercise the tools in important ways. You could use a detailed example from a text on the methods, which would give you a fair amount of detail to work with. Still, you must very carefully work through the example and apply good engineering effort to upgrading the example. After comparing notes with many instructors of methods classes, it is clear that the engineering quality of this type of example is very low. When a real project is your guinea pig, your evaluation will benefit from the reactions of real users and the stresses of real problems with real constraints. However, your freedom to experiment is severely hampered unless you have the human resources to experiment in parallel with supporting the real thing. There is also the risk that you are not fully prepared to support use on a real project. You might have to abandon the project with a loss in credibility, and you might even cause damage to the project which would deal a disastrous blow to your credibility.
6.3.2 A Shadowing Example Let us suppose that on a recent project you had to generate a software requirements specification for customer review as one of the project artifacts. Further suppose that you are evaluating tools which support real-time structured analysis, as extended in (Hatley and Pirbhai, 1987) from the original (DeMarco, 1978). Your goal now is to generate an equivalent spec from a structured analysis model contained in a tool database.
CASE ADOPTION: A PROCESS, NOT AN EVENT
131
From the perspective of a potential user, the tasks are: (1) use the tool to enter the appropriate structured analysis model into the tool database, and (2) run the utility which automaticallygeneratesthe specification.For a real user there would be several training courses on methods use and tools use to complete, as will be discussed in the first victim section. The CASE team will emulate real users for the shadow effort and they will be developing the disciplines needed by users. There are a number of tasks which must be completed before a user could effectively use the tools. The CASE team needs to map the existing requirements into a structured analysis model, and then map the model back into a document with the correct format. This may well result in a significant reordering of the information in the new document from that in the original. There will likely be explanatory text in the original document for which there is no obvious home in a structured analysis model. If the text is associated with a structured analysis (SA) object (data flow diagram, pspec, or data flow), it can be attached to that object as some form of named annotation with most SA tools. Text not associated with any SA object must be stored somewhere else-possibly in flat text files or attached to the context diagram. The team will doubtless go through many iterations modifying the two mappings and the tailoring of the format of the target document before a satisfactory approach is found. It must then provide a capability to generate the spec automatically from the database and possibly some separate files. Hopefully the capability will mostly be based on vendor supplied functionality. Many vendors claim to provide a utility which automatically generates specs which follow well know documentation standards (e.g., the standard for a Department of Defense 2167A Software Requirements Specification). However, the COTS utilities for this purpose, as delivered, tend to generate documents greatly deficient in content and format. This implies that you will have to tailor and extend the utility in order to eliminate the deficiencies. Some of the extensions will parallel the extensions you made to the method when you decided to attach certain notations to selected objects in the model. The extended documentation generation utility must know how to find these notations and know how to map them into the target document. Finally, in your evaluation, you have to struggle with actually entering all necessary data. You must approach this activity seriously, as you will doubtless discover many interesting characteristics of the tool and your extensions. Your ultimate users will want to know about those which will reduce their labor burden. They will also want to know about the ones that might increase that burden, especially the ones with drastic consequences. Not only must you thoroughly investigate data entry, both diagrams and text, you also have to investigate changing the data in the tool database. On any real project, there will be a desire for global changes (e.g., updating a flow name
132
JOCK A. RADER
everywhere that it occurs) as well as local changes. For larger projects, the CASE team will probably be asked to provide utilities for certain kinds of updates or consistency checks that are not part of the vendor functionality.
6.4 Technical Criteria 6.4.1 Platforms and Portability Your platform criteria will vary greatly depending on the size of your projects and the nature of your customer base. Tools which run on PCs frequently only support a single user and hence are mostly suitable for small projects, where a unique resource need be shared by only a few individuals. Personal computerbased tools are often much cheaper than their workstation-based counterparts. Tools which allow multiple users to simultaneously access their database are mostly workstation based, and are designed as cliendserver applications. This allows for instance, physically distributed users to be modifying, simultaneously, different diagrams in the same structured analysis model. A server can typically service 10 to 20 users with adequate performance. Some tools are hosted on both workstations and PCs. If you are considering using a tool in this way, you must verify that adequate interoperability exists between the different platform versions of the tool. Sometimes a vendor’s PCbased tool is completely unlike their workstation based tool, with different features and a different database. Your customer may specify a platform on which you are to deliver the software and the tool databases as well. If you work with many customers, you will probably be forced to deal with different platforms. In this case, portability of tools that you buy will be very important. When evaluating, you need to be very specific about what versions of what tools run on what versions of what operating systems on what processor, and as of what date. For example, not only does UNIX vary from vendor to vendor, but it can also vary from processor to processor for a given vendor, or multiple versions may be supported for a single processor. Another portability issue relates to the use of a PC as an X-terminal to a workstation-based tool. This is an appealing approach to many managers because there are already PCs on so many desktops. If you are considering such an approach, be sure to exercise this capability in your evaluations. You will probably find that a PC X-terminal is adequate for some tool activities but deficient for others. For some activities, one limitation is bound to be screen size. This can be allayed if desired by installing a larger monitor. If your PC is a Mac, you will likely find the standard one-button mouse a drawback. Not all workstation tools are friendly with their X support. You may find that important parts of a window are irretrievably off the screen of your X-terminal, making use of the X-terminal difficult or impossible.
CASE ADOPTION: A PROCESS, NOT AN EVENT
133
6.4.2 Tailor, Enhance, Integrate If you adopt a tool and put it into operational use, it is virtually certain that you will want to tailor the tool, to enhance it, and to integrate it with other tools in your environment. Therefore,it is important that you explore these areas in your evaluation. Most vendors provide these capabilities in some measure. However, organizations,with significantCASE tool experience, have found that operational quality use of these capabilities comes only after long and arduous experimentation. Therefore, it is incumbent on you to struggle with them during your evaluation, in order to better understand what you might have to do to prepare for operational use. Many tools store all or part of their data in a database, which may be a commercial product such as Oracle or Sybase, or it may be a proprietary product of the vendor. The latter is common for reasons of performance. Generally, the use of a popular commercial database is a plus, because the database vendor will port the database to many different platforms, easing the job of the tool vendor to follow suit. However, if that database in not available on a platform in which you are interested, portability to that platform could be a serious problem for the tool. The application programming interface (API) is critically important for purposes of integration and often for enhancement as well. The API is a library of procedures that can be invoked from one or more languages and allows you access to the data and the functions of the tool. The most common API bindings are provided for the C programming language. Use of APIs is complex and requires considerable effort to master. Vendors that offer an API usually offer training in its use. Because APIs are known to be good, in the manner of “motherhood and apple pie,” a vendor whose tool is without API may seek to convince you that they provide all that you really need. One such pseudo-API is to provide you with the knowledge of how the vendor maps into the database the logical model you manipulate with the tool. Then you can get at whatever you want by dealing directly with the database. This is cumbersome at best and very, very dangerous at worst. If your use of this pseudoAPI involves updating the logical model, you have accepted responsibility for guaranteeing the integrity of that model as a result of your changes. Also, if the vendor should alter the mapping or change databases at some point in the future, your changes could produce unpredictable results. Adherence to standards is another indicator of openness, which leads vendors to advertise the standards which they follow. If a particular standard is important to you, you need to investigate in your evaluationsthe degree to which adherence to the standard meets your requirements. Unfortunately, tool-related standards provide more of a quagmire than of a stable environment. The existing set contains overlapping and competing standards, many of which are incomplete or unproven. This provides a fuzzy and shifting target for the vendors to interface to.
134
JOCK A. RADER
6.4.3 Tool Features Most tools provide a competent windows environment that behaves like you would expect it to, but there are exceptions, sometimes with a relatively popular products. When you select a child object on a diagram and request the display of the child diagram, is the parent diagram still visible? Some tools replace the parent with the child in the same window. Others open a new window so that child and parent are both visible. The latter approach allows you to see the context for the child diagram which can be very helpful. Do you have to fire up a second copy of the tool to be able to view both the parent and the child diagrams at the same time? This would cause you to spend tokens for two licenses to do the job of one. Are multiple windows supported? The answer is almost certainly yes, but you should check. Some older tools have a limited view of windows. Do you first select an object and then the operation, or do you select an operation first and then apply it to the object? The first approach seems more natural to most people and is more consistent with object-oriented concepts. Does the tool adhere to the particular window protocol that you prefer or that you think will become the standard? Navigation between objects is an important characteristic for ease of tool use. The better tools give you several ways to navigate from a given object to other objects in the system. Can you get from a given diagram to a child or parent diagram in a natural way with only a few mouse actions? With some tools. if you double-click on an object in one diagram, a second window opens containing the representation of the
child object. Can you quickly bring up a list of all objects of a given type, and use that to quickly navigate to a specific object? The ability to apply notes or annotations to objects is an extremely useful and important capability. Such notations provide the underlying functionality for all sorts of enhancements and extensions. You should make sure that such a capability exists and make sure that you understand the rules for creating notes.
Do all of the textual editors behave in the same way? There will probably be several object types supported by the tool (e.g., pspec’s, data dictionary definitions, notations, etc.) which are textual in nature. The question of method(s) supported by the tool is extremely complex. For this purpose, we will say a method is defined by the objects it defines and the verifications it performs. We assume that you have already chosen the method(s) that you intend to use. Wood er al. (1988) and Wood and Wood (1989) provide useful ideas for methods evaluation.
CASE ADOPTION: A PROCESS, NOT AN EVENT
Are editors provided to define and modify all of the necessary objects? How convenient are the editors to use? What objects or pieces of objects does the tool automatically generate? Are all necessary consistency and completeness checks performed?
The ability to generate documents and reports is needed. Reports are frequently in the form of lists or tables. Documents are more complex than reports and usually much more difficult to produce. How easily can you generate a report listing all objects of a given type along with selected attributes of the objects? How easily can you generate a report of all objects of a given type which interact with a specific object of interest? Is a general document generation capability provided? Does this capability allow you to include all of the objects from a logical model in an order specified by you?
The ability to back up and restore your tool database is critical. What back up and restore functionality is provided? Are all of the expected functions included? Doses a user need special privileges in order to run backup or restore?
6.5 Cost: Always an important Selection Factor The cost of purchasing and maintaining CASE licenses is a very complicated topic, especially for workstation-based products. When you are quoted a price there are numerous questions that you need answers to. Suppose you are quoted a price of $77,000 for 11 copies of a tool that supports structured analysis. You need to explore the following issues. This is a representative set, not a complete one. But it should help you adopt the right frame of skepticism, so that you can add your own questions. Exactly which methods are supported [partially/completely]for this cost? (Perhaps real-time extensions are not included. Or some of the consistency checking. Or the document generation capability.) Are licenses node-locked or floating or user-based? (Discussed in detail in the next subsection.) Is the data base server extra? (Neededfor tool operation and may cost an additional $15,000 or more.) How many clients on one server? (Maybe up to 50, except response is temble above 10. So you need two servers.) Is installation included? (If you need help with installation, that is perhaps $1500 per day.) Which utilities are included? (Maybe report generation, document production and database backups are priced separately.)
136
JOCK A. RADER
How much is training? (Perhaps you get one day of training included. The five day courses in basic tool use, advanced tool use, and tool administration are $1500 per student per course.) How much is maintenance? (For CASE tools, this is typically around 15% per annum of the purchase price. For this fee you get a hotline number and product upgrades. You need to verify that product upgrades are included.) What upgrades does this include? (Sometimes the upgrade you need most is only offered as part of a new product, which you will have to purchase separately.) What is the bug repair policy? (You should ask existing users their perception of this issue as well as the vendor. Known bugs can sometimes flourish for years, unless someone agrees to pay for their elimination.) Is a toll-free number provided? (Phone calls across the country during business hours are expensive.) How much is consulting? Will you get an expert or merely someone who barely knows more than you do about the product? (Free technical support during presales tends to evaporate once purchases have been made. Ask how long your consultant has been with the company. If you have good experience with one of their staff, ask for them by name. Try to talk to existing users.) What options are available or can be negotiated? (More is negotiable than the price list indicates.)
6.5.1 Licensing Alternatives There seem to be four basic types of licensing agreements, listed here with the most flexible first: 0
0 0
Site Floating User-locked Node-locked
A site license allows unlimited use at a site. The definition of the scope of a site could vary from a single building to an entire Fortune 500 company. The license fee would be proportional to the maximum number of possible simultaneoususers. A floating license can be used by any user on any workstation, with up to a specified number of simultaneous users allowed. If you purchase five floating licenses and all five are in use, attempts by new users to gain access to the tool will be rejected until one of the active licenses is released by its current user. The mobility of the floating is an important benefit with this type of license. For some products, floating is restricted to a single database server or single file server. With others, a license can float all over your network spanning facilities that are geographically dispersed. A user-locked license can be used by a specific user on any workstation. This is significantly less convenient than floating, but could be more convenient than
CASE ADOPTION: A PROCESS, NOT AN EVENT
137
node-locked for certain situations. The big drawback is that every potential user must have a license even if they will only make use of it every third month for five minutes. Essentially, this is a floating license for a single user and therefore has the same floating issue as for the regular floating license. A node-locked license can be used by anyone but is tied to a specific workstation. This can be highly inconvenient. Suppose you have 10 workstations on a network and you know that you will never need more than two simultaneous users of the C compiler. It is easy to find yourself in the situation where both workstations with C licenses are occupied by engineers performing other functions. This becomes even worse if there is a second tool for which only two licenses are needed, and there is a need to sometimes use both this tool and the C compiler on the same workstation. Moreover, if one of those workstations is down, the problem is exacerbated.
Example. Suppose you have a project with 100 people and 45 workstations. Further suppose that seldom would there be more than 12 simultaneous users of the Shazam tool, and that some people use it daily while everyone uses it at least half an hour per month. Which of the following is better for this project? 0 0 0 0
20 fixed licenses at $1700 per copy 60 user licenses at $550 per copy 12 floating licenses at $2900 per copy 1 site license at $63,500copy
To do a thorough cost analysis, you have to make some assumptions about usage patterns. You should probably consider several different scenarios so as to gauge the sensitivity of total license costs to shifts in usage patterns. You also should take convenience, as well as cost, into consideration.
7. Supporting First Operational Use We separate the first operational use phase into three subphases. Before discussing the subphases, we cover the choice of the first victim and the concept of CASPs, an approach to integration of process, methods, and tools. 1. Preparation 2. Training 3. Support and evolution
7.1
Choice of First Victim
Eventually some project must be designated to make first operational use of the tools. This is a position of potentially high risk, hence the title “first victim.”
138
JOCK A. RADER
Most software professionals, at one time or another, have srruggled with use of version 1.0 of some software product and have fought with its immaturity. Even if the tools being adopted are relatively mature, their operational use in your environment is immature. In all instances of which the author is aware, when CASE has been introduced for use by a project team, a good deal of tailoring and enhancing has been required. These additions are certainly bound to be immature. The choice of the first project is very important. It must be chosen so that given all the resources at hand, including the sponsor, the CASE team, the,project staff, and the customer, the project will not fail because of the new technology. Thus, the project should not be too large nor have too ambitious a schedule. And the CASE team must be sufficiently large so that adequate preparation, training, and consulting can be done. To emphasize the point, adoption by the first victim must be successful! Therefore it becomes the responsibility of the sponsor and the team to make sure that there is no failure. Guaranteeing success becomes pretty much a matter of providing resources. The CASE team must be willing to move as much of their attention to project support as necessary, which includes training, consulting, hand holding, tutoring, troubleshooting. and modifying the environment. This may well postpone other planned activities of the CASE team. The sponsor can also grant some leeway with respect to project milestones, where that is within the sponsor’s discretion. Ideally the customer will also be supportive of the adoption and thus show some measure of support or understanding should difficulties arise. The success of the adoption will be greater if a willing victim can be found, i.e., a project whose management and staff are interested in attempting to use the new technology. In return for their agreement to be a guinea pig, the project has the right to expect that they will receive plentiful support from the CASE team, and that they will be involved in determining the direction of the tool evolution on the project. It is also extremely important to the CASE team that first project use be successful. So both the project staff and the CASE team have a shared interest in a successful outcome. Within the project, some staff will be more receptive to the new technology than others. Borrowing the words of an AT&T manager who introduced CASE into his organization, “work with the believers.” Identify project staff who are supportive of the new technology and get them involved with the team. Research in the discipline of organizational change indicates that there is a significant spread in different individual’s attitude and receptivity to change, with about one person in six inclined to be an early adopter.
7.2 Overview of CASPs (Computer Aided SubProcesses) Here we will introduce the concept of a CASP, which will help us focus on the work to be done. In our illustrative example of a CASP, software requirements
CASE ADOPTION: A PROCESS, NOT AN EVENT
139
analysis is the process fragment or subprocess in question. In addition to the subprocess, the CASP consists of the methods and tools which will support requirements analysis on the project. In the erratic evolution of software engineering,process and tools all too often have followed independent paths. In order to be more useful, they need to be support one another. To be effective, a process will employ tools to promote consistency and completeness,and to provide assistance with many of the routine actions required by the process. Tools, in order to be effective,must help automate tasks that need to be done-portions of a project’s methods and process. Very frequently a process group fails to consider what tools are available or feasible, and thus is ignorant of the characteristics of an important source of building materials for an operationalprocess. The perspectivethat process definition is at too high level to be concerned with tools is flawed. An architect does not design a house without a knowledge of two-by-fours or of current plumbing components. In Hughes Aircraft we have introduced CASPs as a way of integrating process, methods, and tools. A specific subprocess is always the root of a given CASP. The root subprocess is then supported by methods services, which in turn are supported by tools services. Some subprocessesare phase related such as requirements analysis, design, and subsystem integration. Other subprocessesspan several life-cycle phases, such as configuration management, process enactment, and project management The CASP concept finds its roots in the services model approach of the NISTBCMA reference model (NISTBCMA, 1991) and in work done at the Software Engineering Institute on process, methods, and mechanisms (Brown et al., 1991). A sample CASP is illustrated in Fig. 4, where the root subprocess is requirements analysis. In the example, the subprocessuses the Hatley-Pirbhai structured analysis method. For a given project using the Hatley-Pirbhai method, it is likely that the Hatley-Pirbhai method has been extended and tailored via a set of project procedures to meet the specific needs of the project. The project procedures are a second method used by the subprocess. Note that the Hatley-Pirbhai method is used both by the subprocess and by one of the other methods, the project procedures. Automated tools support methods, making it easier to capture data, manipulate data, verify data, and in some cases to generate data. A given method may or may not be supported by tool services. It is possible to practice the Hatley-Pirbhai method without any automated tool support, but maintaining consistency for a project of any size would be nearly impossible. Most subprocesses and methods on operational projects are supported by one or more tools. In the example, three tools are identified: a COTS tool which supports structured analysis, a COTS desktop publisher, and a set of home-grown off the shelf (HOTS) tailorhgs and extensions to the commercial tools.
140
JOCK A. RADER
r
7
Requirements Analysis CASP Each level contains a set of services used at the next higher level or at the same level. E.g., the Hatley-Pirbhai COTS method is a service used by the subprocess above it. It Is also used by the HOTS requirements method at the same level.
Hatley-Pirbhai (H-P)
COTS H-P Tool *COTS Publishing Tool HOTS Enhanced Document Generation HOTS Utilities
Fro. 4. Components of a requirements analysis CASP.
To document a CASP, you document the subprocess, the methods, and the tools. Subprocess documentation is a high-level definition like a policy statement or a practice. Methods are lower level and much more detailed, describing how policy is to be enacted. In this example, the Hatley-Pirbhai method could be documented by reference to the literature. The project procedures would have to be explicitly documented by the project and the CASE team. At the tools level, some tools would be documented by reference to the vendors documentation, and the home-grown set would be documented by the CASE team. The concept of alignment is critical to CASPs. Process goals and tool use are often in misalignment on software projects, which can result in abandoning either the process or the tools. The misalignment is caused by a lack of communication, and a resulting lack of cooperation, between process groups and tools groups. A CASP is a subprocess which is in alignment with supporting sets of methods and tools. To evolve a useful CASP, practice writers, method makers, and toolsmiths must work cooperatively and iteratively. A CASP, so constructed, can be reused as a component in many different processes.
7.3 Preparation for the First Victim Building a CASE infrastructureis one major component of preparing to support first operational use and the first victim. The knowledge and the skills developed by the CASE team during the evaluation phase are important for training and
CASE ADOPTION: A PROCESS, NOT AN EVENT
141
support, but they will need to be extended and focused to support use on a real project. Let us suppose that the first victim has agreed to generate their software requirements with a structured analysis tool, and expects to automaticallygenerate their deliverable requirements specification from the tool database. With luck or good planning, requirements analysis is one of the areas wherein the CASE team has been building up expertise and experience. The CASE team now has to build up a set of specific, detailed procedures to support project needs, and it also must adjust the tool set to work in concert with the procedures and needs. In other words, the team must create a requirements analysis CASP to support project use, along the lines described in the preceding subsection. The CASP consists of a high-level practice for the requirements analysis subprocess, a set of methods (commercial and home-grown) implementing requirements analysis, and a set of tools (commercial and homegrown) supporting the methods and the subprocess. You should figure on from 3 to 6 months from first experimentationwith tools to inserting them into an operational project. Of course some, or even much, of this time is consumed in the evaluation and select phases, but some will be in the project preparation portion of this phase. The length of time necessary is influenced by the robustness of the CASE infrastructure in place at the time of first experimentation.
7.3.1 HOTS Tools and HOTS Methods
Tools. Very often the literature refers to commercial tools as COTS tools, where COTS is an acronym for commercial off-the-shelf. This is in contrast to homegrown tools, which we will refer to as HOTS tools (homegrown off-theshelf) in order to emphasize the distinction. With respect to requirementsanalysis, there are many COTS tools which support structured requirements methods and an increasing number of COTS tools which support object-oriented requirements analysis. As the team will have learned in earlier phases, each COTS tool has a number of deficiencies and foibles, and each has its own unique interpretation of the methods it supports. Moreover, each project has its own needs and desires; which can be influenced by the application, the organization, or the customer. This means that it will be necessary to tailor and extend the methods and tools in order to make them truly useful. If more than one tool is involved, there will be integration work as well. This could mean tailoring and extending an existing integration, or may require developing some tool integration from scratch. What we are calling HOTS tools here are mostly homegrown tailorings, extensions, and integrations of the COTS tool set. One form of HOTS tool you will almost certainly create is utilities. After some operational use, it is highly likely that the project staff will find some aspect of using the tool set particularly
142
JOCK A. RADER
cumbersome. Very often, for only a few staff days' effort, one of your toolsmiths can create a utility, probably by tailoring or extending the tools, which ameliorates the problem. By way of example, consider a HOTS extension to the COTS structured analysis tool, where the extension checks for consistency between the inputs and outputs of a process: (1) as represented on its parent diagram, and (2) as described in the body of the specification for that process. An extensible tool should allow you to add a HOTS function, which verifies for a pspec, that: The body of each process spec must explain, by name, how each process input is consumed by the process and how each process output is produced. An automatic document generation function is a much more extensive HOTS extension. Assuming that the vendor provides a basic documentation capability, you will still have a great deal of tailoring and enhancing to perform in order to be able to generate a useful, professional document. You will also have to delve into the integration between the requirements tool and the desktop publisher. Based on experiences in a number of companies, the effort to accomplish this will probably exceed six staff months. For some applications and methods it may be necessary to develop a tool from scratch. This would also be a HOTS tool, but with much more serious maintenance and support ramifications. In general, we believe in using COTS wherever possible. If you are forced to develop a HOTS tool from scratch, then you should give very careful attention to how you will integrate it with the other tools in your environment.
Methods. Methods, in the same manner as tools, can be of either commercial or homegrown origin. Thus, it is reasonable to apply the terms COTS and HOTS to methods as well as to tools. Examples of COTS methods would be any of the well-known structured or object-oriented requirements analysis methods that are documented in the literature, such as real-time structured analysis (Hatley and Pirbhai. 1987)or the ObjectModeling Technique(OMT) (Rumbaugh eraZ., 1991). What we are calling HOTS methods are really collections of detailed procedures, which are established for a project or an organization, which explain how to use the tool set in order to support the COTS method and to satisfy the requirements of the subprocess. In many cases a HOTS method is wrapped around one or more COTS methods, i.e., it provides detailed guidance in how to use the COTS methods with the tool set on this project. Naming conventions are a simple example of a HOTS procedure, e.g., The name of all external flows that interface with the ATM network must begin with '$$', and those that interface with the laser cannon must begin with '##'.
This next HOTS procedure is motivated by serious concern for performance characteristics on the project. It requires that the performance description be
CASE ADOPTION: A PROCESS, NOT AN EVENT
143
stored in the tool model in a uniform place so that the HOTS tool extension to generate the requirements specification knows where to find the description. Attached to each dataflow diagram must be a note, named ‘performance,’which estimates the nominal execution time for the parent process. and explains the derivation of the estimate.
In projects at Hughes, documentation of the HOTS methods for a subprocess runs from 20 to 50 pages or more. These methods must be developed as a joint effort by the project staff and the CASE team. The project staff understand the needs of their project but is not expert with respect to the capabilities and limitations of the tools, and the expertise is reversed for the CASE team. It should be clear from the examples above that methods and tools are inextricably intertwined. Training as well as documentationis imperative for the HOTS methods. Experience has shown that without effective training it can be difficult for some project staff to grasp procedures as basic as employing a specific set of predefined styles in Microsoft Word (Microsoft, 1993). The training has to cover both the mechanics of applying styles and the methodological reason for how they are used.
7.4 Training, Training, and More Training There are three broad areas of tool and tool related training, i.e., methods training, tools training, and platform training. The project staff needs to be trained in all three. If proper training is not provided the consequences range from wasted effort at the best, to failure of the adoption activity, and even to poor performance or failure on the project itself.
7.4.1 Methods Training For methods supported by COTS tools, training can usually be obtained from the tool vendor or from a consultant firm associated with the vendor. Because of the correlation between training and successful adoption, the vendor will probably urge you to purchase methods training. Because the vendor has studied the methods in detail in order to implement them, the vendor is normally well equipped to teach it. Depending on the number of people you want to train and your schedule, you may wish either to contract for a dedicated class in-house or to send a small group off to a public class. When it makes economic sense, it can be very desirable to develop the ability to provide methods training internally. This allows you to control three important aspects of the training: tailoring, timing, and quality of instruction. In addition, if you are likely to train more students over a period of years, internal training becomes more cost effective. If the training can be applied to many projects
144
JOCK A. RADER
throughout the company, developing in-house training becomes particularly attractive. Also CASE team members who train and consult to projects become valuable resources for new projects. When you tailor superfluous material out of a course and insert project-specific material instead, you help make more effective use of students’ time. Vendor courses are frequently targeted at a lowest common denominator often spending excessive time on introductory material. In addition to wasting manpower, this can demotivate students. The choice of examples is another tailoring issue. The examples in the vendor course may have little relevance to the staff on your project. Financial planning examples and sonar signal processing examples will not provide equal insight to your project staff. However, if you decide to do your own training, do be aware that developing good examples and solutions is time consuming. The timing of training can be very important, with the concept of “just in time” training very powerful. When the training occurs near to the time that it will be used, you accrue two important benefits, (1) students pay more attention because they know they are about to apply the training, and (2) students have less time to forget what they have learned. Training which precedes application by more than a few weeks usually has to be repeated or refreshed. There is a wide range in the quality of instructors, which is related more to breadth of knowledge and experience than to teaching skill. Most professional instructors are competent at teaching. But sometimes your instructor was only trained in the method a short time before being assigned to teach your class. This makes it very difficult for them to effectively answer questions of even medium complexity from your project staff. It may also be that the instructor has little or no experience with applying the method on a real project, or on projects with characteristics similar to yours. The methods training should provide lots of time for working and discussing examples. Working in small groups of two to five people seems to be more effective than one person working alone. It is important to structure the training so that the students will have time to work on the problems. In a work setting, this usually means devoting a part of class time to group problem solving. This also allows the groups to seek assistance from the instructors while they are working on the problems. Training for HOTS methods is pretty clearly the responsibility of the CASE team. The team has to develop user documentation and reference documentation for both HOTS methods and homegrown tools. This material could range from five to a hundred pages in length. It might be desirable to provide on-line documentation as well. They also have to develop the training materials, which usually will take the form of slide presentations. The most difficult part of developing training materials
CASE ADOPTION: A PROCESS, NOT AN EVENT
145
is the examples and exercises. But these are probably the most important part of the training. Over time, the team should evolve the materials to better serve subsequent training classes. Therefore, it makes sense that the training materials be developed in electronic form to facilitate maintenance.
7.4.2 Methods Training before Tool Training? When you talk to people with operational experience, most frequently you hear that methods training must precede tools training with a minority expressing that methods and tool training can be combined. You are very unlikely to ever hear someone promote tool training before methods training. Both the methods and the tools are very complex. Therefore it is better to teach the method first so that the student can concentrate on the method and not be distracted by the tools. Moreover, given the immaturity of tools, the distraction can be very great with the potential that much of the class time is spent troubleshooting tool or workstation problems. For a novice it will also be confusing as to what behavior is associated with method and which is an idiosyncrasy of a particular tool. It must be remembered that 3 to 10 days of training in a method will not make anyone an expert in using that method. At best that makes them a beginning apprentice. (“Apprenticed to whom?” you should ask.) Abstractly, it is easy to argue that methods training and tool training can and should be combined. If you want to teach someone to build a bookcase, you don’t have to teach them how to build a bookcase with stone tools before you show them how to use power tools.
The problem with applying this reasoning to CASE tools and the methods they support is the complexity of each as mentioned above and particularly the unpredictability of tools. Until the tools become more robust, more intuitive, and less intrusive. it is often best to teach the methods before the tools.
7.4.3 Tools Training Most of the discussion for methods training applies equally well to tools training. Almost certainly, the CASE team will be trained in tools use by the vendor, probably early in the evaluation phase. If, subsequently, you plan to use the vendor to train the project team, you should negotiate with the vendor to tailor the class. The team will have experienced the standard training and will know what will be of most value to the project users, so they can suggest the direction of the tailoring. It is extremely important that all students get extensive hands-on experience during training. The less experience project staff has with this type of tool, the
146
JOCK A. RADER
more hands-on exposure they need. Project staff who do not feel comfortable with the tools will be reluctant to use the them on the project, possibly delaying use until they jeopardize correct or successful adoption. After all, they can always fall back on the old way. The sponsor can provide a major assist to successful adoption by disallowing or preventing reversion to the old way. Perhaps by demanding reports that the tools will generate automaticallyand that would be painful to generate manually. Way back in the 1970s, a determinedmanager at Hughes weaned his programmers away from punched cards, when he refused to purchase a card reader for a shiny new computer facility.
7.4.4 Examples and Exercises Extremely Helpful When methods and tools training are given to members of the project team, the goal is that they learn to be able to apply the methods and to use the tools. The training is not just for information only. Therefore they must receive a clear idea of exactly what they must do and they need practice in actually doing it. As most of us can recall, understanding an example in physics class or on page 150 is very different from working the high-numbered problems at the end of the chapter. Training requires all three of the following components. In the exposition component the instructor is saying, “let me explain what I’m going to show you.” With the examples,the message is, “now I’ll show you what we are talking about.” The final component is the exercises, where the student is challenged with, “now you try it.” The expository portions of the training describe how to use the methods or tools and provides a skeleton of understanding. The examples help explain and add substance to the skeleton. But it is the exercises that force the students to recognize the holes in their understanding and to work at filling those holes. They do this by struggling (trial and error experimentation), by discussing with their fellow trainees (“what did the instructor say we should do when this happens?”), and by seeking feedback and assistance from the instructors. The exercises must provide the students plenty of time to explore. And after the training is over, the students should be encouraged to continue exploring because at this point they are best beginning apprentices. This implies that the necessary hardware and software be available for the project team to experiment on, and that you should encourage the staff to spend part of their work week experimenting. It is particularly helpful if the examples and exercises are similar to the real application. For some project staff, the methods and tools may seem irrelevant unless they are motivated by meaningful examples. For an average project, it is unavoidable that 25% of the staff will be in the bottom quarter with respect to
CASE ADOPTION: A PROCESS, NOT AN EVENT
147
knowledge, or ability, or motivation. Clearly, you want to give the adoption every chance you can to succeed. Your earliest successful users on the project may be able to provide examples to help continue the training of later users.
7.5 Supporting Operational Use When discussing support of first operational use, we assume that you have already developed your HOTS methods, that you have designed and implemented all necessary HOTS tools, that you have acquired all necessary hardware and software, and that you have provided initial training to the users. If you have not done all of these things before you have real users, your chances for success have dimmed.
7.5.1 Support Categories There are quite a few skills required by the CASE team in order to support the first operational project. The following list indicates the nature of the skills. If we assume that you are going to introduce a new method, the Flog method, onto a project and that you are also introducing a tool which implements that method, you need all these skills available to support the project staff. You also need one or more talented individuals from the project who buy into “making CASE work.” If you are introducing multiple methods or multiple tools, you need expertise available in a correspondingly larger number of areas. 0 0 0
0 0 0
Flog method expert user Flog method guru Flog tool expert user Flog tool internals expert System manager Users from the project
This group of people will be responsible for handholding, ad hoc training, troubleshooting, and evolving the Flog method and tool. In the group, you need some talented individual with the ability to solve users’ problems of the type “the tools and/or methods and/or conventions won’t let me do what I need to.” This may be an education problem, but often it will highlight the need to evolve the methods or tools. For some problems, the team will need a deep understanding of the methods and tools, a sincere appreciation of the user’s concern, and considerable creativity to successfully eliminate the issue.
System Manager Support. Management and project staff are frequently amazed at how much system manager support is required to manage CASE tools,
148
JOCK A. RADER
especially for first operational use. CASE tools make major demands on system resources, including memory, processor speed, network bandwidth, disk storage, and operating system services. The resultant heavy loading increases the likelihood of unwelcome system events. The duties of the system manager are to install tools, to install new versions and bug fixes, to set user quotas and permissions, to perform backups, and to solve user problems. To the system manager, members of the CASE team are users as well as the project staff. Typical problems include: some feature won’t work, performance is terrible, data is lost, some tool will not come up, the print queue will not print, etc. Often these problems can be traced to the way a user’s account is configured-their quotas, their privileges, and their environment.In the author’s experience, account configuration problems will still be an annoyance even two years after project inception. For a project utilizing a file server and 5 to 15 workstations, you can expect to average 50% of a system manager for two years, with a higher percent required in the first six months.
Methods and Tools Expert Users. You need expert users of the methods and tools to consult to the users on the project and to perform continuing hand holding. Some user problems will be straightforward and require little expertise, just basic knowledge. Other problems may stretch the envelope of the methods and tools and will require creative response from a highly qualified expert user. As the project develops, much of this activity can be taken over by project staff as they become more experienced. Even more difficult problems will require attention from a methods guru or a tools internals expert. This could result in troubleshooting at the intemals level, or it could result in the adaptation and evolution of the methods and tools. This type of expertise should not be required as often as it was during preparation for the first victim. Still, when it is required it must be provided as quickly as possible.
7.5.2 The Value of Regular Meetings When introducing new methods and tools (or for any organizational change), regular meetings which are attended by all affected staff can be highly effective for numerous reasons. In addition to all of the tactical reasons, these meetings are a tangible expression of management’s interest in the changes and their At the beginning of the dedication to making them work. (“Walkthe talk.*’) project, meetings which focus on the status of the adoption should occur at least once a week. The goals and benefits include: 0
To constantly review what is working and what is not
0
To review current status and problems
CASE ADOPTION: A PROCESS, NOT AN EVENT
0 0 0 0
0 0
149
To foster communication To show interest in both problems and successes To learn of and to address rumors To learn how people are using the tools To spot problems early To allow people to learn from and be motivated by their peers
Because I have often been an early adopter, I have many times been a first victim (sometimes self-inflicted). On those challenging occasions when my software component was one of the first to follow the new methods or to use new tools, there was almost always disharmony due to inconsistency among the new way, what I thought I needed to do, and what was feasible. It is only in an open discussion where all points of view are represented, that workable compromises can be fashioned. After listening to my complaint, another user may explain how they avoided my particular problem, or management may clarify my (and others’) murky understanding of their objectives,or I may have a suggested improvement, or the method or tool keepers may be able to suggest alternatives. In fact, I have often been the inflictor of a new way. And I have never been happy with that new way during my first attempt to follow it. No matter how satisfied I am with the designed method or tool, I can only marvel at my lack of insight when applying it to real operational use, even after careful experiments. Real use, by myself and others, allows evolution toward a more usable and useful solution. It seems fairly obvious then, that when the designer and the first users are not the same person, the need for orderly evolution can only be greater. By working the issues in open meetings, other users become more aware of the new ways, what the issues are, and how their peers are coping. They are also kept up to date on how the new way is evolving. Moreover, not only will they hear about the many problems, they will also hear about what is working, and receive useful hints on how to adapt their own efforts.
8. Expansion and Evolution: Second Victim and Beyond To a first approximation, enlarging the scope of CASE adoption from a single project to many projects or a whole organization is just more of the same. Unfortunately, there are two conflicting trends that can seriously decrease the chances for success: 0
0
The need for resources to support CASE adoption is likely to increase. Management’s desire to provide resources is likely to decrease.
Management is always in need of resources, so the completion or nearcompletion of the adoption effort for the first project seems like it might be a
150
JOCK A. RADER
good time to reclaim the level of support previously enjoyed by the CASE team. Many managers still believe that adoption is an event-if it worked for one project, it will easily transfer to another. However, it is critically important that the sponsor stay sold and that management continues to support the CASE team. The CASE champion should treat maintaining the support of the sponsor as a top priority.
8.1 Enlarging the Scope Enlarging the scope can take place in either of two dimensions or in both. One dimension is the scale of the organization that is adopting new technologyfrom a few people on a single project to a thousand or more in a large organization with many projects. A thousand people distributed over 20 projects is probably an even more formidable adoption task than if the entire thousand works on one project. Diverse application areas and diverse customers further increase the challenge. The second dimension is the scale of the technology to be adopted. If you followed the rule, “start small and grow,” you probably afflicted your first victim with only a small number of tools, supporting one or two subprocesses. But there are a score of subprocesses into which you can venture with CASE tool support. And as the number of CASPs grows, the number of desirable interactions among them increases even more rapidly. In Figure 5 , we indicate how increases in both dimensions might interact. This figure is an extension of ideas appearing in Fowler and Levine (1992) and Adler and Shenhar (1990).Along the horizontal axis is a scale for organizationalcontext,
Organiutbn Conpbxity
-
FIG. 5. Impact of technology complexity increased by organization complexity.
CASE ADOPTION: A PROCESS, NOT AN EVENT
151
reflecting both size and complexity. The vertical axis is a double scale for level of effort and time. Each of the little rulers in the quadrant corresponds to the same five measures of the scale of the technology. Thus, for a fixed value of scale of technology, effort and time increase with larger measure of organizational context.
lmrnaturity of CASE Technology. If you plan to introduce tools to support many of your subprocesses in a software organization of a hundred people or more, your adoption effort will span years and will eventually blend into a continuing tools environment evolution effort. Because CASE technology is so young and dynamic, you will probably be using some tools in three years that don’t exist today. By the same token, certain of the tools you use today will be obsolete and unsupported, and many vendors will have disappeared, perhaps folded into more successful companies. Consequently, the CASE team will find that it frequently must evaluate new products and new versions of existing products, as the market rapidly changes. An interesting column (Cringely, 1994) in one of the trade magazines makes the analogy with the automobile industry. In the United States, there were 300 car makers in 1920, dropping to 25 by 1930, to 10 by 1940, and to only 3 today. It took 70 years to go from zero to 100 million cars, but less than 20 years to get to 100 million desktop computers. This underlines the dynamic nature of the CASE industry, which is barely 10 years old. CASE integration technology is even more immature than CASE tools. There has been considerable research in integration techniques since the late eighties, but very little operational use. A recent report (Rader et al., 1993) indicates that operational use of integrated CASE tools is rare, even for point-to-point integrations. Adoption strategies based on use of integration services (sometimes called framework services) are high risk and will remain so for some time. Extensive evaluation and prototyping can help mitigate the risk. This translates into increased requirements for resources for the CASE team. Culture Change in the Large. If you chose your first victim wisely, they were receptive to the tool adoption effort. Probably then, many of the project staff were either “early adopters” or members of the “early majority.” As you move to spread the adoption across the organization, you will begin to encounter the members of the “late majority” and the “laggards,” who together make up about half of your total staff. They will not be as receptive as the members of your first project. Therefore, success (publicized success) on the first project is so critically important.
CASE Team in the Large. The fundamental nature of most of the tasks performed by the CASE team is unchanged. They still have to provide consulting
152
JOCK A. RADER
and training, but now to multiple projects. The team will have to be competent with respect to a larger collection of methods and tools; and, doubtless, individual team members will have to broaden their personal sets of competence and expertise, The broadened experience base will help them to design new CASPs which are more generic; i.e., applicable to a wider group of projects. Because adoption and evolution is a continuing process extending over the years, tool evaluations will always be an important part of the team’s job. The highly dynamic nature of the CASE marketplace will guarantee that there are always more tools to be evaluated than can be accommodated in any likely budget. Therefore, the team will have to wisely choose which tools to evaluate. There will always be a tension between spending team resources to evaluate new tools and developing new capabilities, on the one hand, and supporting existing projects on the other. It is important that you do not allow a schism to develop between those two activities, because it is only by supporting projects that you can become steeped in what the real needs of the users are. A very fundamentally new and different responsibility that has to be accepted by the CASE team, if the adoption and evolution of CASE is to become a part of the culture of the organization, is to define the CASE evolution process. This would include tool evaluation, conducting pilot projects, CASP definition and implementation, and introduction onto new projects. In Lydon (1994) the author discusses how in his company, they defined a process based on an SEI Level 5 KPA.
8.2 Build on Success: Maintain the Momentum If you did most of the things recommended in this chapter, you should have fashioned a capable CASE team by the time your first operational project has put into regular use the methods and tools. Hopefully, you will be able to keep this team largely intact as you face into subsequent projects. You also have helped to create a body of competent users on the project staff, who can assist you with the continuing adoption effort. In some cases, they will be willing to talk to new users about their experiences, to consult or to provide examples. Others may be willing to join the CASE team on either a full- or part-time basis. It should be obvious that if there were 15 users of the new technology on the first project and you are about to expand to a total of 120 users on four projects, you need a bigger CASE team. If some of the tools will be different on the new projects or if there will be additional tools, even more tasks will have to be performed by the CASE team. For each of the subprocesses,you still need experts on the methods, expert users for each tool, and internals experts for each tool, plus all of the other expertise mentioned in the preceding section. You will almost certainly find that the CASP that you built for first operational use will not work without modification for your second project. After open
CASE ADOPTION: A PROCESS, NOT AN EVENT
153
discussion and negotiation, with both sides adapting, your second victim will doubtless find a better match than they originally thought. It is to be hoped that you will be able to modify your CASPs in such a way that the differences between the variants for the two projects is small compared to what is unchanged. (To review; a CASP is a set of methods and tools fashioned together to support a fragment or subprocess of your process.) But what about the third project? And the tenth? Ideally, you will eventually be able to design and build CASPs with smart tailoring characteristics that allow you to create a variant for a new project with a minimum of effort. However, if a new project intends to use object oriented design, where previous projects have all used structured design, you would expect the 00 design CASP to have fundamental differences from a structured design CASP.
9. CASE Adoption Summary CASE adoption is a process, not an event, and as such will take place over an extended period of time. You should manage the adoption activity like a project with objectives and phases, with the objectives of the adoption effort in alignment with the objectivesof the organization.The four phases we suggest are: 0 0 0 0
Awareness Evaluation First victim (first operational use) Second victim and beyond
There are four necessary groups that must work together for successful CASE adoption: 0
0
0
Sponsors, who furnish resources and legitimacy (often a single person) Champions, who maintain the project (CASE adoption) vision and are the spokespeople for the project (often a single person) Change agents, who perform the work adoption Operational users, the presumed, eventual beneficiaries of adoption
These groups are identified in the literature of organizationalchange and hence are not unique to CASE adoption. There is much to learn about adoption from the discipline of organizationalchange, because no change is purely technicalall change is at least partially organizational.The organizationalchange literature is in agreement with the author’s empirical observationsin many important ways. The CASE team is the group of individuals that perform the work of the adoption activity. They are responsible for methods training, methods extension, tool evaluation,tools training, tool extension, consultationto operational projects,
154
JOCK A. RADER
and environment evolution. The sponsor(s), the CASE champion(s) and the change agents constitute the team. Evaluation should be done very carefully, because highly exaggerated and misleading claims are not unknown in the CASE marketplace. Whenever possible the CASE team should find existing users and interview their CASE team and their operational users in detail. Developing a project support environment (integrated or not) is an extremely complex undertaking and should be developed in an evolutionary manner. Start small, both in terms of initial users and with respect to number of tools, then grow. You should keep all affected parties involved throughout the adoption and evolution activities. CASPs (Computer Aided SubProcesses) provide a way to integrate process, methods, and tools; and provide a concept of operations for the methods and tools. They also facilitate evolutionary development because each CASP focuses on a single process fragment. This final list summarizes many of the suggestions discussed in the chapter. They are based on experience with: (1) previous CASE adoption efforts, and (2) other technological change. 0
0
0 0
0
0 0
0 0
Allow plenty of time for adoption-the larger the organization, the more time is required. Provide adequate resources-including workstations, staff time, licenses, training, and schedule. Develop and maintain a CASE team. Learn from the experiences of others and look at the details-you may learn more from their failures than their successes. Be very exacting as you evaluate vendor statements about functionality and availability-this will minimize unhappy surprises later when it is harder to compensate. Choose your first victim carefully-someone who is flexible, who is interested in new technology, and who has a high threshold for pain. Plan to fully support your first victim and to understand their needs, be flexible, and use common sense. Obtain and maintain sponsorship, it is required not only for resources but also for legitimacy. Establish both short-term and long-term goals. Plan to evolve-you will make plenty of regretful choices the first time.
The title of the chapter emphasizes that CASE adoption is a process, not an event. Hopefully, the contents of the chapter have provided insight into what that process should be. If process seems too formal a term, then an alternative
CASE ADOPTION: A PROCESS, NOT AN EVENT
155
statement is that CASE adoption is a journey, not a destination. History has shown that it can be an adventure as well. ACKNOWLEMMENTS The author wishes to acknowledge Dennis Kane of Hughes Aircraft and Mark Zelkowitz of the University of Maryland for their helpful comments in reviewing this chapter. He is also indebted to many of the staff at the Software Engineering Institute for numerous valuable ideas and insights received while a resident affiliate there, both in the fields of CASE technology and of technology transfer. The author acknowledges workers at other companies who have kindly shared their experiences, and most importantly, his colleagues at Hughes Aircraft, the change agents, toolsmiths, and victims, who by strenuous effort have provided the real material for this chapter, operational experience.
REFERENCFS
Adler, P. S.,and Shenhar, A. (1990). Adapting your technological base: The operational challenge. Sloan Manage. Rev. 32(1), 25-37. ANSI (1992). “Recommended Practice for the Evaluation and Selection of CASE Tools,” ANSY IEEE 1209 (to be balloted as an IS0 practice). ANSI, Washington, DC. Belasco, J. A. (1990). “Teaching the Elephant to Dance: Empowering Change in Your Organization.’’ Crown Publishers, New York. Boehm, B. W. (1981). “Software Engineering Economics.” Prentice-Hall, Englewood Cliffs, NJ. Boehm, B. W. (1988). A spiral model of software development and enhancement. IEEE Compur. 21(5), 61-72. Bridges, W. (1991). “Managing Transitions: Making the Most of Change.” Addison-Wesley, Reading, MA. Brown, A. W., Feiler P. H., and Wallanu, K.C. (1991). “Understanding Integration in a Software Development Environment,” Tech. Rep. CMU/SEI-91-TR-3. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA. Brown, A. W., Carney, D. J., Moms, E. J., Smith, D. B., and Zarrella, P. F. (1994). “Principles of CASE Tool Integration.” Oxford University Press, New York, NY. Camp, R. C. (1989). “Benchmarking: The Search for Industry Best Practices that Lead to Superior Performance.” ASQC Quality Press, Milwaukee, WI. Card, D. (1994). Making the business case for process improvement. IEEESofhvare 11(4), 115-1 16. Cringely, R. X.(1994). Notes from the field. Infoworld, March 28. Curtis, W., Krasner, H., Shen, V.,and Iscoe, N. (1987). On building software process models under the lamppost. In “Proceedings of the 9th International Conference on Software Engineering,’’ IEEE Computer Society Press, Los Alamitos, CA. DeMarco, T. (1978). “Structured Analysis and System Specification.” Prentice-Hall, Englewood Cliffs, NJ. Department of Defense (DOD). (1988). “Defense System Software Development,” Mil. Stand. DOD-STD-2167A. DOD, Washington, DC. Favaro, J., Coene, Y.,and Casucci, M. (1994). The ESSDE experience: Can a SEE be designed for scalability? In “Proceedings of the 4th Irvine Software Symposium (ISS ’94);’ pp. 77-88. University of California, Irvine. Firth, R., Mosley, V., Pethia, R., Roberts,L., and Wood, W. (1987). “A Guide to the Classification and Assessment of Software Engineering Tools,” Tech. Rep. CMU/SEI-87-TR-lO. Software Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA.
156
JOCK A. RADER
Fowler. P. J., and Levine, L. (1992). “Software Technology Transition: Life Cycles, Models, and Mechanisms,’’ Bridge, Oct. 1992. Software Engineering Institute, Camegie-Mellon University. Pittsburgh, PA. Fowler, P. J., and Maher, J. H..Jr. (1992). “Foundations for Systematic Software Technology Transition.” SEI Tech. Rev. ‘92. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA. Gilb, T. (1985). Evolutionary delivery vs. the waterfall model. ACM Softwure Eng. Nores lO(3). 49-61. Gilb, T. (1988). ‘‘Principles of Software Engineering Management.” Addison-Wesley, Wokingham, England. Glass. R. L. (1989). How about next year? a look at a study of technology maturation. J. Sys. SopWare 9(3). 167-168. Hatley, D. J., and Pirbhai, I. A. (1987). “Strategies for Real-Time System Specification.” Dorset House, New York. IEEE (1994). “Recommended Practice [draft] for the Adoption of CASE Tools, Draft 5.” IEEE P1348. IEEE. New York. Lydon, T. (1994). Techniques for managing software technology change. In “Proceedings of the 6th Annual Software Technology Conference,” Track 4. Software Technology Support Center, Hill Air Force Base, UT. McSharry, M. (1994). At workshop, NASA promotes SEL process. IEEE Software 11(3), 105-106. Microsoft (1993). ‘‘User’s Guide Microsoft Word.” Microsoft Corporation, Redmond, WA. Myers, W. (1992). Good software practices pay off or do they?, IEEE Softwure 9(2), 96-97. NISTECMA (1991). “Reference Model for Frameworks of Software Engineering Environments.” Tech. Rep. ECMA TR155.2nd ed., NIST Spec.Publ. 500-201. U.S. Government Printing Office, Washington, DC. Paulk, M. C.. Weber. C. V.,Garcia, S. M.. Chrissis. M., and Bush M. (1993). “Key Practices of the Capability Maturity Model, Version 1.1.” Tech. Rep. CMU/SEI-93-TR-25. Software Engineering Institute, Carnegie-Mellon University. Pittsburgh, PA. Ressman, R. S. (1988). “Making Software Engineering Happen.” Prentice-Hall, Englewood Cliffs, NJ. Rader, J. A. (1991). Automatic document generation with case on a DOD avionics project. In “Proceedings of the loth Digital Avionics Systems Conference.” pp. 305-310. IEEE Computer Society Press, Los Alamitos, CA. Rader, J. A., Brown. A. W., and Moms, E. J. (1993). “An Investigation of the State of the Practice of CASE Integration,” Tech. Rep. CMU/SEI-93-TR-15. Software Engineering Institute, CarnegieMellon University, Pittsburgh, PA. Redwine, S.. and Riddle, W. (1985). Software technology maturation. I n “Proceedings of the 8th International Conference on Software Engineering.” pp. 189-2oO. IEEE Computer Society Press, Los Alamitos, CA. Rumbaugh. J.. Blaha. M., Premerlani, W., Eddy, F.. and Lorensen, W. (1991). “Object-Oriented Modeling and Design.” Prentice-Hall. Englewood Cliffs, NJ. Wood, D. P.,and Wood. W. G.(1989). “Comparative Evaluations of Four Specification Methods for Real-Time Systems,” Tech. Rep. CMUISEI-89-TR-36.Software Engineering Institute, CarnegieMellon University. Pittsburgh, PA. Wood. W. G.,Pethia, R. D. and Firth, R. (1988). “A Guide to the Assessment of Software Develop ment Methods.” Tech. Rep. CMUISEI-88-TR-8.Software Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA. Zelkowitz, M. W. (1993). Software engineering technology transfer: Understanding the process. In “Proceedings of the 18th NASA Software Engineering Workshop,” pp. 450-458. NASA Greenbelt. MD.
-
On the Necessary Conditions for the Composition of Integrated Software Engineering Environments
.
.
DAVID J CARNEY AND ALAN W BROWN Software Engineering Institute Carnegie-Mellon University Pittsburgh. Pennsylvania
Abstract This chapter explores the conditions necessary for integration in a software engineering environment. Integration is considered to be a primary “quality attribute” of an environment, and is defined by the mechanisms that implement it, the services that it provides. and the process constraints that provide its context. This chapter argues that if composition of integrated environments is to become a reality. the necessary first condition is the definition and eventual standardization of interfaces at many functional boundaries not cumntly exploited today.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 A Brief Description of a Software Engineering Environment . . . . . . . . . . 1.2 An Overview of the Integration Roblem . . . . . . . . . . . . . . . . . . 2. A Three-Level Model of Software Engineering Environments. . . . . . . . . . . . 2.1 Integration in Terms of the Three-Level Model. . . . . . . . . . . . . . . . 3. The Mechanism and Semantics of Integration . . . . . . . . . . . . . . . . . . 3.1 Integration Described as a Set of “Dimensions” . . . . . . . . . . . . . . . 3.2 Integration Described as a Relationship . . . . . . . . . . . . . . . . . . . 3.3 Integration Described by Its Semantics and Mechanism . . . . . . . . . . . . 3.4 Summary of Semantic and Mechanism Integration . . . . . . . . . . . . . . 4 Integration in Practice: Rocess Aspects of Integration . . . . . . . . . . . . . . . 4.1 Common Practical Means of Integration. . . . . . . . . . . . . . . . . . . 4.2 The Rocess Context of Integration . . . . . . . . . . . . . . . . . . . . . 4.3 Qualities of Successful Processes. . . . . . . . . . . . . . . . . . . . . . 4.4 APracticalExample. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 RagmaticSolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 The Conditions Necessary for Integration . . . . . . . . . . . . . . . . . . . . 5.1 The Proper Locality of Software Interfaces . . . . . . . . . . . . . . . . . 5.2 The Search for Standards . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
ADVANCES IN COMPUTERS VOL. 41
157
158 159 161 162 164 165 165 167 168 172 172 173 174 175 176 179 179 180 182
.
copyriph( Q 1995 by Audemic Rou Ine. AU rights of rrpmaunim in MY fm I C I Q V.~
158
DAVID J. CARNEY AND ALAN W. BROWN
5.3 The Interaction of Interfaces and Standards . . . . . . . . . . . . 6. Toward Engineered Environments. . . . . . . . . . . . . . . . . . . . . 7. Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . ... ... ...
183 185 186 188
1. Introduction Integration of software components has been the subject of considerable attention over the last decade. Particularly in the domain of software engineering, integration of a set of useful engineering tools has been seen as a means both to improve the practice as well as to reduce the expense of building software. The tools are most often called computer-aided software engineering (CASE) tools, and the integrated collectionsof tools are generally called softwareengineering environments (SEES), or often just called “environments.” The notion of an integrated environment, while attractive, has proven remarkably difficult to implement in practice. Exemplified by the Ada-centric environments [the Ada Integrated Environment (AIE) and Ada Language Systems (ALS)] in the early 1980s (Momser al., 1992). many projects within government and industry have set out in pursuit of this goal: Aspect (Brown, 1991), Boeing Advanced Software Environment (BASE) ( Jolley and Gockel, 1988), Environment for Advanced SoftwareTechnology (EAST) (Bourguigon, 1989),Enterprise I1 (Long and Moms,1993), Software Life Cycle Support Environment (SLCSE) and Process SLCSE (ProSLCSE), (Strelich, 1988) the Special Working Group (SWG) on Ada Programming Support Environments (APSE) of the North Atlantic Treaty Organization (NATO) nations, and several other such projects were all, in one way or another, attempts to define and implement an integrated set of software engineering tools. Each of these projects had laudable features; yet real and genuine success-evidenced by an environment that measurably facilitates software engineering and that has a loyal and growing user base-eluded all of these efforts. Nonetheless, the perceived benefits of integrated environments have been desirable enough that the quest for them continues to drive many government and industry initiatives.’ There are several desirable features or attributes that an environment should exhibit. Integration is perhaps the most desirable attribute, as indicated by its staying power as the I, in such acronyms as Integrated Project Support Environment (IPSE), Integrated CASE (I-CASE), Integrated Software Engineering Environment (ISEE), etc. However, there are a number of other “quality attributes,” such as adaptability, reliability, maintainability, flexibility, and other such “ilities” (International Standards Organization, 1991) that are now perceived as of comparable significance in how they contribute to the goal of a useful engineering environment. Some of the latest directions in environment work-such as the development of process-centered or reuse-driven environments-pertain to the
COMPOSITION OF INTEGRATED SOFMlARE ENVIRONMENTS
159
increasing interest in new paradigms of software development? Other directions-principally interest in environments constructed largely of collections of commercial components-grow from a desire to change the current process of acquisition, to ease the ongoing maintenance burden, and to reduce the cost of introducing and using an environment. This recent emphasis on use of commercially available products rather than special-purpose components is paralleled by an increased interest in standards. Standards, particularly interface standards, are seen as having a beneficial effect on acquiring components that can easily be used in combination. Widely accepted standards have been suggested as a primary means of achieving “openness,” and several significant attempts have been made to assemble lists of standards [e.g., the National Institute of Standards and Technology (NIST) Application Portability Profile (1993)l.One goal of such lists is to find groups of compatible standards that span a range of necessary environment functionality. Further, although seldom explicitly asserted, the “open” quality that would result from the use of standards is usually assumed as somehow providing the ability to “plug-and-play” tools or components. An “open” environment is expected to permit new tools to be added or removed with little effort on behalf of the integrator, and with no effects on the remaining tools in the environment. It is the argument of this chapter that tool integrationand achieving “openness” through standards are only parts of a more complex problem; further, that to focus either on integration or standardization alone is inherently misguided. We argue that to achieve successful composition of integrated environments, it is first necessary to understand the interactions of the semantic, mechanistic, and process-related aspects of integration. We then suggest that this understanding will result in a focus on many software interfaces now considered “low-level.” Thus, exposure and eventual standardization of these low-level interfacesand the tacit assumption that software will be engineered according to this understanding-= the necessary conditions whereby integration, as well as such other quality attributes as flexibility and adaptability, can become real characteristics of commercially viable SEES.
1.1 A Brief Description of a Software Engineering Environment The role of a SEE can very clearly be stated in terms of the automated support for the development and maintenance of software systems. However, the characteristics of a SEE that fulfils this role are much more difficult to define. In particular, a SEE can be considered from many viewpoints, each of which highlights different characteristics of interest. Perhaps the most obvious, for example, is to consider a SEE to be a collection of commercial products that make use of operating system facilities to execute on one or more computers.
160
DAVID J. CARNEY AND ALAN W. BROWN
While such a view provides a product-oriented perspective on a SEE, in this paper we consider a SEE primarily from a service-oriented perspective. That is, a SEE is viewed as providing services that can be used in the development and maintenance of software systems. The services may be implemented in a variety of ways, and be used by many and varied software organizations. Furthermore, we make a basic distinction between two classes of services offered by a SEE: those that are provided primarily to support end-users in the tasks of developing and maintaining software systems (called end-user services), and those that are provided primarily in support of those end-user services (called framework services).This distinction is useful because it recognizes the different roles played by products that form a SEE. The separation of SEE services into end-user and framework is most clearly illustrated in the reference models supportedby the European ComputerManufacturers Association (ECMA) and NIST (ECMA & NIST, 1993; SEI & NIST, 1993). These services are summarized in Fig. 1. These reference models describe sets of services that may be found in any SEE. The models can be used (and have been used) as a checklist as part of an evaluation of SEE products, or as the basis for constructing a SEE acquisition strategy. It is important to emphasize, however, that a service-oriented view of a SEE is useful only insofar as it highlights the functionality that may be implemented in a SEE. Such a view must be augmented with other views that illustrate other SEE characteristics (e.g., the software and hardware components that implement the SEE and their detailed interactions).
~~
~~
Support Servlces (e.g., publishing, communication, administration)
Pro ect MaAagement Services (e.g., Planning, cost estimation)
Technlcal Management Servlces
.
(e.g., confi mgt, change ma.)
Technical Engineering Services (e.& system eng, so areeng.)
Framework Services (e.g., object mgt, process mgt, operating system) Fio. 1. A summary of SEE services.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
161
1.2 An Overview of the Integration Problem We concentrate our attention on environments composed of collections of components drawn from different commercial sources. While environments can be constructed as single-party, proprietary, or “black box” solutions, we are interested in the more general problem faced by many organizations.As a starting premise, we take the view that integration is a desirable condition in an environment. Hence, in our view of an environment, integration implies cooperative interaction between the heterogeneous components. Integrated interaction of heterogeneous components can be observed as common in most engineering disciplines. It makes possible the engineer’s main task of specifying and composing a system, and in choosing appropriate parts and tools to accomplish this. Whether the components of the system axe small (e.g., screws, plugs) or large (e.g., girders, engines), it is generally the case that an engineer can select components and tools from a variety of possible sources. Multiple tools are available that exhibit comparable functionality; it is expected that a mix of tools will not (typically) cause fundamentalproblems; and replacement of one supplier’s components with another supplier’s components should not (typically) invalidate the tools that operate on them, nor invalidate the system being constructed. The proper joining of the components and tools is assured because they offer well-defined interfaces to each other (e.g., the plug has prongs and the socket has holes), and the makers of components on both sides of the interface abide by standardized agreements about the interface’s dimensions (e.g., the plug has a specific number of prongs, each of a specific shape and size, that fit the socket’s holes). A tool or component has a set of implicit and explicit uses; there is, whether stated or not, a set of assumptions about sequence,priority, and appropriateness (i,e,, some “process”) that underlies both the independent use of tools and components as well as the construction of the system. Finally, maintenance and upgrade of the system depend on the same assumptions and agreements. The point of this brief description is to indicatehow integration,a desirable goal, is inseparable from other factors. The harmonious interaction of heterogenous components (which is one way to view their “integration”) is essentially a product of these other factors-well-defined interfaces, whose definition is based on widespread agreements that eventually may become officially standardized; and interfaces that support an underlying defined process. Integration is the state or condition that results when these other factors properly cooperate. Thus, integrated environmentscannot be achieved simply by taking tools and “adding integration.” Similarly, the pursuit of standardsper se is fruitless without considering the locality of the software interfaces needing standardization, as well as the essential centrality of the processes that the interfaces will support. With
162
DAVID J. CARNEY AND ALAN W. BROWN
reference to the problem of building software engineering environments, these are all intrinsically related issues, and are not amenable to piecemeal solution. The remainder of the chapter examines the premise that integration is inseparable from these other factors. Section 2 defines a three-level model of a software engineering environment; this model provides a conceptualbasis for the following sections. Section 3 examines the semantic and mechanistic aspects of integration. Section 4 examines the process-related aspects of integration. Section 5 discusses how all of these aspects relate to the locality of interfaces and to standardization, and act as the necessary condition of integration. Section 6 summarizes the argument of the chapter.
2. A Three-Level Model of Software Engineering Environments In order to discuss any quality attributes of an environment, and particularly the notion of integration, we need an abstract model of an environment that is general enough to permit consideration of the environment’s components, their interconnections, and the circumstances in which the environment will be used. The model we propose distinguishes three levels at which an environment can be discussed. 0
0
0
At one level the environment consists of the services it makes available to environment users. We consider a service to be an abstract description of a discrete capability of a software engineering en~ironment.~ At another level the environment can be described in terms of the mechanisms that implement those services. Conversely, the services can be said to depend on some set of mechanisms. At yet another level the environment can be defined by the process being enacted, that is, the set of goals and constraints of a project or organization and the desired sequence of project activities. The environment’s services support the process being enacted; conversely, the process constrains the set of services provided by the environment.
Together, these three levels of description and their interrelationships provide the context in which many aspects of an environment can be described. Figure 2 illustrates this model. Looking at these three levels in more detail, we see that the mechanisms level includes the architectural concerns and the technology components that will comprise the integrated environment. Concerns at this level include implementation issues, such as the software interfacesprovided by the environment infrastructure (e.g., operating system interfaces), the software interfaces provided by individual tools in the environment, and the specific integration mechanisms that
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
163
FIG. 2. A three-level conceptual model of a SEE.’Abbreviations: BMS,broadcast message server; RDBMS. relational data base management system.
will be used to connect the various environment components. The mechanism level also includes many architecturalconcerns of the tools, such as their internal structure (e.g., cliendserver)and data management structure (e.g., data dictionary, database); this level is concerned with the technology available and with the techniques that can be applied to connect the different environment components. The services level corresponds to an abstract description of the functionality that the environment offers to its users. Concerns at this level include the specification of the services provided by an environment, and how (at an abstract level) those services relate to each other. Integration concerns at this level may include such things as defining the relationshipsbetween the version control and the data management services, or describing the way in which a specification and a design service are related through traceability design components. Note that we do not directly equate a “service” with the term “tool.” We avoid this partly because the term “tool” has lost any useful precision; it can span the spectrum from a two-line shell utility to a massive product that verges on a full environment? But more importantly, the notion of “service” is an abstract one; the actual implementationsof services in our model exist not here but at the mechanism level. At the service level, we are interested in the descriptions of the individual services provided and with their logical connections. At the mechanism level the principal interest is the provision of the individual service and the complexities of connection that the serviceprovider offers to other components. The processes level corresponds to how services cooperate in the context of a particular software development process. The focus of this level is the process
164
DAVID J. CARNEY AND ALAN W. BROWN
specification for how software will be developed. This specification can define a view of the process from many perspectives, spanning individual roles through larger organizational perspectives. For example, the way in which design documents gain approval before implementation begins is a process constraint that affects the way that services are integrated. Another example might be the bug tracking and version control process that must be followed when errors are found in existing code. There may be a wide range of activities supported, with many degrees of freedom in the use of the environment, or the environment may be restricted to a smaller number of well-defined processes. Although these levels are conceptually separate, there are important relationships between them. The relationship between the mechanism level and the services level is an implementation relationship: services may be implemented by different mechanisms, and conversely, a single mechanism may implement more than one service. In this way, the abstract functionality of an environment at the services level can be described in terms of the concrete mechanisms that implement those services, and on which the services depend. The relationship between the services level and the processes level is a supporting relationship. The processes that are to be supported act as a set of guidelines and constraints for the combination of services. For example, the way in which the design and the coding services interact is a function of the development process that is to be supported. Also, the way in which they interact in support of a classic waterfall style of software development may be very different from their interaction in support of a prototype-oriented style of development. The differences between these processes will constrain the environment’s services, and particularly how they are integrated.
2.1 Integration in Terms of the Three-Level Model By using this three-level model, many facets of an environment can be discussed. An environment’s functionality can be related to the tools and integration technology that realize it; the organizational and development processes that constrain and guide the use of environment services can be discussed in terms of their effects on those environment services. Most of all, however this model provides a conceptual framework for understanding environment integration, and we use it as a working context and expand on its meaning and application in the rest of this chapter.6To begin, we briefly summarize the integration concerns at each of the three levels. At the services level we are primarily concerned with behavioral aspects of an environment. That is, the services that an environment provides can be considered to be an abstract description of the functionality that the environment provides. In considering integration at this level, we focus upon the semantics of the integration, as found in the agreements that relate the supported operations
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
165
of the environment. For example, consider the configuration management (CM) services that an environment provides: when discussed at the service level, we can focus on the model of CM that is supported [e.g., composition model, or change set model (Cagan, 1990; Feiler, 1991)], on the definition of the CM operations that the environment supports (e.g., check in, check out), or on the relationship of CM support to other services (e.g., CM and data management). In the remainder of this chapter we shall use the term “semantics” when integration at the services level is discussed. Our main concern at the mechanism level is with implementation details. In discussing integration at this level we must be concerned with the details of the architecture of the environment (i.e., the logical and physical design of the environment, and the mapping between these two). Examples of the integration mechanisms of interest are the use of a software bus for event synchronization, and persistent shared data through a common database. The way in which combinations of such mechanisms interact in providing the services offered by the environment must be considered when designing or analyzing a SEE. At the process level the development methods (and their constraints)employed by an organization are of concern. There is a wide variety of elements that must be considered at this level. This includes the various engineering processes the organization follows (e.g., a bug tracking process, a design review process, and a product release process), the organizational process that guides the organization at an enterprise level (e.g., project management, and organizational process improvement), and the personnel issues that affect the success of introducing new tools and techniques into the organization (e.g., training needs, and organization culture and climate). The relationships between these many and diverse elements are the basis for considering integration at the process level. The services provided by the environmentmust be used in support of these process constraints and needs.
3. The Mechanisms and Semantics of Integration We begin by considering in more detail the issues of mechanisms and semantics of integration;the mechanisms that implement the environment’sintegration and the semantics that affect it. We must first examine how tools might be integrated in terms of the underlying mechanisms, then examine the required semantics for what is integrated. We begin by reviewing two classical approaches to defining integration, since these approaches have contributed significantly to our own ideas. We then describe our notion of the mechanisms and semantics of integration.
3.1 Integration Described as a Set of ”Dimensions“ A seminal paper by Wasserman (1990) viewed integration as a set of dimensions that can be applied to an environment. These orthogonal dimensions of
166
DAVID J. CARNEY AND ALAN W. BROWN
integration allow the environment to be assigned attributes in each of the dimensions, Wasserman proposed five dimensions of integration, three of which were discussed in detail, Control integration relates to intertool coordination, data integration relates to information sharing, and presentation integration refers to user interface consistency and sophistication. The essential idea of Wasserman’s paper is that any environment can be evaluated for its approach to integration in each of these dimensions. For example, in the data integration dimension, one environment may use a file system, while another uses a database. These would be said to have different data integration attributes. By examining an environment in each of the dimensions, it is then possible to define the set of integration attributes of that environment that characterizes its approach to integration. This allows some measure of comparison of different environments in terms of their integration approach. Figure 3 illustrates three of the integration dimensions proposed by Wasserman. The dimensions are shown as orthogonal, and have gradations marking various points along each of the axes. Wasserman subsequently refined and expanded his proposed integration dimensions to includeplatform integration (system support services) and process integration (support for a well-defined development process). While the dimensional view of integration has merit in separating a number of integration issues that previously had been confused, it is also somewhat problematic. One question that immediately arises is whether the dimensions are truly orthogonal, and whether they can (or should) be considered separately. For instance, it is at least arguable that there is a close and complex relationship between data and control integration that suggests that these two are not truly
Presentation
Standard tool kit Standard window manager
--Shared
Shared
Control FIG. 3. Three integration dimensions.
Shared Object Management
COMPOSITION OF INTEGRATED SOFMlARE ENVIRONMENTS
167
separable. For example, restricting access to shared data is undoubtedly a form of control that synchronizes interaction with the data, while a control message sent from one tool to another will most often contain or refer to one or more data items, and is dependent on format of arguments, parameters, etc. Another problem with the dimensional view is that the items that populate each axis are commonly ranked in an arbitrary progression toward greater sophistication,leading to an interpretation that greater sophistication is equivalent to a “greater” or “better” level of integration. In fact this is not necessarily so. Certainly the mechanisms are more sophisticated in terms of the integration approaches they support. However, the important role played by the semantic agreements between tools (e.g., agreeing on what the shared data, messages, or screen icons actually mean) is not addressed in this view.
3.2 Integration Described as a Relationship Work by Thomas and Nejmeh (1992) focuses on integration as a properfy of the relationship between two or more environment components. They take this approach to highlight that integration addresses the way the components interact rather than describing characteristics of a single component. It is this assembly of components that is the key to a well-integrated environment. Thomas and Nejmeh identify several types of inter-component relationships that are of particular importance: 0
0
0
Tool-ro-roo1 relationships. Thomas and Nejmeh expand on Wasserman’s dimensions of integration by discussing the mechanisms through which individual tools interrelate. Tool-ro-frameworkrelationship. As the tools are hosted on some framework component (e.g., a database system or an operating system), the extent to which each tool makes use of the framework’s services is significant. Tool-to-process relationship. How well each tool supports the process being carried out is another relationship of interest. The relationship may be expressed in terms of a tool’s support for an individual step within the software life cycle (e.g., requirements definition), and its support for multiple steps (e.g., the requirements, design, and coding steps).
A useful distinction that is made in this work is between “well integrated” and “easily integrable.” An environment can be well integrated with respect to how end-users of the environment carry out their application development tasks. For example, there may be a consistent and intuitive user interface for interaction with all tools in the environment. On the other hand, a well-integrated environment is not necessarily easily integrable with respect to how easy it is for the environment builders and administrators to assemble the environment, tune it for particular needs, and replace one tool with another. To a large extent these two views
168
DAVID J. CARNEY AND ALAN W. BROWN
of integration are independent; there is no guarantee that an environment that is easy to assemble is enjoyable and productive to use, or vice versa. Thomas and Nejmeh’s view of integration is illustrated in Fig. 4, which shows their view of integration as a relationship between components, and which also shows the various kinds of relationships that may exist. While this view of integration is fruitful, it has at least two limitations. First, the integration relationships are expressed as “goals” that an environment may achieve. Unfortunately there is no discussion in their work about how to achieve these goals, what dependencies lie between them, and what tradeoffs must be made. Second, a tool is not equivalent to an end-user service. Rather, a tool may implement part of a service, or many services. But in using this approach to analyze an environment composed of a number of tools, there is no guidance on which of the many possible tool relationships are of most interest. While an integrator could use this approach to consider the potential relationships between every pair of tools in the environment, there is little direction in addressing the environment as a whole. And in reality, in an actual environment, the potential relationships between a pair of tools are heavily influenced by the other components of the environment.
3.3 Integration Described by Its Semantics and Mechanisms Having considered both of these valuable views, we now build on them to examine the semantic and mechanistic aspects of integration in greater detail. We start with a simple strategy: supposing that integration is concerned (at least) with the sharing of information,we must determine: What information is shared? How is the information shared? Who is sharing the information? When is the Process Interface
t
t
w
lifecycle stages
lifecycle steps
tool-tool control tool-tool data tool-tool presentation
tool-framework
Framework Interface
Fro. 4. Relationships among environment components.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
169
information shared? For the moment, we concentrate on the first two questions (“What” and “How”), since they focus respectively on the issues of semantics and mechanisms of information sharing. The other questions (“Who” and “When”) will be considered in the following section when we discuss the issue of process definition and enactment.
3.3.1 Semantics: What Information Is Being Shared Of the types of information that may be shared, the range is obviously very broad, including application data, metadata (descriptions of the structure and relationships among data items), documents, electronic designs, graphics, hardware designs, product descriptions, and code. But whatever types are of concern, the question of “What is being shared” is not answered merely by describing the shared data; the question is actually more complex, being inseparable from the forms of agreements that exist between the components that do the sharing. The agreements might be as simple as agreeing which port number will be used in a remote procedure call access or as complex as an information model that defines the overall business functions of the whole organization. Given that this range of levels of agreement can exist, it is useful to categorize those levels in some way. The following five levels’ are not intended to form hard categories, but rather to suggest the range of possible agreements. 0
Carrier level-By analogy with the use of the term in electroniccommunications, carrier level agreements allow data to be communicated between two agents (i.e., tools or users) without any agreementon the syntax or semantics of the data being shared. An example is the use of a common byte stream between tools. Lexical levels-When the agreement includes a common understanding of the basic tokens of data being shared, we can say that a lexical level of agreement exists. Comma separated lists of items are an example of a particularly simple form of lexical agreement that can be used. Similarly, an agreement over how many bits constitute a byte, whether there is odd, even, or no parity, and so on. Syntactic level-if the agreement extends to a common syntax for the shared data, then we can say there is a syntactic level agreement. Some of the structure of the data being transmitted is agreed, but the meaning and implied behavior is not shared. An interface such as the Structured Query Language (SQL) is an example of a syntactic level agreement. Semantic level-For a more meaningful exchange of information to take place there must be semantic level agreement. Here, there is a shared under-
170
0
DAVID J. CARNEY AND ALAN W. BROWN
standing of what the data items being shared actually mean. A common data schema used by a set of tools is an example of a semantic level agreement. Method level-At times the agreement extends to the point where there is a shared understanding of the method, or context, in which the agreement has taken place. This is called method level agreement. The agreements being drawn up by the American National Standards Institute (ANSI)X3H6 group is representativeof the attempts being made to define a set of common semantics for events that allow tools to expect certain behaviors from others with which they communicate.
As noted above, these categories do not have hard boundaries, nor are they necessarily independent. In fact, it is probable that any actual sharing of information between two tools will involve agreements at more than one of these levels. It is tempting to conclude from these levels that integration in an environment should always take place at as high a level as possible (i.e., at the method level). However, there are important consequences that must be considered as the sophistication of the level of agreement increases. In particular, higherlevel agreements are likely to be more complex and more closely related to the specialized needs of the tools. As a consequence, they suffer from a number of drawbacks; the making of such detailed agreements between different tools (and consequently different tool vendors) can be very difficult, the agreements can be slow to evolve over time, and the agreement can be so specific to the needs of a small group of tools that they have limited appeal. In fact, it has been noted that increased levels of agreement between integrated tools can often be a major inhibitor to the openness that environments seek (Brown and McDermid, 1992).
3.3.2 Mechanisms: How Information Is Shared The approaches for how information sharing is accomplished, i.e., the mechanisms that accomplish it, fall into four common classes-direct tool communication, direct data transfer, common data store, and event coordination mechanism. We briefly review each of these in turn.
3.3.2.7 Direct Tool Communication. Direct tool communication is the most common way that tools exchange information. Most often the communication takes place via unpublished, vendor-specific interfaces. Typically, a tool developer identifies an area of functionality that its tool does not address and forms what is often called a strategic alliance with another tool developer. They then collaborate to share knowledge about their tools, provide or tune interfaces to their tool’s functionality, and develop any additional code they require to enable the tools to work together. This approach has been seen between software design and documentationtools, software requirements and software design tools,
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
171
software design and coding tools, and so on. Documented details of the internal interfaces used are seldom made public, but are considered to be the “value added” from purchasing the integrated tools. Where the interfaces are publicly known, the tool developers rely on the fact that detailed knowledge on how to use those interfaces must be acquired before they can be effectively used. The specific mechanismsin question could include Remote Procedure Calls, broadcast messages [e.g., Hewlett-Packard’s (HP‘s) Broadcast Message Server (BMS) (Cagan, 1990)], or object requests [e.g., an implementation of the Common Object Request Broker Architecture (CORBA) (STARS Program Office, 1992)l.
3.3.2.2 Direct Data Transfer. In this approach, when information sharing is required between two tools, the data is written out by one tool in an agreed format and read in by the other tool. The tools can maintain information in an internal form appropriate to the tool’s function and translate data to and from the external format for communication with other tools. As a result, the tools can individually maintain data in different specialized formats, increasing the speed and efficiency at which they perform their particular tasks. A consequence of this approach is that each of the tools must provide services that allow the conversion to and from the external format. Depending on the complexity of the transfer format, the parsing and translating of data may be a nontrivial task in terms of time and system resources. This may have an impact on the tools’ overall performance and also may constrain the frequency and quantity of information transfer. 3.3.2.3 Common Data Store. Another approach is to maintain data structures and formats as a separate artifact under external control; a common data store can be provided that is external to the tools that wish to share information. The main difference between this approach and the direct data transfer approach is that the common representation that is used to share data now persists in this form in secondary storage.The expectation is that this persistent, canonical form is used as the primary means of data storage for each tool. The specific mechanisms could vary from simple use of the native file store to a sophisticated data repository together with a set of agreed-on schema definitions [e.g., an implementationof the PortableCommon Tool Environment (PCTE)with a shared set of schema definition sets (Wakeman and Jowett, 1993)l. 3.3.2.4 Event Coordination Mechanism. The basis of this approach is that tools inform other interested tools when important events are occurring, thereby ensuring appropriate coordination between the tools. For example, the check-in of a new source module may be of significance to a metrics tool, to a source code analysis tool, to a project management tool, and so on. Knowing that this event has occurred enables information sharing to take place at the
172
DAVID J. CARNEY AND ALAN W. BROWN
appropriate times. Specific mechanisms that could implement this include semaphores, operating system signals or interrupts, or the notification mechanisms provided by some repository implementations. This approach is orthogonal to the three noted above: there is still the necessity for some agreement about format and structure, both of the data to be shared as well as for the messages that provide the coordination of events.
3.4
Summary of Semantic and Mechanism Integration
In this section we have considered a number of semantic and mechanistic aspects of environment integration. Though these two aspects are often inseparably connected, it is useful to conceptually distinguish between them in order to gain deeper insight into the overall problem of tool integration. We also note that distinguishing these two aspects of integration has direct relevance for the environment builder or implementer. From the perspective of the semantics and mechanisms of tool integration, implementation of an environment can be viewed as a complex design exercise. The goal of the designer must be to consider the various choices that are possible, consider the advantages and disadvantages of each, and then document the decisions that are made. The basis for making each decision will be the effect the decision will have on the quality attributes of the environment. For example, if the designer considers an attribute such as flexibility to be of greater importance than performance (e.g., if the environment is expected to change frequently over time), then a decision may be made to be satisfied with simple, low-level agreements of data semantics between tools. Thus, as the environment components are replaced over time there may be a reduced amount of effort involved. The consequencesof such a decision, however, will be that the level of integration of the tools is reduced because the tools will need to parse the data that is shared and reconstruct much of the semanticsat each data transfer. Different choices at the semantics and mechanisms level might be made if the priorities of flexibility and data-sharing performance are reversed. Hence, the task of the environment designer and implementer is largely one of selecting quality attributes, identifying the relative priorities of those attributes, and then making choices of the semantics and mechanisms of integration in terms of the ways in which they affect those quality attributes. While this process sounds relatively straightforwardin abstract terms, in practice there are few welldefined guidelines and metrics to assist the environment designer in making these tradeoffs.
4. Integration in Practice: Process Aspects of Integration The importance of process definition and enactment for software development and maintenance has had ample discussion in the past few years, and needs little
COMPOSITION OF INTEGRATED SOFlWARE ENVIRONMENTS
173
justification here. However, the relevance of process definition and enactment for tool integration has been little mentioned. We examine here some of the ways in which these issues are interrelated, and why consideration of the process aspects are of fundamental importance for integrating tools. A key element of our three-level model is that process-principally in its definition and enaction-provides a necessary context for integration. We argue that any consideration of “real” integration, i.e., any instance of an actual integrated toolset, must include consideration of the process (or processes) that provide the context and background against which the tools are integrated.
4.1 Common Practical Means of Integration Before examining the process notions themselves, it is useful to briefly touch on some of the actual practices used in integrating software components. These will be revisited below when we look at a practical example of integration and the effects that process constraints can have on it. One common situation occurs when a project or organization decides that two or more “standalone” tools (i.e., commercially separate products) should be “put together” in some manner. The “putting together” might simplistically mean that the output data of one tool is automatically fed into the second; it could mean that one tool invokes another, or other similar types of relationship. The semantic aspects are often minimal; e.g., the tools could already agree on the nature of the common data. Typically, the desired integration is essentially aimed at reduction of redundant data entry, or at enhanced control flow. One dependable mechanism for this “putting together” is a shell script; the script freezes the series of command-line invocations that would occur if the tools were run in interactive, standalone mode. Scripts can provide temporary files and variables for use as ad hoc data stores, and invocation of tools is convenient. Another means, somewhat more sophisticated, is seen in messaging systems. Using these mechanisms, tools send notifications (e.g., about starting or stopping) to each other at certain point in their executions. Messaging systems can be used in a very simple way, by “encapsulating” a standalone tool with a wrapper that sends and receives messages. A more complex manner of use involves rewriting a given tool to incorporate the power of the messaging system into the tool’s working. (While this latter manner of use is more powerful, we note that it is not an option for a third-party integrator, since it requires alteration of the tool’s source code.) In either case, whether script or message system, there is an implied order of events. There are assumptions made by the integrator about sequencing, about data and control flow, about the party or parties that are interested in certain events; in short, the “putting together” of the tools, even by a simple shell script, implies a form of process definition and the means for enaction of a process.
174
DAVID J. CARNEY AND ALAN W. BROWN
4.2 The Process Context of Integration Process definitions deal (among other things) with the “Who is sharing the information” and “When is it shared” questions posed above. Process definitions provide context for information sharing by specifying the actors and the sequences of activities relevant to a chosen paradigm of software development. Process definitions typically find expression in some diagramming technique. For this chapter, we shall use a modified form of the notation called IDEFO, although many others could be used equally effectively.’ In this notation, there are boxes (labeled with verbs) and arrows (labeled with nouns) connecting them. Using IDEFO, processes are defined through decomposition into progressively lower subprocesses. For example, Fig. 5 shows how an arbitrary process might be expressed and decomposed one level. The process at the top level is indicated by a single box with one input, one output, and one control (for instance, in the form of some requirements). The process decomposes at the next level into five subprocesses (here named 1-5). The input, output, and control from the top-level diagram are also shown in the decomposition; in addition, new (lower-level)inputs, outputs, and controls appear in the decomposition. The arrows labeled A and B are the data outputs of subprocesses 1 and 2 that become inputs to subprocesses 3, 4,and 5; and the arrows labeled X and Y are outputs from subprocesses 3 and 4 that form controls on 2, 4,and 5. They could assert control either in the sense of allowing other subprocesses to begin or as feedback loops for other subprocesses. They would
Some-control
To level Dehion
Some-wntrol
First-level Decomposition
(internal inputs, outputs, and controls)
\
FIG. 5. An example process definition in IDEFO.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
175
be things such as notifications of success or failure, or specifications for sets of actions to be carried out.
4.3 Qualities of Successful Processes Given any process definition, there are at least two qualities necessary for its success. The first is that it be flexible, i.e., the process will permit itself orderly change. A second quality is that the process will permit metrics to be gathered during its enactment. We consider both of these qualities to be required for the process aspect of an integrated environment.These qualities can be paraphrased more simply by refemng to the IDEFO notation: the presence of an arrow (of any sort) suggests that:
0
To evolve the process description, someone may wish to reposition the arrow somewhereelse on the diagram (i.e., to represent a modified process); To monitor the process, someone may want to make some measurement while an arrow is being traversed (i.e., during process enactment).
Now, when an organization has defined its engineering processes, it creates process diagrams such as the IDEFO example above and then seeks means to enact those processes, usually through the support of automated software tools. T h i s is where the problem begins, because the definition of subprocesses that are significant to any given process definition may or may not bear a relationship to the workings of tools that could support them. Specifically, the tools’ interfaces may or may not be consistent with the inputs, controls, and outputs defined in the process description. For instance, in the above example, tools might exist that have a good fit with steps 1.4, and 5 : their functional operations correspond to whatever the functional descriptions of the three steps, and their inputs and outputs also correspond to the inputs and outputs named in the process description. But it is conceivable that the only available automated support for steps 2 and 3 is a product that combines these two process steps. In this case, one or more of the following may be true: 0
0
0
It is possible that the tool will not permit arbitrary metrics gathering at the connection between 2 and 3 (the feedback loop from 3), or over the independent output from 2 (which serves as input to 4). The output from 2 is input to 4; the output from 3 is control over 4. The tool may or may not permit this distinction in its output (i.e., the granularity of output may not be fine enough to distinguish the data from the control factors in the succeeding process steps. If the process description evolves, for instance, so that both 3 and 4 provide a controlling feedback loop to 2, it is unlikely that this will be implementable using that product.
176
DAVID J. CARNEY AND ALAN W. BROWN
In any of these cases, the end result is that automated support-in the form of an integrated toolset-is either weakened or impossible using this collection of tools. The process will be enactable only by such means as manual file copying or even redundant data entry. This introduces a probable loss of process monitoring and traceability and a corresponding rise in the risk of data error.
4.4 A Practical Example The point of the above example is to demonstrate in an abstract sense how process constraints may affect tool integration. Here we transform this example into a real one that might be faced by an actual software-producingorganization. Let us imagine a project whose process description includes creating source code with an editor, compiling the source, running the (successfully compiled) source file through a standards checker, and then entering the final result into the configuration management system. (We omit a debugging phase for simplification.) The standards checker is one that flags poor coding practices such as one-character variable names, excessive function length, presence of gotos, and similar practices deemed unacceptable by the organization. The IDEFO description of this process is shown in Fig. 6.9 It is possible to conjecture that at some time, a different project manager might insist on a slightly different sequence of these process steps, namely that all code must be run through the standards checker before compilation. This revised process description is shown in Fig. 7. We observe that in both cases, the project manager wishes to keep a detailed account of all activities: track the time that a programmer spends in an editor,
I
1
\
I
ZjZi code
module
FIG.6. An illustrative example using IDEFO.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
\
177
\
organization's
standards ASCII file
t
fail
language rules
1
module
FIG. 7. An updated example using IDEFO.
record the number of times the compiler is invoked, and so forth. Company policy also dictates that use of the standards checker (in either version of the process) is mandatory and that the programmer should not have the option of sidestepping it. Many tools and products exist to enact either of these processes, though with differing degrees of integration and automated support. A typical (and nonintegrated) way to enact either might be to acquire four separate tools [e.g., the emacs editor, a C compiler, a homegrown standards checker, and Unix's source code control system (SCCS)] and to use each independently; this is probably done in a great number of organizations today. While there are drawbacks (e.g., metrics on use of the tools must be recorded manually), there are also benefits (e.g., there are no problems in moving from the original process definition to the updated one in Figure 7). But this type of solution is a prime candidate for integration. For instance, the way that the programmer must use the editor and compiler in standalone mode is very painful: invoke the editor, then invoke the compiler, save a separate file of compile-timeerrors, step through the error file while editing the source file (e.g., in two windows), etc. This toolset could be minimally integrated in a number of ways. For instance, through use of a messaging system the compiler and the standards checker could be encapsulated so that the compiler notified the standards checker on successful compilations; a script could provide the same functionality. Reversing the order of those two tools would be trivially easy. It would be equally easy to add metrics-gathering code to either mechanism, thus giving us the two required
178
DAVID J. CARNEY AND ALAN W. BROWN
capabilities for successful process integration (i-e.. the process can evolve and it can be monitored). However, the inconvenience of the editkompile cycle noted above is still present, and might not be as easily addressed by a script or message approach. To answer this need, some more sophisticated products offer a “compilation environment,’* typically including an editor integrated with a compiler, and possibly such other tools as a debugger, static analyzer, etc. In these products, when the compiler encounters an error, the editor is automatically invoked with the cursor positioned on the failing line of source. Products of this type are common both in the personal computer (PC)world (e.g.. Borland’sTurbo compilers) as well as the workstation world (e.g., HP’s SoftBench toolset). For the original process description, products such as these appear to offer excellent integrated support. There is no question that the “preintegrated” editor/ compiler products save time and effort, as well as probably helping to produce a sounder result. Integrating such a product with any standards checker and any configuration management tool could then be accomplished by shell scripts, encapsulation, etc. However, the project manager’s goal of obtaining metrics on compiler invocation or duration of editing sessions may or may not be attainable. For instance, the product may be structured so as to make counting compiler invocations impossible. The automated reinvocation of the editor may not permit accurate tracking of time-related metrics. If these (or other such examples) are the case, then “preintegrated” products such as these will provide automated support for the functional steps of the process only; their integration does not provide (and may not permit) automated support for the monitoring or tracking of the process. When the process definition evolves, however (i.e., the standards checker is to be invoked before compilation), the situation may be even worse, since the process now demands that the product used will permit arbitrary tool invocation from the editor, rather than the preordained sequence of “edit-compile.” Unless such a capability exists, the process cannot be supported at all by the “preintegrated” type of product-at least, not in an automated manner. The user must, on completing the editing phase, resort to: opening a second window on the screen, saving the source file, invoking the standards checker (in the second window), and so forth.” As in the first process definition, monitoring the process and collecting metrics becomes more difficult. Further, the company policy of mandating the use of the standards checker is even less likely to succeed, since the use either of scripts or encapsulations to integrate the standards checker is very likely to be impossible. As an even simpler example, a user of one of these “editodcompiler” products might wish to substitute a different editor for the one packaged with the product. This will again depend on the individual product: Hp’s Softbench does permit
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
179
the substitution of another editor, as long as the user provides an encapsulation. However, as far as the authors can determine, it is not possible to decouple the text editor from the Borland Turbo products in order to substitute another editor of choice. This difference does not in any way imply that one product is superior to the other, but only that two products that exhibit similar functional behavior can offer quite different types of integration potential for a third-party integrator.
4.5 Pragmatic Solutions It is tempting to answer problems of this type thus: “Select a different product, one that does provide the appropriate flexibility.” But this is very often not an option; the product may be mandated, or there may be no existing product that quite does the desired job. In reality, what typically occurs in such a situation is one of the following: 0
0 0 0
The process is enacted without using automated support, by manual means as described above. The process is rewritten to accommodate the product. The organization writes its own automated tool that conforms to the defined process. The product’s vendor is persuaded to incorporate the needed functionality.
While all of these pragmatic choices solve the problem at hand, they are all intuitively unattractive. The root of the problem is that there is currently no correspondencebetween the interconnectionsexposed in a process decomposition and the interfaces presented by the available CASE tools and products. This results either in doing the task manually, in corrupting the defined process, in expanding the vendor’s product to support the desired process step, or in abandoning entirely the notion of using commercially available components for an integrated environment. Of these unattractive solutions, the first and second, or some permutation of both, is the most practical one, and is most often done in actuality. The third solution (creating a “homegrown” tool with appropriate flexibility) depends on the precise nature of the process steps in question and the cost of creating a tool for that functionality. The fourth solution (where a vendor’s product is revised to encompass greater functionality) is a frequent occurrence, as it is attractive to vendors to increasetheir tools’ potential market. It also has the greatest potential for harm, an issue we consider in the next section.
5. The Conditions Necessary for Integration We have adapted our three-level model of an environment to isolate different aspects of the integration problem. One aspect concerns the broad range of
180
DAVID J. CARNEY AND ALAN W. BROWN
possible semantic understandings of integration. A second is concerned with the mechanisms whereby components share data or control. The third concerns the importance of process, its constraints, and the context that it provides for integration. Having examined these aspects individually in detail, we now examine how they interact. This interaction is seen in two domains, namely, that of the locality of interfaces and that of standardization. We consider that in these two domains can be found necessary conditionsthat must exist if composition of truly integrated environments is to become a reality.
5.1 The Proper Locality of Software Interfaces We can generalize that interfaces are the boundaries of a component; they are the entry points to its functionality and the exit points of its output.” Interfaces thereby provide the fundamental means through which components can connect with each other, i.e., be in some way integrated. Therefore, assuming some set of components needing to be integrated into an environment, the locality of the interfaces that each displays to the outside world-i.e., the functional “places” where a component’s actions can be invoked or its data read-is paramount. We again interpret this problem in the context of our three-level model: for any candidate component, it is necessary that the mechanisms of its interfaces (e.g., invocable functions), their semantics (e.g., the meaning and format of required input data), and the process constraints that govern those interfaces (e.g., whether the component’s functionality accepts some external control at a certain point) determine whether or not the integration of this component with other components is useful or even possible. While this appears to be a reasonable scenario. it is also the case that unless the candidate tools to be integrated are all written with mutual knowledge of each other’s particular interfaces, integration will be difficult or impossible. There must be some preexisting and commonly shared understanding of those boundaries, namely, some reasonable understanding in advance of where a tool can be expected to provide external interfaces, and of what operations are typical at those interfaces for a tool or class of tools. This is certainly not the case today. On the contrary, there exists no generally accepted notion of the appropriate “boundaries” of a tool, of where one tool should “end,” or at least offer an external interface. Products tend to offer a multiplicity of services and to combine functionality that represents, at least conceptually, several “tools.” For instance, a compiler is arguably a tool that provides a single service. But, as we described in the examples in the previous section, many compiler products now also include editors and debuggers as well. Such tailoring of functionality within a single product is clearly beneficial-the components of the product are well integrated and often extremely useful.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
181
But this tailoring is not without cost. First, the word “tool” loses its precision: what we really have are products that provide services spanning a set of tools. (This in itself is not a great loss, since the software community has never been able to agree on a definition of what a “tool” is anyway.) Second, the welldefined interface to each functional component of the product tends to be lost; public access to the various “internal” services is often obscured, and sometimes hidden entirely. From a technical perspective, products of this sort tend to be ad hoc collections of functionality, and the task of integrating them with other products, if possible at all, will be equally ad hoc. Consider the example from the previous section when the process definition evolves and a user wishes to modify the integrated toolset to support the revised process. Whether this is possible will depend on the product used to implement the process scenario: some will permit it, but others disallow it. Earlier, we noted that one possibility to solve the problem can be seen when a tool vendor extends his tool to provide the extra capability needed by the user. While this is useful in the short term, we argue that while this solves the problem at hand, it may actually cause greater problems than it solves. This is because the problem is expressed incorrectly: the evolutionof the user’s process definition implies not the need for greater functionality, but rather the need for access to a different functional interface. When the tool vendor expands his tool to accommodate that greater functionality, the situation is actually made worse, since the expanded tool is usually still a black box, and one more functional interface has been obscured. The solution we argue for is not new; it was considered over a decade ago, as part of the early work on the STARS program:
This approach is somewhat dependent on breaking the large tools that we typically think in terms of today, such as compilers, into tool piece parts. If this is not done, then the wide variability of requirements for specific environments will result in few of the tools being selected and the need to build a large number of new tools. . . . For many traditional tools, the piece parts into which they should be broken are fairly obviousjust from our general experiences. For example, breaking a compiler into a lexical analyzer, a parser, a code generator, and an optimizer is an obvious division that provides piece parts of general utility. This general knowledge that we have from working with traditional tools can be effectively used to break up a large variety of differenttypes of tools into reusable piece parts. (DeMillo etal., 1984) We do not go quite this far: it is not necessary actually to break the tools (products) apart. But it is necessary that their functionality be modular, that this modularity be based on some community-wide agreement on functional boundaries, and that the interfaces to the modular sections be made public. If we return to Figure 5 where some needed tool spanned two process steps in a “blackbox” manner (Lea,of the functions we called 2 and 3). this would
182
DAVID J. CARNEY AND ALAN W. BROWN
imply decoupling the tool’s functionality enough to make the interface between the two modules public, for instance, as a specified invocation with arguments, options, etc. The aggregate product could be sold in much the same way it presently is, but with the specification of the interface now documented. The entire product could work in the same way; but it now permits a tool integrator to make the sort of process-directed changes we noted above, i.e., to gather whatever metrics required at the interface between 2 and 3, or to substitute some other tool that does the work of 3, providing it makes use of the publicly specified interface. The tool integrator now bears the responsibility of creating a new integrating mechanism (the typical quick solution is a shell script; other more complex solutionsare easy to imagine); but the integrator is aware of the semantics that govern the shared data or control, and how has the flexibility to support this evolved process, since the interface has been properly located and made public. In summary, regardless of the size of products and tools, they should be constructed from components that are ideally very small (by today’s “tool” standards). The boundaries of functionality should be well-defined in the same sense that hardware components are well defined. And finally, these boundaries that provide the interface to the component’s functionality must be public interfaces. The question of their standardization is considered in the next section.
5.2 The Search for Standards As noted above, interface standards are seen by many as the principal key to solving the problem of CASE tool integration. And it is certainly the case that interface standards are a necessary condition for this goal. Proof of the benefit of standards can be seen in the widespread use of the various communicationand network protocols that are presently providing near-seamless access to worldwide networks; it is also offered by such humble format standards as the American Standard Code for Information Interchange (ASCII). But there are severe shortcomings in the attempts to create standards for software engineering (in fact, standards for any engineering discipline). One cause of this is the notion that these standards can be created in advance of a user community. This stems from a belief that the pedigree of a standard is the first item of significance: in this view, standards are formalized specifications produced by formal standards-making organizations such as IEEE or ANSI. Taking this view, pedigree rather than ubiquity is of greatest significance. Because of this belief, considerable time, expense, and effort has been spent in producing ANSI or IEEE standards for a variety of needs: language, repository interface, and data format. Examples of this include Ada, the Common Ada Programming Support Environment Interface Set (CAN), CAIS-A, PCTE 1.5, ECMA-PCTE, ANSI-Information Resource Dictionary System (IRDS), International StandardizationOrganization(IS0)-IRDS, and the CASE Data Interchange
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
183
Format (CDIF). Each of these standards shares certain traits. It was developed in advance of a using community.It followed the sequenceof “develop standardimplement standard-find a user for the standard.” And none of these standards is in widespread commercial use, even after a decade of existence in some cases. The failure here is not standardizationper se, but standardizationat the wrong time. The better solution, and the real need in the community, is to have broad, public agreements. The formal pedigree of these agreements, while not unimportant, should be the lust item of concern, not the first. As an example of this, consider the current standardizationefforts carried out by the Posix.1 and Posix.2 working groups. These efforts have no less a pedigree than the others already mentioned. But they have been carried out in the context of a very large community that has been using Unix for many years. For that reason, examples of broad and public agreements about Unix are ubiquitous, and areJinally-not initiallybeing formalized by the Posix working groups. The key point is that they have been broad and public agreements, tried by use for many years. It is very unlikely that the Posix.1 and Posix.2 standards, which have a clear connectionto “historical” Unix, will suffer the same fate as CAIS or IRDS. Discussion concerning the relative merits of early standardization versus late standardizationcan be long and vehement. However, it is interesting to note that recent years have seen a much greater interest in the environmentsarea in industryled consortia defining common interface definitionsthat the consortium members will agree to use [e.g., Common Open Systems Environment (COSE), CORBA, CASE Communique]. These are typically based on market-driven needs to provide greater interoperability and consistency between existing vendor products. The consortium may submit the agreed interface definitions to a formal standards body once its work is complete, but this is something of an afterthought, not its primary means of operation.
5.3 The Interaction of Interfaces and Standards The locality of interfaces and the issue of standardization are distinct but interrelated,and we considerthat their interactionis central to achievingintegrated environments. This interaction can be expressed as a requirement: the task of integratingheterogenouscomponentsrequires not the software interface standards available today, but rather standardized sofrware interjiaes at appropriately low levels offunctional modularity. This statement means that the tool interfaces need to be exposed at a lower level of granularity than is common today. It means that the community of tool users and integrators must insist that process constraints determine the locality of those interfaces, rather than letting the tool vendors’ decisionsdeterminethe end-users’ processes. There is the need for these interfaces to become common and ubiquitous. And finally, there is eventually the
184
DAVID J. CARNEY AND ALAN W. BROWN
need for the conventions and protocols that arise surrounding these interfaces to become formalized by recognized standards bodies. In this view, the focus should not be on the pedigree of the standardization process (though a good pedigree is obviously desirable), but rather on the ubiquity and currency of the thing to be standardized. The focus should not be on obscuring the interfaces through bigger and bigger “blackbox” products, but rather on exposing lower-level interfaces and partitioning functional behavior of products accordingly. In a sense, this is nothing more than the software equivalent to the general agreement that even in large machines, individual screws are replaceable; or that the screws will not contain a hidden electrical fuse necessary for the machine’s operation. Yet agreements such as these represent the most difficult (and most necessary) step if CASE tool integration is to become a reality. In this view, it will then not matter whether products offer single services (in the old sense of a “tool”) or provide the user with entire suites of services crossing whole functional domains. The key point is that these services are appropriately modularized, and are accessible through externally visible interfaces. Tool vendors will then succeed or fail on the qualities that their products exhibit (efficiency, dependability) and on such marketing factors as customer service and cost. In the domain of software engineering environments, we can see that much work must be done if integrated environments are to attain a comparable degree of success. The need is for end-users, vendors, and integrators of environment products jointly to begin the work of: 0
0 0
Working cooperatively to understand more about environment architectures, the appropriatepartitioning of functionalityin an environment,the important interfaces that exist between those partitions. Developing agreements on the appropriate interfaces that have the effect of increasing interoperability of vendor products. Producing tools that provide greater access to their internal structure and behavior to increase the integration possibilities of those tools.
To achieve this there must be action on behalf of each of the communities involved. End-users of environments must start demanding integrability as a first-degree requirement of a commercial tool. This must be based on a deeper understanding and expression of integration characteristics, and founded on an experience base of data that can be used to illustrate the value of different integration approaches. Vendors have to respond by opening up their internal interfaces to provide the integrability that is demanded. Standards bodies must focus on consensus building by standardizing the best of current practice, particularly at the lower levels of the tools.
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
185
6. Toward Engineered Environments Much of the work carried out by the environments community is focused on how an environment can be designed and assembled with its integration characteristics in mind. In particular, the aim has been to develop an approach that considers integration as a key factor that influences many of the design decisions that take place while assembling an environment. If this view is adopted, then the most important factors in constructingthe environmentare considerations of how integration is embodied within the environment, decisions concerning what tradeoffs need to be made in its assembly, recording the reasons for the design decisions that are made, and measuring the results of the design activities when compared to the stated needs for the environment. The basis of this view is a separation of the mechanistic issues (how the interrelationships among components are implemented), the semanticissues (what services are provided, and how those services are related) and a description of the context-a description of the software development processes being supported-in which the environment will operate. With this view of integration as a design activity in mind, and having the conceptual framework provided by the three-level model, it is possible to sketch the outline of a method for engineering an environment. The first step in this approach is to analyze and record the processes that are to be supported and the constraints on the organization in carrying out these processes. For example, an organization may have a suitable approach to designing, coding, and testing individual softwaremodules that has been institutionalized throughout the organization. This approach must be documented to produce a process model of the description of the practices as they currently take place. Connections between this process and related practices (e.g., the methods for system building and system integration testing) must then be defined. Given this context in which to work, the next step involves the description of required environment services that will be used in support of these processes. At an abstract level, the environment services can be discussed and the interrelationships defined. For example, design, coding, and software module testing services may be required from the environment. Based on the processes being enacted by an organization, the ways in which these services must interact to support those processes can be defined. The next step is to choose environment componentsthat can realize the services required and that provide the interfaces that the process description demands. CASE tools would thus be acquired not simply based on their functional description, but also with a detailed knowledge of the processes that need to be supported and the necessary connections that are required with other CASE tools. For example, instead of the imprecise statement, “I need a testing tool,” an organiza-
186
DAVID J. CARNEY AND ALAN W. BROWN
tion is now able to express its needs in more detail: “I need a testing tool that will support these activities, and that will allow these services it provides to interface with these other services from my design and coding tools.” Integration technology such as databases and message passing systems may be required to facilitate the construction of the environment. Again, these can be selected and evaluated with a much clearer understanding of the requirements that they must satisfy. While these steps are described in sequential fashion and in a simple manner, in actual practice there will be significant feedback and iteration between the steps. For example, no matter how much technology is at hand, it may still be necessary to change the processes in order to accommodate the restrictions of the available tools. However, the important point to note is that such changes can now take place within a context that has documented why such changes are necessary, provides a basis on which such decisions can be rationalized, and maintains a history of why the environment architecture has evolved to be how it is. In most current environments,the conceptual framework necessary to facilitate these actions does not exist. We have made use of this approach to the engineering of an environment in a number of detailed case studies. Our results show that this design approach to integration does allow the integration characteristics of an environment to be better understood, and the resultant environment to be more predictable in its support for those characteristics (Brown et al., 1994). More in-depth studies are planned, and the results of these studies will be published as they become available.
7. Summary and Conclusions Understanding integration in an environment is important for all those concerned with the development and maintenance of large-scale software applications. The way in which the environment used to develop and support those applications is engineered can have a significant effect on the applications throughout their lifetime. Previous attempts at developing conceptual models for understanding integration aspects of an environment concentrated on static approaches to analyzing the properties and relationships of existing systems. While much can be learned from these models, what is needed is not only the ability to analyze certain integration characteristics of an existing environment, but also the ability to construct an environment to suit a variety of integration needs and to be able to evolve that environment in a controlled way as those needs change. The three-level model described in this chapter separates three different integration concernsservices, mechanisms, and process-and allows each of these to be examined
COMPOSITION OF INTEGRATED SORWARE ENVIRONMENTS
187
in isolation.It also allowsrelationshipsamong these threeconcernstobe discussed and analyzed. The main concern of this chapterhas been the way in which current environment components partition the functionality of an environment, the interfaces that those components make available to others, and the implications of this for the engineering of an environment. A primary theme of the chapter is that standardizationof appropriate interfaces can have a major influence on the engineering of open, integrated environments in the future. There are many indications that many of the ideas discussed in this paper will be adopted in the future. Three in particular are worthy of note. First, the major computer and tool manufacturers are working in a number of consortia to define standards for tool interoperability that build on the strengths of their existing products and overcome many of the weaknesses. The work of the Object Management Group (OMG) (1992) and the Open SoftwareFoundation (OSF) is perhaps foremost in this effort. Second,environment builders seem to be recognizingthat the needs of environment users must be the major driving force in the work that they carry out. For example, the European Aerospace Systems Engineering Environment (EASEE) initiative is an attempt to collect together environmentusers in a particular domain to elicit their requirements, and to understand more clearly what needs they have for supporting the applications they develop. Third, interest in gaining a greater understanding and appreciation for software architectures has spawned a number of initiatives. Programs focused on understanding more about domain-specific software architectures, and the development of languages and notations for defining software architectures are two examples of the work in this area. A heightened awareness and understanding of software architecture issues in general can have very important positive effects for the engineering of software engineering environments in particular. ACKNOWLEWMENTS We would like to thank the following people for their comments on previous drafts of this chapter. Tricia Obemdorf, Kurt Wallnau. and Marv Zelkowitz. The Software Engineering Institute is sponsored by the U.S. Department of Defense.
ENDNO~ZS
’
Some recent approaches to government-sponsored environments include both the SoftwareTechnology for Adaptable, Reliable Systems (STARS) Program (1992) and the Integrated-CASE (I-CASE) program (Bragg, 1994). Since these programs are still currently in progress, it is premahrre to assess their degree of success. * The STARS Program is notable in this respect. Although it began as a program whose goal was to develop a group of environments,STARS is now largely focused on the centrality of process and
188
DAVID J. CARNEY AND ALAN W. BROWN
reuse; the environmental component of STARS is now captured by the notion of “technology suppo~Vfor reuse and process mechanisms (STARS Program Office, 1992). A full definition of the term “service,” as well as many other terms relevant to environments, is found in SEI and NIST (1993). Note that this figure illustrates examples of the elements at each level, and is not intended to be exhaustive. The question of the term “tool” and its lack of precision will be revisited below in Section 5 . We note that in actual practice, existing integration technology and implementation architectures do not match this idealized model of integration. CASE tools typically embed process constraints that limit the ways that they can be used, rather than permitting the tools to be combined in multiple ways supporting many processes. and the services supported by a CASE tool are often locked into that tool’s particular implementation techniques. This approach was previously been developed by Brown and McDermid (1992).They distinguished five levels at which tool integration takes place within an environment. This approach was subsequently refined in the work of the U.S. Navy’s Next Generation Computer Resources program (Naval Air Warfare Center, 1993). We reinterpret these levels and use them as the basis for a categorization of data-sharing agreements. Where IDEFO is itself a modified form of Structured Analysis and Design Technique (SADT) (Marca, 1988). Note that this IDEFO example is intentionally simplified. It omits such items as compilation errors that provide a data feedback from the compilation step to the editor, etc. lo Thus, following the same type of standalone scenario that used to be typical when a progknmer used the Unix *‘spell” facility, as opposed to the current use of automated spell checkers in tools like FrameMaker or InterLeaf. The IEEE Standard 610.12-1190,IEEE Standard Glossary of Sofhvare Engineering Technology definition is: “A shared boundary across which information is passed; (2) A hardware or software ~ ~ or more other components for the purpose of passing information from component that C O M ~ Ctwo one to the other; (3)To connect two or more components for the purpose of passing information from one to the other: To serve as a connecting component as in (2):’
’ ’
’
’
REFERENCES
Application Portability Profile (APP) (1993). “The US. Government’s Open System Environment Profile O W 1 Version 2.0,May 1993,” NIST Spec.Publ. 500-xxx. NIST, Washington, DC. Bourguignon, J. P.(1989). The EAST Eureka Project: European software advanced technology. In “Software EngineeringEnvironments:Research and Practice, Durham, U.K., April 11-14,1989,” pp. 5-16. Ellis Horwood, Chichester, UK. Bragg. T. W. (1994). A casefor defense: The defense department’s 10-year integrated CASEproject. Am. Programmer 7(7), 16-23. Brown, A. W.. ed. (1991).“Integrated Project Support Environments:The Aspect Roject.”Academic Press, London. Brown, A. W., and McDermid, J. A. (1992). Learning from IPSE’s mistakes. IEEE Software 9(2), 23-28. Brown. A. W., and Carney, D. J. (1993). Towards a disciplined approach to the construction and analysis of software engineering environments. In “Proceedings of SEE ’93.” IEEE Computer Society Press, Los Alamitos, CA. Brown, A. W., Moms,E. J., Zarrella, P. F., Long, F. W., and Caldwell, W. M. (1993). Experiences with a federated environment testbed. In “Proceedings of rhe 4th European Software Engineering Conference, Garmisch-Partenkirchen,Germany. 1993.”
COMPOSITION OF INTEGRATED SOFTWARE ENVIRONMENTS
189
Brown, A. W., Carney, D. J., Moms, E. J., Smith, D. B., and Zarrella, P. F. (1994). “Principles of CASE Tool Integration.” Oxford University Press, New York. Cagan, M. R. (1990). The H p softbench environment: An architecture for a new generation of software tools. Hewlett-Packard J. 41(3). Dart, S. (1990). “Spectrum of Functionality in Configuration Management Systems,” Tech. Rep., CMU/SEI-WTR- 1 1. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA. DeMillo R. A. et al. (1984). “Software Engineering Environments for Mission Critical Applications-STARS Alternative Programmatic Approaches.” IDA Pap. P-1789. Institute for Defense Analyses, Alexandria, VA. ECMA & NIST (1993). “A Reference Model for Frameworksof SoftwareEngineeringEnvironments (Version 3).” ECMA Rep. N. W 5 5 Version 3, NIST Rep. N. SP 500-211. NIST, Washington,DC. Feiler, P. (1991). “Configuration Management Models in Commercial Environments,” Tech. Rep., CMU/SEI-91-TR-7. Software Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA. International Standards Organization (199 1). “Information Technology-Software Product Evaluation-Quality Characteristics and Guidelines for their Use,” ISO-9126. ISO, Washington, DC. Jolley, T. M., and Gockel, L. J. (1988). Two year report on BASE method and tool deployment. In (S. hbylinski and P. J. Fowler, 4s.). “Transferring Software Engineering Tool Technology.” IEEE Computer Society Press, New York. Long, F. W., and Moms, E. J. (1993). “An Overview of PCTE: A Basis for a Portable Common Tool Environment,” Tech. Rep. CMUEEI-93-TR-1, ADA265202.Software Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA. Marca, D. (1988). SADT Structured Analysis and Design Technique. McGraw-Hill, New York. Moms, E., Smith, D., Martin, D., and Feiler, P. (1992). “Lessons Learned from Previous Environment Efforts,” Spec. Rep. Software Engineering Institute, Camegie-Mellon University, Pittsburgh, PA. National Bureau of Standards(1988). “A TechnicalOverview of the InformationResource Dictionary System, 2nd ed., NBSIR 85-3164. NBS, Gaithersburg, MD. Naval Air Warfare Center (1993). “A Report on the Progress of NGCR PSESWG. Draft Version 1.0,’’ U.S. Navy NGCR Program. Naval Air Warfare Center, Aircraft Division, Warminster, PA. Object Management Group (1992). “The Common Object Request Broker: Architecture and Specification.” OMG, Framingham, MA. SEI & NIST (1993). “A Reference Model for Project Support Environment Standards. Version 2.0,” Tech. Rep. CMU/SEI-TR-93-23, NIST Rep. N. SP 500-213. NIST. Washington, DC. STARS Program Office (1992). “Proceedings of STARS’92: “On the Road to Megaprogramming.” STARS Program Office, Washington, DC. Strelich, T. (1988). The Software Life-Cycle Support Environment (SLCSE): A computer-based framework for developing software systems. In “Proceedings of ACM SIGSOFT/SIGPLAN Sofirware Engineering Symposium on Practical Sofhvare Engineering Environments, Boston. ” Thomas,I., and Nejmeh. B. (1992). Definitionsof tool integration for environments. IEEE Sofnvare 9(3), 29-35. Wasserman, A. (1990). Tool integration in software engineering environments. In “ S o h a r e Engineering Environments” (F. Long, ed.), Lect. Notes Comput. Sci., No. 467, pp 138-150. SpnngerVerlag, Berlin. Wakeman, L., and Jowett, J. (1993). “PCTE:The Standard for Open Repositories.” Prentice-Hall, Heme1 Hempsted, UK.
This Page Intentionally Left Blank
Software Quality, Software Process, and Software Testing DICK HAMLET Department of Computer Science Center for Softqvare Quality Research Portland State University Portland, Oregon
Abstract Softwaretesting should play a major role in the definition and improvementof software development processes, because testing is the most precise, most easily measured, and most easily controlled part of the software lifecycle. However, unless testing goals are clearly related to true measurements of software quality, the raring may appear to improve, but the sofnyare will not. Much of current testing theory and practice is built on wishful thinking. In this chapter, the state of the testing art, and the theory describing it, is critically examined. It is suggested that only a probabilistic theory, similar to reliability theory, but without its deficiencies, can describe the relationship between test measurements and product quality. The beginnings of a new theory of “dependability” are sketched. Keywords: software developmentprocess, assessing software quality, coverage testing, finding failures, reliability, dependability, trustworthiness
1. Introduction . . . . . . . . . . . . . . . . . 1.1 Software Quality. . . . . . . . . . . . . 1.2 Software Process. . . . . . . . . . . . . 1.3 Formal Methods and Verification. . . . . . 1.4 Software Testing. . . . . . . . . . . . . 2. Testing Background andTednology . . . . . . 2.1 The Oracle Problem . . . . . . . . . . . 2.2 Unit Testing versus Systems Test. . . . . . 3. Testing to Detect Failures . . . . . . . . . . . 3.1 Functional Testing . . . . . . . . . . . . 3.2 Structural Testing . . . . . . . . . . . . 3.3 What Should Be Covered? . . . . . . . . 3.4 Testing for Failure in the Software Process . 4. Testing for Reliability . . . . . . . . . . . . . 4.1 Analogy to Physical Systems . . . . . . . 4.2 Random Testing . . . . . . . . . . . .
. . . . . . . . . .
. . . . . ..
. . . . . . . . . .
. . . . . .
.............. .............. .............. .............. .............. .............. .............. .............. .............. ..............
. . . . . .
. . . . . .
191 ADVANCES IN COMPUTERS. VOL. 41
AU
. . . . . .
. . . . . .
. . . . . .
. . . . . .
........ ........ ........ ........ ........ ........
192 194 195 195 191 198 200 201 201 203 204 209 211 211 211 213
Copyright Q 1995 by Academic Ras. k. reproduction in my form mared.
ribta of
DICK HAMLET 4.3 Software Reliability Theory 4.4 Profile Independence . . .
........................ ........................
5. Comparing Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Comparison Using the Subsumes Relation. . . . . . . . . . . . . . . . 5.2 Comparison for Reliability Rediction . . . . . . . . . . . . . . . . . . . . 6. Dependability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Reliability-Based Dependability . . . . . . . . . . . . . . . . . . . . . . 6.2 Testing for Probable Correctness . . . . . . . . . . . . . . . . . . . . . . 6.3 Testability Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Self-checking Programs. . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Defining Dependability . . . . . . . . . . . . . . . . . . . . . . . . . .
7. Conclusions References .
................................. .................................
..
214 216 217 218 219 220 22 1 221 222 223 224 225 227
1. Introduction Renewed interest in quality software in the 1990s stems from attention to the development process. In the United States, the Software Engineering Institute (SEI) has proposed a model of organizational structure capable of producing good software. In Europe, the International Standards Organization ISO-9OOO standards address similar concerns. Schemes like “total quality management” (TQM)extend to the larger organizationin which software development is embedded. Software development is often in a sorry state, certainly in need of attention to the procedures being used. However, the “process” movement inherits from software engineering an emphasis on subjective, management-oriented methods. The implication is that software development is a mysterious art, and only careful control of its practicioners can avert disaster. Software engineering suggested ways to decompose development into manageable (literally!) stages; “process” focuses on the procedural details of each stage and how to define, measure, and modify them. But a laudable interest in the development process per se has the unfortunate side effect of downplaying its technical aspects. Organization and systematic, carefully monitored procedures are usually a part of successful engineering, but a relatively minor part. The essence of engineering is scientific support for its methods. All the organization in the world will not save a process based on an erroneous understanding of reality; in fact, excellent organization can have the pernicious effect of shoring up methods that should be scrapped. The only real measure of quality software is a “product” measure. It is the software that counts, not its pedigree. Of course, attention to the process can be helpful in producing a good product, but only if there are solid, demonstrable connections between what is done and what results from doing it. To carry out procedures for their own sake-e.g., because they can be easily monitored and adjustedis to mistake form for substance.
SOFTWARE QUALITY, PROCESS, AND TESTING
193
Proponents of the process viewpoint argue that the link with product is established by feedback from failures. They say that a released software deficiency will be traced to its procedural source, and an improved process will not again allow that particular problem to occur. But this view ignores the complexity of the process. Consider the following example: An operating system crashes. Analysis of the failure reveals that two successive system calls are at fault: the first call established an internal system state in which the second call’s normal operation brought the system down. Further analysis shows that this combinationwas never imagined during development.It was not mentioned in the requirements, never specified, not considered in design or implementation, and although the two calls were successfully tested in isolation, the combination was not tested.
The example is not far-fetched. How can the process that led to the release of the faulty operating system be corrected? It is probably implicit in requirements for system calls that they may be used in any sequence, although not all sequences are useful. So there is nothing to correct in the requirementsphase. The specification for each call fails to mention the other, but there is nothing wrong with that if they are unrelated. Probably the fault is in design, where the strange internal state was invented, or in implementation, where there was too much freedom in representing the state. But it is not a process correction to say “don’t do that!”only hindsight identifies what not to do. It won’t do to ask the designers to systematicalIy consider system-call pairs; the next failure may be the result of a sequence of three calls, or hundreds of calls. We could ask for more extensive inspection of the design and implementation, but how much is enough to catch everything that was not imagined? Finally, what might have been done in testing? Functibnal testing would not expose the problem, and neither would simple coverage testing. Def-use testing might have found it, and so might mutation testing, but there is no guarantee, if the peculiar state is not created on most instances of the first call. Random testing has only a small chance of trying the failing combination. (These methods will be considered in detail below). More testing would improve the chances of detection, but again, how much is enough? The example was invented to ridicule the simplistic idea that product failures always produce meaningful feedback to the development process. But it also shows another important point. In every phase except testing, the possible corrections are subjective and unsystematic. People could be advised to “do better” but it is not clear just what that means. In the testing phase, the options were specific and technical. Some systematic methods are better than others in detecting the problem; using more tests should help; etc. Testing has less need of hindsight. The lesson that testing is a less subjective, more precise part of development than the other phases is not lost on “process” advocates, who may choose to begin work there. Caution is still in order: in the example, a particular kind of
194
DICK HAMLET
failure might be prevented by testing. But how many kinds of failures are there, and how frequent is each kind? In development no one has the luxury of a particular situation to analyze-all field failures lie in the future. Might it not be that corrective feedback is always one step behind, and the supply of unique, new problems nearly inexhaustible? When precise techniques are available, it increases the danger of confusing procedure with results. This chapter is about the scientific, technical basis for software testing. It assesses the state of the testing art in the context of improving quality through the development process. Research is beginning to ask the right questions, and there are promising new results. But indications are that the fundamental role of testing must change from defect-discovery to assurance of quality.
1.1 Software Quality Software quality is a many-faceted idea, and many attributes of “good” software are subjective. It must be “user-friendly” and “maintainable,” for example. From the developer’s point of view, perhaps its most important characteristic is rapid development-time to market is a critical factor in remaining in the software business. The fundamental measure of quality is absence of defects. Quality software does not fail. Failure can take the form of a “crash” after which the software cannot be used without some kind of drastic restart (and often a loss of information or invested time); failure can be wrong answers delivered with all the trappings of computer authority (and thus the more dangerous); and failure can be in the performance dimension-the software works, but too slowly to be useful. In this chapter we identify software quality with the absence of failure. However, we want an engineering measure, not the binary ideal of “correct”/“not correct.” Whether or not one believes that it is possible to create “zero-defect” software, to demonstrate correctness is impractical. Although proof methods might in principle demonstrate correctness, they have been unable to do so in practice. Part of the reason is that theorem provers are too inefficient and too hard to use; a deeper reason is that the formal specificationsrequired for verification are at least as difficult to create, and as error-prone, as programs. The alternative to formal verification is testing, but tests are only samples of software’s behavior, and the best we can hope for is that they establish some kind of statistical confidence in quality. There is no standard term for good software in the sense of “unlikely to fail.” “Reliable” has a related but different technical meaning’ in engineering (Section 4). Parnas has used the term “trustworthy” (Pamas et af., 1990), but he includes the severity of failure: trustworthy software is unlikely to have catastrophic failures. What is or is not a catastrophe is sometimes subjective, and may change over time. Furthermore, catastrophe is only easy to recognize after the fact. Here we will be concerned with prediction of unspecified future events, whose severity
SOFTWARE QUALITY, PROCESS, AND TESTING
195
is unknown. We will use the term “dependable” for the intuitive idea “unlikely to fail” (Section 6).
1.2 Software Process The emphasis in this chapter will be on software testing as the phase of development easiest to technically relate to software dependability. When there is an objective, quantitative relationship between product quality (here, dependability) and process activities (here, testing methods), the feedback model of process improvement works. When the quality is inadequate, the methods can be changed (for example by devoting more resources to testing) to improve it. It cannot be emphasized too strongly that the feedback cycle must extend all the way to actual measurements of quality in the field. It is all too easy to guess at what is causing a problem, change something in the development process and measure the change in its own terms, and conclude that the process is improved. For example, suppose that the operating-system failure above is one of many, and that the testing plan includes only functional and basic structural coverage testing. To guess that more testing is needed, and to (say) double the time spent on testing (perhaps doubling the number of test cases), will probably have no effect on these failures, because these methods do not address the problem. But if measurement stops with counting test cases, the process is “improved” while the product is not. There is no reason why any proposed development method or technique cannot be held to the same standard as testing methods. That is, one could demand that a quantitative relationship be demonstrated between any proposal and product quality. For example, the introduction of formal specifications methods is thought by some to be a good idea. This could be established by measuring dependability of software developed in this way. At first, the argument for any method has to be theoretical. It must be shown that the characteristics of the method made a dependability improvement plausible. In unusual circumstances,it may be possible to perform a real experiment to demonstrate the improvement. But in most cases the theoretical arguments will be the only ones available. Once a method is in place in the software process, however, experiments are more easily carried out. It is not difficult to measure software quality using field reports of failure. Any method can be improved by devoting more time to it, providing better training or tools for its users, etc. If the method improves, but quality (dependability)does not, it is a strong indication that the theoretical arguments supporting the method were wrong.
1.3
Formal Methods and Verification
Advocates of formal, mathematical methods in software development* argue persuasively that the root cause of poor software quality is a loss of intsllectual
196
DICK HAMLET
control. The remedy suggested is better mental tools such as precise notations and formal reasoning. Formal methods are proposed for two distinct roles in software development:
“Notation” role. Abstraction and formal notation are powerful aids to precise thinking. By expressing a software specification or design in a concise mathematical form, the human developer is led to be more precise and accurate, to think through issues and decisions that might otherwise have gone unnoticed. The Z specification language is currently a popular vehicle for the notation role. The formalism is viewed as an end in itself. Even if the specification were devised and then immediately discarded (this is not advocated, although it has happened where a contract mandated formal specification),the benefits of better understanding and clearer thinking about the software should accrue.
“Proof” role. Formal systems based on symbolic logic may be used to go beyond the notation role to include formal reasoning. Once software properties have been captured formally, the formal descriptions can be used to derive properties that software meeting the description necessarily will possess. The proof role corresponds to the classical purpose of mathematical formalism in the sciences, which is to model reality abstractly, so that abstract reasoning may derive its properties. The power of formalism in the proof role is that theorems proved about the formal abstraction may be difficult and surprising, expressing properties that were not suspected, and could not be established without abstraction. In establishing formal properties, a proof-oriented formalism often makes use of a mechanical theorem prover. The distinction between the notation and proof roles lies in the extent to which the mathematical notation is used to reason about software, and there is evidently a continuum of possibilities ranging from no proofs at all to an effort in which proofs are dominant. The formalism used may influence where on this continuum the developer can move; e.g., the Z language is not well adapted to proofs. The cleanroom methodology (Cobb and Mills, 1990) has been used mostly in the notation role, but its formalism is useful for intuitive (not mechanical) proofs. Although serious use of formalism is only beginning to be seen in practice, the notational role is the more common (Gerhart et al., 1993). The proof role has been limited to applications with security constraints, usually in military systems. The arguments advanced for formal methods today are given largely in process terms. It is claimed that these methods are obvious aids to human effort in an admittedly difficult task. But to evaluate them in product terms requires a different kind of argument. Granted that formalism allows people to do a better job, is it possible to quantify the improvement that will result, say in software dependability? Without this quantification, it is impossible to compare different formal
SOFTWARE QUALITY, PROCESS, AND TESTING
197
methods, or to compare use of a formal method with testing. It is not so much that arguments for product quality resulting from the use of formal methods cannot be given, as that they have not been attem~ted.~ The subjective nature of development phases other than testing makes it much harder to produce and to evaluate theoretical arguments for their efficacy. But it is wishful thinking to employ these methods and ignore their quantitative effects (if any).
1.4 Software Testing Because testing is a precise: quantitative activity, it must answer to more than the subjective criteria used to evaluate other phases of software development. However, its very precision is a danger. In testing we know what we are doing, and we can measure it. But we do not necessarily know-and we might not measure-what its efect will be. The common wisdom in testing (and it has become common wisdom largely because of the “process” movement) is usually credited to Glenford Myers (1979): The purpose of testing is to find failures.
Myers’s insight is certainly a profound improvement on the earlier idea that the purpose of testing is to show that software does not fail. It is all too easy for a slapdash test to succeed even on the worst software, and with the goal achieved, there is strong disincentive to work harder. Myers’s suggestion is in agreement with the testing goal for other engineered products, and certainly in line with the testing of computer hardware. However, the analogy between mass-produced physical objects (each slightly different) subjected to environmental stress, and software (each copy truly identical) that does not wear or get brittle, etc. is a poor one. In hardware testing we know a good deal about the possible failure modes of the device, and tests are designed to expose them. When the tests fail, the part is flawed and is discarded. When the tests succeed, we believe the part is up to spec (in manufacture-its design has not been considered) because we believe that the failure modes tested are an exhaustive list. In software the failure models are unlimited and unknown. When a test fails, we know something is wrong-that is the basis of Myers’s insight. But when all tests succeed, we know only that some failures are precluded, not exactly what these failures are, and nothing about other failure possibilities. Because no software testing method (or combination of methods) is effective at exposing all possible failures, a very unsatisfactory situation arises. Eventually, all the defects a testing method can find will have been found, so the method’s usefulness (for exposing failure) is at an end. But the “tested” software is still of unknown quality. Some defects have been removed; but what defects remain? The analogy to fishing a lake is apt: when you catch no fish, it doesn’t necessarily
198
DICK HAMLET
mean there are none in the lake. A program’s testing space is bigger than a lake, its defects are as elusive as fish, and testing methods are no better than fishermen’s schemes. The fishing analogy can be made quantitative. A deep lake 150 km in diameter (roughly the size of one of the Great Lakes) might contain 10” m3 of water, and trolling it for a day, a fisherman might probe about 10“ m3 (assuming the lure is attractive in a 0.2-m2 cross-section), or about fraction lo-’ of the space. A program with a pair of integer inputs running on a 32-bit machine has about 264possible input values, and testing it for a day at one test per second is about 3 X 104 tests, or fraction of the possibilities. It is as ideal to imagine that each instant of fishing time covers a different part of the lake, as to suppose that every test will be truly different. Just as fish to not always bite when the bait comes near, so bugs do not always reveal themselves when tests encounter faulty code. All in all, it seems that a fisherman predicting no fish in Lake Michigan after getting skunked is far more likely to be right, than is a software tester predicting no bugs in a trivial program that tested without failure. Practical testing methods (like the notions of fishing guides) are based on experience in uncovering software failures (or finding fish). But it is fallacious to believe that these methods have significance when they do not do what they were designed to do. (That is, when they find nothing, it is fallacious to conclude that there is nothing to find.) For assessing the quality of software based on successful tests, different methods are required. Everyone who has done largescale testing has an intuition that there is a connection between practical testing methods and software quality, but until that connection is supported by theoretical arguments and empirical studies, it remains a “fish story.” There are reasons to test with the intent of finding failures (Section 3). although obtaining dependable software is not currently one of them. There is even some reason to hope that a connection can be established between failure-finding and the quality of the resulting software. But for now, Myers’s advice deals with the process, not the product. Failure detection is certainly a measure of the effort put into testing, and if there is an independent control on defects-e.g., a rigorous inspection process-test failures can be a measure of how well that control is working. But since a reasonable development process corrects defects that are found, testing for failure applies not to the final released software product, but to intermediate, discarded versions that are not released. As such, this testing does not meet the standard for true process quality.
2. Testing Background and Terminology A test is a single value of program input, which enables a single execution of the program. A testset is a finite collection of tests. These definitions implicitly assume a simple programming context, which is not very realistic, but which
SOFTWARE QUALITY, PROCESS, AND TESTING
199
simplifies the discussion. This context is that of a “batch” program with a purefunction semantics: the program is given a single input, it computes a single result and terminates. The result on another input in no way depends on prior calculations. Thus a testset puts this program through its paces, and the order of the tests is immaterial. In reality, programs may have complex input tuples, and produce similar outputs. But we can imagine coding each of these into a single value, so that the simplification is not a transgression in principle. Interactive programs that accept input a bit at a time and respond to each bit, programs that read and write permanent data, and real-time programs, do not fit this simple model. However, it is possible to treat these more complex programs as if they used testsets, at the cost of some artificiality. For example, an interactive program can be thought of as having testsets whose members (single tests) are sequences of inputs. Because testing theory often has negative results to state, these results can be presented in the simple case, and can be expected to carry over to more complex cases. Each program has a specification that is an input-output relation. That is, the specification S is a set of ordered input-output pairs describing allowed behavior. A program P meets its specification for input x iff: if x E dom(S) then on input x, P produces output y such that (x, y) E S. A program meets its specification (everywhere) iff it meets it on all inputs. Note that where x 65 don@), i.e., when an input does not occur as any first element in the specification, the program may do anything it likes, including fail to terminate, yet still meet the specification. Thus S defines the required input domain as well as behavior on that domain. A program P with specification S fails on input x iff P does not meet S at x. A program fails, if it fails on any input. When a program fails, the situation, and loosely the input responsible, is called afailure. The opposite of fails is succeeds; the opposite of a failure is a success. Programmers and testers are much concerned with “bugs” (or “defects” or “errors”), The idea of “bug” in unlike the precise technical notion of “failure” because a bug intuitively is a piece of erroneous program code, while a failure is an unwanted execution result. The technical term for “bug” etc., is fault, intuitively the textual program element that is responsible for one or more failures. However, appealing and necessary this intuitive idea may be, it has proved extremely difficult to define precisely. The difficulty is that faults have no unique characterization. In practice, software fails for some testset, and is changed so that it succeeds on that testset. The assumption is made that the change does not introduce any new failures (an assumption false in general). The “fault” is then defined by the “fix,” and is characterized, e.g., “wrong expression in an assignment” by what was changed. But the change is by no means unique. Literally an infinity of other changes (including those that differ only by extrane-
200
DICK HAMLET
ous statements) would have produced exactly the same effect. So “the fault” is not a precise idea. Nevertheless, the terminology is useful and solidly entrenched, and we will use it to refer (imprecisely) to textual elements that lead to failures. These definitions of failure and fault correspond to an Institute of Electrical and Electronics Engineers (IEEE) glossary.6
2.1 The Oracle Problem Evidently the most important aspect of any testing situation is the determination of success or failure. But in practice, the process is decidedly error-prone. If a program fails to complete its calculation in an obvious way (e.g., it is aborted with a message from the run-time system), then it will likely be seen to have failed. But for elaborate output displays, specified only by an imprecisedescription in natural language (a very common real situation), a human being may well fail to notice a subtle failure. In one study, 40% of the test failures went unnoticed (Basili and Selby, 1987). To do better, the process of failure detection must be automated. An oracle for specification S is a binary predicate J such that J(x, y) holds iff: either x 4 dorn(8) or (x, y) E S. (That is, J is a natural extension of the characteristic function of S.) If there is an algorithm for computing J then the oracle is called efective. Thus, given a program and a test point, an oracle can be used to decide if the result the program gives is or is not correct (the program does or does not meet its specification at this point). For an effective oracle this decision can be automated.’ Testing theory, being concerned with the choice of tests and testing methods, usually ignores the oracle problem. It is typically assumed that an oracle exists, and the theoretician then glibly talks about success and failure, while in practice there is no oracle but imperfect human judgment. The theoretician justifies this lack of concern in two ways: (1) the problem is one of specification,not testingproper specifications should have effective oracles; and, (2) all testing methods suffer equally from the absence of an oracle. Without an oracle, testing cannot be done, but whatever is available will work for any method equally well (or equally badly). Point (1) is well taken, and specification research has accepted the responsibility in the strong sense that specification methods seek effective oracles that can be mechanically obtained for any example specification (Gannon et al., 1981; Antoy and Hamlet, 1992). But justification (2) contains a subtle flaw: it assumes all testing involves the same size testsets. If method X requires far more tests than method Y,then with an inefficient oracle, X may be intractable while Y is not. Of course, more tests always cost more in machine execution time, but people time is the more important in practice today, and today’s oracles are usually people. The size of testsets is an issue in random testing (Section 4.2) and in mutation testing (Sec. 3.2.3).
SOFTWARE QUALITY, PROCESS, AND TESTING
20 1
2.2 Unit Testing versus System Test The definitions above are framed to apply to complete software systems, programs that can be executed, that have input-output specifications. But testing a large piece of software is a formidable task the number of possible behaviors may be very large, and it may not be easy to select tasks. The tester intuitively feels that the problem has no practical solution, and that whatever tests are conducted, they will barely scratch the surface, and leave much untried. It is natural to want to decompose the testing problem into more manageable, understandable units. Since software itself is composed of units (subroutines, modules, etc.), it is natural to think of testing these. The practical difficulty with “unit testing’’ is precisely that small units are nor directly executable,and they may have specifications even more vague than the typical poor one for the whole program. In principle, the specification problem is also decomposed and simplified by considering modules. But in practice, designers may be very slipshod about unit specifications, relegating them to a comment in the code like, “update control block.” Of course, the code could not have been written without knowledge of the detailed format of this “control block,” and without a clear statement of what it means to “update” it. But this information is likely to be distributed across many design documents, and to reside partly in the heads of designers and coders. The tester may be at a complete loss for a module oracle, without which testing is impossible. Executing a module that expects its inputs from other modules is not a problem in principle. It is solved by writing a “testing harness”-a main program that accepts inputs, sends them to the unit to be tested, and keeps track of the results returned. Such harnesses can be automatically created from the unit’s source syntax, using compiler techniques. Missing modules needed by the unit under test are a more difficult problem. If those modules are written and linked in, then a small “unit” is no longer being tested, but rather a “subsystem” of some kind, and its properties do not always reflect properties of its parts. For example, a test may succeed despite a fault in one module, because it calls another compensating module. On the other hand, if the called modules are replaced by “stubs”dummy routines that do nothing themselves-the unit under test may not have the proper environment, and may itself do nothing significant. Whatever the practical problems of unit testing, the idea harbors a much more serious difficulty in principle: when all unit tests succeed, and the units are linked together, it does not follow that overall system tests will succeed, or that properties tested for units will still be assured. This failure of unit testing to “compose” will be further discussed in Section 3.3.
3. Testing to Detect Failures If Myers’s advice to test for failure was needed by some segment of the testing community, it certainly was not news to those who devise systematic testing
202
DICK HAMLET
methods. These methods look for failures, and the mindset of their inventors is accurately described by the rhetorical question: “How can a tester be prevented from thinking everything is OK, when it is not?” The idea behind systematic testing is coverage. A deficient test, perhaps conducted by someone who really did not want to discover any problems with the software being tested, might succeed but only because it doesn’t probe very deeply. Coverage measures this probing. A typical systematic testing method, applied with deficient test data, might issue the generic message: “Testing succeeded, but it did not cover (list of elements missed).” Systematic testing methods differ in the kind of coverage they require. The major division is between “functional” and “structural” methods. Functional testing is also called “black-box’ ’ testing, or “specification-based” testing. The coverage is by functions (cases) of what the software is supposed to do. In a command-driven system under test, different commands could be taken to be functions; a test would not achieve functional coverage unless it had used every command. Additional functional classes could be defined by considering command-parameter classes, e.g., “long” or “short” for a string, positive/negative/zero for an integer, first/second/third/fourth quadrant for an angle, etc. A command with one argument that is an angle would then define four functional classes to be covered, one for each quadrant. Functional testing is the primary systematic technique, at the heart of any reasonable plan for finding software failures. Without it, the tester has no idea what software will do in the very cases that it is supposed to handle, cases that are going to be tried by its users. Structural testing is also called ‘program-based” or “code-based” testing, and also “clear-box” or “white-box” testing.*The coverage is by syntactic parts of the program begin tested. The most common kind of structural coverage is “statement testing,” which requires that test data force the execution of each and every program statement. Structural testing requires no knowledge of what a program is supposed to do (except in determining success or failure of each test-see Section 2.1), and thus its coverage requirement can be automated. Tools for measuring structural coverage are often efficient and easy to construct. For example, many operating systems have a “profile” utility that counts execution frequencies by statement, using instrumentation inserted in the source program. Deficiencies in statement coverage show up as profiles containing statements executed zero times. Without statement-coverage testing, the tester has no idea what will happen when the statements that were not tested are executed, as they probably will be when the program is used. Functional and structural testing intuitively complement each other: structural testing exercises the code that is present, and functional testing covers what
SOFTWARE QUALITY, PROCESS, AND TESTING
203
should be there. These methods intuitively uncover failures because bugs have a functional andor structural location, so by systematicallyexploring these locations, the bugs may be found. Most test plans use a combination of methods. A particularly plausible scheme (Marick, 1991) is to devise tests for functional coverage, and then measure the structural coverage of those same tests, to judge how well they apply to the code being tested. When parts of the code are not covered, the tester should return to the functional cases to identify omitted functions, and repeat the process. Suppose that a test plan specifies complete coverage (functional or structural, or both), and that it is successfully carried out. How could a program bug neverthelessescape detection? This question is central to understanding in detail why systematic testing cannot establish that a program works. Actual test cases are finite in number (and for practical testing, the number is severely limited by time and resource constraints). Any test can therefore cover at most a finite number of things. Thus the list of functions to be covered (for example), must have a coarse granularity. In the example of a command with an angle argument, it could happen that the program fails when the angle is 1 2 O.oooO1 radians. This is in the 1st quadrant, but the tester may not have chosen such points in “covering” the first quadrant. There are an infinity of points (or at least impractically many in the digital quantization) to consider, and it is impossible to try them all. The situation of “covering” an infinity with a small finite number of tests is general in systematic testing. For structural statement testing, there are but a finite number of statements in any program. But each statement may be executed with varying values for the quantities it manipulates, and true coverage would have to try all these values. Statement coverage does not do so, which is its flaw.
3.1 Functional Testing The bulk of the creative effort in functional testing goes into devising the tests, or put another way, into defining the functional classes that are to be covered, and their representatives.Functional testing fits best at the system level, because software requirements and specificationsare most often at that level. User documentation (especially a user’s manual with examples) is a good source for a functional breakdown. With the test points in hand, the tasks of executing them, judging the answers, and keeping track of what has been done, are relatively straightforward. Tool support for the bookkeeping is helpful, indeed essential for a large system (Baker et al., 1989). At the unit level, functional testing is also of value, but more difficult to carry out, because unit specificationsmay be poor or nonexistent, and there may be a dearth of intuitive functional classes for a unit. It is common to rely on implicit understanding, or volatile design knowledge that is not carefully recorded, in
204
DICK HAMLET
defining what code units must do. Furthermore, units may have their functionality so intertwined with other units that it is impossible to test them separately. Some units perform a function that does not have any intuitive breakdown into cases; e.g., a subroutine computing e‘ for argument value x seems inadequately tested using the classes {xlx < 0}, {0}, and {xlx > 0}, but what other classes come to mind? Although functional testing is defined to make no use of the program structure, in practice there is good reason to add some design information to the specification when defining functional classes. Every program has important internal data structures, and arbitrary restrictions on its operation imposed by implementation limitations; these lead to natural functional classes for testing. For example, a program that uses buffers should be probed with the buffer empty, just full, and overfull. What is being covered by such tests are really design elements, which fall between the functional and the structural. Another example is a program that uses a hash table: the distinction of collisionho-collision defines a pseudofunctional class. Marick (1991) calls the kind of testing that includes boundary cases from design “broken-box” testing; it has long been treated as an aspect of functional testing (Goodenough and Gerhart, 1976).
3.2
Structural Testing
Structuraltesting methods are conveniently divided into three categories, based on implementation techniques for tools, and the intuition behind the methods. These are: control-flow methods (easy to implement efficiently, easy to understand); data flow methods (somewhat more difficult to implement, sometimes difficult to grasp); and data coverage methods (hard to implement efficiently, hard to understand).
3.2.7 Control-Flow Coverage Statement-coverage testing is the prototype control-flow method. The places in the program that might be reached (the statements), must be reached by an adequate testset. Because statement coverage is almost trivial to measure, most commercial testing tools provide this measurement. Brunch testing is also widely supported by testing tools. A testset achieves branch coverage if its tests force each program branch to take both TRUE and FALSE outcomes.’’ Branch testing is commonly preferred to statement testing, but without much justification (Section 5). Conditional branches employing AND and OR operators contain “hidden” paths that are not forced by branch- or statement-coverage testing, and in the presence of short-circuit evaluation, it may be of some importance whether or not these paths were taken. Multicondition coverage is a version of branch testing
SOFTWARE QUALITY, PROCESS, AND TESTING
205
that requires tests to make each subexpression take both TRUE and FALSE values. Loop coverage requires that tests cause the loop body to be executed at least once, and also cause the body to be skipped.” The state of the test-tool art is nicely exemplified by the public-domain tool GCT (Generic Coverage Tool) (Marick, 1991) for the C language. GCT comes with impressive testsets for testing GCT itself. It includes branch, multicondition, and loop coverage measures (as well as weak mutation, described below). One structural testing method often spawns another when a deficiency is recognized in the first method. Branch testing and the other GCT control-flow methods could be said to arise from statement testing because they intuitively seem to cover more. The ultimate in control-flow coverage is path resting, in which it is required that tests in a testset cause the execution of every path in the program. The rationale behind path testing is that each path represents a distinct case that the program may handle differently (insofar as something different happens along each path), and so these cases must be tried. Unfortunately, there are a potential infinity of paths through any loop, corresponding to 0, 1, 2, . . . , n, . . . times through the body, and even nested IF statements quickly lead to too many paths, so complete path testing is usually not attempted in practice.’* Testing tools that implement control-flow methods make use of exception reporting to minimize their reports. They identify not what was executed, but what was nor. Tools are thus required to be able to identify the possibilities for coverage. This they necessarily do inaccurately, because determinationof exactly which control flows can occur is in general an unsolvable problem. Invariably, practical tools use a simple worst-case algorithm: they treat a control flow as possible if and only if it forms a path in the uninterpreted program flowgraph. That is, they ignore the logical condition needed to execute the path, and consider only the connections of the graph. As a special case of this inaccuracy, a FOR loop appears to be able to execute its body any of 0,1, 2, . . . , n, . . . times; in reality, for constant bounds only one count is possible. When a testing tool reports that coverage has failed, and statements, branches, paths, etc. are uncovered, it may be falsely listing situations that actually cannot occur. The tester is then faced with the infeasible parh problem: to determine which of the reported coverage deficiencies are real (requiring additional test points to cover), and which are impossible to fulfill (to be ignored). In practice, these decisions are not very difficult for human beings to make (Frankl and Weyuker, 1988). Testing tools also commonly report fractional coverages, e.g., “87% of all branches were covered.” Such messages are the basis for testing standards that specify a fraction to be attained [80%is often suggested,but in practice, coverages may be much lower (Ramsey and Basili, 1985)l. These summary numbers do measure testing effort, but their meaning is obscure, and it is easy to be misled (Section 5.1).
206
DICK HAMLET
3.2.2 Data-Flow Coverage Complete path testing is considered impracticalbecause of the potential infinity of paths through loops. A heuristic method of loop coverage that requires executing the loop body at least once, and also skipping it entirely, is not satisfactory. Experience shows that many subtle failures show up only on particular paths through loops, paths that are unlikely to be selected to satisfy a heuristic. Complete path testing really has something to offer, because programmers do think in terms of paths as special cases, that should subsequently be tested. DaruJlow testing is an attempt to prescribe coverage of some of these special paths. The central idea of data-flow coverage is the def-useor DU pair for a program variable. Variable V has a def" at each place in the program where V acquires a value, e.g., at assignments to V. V has a use at any other place it occurs. Thus, for example, in the Pascal statement
x:= x
+
1
there is a use of x followed by a def of x. A DU path for V is a path in the program from a place where V has a def, to a place where V has a use, without any intervening def of V. A DU pair for V is the pair of start-end locations of any DU path. Thus, the intuition behind data-flow testing is that DU paths are the special paths that programmers think about, paths on which values are stored, then later used. For example, in the Pascal code (with line numbers for identification):
1 x := 0; 2 while x < 2 3 begin
4
u r i t e l n ("Looping");
5 x : = x + l 6 end there are the following DU paths for the variable x: 1-2, 1-2-3-4-5, 5-6-2, 5-62-3-4-5. Although it is possible to have an infinity of DU paths in a program (but not usual, as the example shows-most such paths are interrupted by a def), the number of DU pairs is always finite. This motivates the definition of the most common kind of data flow test coverage: a testset achieves all-usescoverage if its data points cause the execution of each DU pair for each variable. (When multiple DU paths, perhaps an infinity of them, connect a DU pair, only one such path need be covered in all-uses testing.) The technical definitions of data flow testing are surprisingly difficult to give precisely, and there are several versions in the literature (Rapps and Weyuker, 1985; Frankl and Weyuker, 1988; Podgurski and Clarke, 1990). Among many variations of data flow ideas, we mention only the extension to dependency chains (Campbell, 1990; Ntafos, 1988). A dependency chain occurs
SOFTWARE QUALITY, PROCESS, AND TESTING
207
when there is a series of DU paths laid end to end, DU paths on which the use ending one is passed to the def beginning another, perhaps with a change of variable. For example, in the Pascal:
1 s := 0; 2 x := 0; 3 while x < 3 4 begin 5 s := s t x; 6 x : = x t l 7 end; 8 writeln(S1 there is a dependency chain beginning with the def of x at 6,and ending with the use of S at 8, passing from x to S at 5, then from S to S again at five, on the path 6-7-3-4-5-6-7-3-4-5-6-7-3-8. This dependency chain captures the contribution of both the loop index and the partial sum to the final sum, and it requires two iterations to observe it. Implementing data-flow test tools is not much more difficult than implementing control-flow tools (indeed, some people do not distinguish data-flow as separate from control-flow). The instrumentation for observing what paths are executed is the same as for statement coverage. However, to calculate which DU pairs/ pathdchains exist in the program requires construction of a dependency graph. This construction is static, based as usual on the uninterpreted flow graph, annotated with variable-usage information. Thus data flow tools are subject to the infeasible-path difficulties described above.
3.2.3 Data Coverage The defect in all control-flow and data-flow coverage methods is that each statementhrancldpath requires only a single set of internal data values to cover. That is, these testing methods do force the execution of control patterns, but only for a single case out of a potential infinity of cases that use the same pattern. Faults can remain hidden, if the failures they cause require the pattern to be traversed with special data values. The more complex the control pattern, the more important the problem of data choice becomes, because in attempting to cover a difficult control condition, a tester is likely to choose trivial data values, values that have little chance of exposing failure. So-called “special-values testing,” or “boundary testing” suggests the use of extreme values. Used in connection with a control- or data-flow method, boundary testing is very valuable in the hands of an experienced tester. For example, on a dependency chain that makes use of buffer storage, a boundary test would require the chain to be covered with the buffer empty, with it just
DICK HAMLET
full, and with overflow. The applications of special-values testing are not systematic, but based on human insight into what values might cause things to go wrong. Ultimately, the “right” special values are the ones on which the program happens to fail, something that cannot be systematized. A brute-force kind of “data-coverage” measurement for a testset can be obtained by instrumenting a program to record all of its internal states (Hamlet, 1993). The quality of the testset at each point in the program is measured by the variety of values assumed by the variables used there. Although neither the instrumentation nor the analysis of recorded data is difficult, such a system is not storage-efficient.The idea has a more fundamental deficiency because a data coverage tool cannot report its results by exception. The problem of statically calculating the internal-state possibilities at an arbitrary program location is of course unsolvable; worse, there are no known approximation or useful worstcase algorithms. Thus, the situation differs from that for control- and data-flow testing, in which the uninterpreted flow graph allows easy worst-case calculation of the possible paths, including only a few infeasible ones by mistake. Mutarion testing is a systematic method that approximates both boundary testing and data-coverage testing (Hamlet, 1977). From the program under test (PUT), a collection of mutant programs are created, each differing from the PUT by a small change in syntax. For example, in a conditional statement of the PUT, the relational operator “>” might be replaced with “ f ” in a mutant. Other mutants would use other relational operators in turn, the set of variations being determined by the particular mutation system. Now the PUT and all mutant variations are executed on a testset. The PUT must get correct results, and any mutant that does not agree with the PUT (i.e.. the mutant does not succeed on the testset) is termed killed. The testset achieves mutation coverage if it kills all mutants. The idea behind m ~ t a t i o nis’ ~that mutant variations explore the quality of the data that reaches each program location. Good testsets cannot be fooled by any mutant. In the example of substituting “ # ” for “>”, these two operators agree on the whole range where the “>” expression is TRUE. So long as test data falls in that range it is not covering very well. Mutation also systematizes the idea of “boundaries” in data; boundaries are literally edges of regions on which mutants are killed. In principle, an extended kind of mutation, in which all possible variations of the PUT are considered, would constitute a perfect test method. Killing all the mutants would show that no other program agrees with the PUT on a testset, hence the PUT must be correct if any program is. But even the very restricted mutants of the experimental systems are too many to make the technique easily practical. Another implementation difficulty is that mutants may have termination difficulties the PUT did not have, e.g., when loop-termination expressions are
SOFTWARE QUALITY, PROCESS, AND TESTING
209
mutated. Technically, a mutant looping forever can never be killed; in practice, systems impose an arbitrary limit and silently and incorrectly kill long-running mutants. There is a more difficult practical problem with mutation testing than its implementation,unfortunately. Just as in control-flowtesting there may be infeasible paths so that the exception reporting of (say) a DU testing tool requires human investigation, so mutation sets its users an even more difficult task. Some mutants cannot be killed, because there is no test data on which they differ from the PUT-such a mutant is a program equivalent to the PUT. The problem of deciding mutant equivalence is unsolvable in principle, but also intractable in practice, and is a major stumbling block to acceptance for mutation testing. A number of suggestions have been made to make mutation testing more practical. The most promising is to use “weak mutation” (Howden, 1982; Hamlet, 1977). Weak mutation considers a mutant to be killed if it differs from the PUT in the state values immediately following the altered statement in the mutant. The mutation described above (now “strong mutation”) requires that the program results differ. Of course, if a mutant is killed in the strong sense, it must also be killed in the weak sense by the same test data, but the converse is not true. It may happen that the mutation alters a state, but that subsequent actions ignore or correct this error. Weak mutation can be efficiently implemented (it is a part of the GCT tool mentioned above), and in some important cases (notably mutants of the relational operators) it is possible to derive simple conditions that describe test data that will necessarily kill all mutants. The problem of equivalent mutants remains in the weak case, but its practical difficulty is mitigated by focusing on just one state following the mutation.
3.3 What Should Be Covered? Coverage testing, both functional and structural, is intended to expose failures by systematically poking into the “comers” of software. It is reasonable to inquire into the principles of coverage-why does it work, and what makes it work better? Such principles would be useful in directing testing practice, because they would help a tester to decide which method to use, and how to tailor a method to particular software. For example, using statement coverage, should some statements receive more attention than others? A better understanding of coverage testing might also address the problem of composing test results from unit testing into conclusions for system testing. What coverage at the unit level would reduce failures when the units are combined into a system? Marick’s suggestion that structural coverage be used to assess the quality of functional coverage is one way that methods might be used together.15Section 3.2.3 has suggested that mutation was invented as a complement to control-flow testing. Each kind of coverage testing has its own rationale, and particular failures
210
DICK HAMLET
it is good at finding, so why not use them all? In short: because there isn’t time. Testing using even a single method (or no method at all) is an open-ended activity. It is always possible to take more time and do more testing. For example, functional classes can always be refined into smaller functional classes, requiring larger testsets to cover. Within each functional class, it is always possible to choose more test points. If methods are to be combined, how much should each be used relative to the others? In research that will be considered in detail in Section 5.2, “partition testing” methods were studied. Partition testing is an abstraction of coverage testing, which considers groups of inputs that achieve particular parts of coverage. For example, in complete path testing, the input space is divided into equivalence classes by which path those inputs take. These classes do not overlap (no input can take more than one path), and together they exhaust all inputs (every input must take some pathI6). The equivalence class for each path is the place from which tests must be drawn to cover that path. Other structural testing methods do not induce disjoint classes in this way (e.g., the same input may cause execution of more than one statement,in statementtesting), but their classes can be combined to create an artificial partition (Hamlet and Taylor, 1990). When the failure-finding ability of partition-testing methods was investigated (Hamlet and Taylor, 1990; Weyuker and Jeng, 1991), an unsurprising result emerged: coverage works best at finding failures where there are more failures to find. That is, if the failure points are unequally distributed over the equivalence classes, then covering the classes of that partition will be more likely to find errors than using a partition in which the classes are uniform relative to failure. One way of looking at this result is as useless advice to “look for failures where they are,” useless because if that information were available, testing would be unnecessary-we do not know where the failures are. But the advice is not so useless in different forms: 1. Coverage works better when the different elements being covered have differing chances of failing. For example, path testing will be better at findingerrors if some paths are more buggy than others. A corollary of (1) is: la. Do not subdivide coverage classes unless there is reason to believe that the subdivision concentrates the chance of failure! in some of the new subclasses. For example, it should help to break control-flow classes down by adding data boundary restrictions, but it would not be useful to refine functional classes on function-parameter values if one parameter value is no more error-prone than another. 2. Devise new coverage-testingmethods based on probable sources of trouble in the development process. For example, emphasize the coverage of parts of code that has changed late in the development cycle, or code written by the least-experienced programmer, or code that has a history of failure, etc.
SOFTWARE QUALITY, PROCESS, AND TESTING
21 1
Although coverage testing is an “obvious” method, its theory is poorly understood. Open questions will be considered further in Section 4.4.
3.4 Testing for Failure in the Software Process The accepted role for testing in the best development processes today is the one given by Myers: to find failures.” But when the development process is considered as a feedback system seeking to improve software quality, failurefinding is a dangerous measure. The trouble is that one cannot tell the difference between good methods used on good software, and poor methods used on poor software. Testing cannot be evaluated in isolation, nor can it be used to monitor other parts of the process, unless an independent control on faults is present. If there is no such control, one never knows whether finding more failures means improved testing, or merely worse software; finding fewer failures might mean better software, or it might mean only poorer testing. It is not impossible to make sensible use of a failure-finding testing measure in the developmentprocess, but to do so requires better fundamental understanding of the interactions between software creation and testing than we possess today.
4. Testing for Reliability When software is embedded in a larger engineering artifact (and today it is hard to find a product that does not have a software component), it is natural to ask how the software contributes to the reliability of the whole. Reliability is the fundamental statistical measure of engineering quality, expressing the probability that an artifact will fail in its operating environment, within a given period of operation.
4.1
Analogy to Physical Systems
The software failure process is utterly unlike random physical phenomena (such as wear, fabrication fluctuations, etc.) that underlie statistical treatment of physical systems. All software failures are the result of discrete, explicit (if unintentional) design flaws. If a program is executed on inputs where it is incorrect, failure invariably occurs; on inputs where it is correct, failure never occurs. This situation is poorly described as probabilistic. Nevertheless,a software reliability theory has been constructed by analogy to the mechanical one (Shooman, 1983). Suppose that a program fails on a fraction 0 of its possible inputs. It is true that 0 is a measure of the program’s quality, but not necessarily a statistical one that can be estimated or predicted. The conventional statistical parameter
212
DICK HAMLET
corresponding to 0 is the instantaneoushazard rate orfailure intensifyt measured in failures per second. For physical systems that fail over time, z itself is a function of time. For example, it is common to take z(t) as the “bathtub curve” shown in Fig. 1. When a physical system is new, it is more likely to fail because of fabrication flaws. Then it “wears in” and the failure intensity drops and remains nearly constant. Finally, near the end of its useful life, wear and tear makes the system increasingly likely to fail. What is the corresponding situation for software? Is there a sensible idea of a software failure intensity? There are several complications that interfere with understanding. The fkst issue is time dependence of the failure intensity. A physical-system failure intensity is a function of time because the physical system changes. Software changes only if it is changed. Hence a time-dependent failure intensity is appropriate for describing the development process, or maintenance activities. (The question of changing usage is considered in Section 4.4.) Only the simplest case, of an unchanging, “released” program is considered here. Thus we are not concerned with “reliability growth” during the debugging period (Musa et al., 1987). Some programs are in continuous operation, and their failure data is naturally presented as an event sequence. From recorded failure times rl, t2,. . . ,r. starting at 0, it is possible to calculate a mean time to failure (MTW) n which is the primary statistical quality parameter for such programs. MlTF is of questionable statistical meaning for the same reasons that failure intensity is. It is a (usually unexamined) assumption of statistical theories for continuously operating programs that the inputs which drive the program’s execution are “representative” of its use. The inputs supplied, and their representativeness,
t wear in
z(t)
I 0 0
wear out
I I
I I I 1
I
I
time t FIG.1. “Bathtub” hazard rate function.
*
SOFTWARE QUALITY, PROCESS, AND TESTING
213
are fundamental to the theory; the behavior in time is peculiar to continuously operating programs. Exactly the same underlying questions arise for the purefunction batch programs whose testing is being considered here. For such a program, the number of (assumed independent) runs replaces time, the failure intensity is “per run” (or sometimes, “per demand” if the system is thought of as awaiting an input). MTI’F is then “mean runs to failure” (but we do not change the acronym).
4.2
Random Testing
The random testing method was not described in Section 3, because it is seldom used for finding failures.l* Random testing recognizes that a testset is a sample taken from a program’s input space, and requires that sample to be taken without bias. This is in strong contrast to the methods of Section 3, each of which made use of detailed program or specification information. Random testing is intuitively useful for prediction. If an appropriate sample is used for the testset, results on that sample stand in for future program behavior, for which the inputs are unknown. Random testing cannot be used unless there is a means of generating inputs “at random.” Pseudorandom number generation algorithms have long beem studied (Knuth, 1981). although the statistical properties of the typical generator supplied with a programming language are often poor. Pseudorandom numbers from a uniform distribution can be used as test inputs if a program’s range of input values is known. In actual applications,this range is determinedby hardware limitations such as word size, but it is better if the specification restricts the input domain. For example, a mathematical library routine might have adequate accuracy only in a certain range given in its specification. A uniform distribution, however, may not be appropriate.
4.2.1 Operational Profile Statistical predictions from sampling have no validity unless the sample is “representative,” which for software means that the testset must be drawn in the same way that future invocations will occur. An input probability density d(x) is needed, expressing the probability that input x will actually occur in use. Given the function d, the operational distribution F(x) is the cumulative probabilityI9that an input will occur: F(x) =
p
dWZ.
-en
To generate a testset “according to operational distribution F,” start with a collection of pseudorandom reds r uniformly distributed over [0,1], and generate F-I(r). For a detailed presentation, see Hamlet (1994).
214
DICK HAMLET
The distribution function d should technically be given as a part of a program’s specification. In practice, the best that can be obtained is a very crude approximation to d called the operational profile. The program input space is broken down into a limited number of categories by function, and attempts are made to estimate the probability with which expected inputs will come from each category. Random testing is then conducted by drawing inputs from each category of the profile (using a uniform distribution within the category), in proportion to the estimated usage frequency.
4.3 Software Reliability Theory If a statistical view of software failures is appropriate, failure intensity (or MTTF) can be measured for a program using random testing. Inputs are supplied at random according to the operational profile, and the failure intensity should be the long-term average of the ratio of failed runs to total runs. An exhaustive test might measure the failure intensity exactly. But whether or not failure intensity can be estimated with less than exhaustive testing depends on the sample size, and on unknown characteristics of programs. Too small a sample might inadvertently emphasize incorrect executions, and thus to estimate failure intensity that is falsely high. The more dangerous possibility is that failures will be unfairly avoided, and the estimate will be too optimistic. When a release test exposes no failures, a failure-intensityestimate of zero is the only one possible. If subsequent field failures show the estimate to be wrong, it demonstrates precisely the antistatistical point of view. A more subtle criticism questions whether MTTF is stableis it possible to perform repeated experiments in which the measured values of MTTF obey the law of large numbers? In practice there is considerable difficulty with the operational profile: 1. Usage information may be expensive to obtain or simply not available. In the best cases, the profile obtained is very coarse, having at most a few hundred usage probabilities for rough classes of inputs. 2. Different organizations (and different individuals within one organization) may have quite different profiles, which may change over time. 3. Testing with the wrong profile always gives overly optimistic results (because when no failures are seen, it cannot be because failures have been overemphasized!).
The concept of an operational profile does successfully explain changes observed over time in a program’s (supposedly constant) failure intensity. It is common to experience a bathtub curve like Fig. 1. When a program is new to its users, they subject it to unorthodox inputs, following what might be called a “novice” operational profile, and experience a certain failure intensity. But as
215
SOFTWARE QUALITY, PROCESS, AND TESTING
they learn to use the program, and what inputs to avoid, they gradually shift to an “average” user profile, where the failure intensity is lower, because this profile is closer to what the program’s developer expected and tested. This transition corresponds to the “wear in” period in Fig. 1. Then, as the users become “expert,” they again subject the program to unusual inputs, trying to stretch its capabilities to solve unexpected problems. Again the failure intensity rises, corresponding to the “wear out” part of Fig. 1. Postulating an operationalprofile also allowsus to derive the softwarereliability theory developed at TRW (Thayer et al., 1978), which is quantitative, but less successful than the qualitative explanation of the bathtub curve. Suppose that there is a meaningful constant failure intensity 0 (in failures per run) for a program, and hence a MTTF of 1/0 runs, and a reliability of e-OMruns(Shooman, 1983). We wish to draw N random tests according to the operational profile, to establish an upper confidence bound a that 0 is below some level 8. These quantities are related by
if the N tests uncover F failures. Some numerical values: for F = 0, N = 3000, a = 0.95,O = 0.001, the MTTF is 1000 runs, and the reliability is 95% (for 50 runs), 61% (for 500 runs), and less than 1% (for 5000 runs). For the important special case F = 0, the confidence a is a family of curves indicated in Fig. 2. For any fixed value of N it is possible to trade higher confidence in a failure intensity such as h for lower confidence in a better intensity such as h’.
c
0
h’
h
failure intensity 0 FIG.2. Confidence in failure intensity based on testing.
1
216
DlCK HAMLET
Equation (1) predicts software behavior based on testing, even in the practical release-testing case that no failures are observed. The only question is whether or not the theory’s assumptions are valid for software. What is most striking about Eq. (1) is that it does not depend on any characteristics of the program being tested. Intuitively, we would expect the confidence in a given failure intensity to be lower for more complex software. Another way to derive the relationship between confidence, testset size, and failure rate, is to treat the test as an experiment checking the hypothesis that the failure rate lies below a given value. Butler and Finelli (1991) obtain numerical values similar to those prediced by Eq.(1) in this way. They define the “ultrareliable” region as failure rates in the range lo-* per demand and below, and present a very convincing case that it is impractical to gain information in this region by testing. From Eq. (l), at the 90% confidence level, to predict a MTTF of M requires a successful testset of size roughly 2M, so to predict ultrareliability by testing at one test point each second around the clock would require three years. (And of course, one must start over if the software is changed because a test fails.) Ultrareliability is appropriate for safety-critical applications like commercial flight-control programs and medical applications; in addition, because of a large customer base, popular personal computer (PC)software can be expected to fail within days of release unless it achieves ultrareliability. Thus software reliability theory provides a pessimistic view of what can be achieved by testing. The deficiencies of the theory compound this difficulty. If an inaccurate profile is used for testing the results are invalid, and they always err in the direction of predicting better reliability than actually exists.
4.4
Profile Independence
The most dubious assumption made in conventional reliability theory is that there exists a constant failure intensity over the input space. It is illuminating to consider subdividing the input space, and applying the same theory to its parts. Suppose a partition of the input space creates k subdomains Sl, S,, . . . , Sk, and the probability of failure in subdomain Si (the subdomain failure intensity) is constant at Oi. Imagine an operational profile D such that points selected according to D fall into subdomain Sj with probabilitypi. Then the failure intensity 0 under D is k
0=
2 pj0i . i= I
Howeve5 for a different profile D’, different p i may well lead to a different 0’ = p j O b For all profiles, the failure intensity cannot exceed
because at worst a profile can emphasize the worst subdomain to the exclusion
SOFTWARE QUALITY, PROCESS, AND TESTING
217
of all others. By coverage testing in all subdomains without failure, a bound can be established on 0,. and hence on the overall failure intensity for all distributions. Thus in one sense partition testing multiplies the reliability-testing problem by the number of subdomains. Instead of having to bound 0 using N tests from an operational profile, we must bound 0, using N tests from a uniform distribution over the worst subdomain; but, since we don’t know which subdomain is worst, we must bound all k of the Oi, which requires UV tests. However, the payback is a profile-independent result, i.e., a reliability estimate based on partition testing applies to all profiles. The obvious flaw in the above argument is that the chosen partition is unconstrained. All that is required is that its subdomains each have a constant failure intensity. (This requirement is a generalization of the idea of “homogeneous” subdomains, ones in which all inputs either fail; or, all the inputs there succeed.) But are there partitions with such subdomains? It seems intuitively clear that functional testing and path testing do not have subdomains with constant failure rates. Indeed, it is the nonhomogeneity of subdomains in these methods that makes them less than satisfactory for finding failures, as described in Section 3. Of course, the failure intensity of a singleton subdomain is either 0 or 1 depending on whether its point fails or succeeds, but these ultimate subdomains correspond to usually impractical exhaustive testing. The intuition that coverage testing is a good idea is probably based on an informal version of this argument that coverage gets around the operational profile to determine “usage independent” properties of the software. But making the intuitive argument precise shows that the kind of coverage (as reflected in the character of its subdomains) is crucial, and there is no research suggesting good candidate subdomains.
5. Comparing Test Methods What objective criteria could be used to decide questions about the value of testing methods in general, or to compare the merits of different testing pmcedures? Historically, methods were evaluated either by unsupported theoretical discussion, or by “experiments” based on circular reasoning. The inventor of a new method was expected to argue that it was subjectively cleverer than its predecessors, and to compare it to other methods in terms defined only by themselves. For example, it was common to find a testset that satisfied the new method for some program, then see what fraction of branch coverage that testset attain&, or, to find a testset for branch coverage and see how well it did with the new method. The new method was considered to be validated if its testset got high branch coverage but not the other way around. Such studies are really investigating special cases (for a few programs and a few testsets) of the “subsumes” relation.M
218
DICK HAMLET
5.1 Comparison Using the Subsumes Relation Control- and data-flow methods can be compared based on which method is more “demanding.” Intuitively, a method is “at least as demanding” as another if its testsets necessarily satisfy the other’s coverage. The usual name for this relationship is subsumes. If method Z subsumes method X, then it is impossible to devise a method-Z test that is not also a method-X test. The widespread interpretation of “Z sumsumes X” was that method Z is superior to method X. The most-used example is that branch testing is superior to statement testing, because branch coverage strictly subsumes statement coverage. However, it was suggested (Hamlet, 1989) that subsumption could be misleading in the real sense that natural (say) branch tests fail to detect a failure that (different) natural statement tests find. For example, suppose that the Pascal subprogram R procedure rootabs(x: r e a l ) : r e a l ; begin i f x < 0 t h e n x := -x; r o o t a b s := s q ( x ) end
has specification { ( x , m ) l x real}, i.e., that the output is to be the square root of the absolute value of an input. It would be natural to branch-test R with the testset Tb = {- l,O,l}, while T,= (-9) might be a statement-test testset, which does not achieve branch coverage. (This example shows why the subsumption of statement- by branch testing is strict). Only T, exposes a failure caused by the programmer mistaking Pascal’s square function for its square-root function. Concentrating on the predicates that determine control flow leads to neglect of the statements, which here takes the form of trivial data values reaching the faulty function. A continued exploration (Weyuker et al., 1991) showed that the subsumes idea could be refined so that it was less likely to be misleading, and that it could be precisely studied by introducing a probability that each method would detect a failure. The behavior of any method is expressed by the probability that a testset satisfying that method, selected at random from all such testsets, will expose a failure. An example was given in which statement testing was more likely to detect a failure than was branch testing; however, even the contrived example was unable to evidence much superiority for the “less demanding” method, indicating that “subsumes” is not so misleading after all. In a recent paper (Frank1 and Weyuker, 1993). the subsumes relationship is refined to “properly covers” and it is shown that the new relationship cannot be misleading in the probabilistic sense. Suppose two testing methods divide a program’s input space into subdomains, one domain for each “element” to be
SOFTWARE QUALITY, PROCESS, AND TESTING
219
covered.21It is the relationship between these two collections of subdomains that determines the failure detection probability of one method relative to the other. Roughly, method Zproperfycovers method X if Z has a collection of subdomains from which the subdomains of X can be constructed. In the simplest case of partitions whose domains are disjoint, if each subdomain of X is a union of subdomains from Z , then Z properly covers X. Intuitively, there is no way for X to have many “good” testsets without Z having equally many, because the X subdomains can be made up from Z subdomains. When the subdomains do not form partitions, one must be careful in using multiply-counted subdomains of Z to make up subdomains of X-a single subdomain may not be used too often. As an example, simple programs exist to show that branch testing does not properly cover statement testing-the misleading program R given above is one. On the other hand, Frankl and Weyuker (1993) have shown (for example) that some methods of their dataflow testing hierarchy are really better than others in the sense of properly covers. The subsumes relationship began as a generalization of how well test methods do in each other’s terms, that is, without any necessary reference to objectively desirable properties of the software being tested. A “more demanding” method Z that strictly subsumes X is actually better only if we assume that what Z demands is really useful beyond its mere definition. The notion of “misleading” introduced an objective measure (failure detection ability), and showed that for natural testing methods, “subsumes” does not necessarily correlate with the objective measure. It is easy to devise unnatural testing methods in which “subsumes” is more misleading. For example, the coverage measure ( Z ) “more than 70%of statements executed” strictly subsumes (X)“more than 50% of statements executed,” but if the statements executed using Z happen to be all correct ones in some program, while those executed using X happen to include its buggy code, then X is actually better than Z for this program. This example demonstrateswhy fractional coverage measures are particularly poor ones.
5.2 Comparison for Reliability Prediction Failure detection probability is currently enshrined as the accurate measure of test quality, replacing a circular use of “coverage” to assess the quality of “coverage.” But while it is questionable whether failure detection probability is the appropriate measure of testing quality, it remains the only one on which work has been done. If testing is to be used to assess software quality, which we have argued is necessary for useful feedback in the software development process, then testing methods must be compared for their ability to predict a quality parameter like MTTF. But here the operational profile enters to invalidate
220
DICK HAMLET
comparisons. Only random testing can use the profile-other methods by their nature require testsets that distort usage frequencies. The theoretical comparisons between random testing and partition testing alluded to in Section 3.3 (Duran and Ntafos, 1984; Hamlet and Taylor, 1990; Weyuker and Jeng, 1991), are often cited as showing that random testing is superior to coverage testing. Strictly speaking, these studies compare only failure detection ability, of random testing and the partition testing abstraction of coverage methods, and they mostly find partition testing to have the edge. The seminal study (Duran and Ntafos, 1984) intended to suggest that random testing was a reasonable alternative to the methods described in Section 3.2. But because random testing's failure detection probability is identical to the failure intensity (hazard rate) of reliability theory, it appears that these studies have an additional significancebecause they make comparisons based on reliability. Unfortunately, even granting that the partitiodrandom comparison applies to coverage testing, and that failure detection probability for random testing determines M'ITF, the only conclusion that can be reached is the negative one that coverage testing is at best no more significant than random testing, or at worst of no significance. A random test can establish upper confidence bound a that the failure intensity is not above 8 on the basis of N tests with F failures. Equation (1) connects these quantities according to the TRW software reliability theory. If a coverage test is as good a statistical sample as a random test, it might realize a similar or better bound on failure intensity. But the very intuitive factors that make coverage testing desirable for finding failures, make its failure detection probability different from a failure intensity. Coverage testing achieves superior failure detection precisely by sampling not the operational profile, but according to classes that emphasize failure. These classes bear no necessary relation to the operational profile, and hence the failure intensity may be large even though the failure-detection probability for coverage is small, if coverage testing found failures in low-profile usage areas, and neglected high-profile usage areas. The promise of coverage testing is that it might explore all usage areas without requiring knowledge of that usage, but no one knows how to surely realize this promise. Thus, the comparable or better failure-detection probabilities of coverage testing vis-A-vis random testing are not failure intensity predictions at all, and there is as yet no information about the relationship between coverage testing and reliability.
6. Dependability The technical difficultieswith the notion of softwarereliability make it inappropriate for measuring software quality, except in the rare instances when a stable
SOFTWARE QUALITY, PROCESS, AND TESTING
22 1
operational profile is available. Intuitively, a different measure is needed, one that is profile independent,thus removing the aspect of reliability that deals with the “operating environment.” There are good grounds for removing this aspect, because software is intuitively perfectible, so that no “environment” (which after all is only an input collection that can be investigated ahead of time) can bring it down. The other aspect of reliability, its dependence on the period of operation, is also intuitively suspect for software. No matter how small the positive hazard rate, all physical systems eventually fail; i.e., their long-term reliability is near zero. But software need not fail, if its designers’ mistakes can be controlled. We seek a software quality parameter with the same probabilistic character as “reliability,” but without its dependence on environment or on period of operation. And we hope to estimate such a parameter by sampling similar to testing. The name we choose is “dependability,” and its intuitive meaning is a probability that expresses confidence in the software, confidence that is higher (closer to 1) when more samples of its behavior have been taken. But this “behavior” may differ from that explored by the usual testing.
6.1 Reliability-Based Dependability Attempts to use the Nelson TRW (input-domain)reliability of Section 4.3 to define dependability must find a way to handle the different reliability values that result from assuming different operationalprofiles, since dependabilityintuitively should not change with different users. It is the essence of dependability that the operating conditions cannot be controlled. Two ideas must be rejected:
U. Define dependabilityas the Nelson reliability, but using a uniform distribution for the profile. This suggestion founders because some users with profiles that emphasize the failure regions of a program will experience lower reliability than the defined dependability. This is intuitively unacceptable. W.Define dependability as the Nelson reliability, but in the worst (functional) subdomain of each user’s profile. This suggestion solves the difficulty with definition U,but reintroduces a dependency on a particular profile. In light of the dubious existence of constant failure intensities in subdomains (Section 4.4) the idea may not be well defined. A further difficulty with any suggestion basing dependability on reliability, is the impracticality of establishing ultrareliability. Other suggestions introduce essentially new ideas.
6.2 Testing for Probable Correctness Dijkstra’s famous aphorism that testing can establish only the incorrectness of software has never been very palatable to practical software developers, who
222
DICK HAMLET
believe in their hearts that extensive tests prove something about software quality. “Probable correctness” is a name for that elusive “something.” The TRW reliability theory of Sec. 4.3 provides only half of what is needed. Statistical testing supports statements like “In 95% of usage scenarios, the software should fail on less than 1% of the runs.” These statements clearly involve software quality, but it is not very plausible to equate the upper confidence bound and the chance of success (Hamlet, 1987), and turn the estimate “99.9% confidence in failure intensity less than .1%” into “probable correctness of 99.9%.”
6.3 Testability Analysis Jeff Voas has proposed that reliability be combined with testabifiry analysis to do better (Voas and Miller, 1992). Testability is a lower bound probability of failure if software contains faults, based on a model of the process by which faults become failures. A testability near 1 indicates a program that “wears it faults on its sleeve”: if it can fail, it is very likely to fail under test. This idea captures the intuition that “almost any test” would find a bug, which is involved in the belief that well tested software is probably correct. To define testability as the conditional probability that a program will fail under test i f it has any faults, Voas models the failure process of a fault localized to one program location. For it to lead to failure, the fault must be executed, must produce an error in the local state, and that error must then persist to affect the result. The testability of a program location can then be estimated by executing the program as if it were being tested, but instead of observing the result, counting the execution, state-corruption,and propagation frequencies. Testability analysis thus employs a testset, but not an oracle. When the testability is high at a location, it means that the testset caused that location to be executed frequently; these executions had a good chance of corrupting the local state, and an erroneous state was unlikely to be lost or corrected. The high testability does not mean that the program will fail; it means that if the location has a fault (but we do not know if it does), then the testset is likely to expose it. Suppose that all the locations of a program are observed to have high testability, using a testset that reflects the operational profile. Then suppose that this same testset is used in successful random testing. (That is, the results are now observed, and no failures are seen.) The situation is then that (1) no failures were observed, but (2) if there were faults, failures would have been observed. The conclusion is that there are no faults. This “squeeze play” plays off testability against reliability to gain confidence in correctness. Figure 3 shows the quantitative analysis of the squeeze play between reliability and testability (Hamlet and Voas, 1993). In Fig. 3, the falling curve is the confidence from reliability testing (it is 1 - (Y from Fig. 2); the step function
SOFTWARE QUALITY, PROCESS, AND TESTING
*
Pr[not correct failure less likely than (fromtestability measurement)
223
I]
Pr[failure more likely than z] (from equation 1)
0
h
1 chance of failure z FIO.3. “Squeeze play” between testability and reliability.
comes from observing a testability h. Together the curves make it unlikely that the chance of failure is large (testing), or that it is small (testability). The only other possibility is that the software is correct, for which 1 - d is a confidence bound, where d is slightly more than the value of the falling curve at h. Confidence that the software is correct can be made close to 1 by forcing h to the right (Hamlet and Voas, 1993). For example, with a testability of 0.001, a random testset of 20,000 points predicts the probability that the tested program is not correct to be only about 2 X
6.4 Self-checking Programs Manuel Blum has proposed a quite different idea to replace reliability (Blum and Kannan, 1989). He argues that many users of software are interested in a particular execution of a particular program only-they want assurance that a single result can be trusted. Blum has found a way to sometimes exploit the low failure intensity of a “quality” program to gain this assurance. (Conventional reliability would presumably be used to estimate the program quality, but Blum has merely postulated that failure is unlikely). Roughly, his idea is that a program should check its output by performing redundant computations. Even if these make use of the same algorithm, if the program is “close to correct,” it is very unlikely that a sequence of checks could agree yet all be wrong. Blum’s idea represents a new viewpoint quite different from testing, because it is a pointwise view of quality. Testing attempts to predict future behavior of a program uniformly, that is, for all possible inputs; Blum is satisfied to make the prediction one point at a time (hence to be useful, the calculation must be made at runtime, when the point of interest is known). All of testing’s problems with the user profile, test-point independence, constant failure intensity, etc. arise
224
DICK HAMLET
from the uniform viewpoint, and Blum solves them at a stroke. Testing to uniformly predict behavior suffers from the difficulty that for a high-quality program, failures are “needles in a haystack”-very unlikely, hence difficult to assess. Only impractically large samples have significance.Blum turns this problem to advantage: since failures are unlikely, on a single input if the calculation can be checked using the same program, the results will probably agree unless they are wrong-a wrong result is nearly impossible to replicate. The only danger is that the checkingis really an exact repetition of the calculation-then agreement means nothing.
6.5 Defining Dependability Either Voas’s or Blum’s idea could serve as a definition for dependability, since both capture a probabilistic confidence in the correctness of a program, a confidence based on sampling. Dependability might be defined as the confidence in correctness given by Voas’s squeeze play. Even if conventional reliability is used for the testing part of the squeeze play, the dependability so defined depends in an intuitively correct way on program size and complexity, because even Voas’s simple model of testability introduces these factors. His model also introduces an implicit dependence on the size of both input and internal state spaces, but this part of the model has not yet been explored. Unfortunately, the dependability defined by the squeeze play is not independent of the operational profile. Dependability might also be defined for a Blum self-checking program as the complement of the probability that checks agree but their common value is wrong. This dependability may be different for different inputs, and must be taken to be zero when the checks do not agree. Thus a definition based on Blum’s idea must allow software to announce its own untrustworthiness (for some inputs). In some applications, software that is “self-deprecating” (“Here’s the result, but don’t trust it!”) would be acceptable, and preferred over a program that reported the wrong answer without admitting failure. One example is in accounting systems whose calculations are not time-critical; the program could be improved, or the result checked by some independent means. In other applications, notably real-time, safety-critical ones, reporting failure is not appropriate. The promise of both Voas’s and Blum’s ideas is that they extend reliability to dependability and at the same time substantially reduce the testing cost. Instead of requiring ultrareliability that cannot be measured or checked in practice, their ideas add a modest cost to reliability estimates of about failuredrun,estimates which can be made today. Blum’s idea accrues the extra cost at runtime, for each result computed; Voas’s idea is more like conventional testing in that it considers the whole input space, before release.
SOFTWARE QUALITV, PROCESS, AND TESTING
225
7. Conclusions A brief history of software testing and its relationship to software quality can be constructed from two famous aphorisms:
Testing can only show the presence of errors [failures, hence the underlying faults], never their absence. [Paraphrasing Dijkstra (Dahl et al., 1972)] The purpose of testing is to find errors [failures-in order to fix the faults responsible]. [Paraphrasing Myers (Myers, 197911
Myers’s statement could be thought of as a response to Dijkstra’s, but the goal that Dijkstra implicitly assumes, to establish that software will not fail, remains. Myers’s realistic goal was uncritically taken to be the same, but we have seen that it is not. Software reliability again addresses Dijkstra’s goal, but using an engineering model (of which he would surely not approve!). The common wisdom about reliability is that (1) its theoretical assumptionsare dubious (Hamlet, 1992), but this is unimportant, because (2) high reliability is impossible to measure or predict in practice (Butler and Finelli, 1991). To complete the picture, the emerging dependability theory answers Dijkstra directly, and may also respond to the question of practicality: it may be possible to measure a confidence probability that software cannot fail, i.e., that it contains no faults, using methods similar to testing. If this promise is realized, the answer to Dijkstra makes for good engineering, if not for good mathematics: software developers will be able to predict tradeoffs between their efforts and software quality, and can build software that is “good enough’’ for its purpose. We have considered only the question of measuring software quality through testing, and using true quality measurements to direct the software development process. The important question of how to artuin the necessary quality has not been addressed. A variety of methods, ranging from the very straightforward (e.g., inspections that employ people’s experience without necessarily understanding just how they use it), to the very abstruse (e.g., completely formal methods of specification, and proof that specifications match intuitive requirements), are the subject of extensive research efforts. I have a vision of the future: 0
Software quality will improve in fact, because development methods will be better.
Software testing for failure will cease to be practiced, because with the better development methods, failure testing will be ineffective: it will find few failures. Instead of seeking failures, testing will be done, in conjunction with dependability analysis and measurement, for the purpose of assessing software quality, to determine if enough effort was put into development, and if that effort was effective.
226
DICK HAMLET
Will the new software be better? Yes, if it needs to be, and the main thing is that the developers will know if the software is good enough. On the other hand, terrible software will continue to be produced (sometimes terrible is good enough!). What will become unusual is software that is nor good enough, but whose developers honestly thought that it was. Will the new software be cheaper? Probably not. Testing for dependability will not save over testing for failure, and the new up-front development methods will have an additional cost. But costs will be more predictable, and the situation in which there is a software disaster revealed only by final testing, will become rare.
ENDNOTE^ I Two other senses of “reliable” occur in the testing literature in theoretical papers by Goodenough and Gerhart (1975) and Howden (1976). Both notions have technical flaws and are not in current use. The Formal Methodists? An illustrative attempt at the kind of argument required has been given for the cleanroom methodology (Hamlet and Voas, 1993). The importance of a development phase is unrelated to its precision. Without question. requirements analysis is the most important phase, and it will forever remain the most imprecise, since it must translate abstract ideas in the application domain to concrete ideas in the software domain. At a research conference, one of the speakers made substantial use of “testset.” In the question period following, a member of the audience commented, “I’ve just realized that ‘testset’ is a palindrome!” This insight was applauded heartily. Error is also a standard IEEE term, refemng to something that is wrong with a program’s internal state-a fault in the process of becoming a failure. Although this is an increasingly important concept in testing (Sect. 6.3). “error” will seldom be used here in its IEEE technical sense. The literature is littered with uses that almost always mean “fault.” e.g., “error-based testing.” It is sometimesstated that an effective oracle requires (or is) an executablespecification.Certainly being able to execute the specification (if the specification domain is known and termination is guaranteed) does yield an effective oracle, since the specification value can be compared with the program output. But there may be other ways to make the decision that do not involve executing the specification.For example, it is possible to check the results of a division by multiplying quotient and dividend; one need not know how to divide to tell if division is correct. * “White-box” shows historical ignorance, because the opposite of a circuit that could not be examined in a black crinkle box was a circuit visible through a plexiglass case. Thus, “clear-box” is the better term. For scientific subroutines, asymptotic behavior and an investigation of the continuity of a computed function are functional test possibilities. lo It is a pitfall of thinking about branch testing to imagine that two distinct test points are required to cover a single branch. It commonly happens that one test suffices, because the branch is within a loop. In particular, the conditional branch that begins a WHILE loop is always covered by any successful test case that enters the loop, since it must eventually return and take the exit direction. Again, this may not require more than one test point, if the loop in question is within another loop. l2 Some early testing standards called for attempting path coverage, with some fudging for loops. There is confusion in the literature, exemplified by statements like “complete path testing is a perfect testing method that would find all possible failures. but it is impossible to carry it out because programs may have an infinity of paths.” The reasoning is bad: on each of that infinity of paths,
‘
SOFTWARE QUALITY, PROCESS, AND TESTING
227
there is also a practical infinity of different data values possible, so that even if all the paths are covered, bugs could be missed (Howden, 1976). I’ “Def” is a poor choice of terminology, since the idea is that V acquires a value. not that it is “defined” in the sense of being declared or allocated storage. “Set” would have been a better choice, but “def” is too entrenched in the literature to change. I‘ Mutation analysis was developed independently (and named) by Lipton (DeMillo et al., 1978). with a rather different motivation. For Lipton, mutants capture a range of possibilities among which a programmer might have made a mistaken choice. Killing mutants with test data is experimental verification that the choice represented by the dead mutant is not the correct one. The description of mutation given above correspondsto the way it was implementedby Lipton’s group; my implementation was built around a compiler, and was much more efficient, if more difficult to describe. Is The converse of Marick‘s suggestion has not been made, but it sounds almost as reasonable: devise tests to achieve (say) statement coverage, then divide these tests into functional categories and note which functions have been omitted or neglected. One might hope that this would identify missing statements or point to code whose proper testing q u i r e s repeated executions. I6 A class must be added to include those inputs that cause the program to go into an unterminating loop or otherwise give no output. I’ The competing technology of software inspection, particularly in the design phase, shows promise of taking over the role of seeking faults. It is cheaper, and apparently effective. In the celebrated example of the space-shuttle software. spare-nocxpense inspections simply made testing superfluous: testing seldom found failures in the inspected software. In response to this, it has been suggested that inspection is just a form of testing, but from a technical standpoint this seems farfetched-inspection usually does not involve execution, which is the hallmark of testing. I* Perhaps it should be used for failure detection, as discussed in Sect. 5.2. In the cleanroom methodology, only random testing is used. Advocates say this is because the methodology produces near zero-defect software, so failure-finding is inappropriate. Critics might say that the advocates don’t want to find any defects. In fact, the random testing does find some failures in cleanroomdeveloped code, but these are “easy bugs” that almost any test would uncover. Cleanroom testing is “betwixt and between” because it is not used to demonstrate that the code is bug-free (it would be interesting to see how the methods of Section 3 would do); nor does it establish that the code is particularly reliable, as described in Section 4.3, because far too few test points are used. l9 The formula assumes that d is defined over the real numbers. The lower limit in the integral would change for restricted ranges of reals. and the integral would become a sum for a discrete density. Mutation testing is the method most frequently used for comparison. Such studies take the viewpoint of that mutations are like seeded faults, and hence a method‘s ability to kill mutants is related to its failuredetection ability. However, if mutation is treated as coverage criterion as in Section 3.2, then such a comparison is like assessing one method of unknown worth with another such method. For example, in all-uses testings, the elements are DU pairs executed; in mutation testing, the elements are mutants killed, etc. REFERENCES
Antoy. S., and Hamlet, R. (1992). Self-checking against formal specifications. Int. Con$ Comput. In$, Toronto, 1992, pp. 355-360. Balcer, M. J., Hasling. W. M., and Ostrand, T. J. (1989). Automatic generation of test scripts from formal test specifications.In “Roceedings of the 3rd Symposium on Software Testing, Analysis, and Verification,” Key West, FL, ACM Press, pp. 210-218. Basili, V. R., and Selby, R. W. (1987). Comparing the effectiveness of software testing strategies. IEEE Trans. Sofhvare Eng. SE13, 1278-12%.
228
DICK HAMLET
Blum, M., and Kannan, S. (1989). Designing programs that check their work. In “21st ACM Symposium of Theory of Computing,” ACM Press. Key West, FL. pp. 86-96. Butler, R. W., and Finelli, G. B. (1991). The infeasibility of experimental quantification of lifecritical software reliability. In “Software for Critical Systems,” IEEE Press, New Orleans, LA, pp. 66-76. Campbell, J. (1990). Data-flow analysis of software change. Master’s Thesis, Oregon Graduate Center, Portland. Cobb, R. H.. and Mills, H. D. (1990). Engineering software under statistical quality control. IEEE Software, November, pp. 44-54. Dahl, 0. 3.. Dijkstra. E. W.. and Hoare. C. A. R. (1972). “Structured Programming.’’ Academic Press, London. DeMillo, R., Lipton. R.. and Sayward, F. (1978). Hints on test data selection: Help for the practicing programmer. Computer 11, 34-43. Duran, J., and Ntafos, S. (1984). An evaluation of random testing. IEEE Trans. Software Eng. SE10,438-444. Frankl. P. G., and Weyuker, E. J. (1988). An applicable famile of data flow testing criteria. IEEE Trans. Sojiware Eng. SE-14, 1483-1498. Frankl, P.. and Weyuker, E. (1993). A formal analysis of the fault-detecting ability of testing methods. IEEE Trans. Software Eng. SE-19, 202-213. Cannon, J., Hamlet, R., and McMullin. P. (1981). Data abstraction implementation. specification, and testing. ACM Trans. Program. Lang. Sysr. 3, 21 1-223. Gerhart. S., Craigen. D., and Ralston. T. (1993). Observations on industrial practice using fonnal methods. In “Proceedings of the 15th International Conference on Software Engineering,” Baltimore, IEEE Press, pp. 24-33. Goodenough,J., and Gerhart, S. (1976). Towards a theory of test data selection.IEEE Trans. Software Eng. SE-2, 156-173. Hamlet, R. G. (1977). Testing programs with the aid of a compiler. IEEE Trans.Software Eng. SE3,279-290. Hamlet, R. G. (1987). Probable correctness theory. Inf: Process. Len. 25, 17-25. Hamlet, R. G . (1989). Theoretical comparison of testing methods. In “Proceedings of the 3rd Symposium on Software Testing, Analysis, and Verification,” Key West, FL,ACM Press, pp. 28-37. Hamlet, D. (1992). Are we testing for true reliability? IEEE Software, July, pp. 21-27. Hamlet. R. (1993). “Prototype Testing Tools.” Tech. Rep. TR93-10. Portland State University, Portland, OR (to appear in Software-Pracr. Exper., 1995) Hamlet, D. (1994). Random testing. In “Encyclopedia of Software Engineering” (J. Marciniak, ed.), pp. 970-978. Wiley, New York. Hamlet, D.. and Taylor, R. (1990). Partition testing does not inspire confidence.IEEE Trans. Software Eng. SE-16, 1402-1411. Hamlet. D., and Voas, J. (1993). Faults on its sleeve: Amplifying software reliability. In “Proceedings of the International Symposiumon SoftwareTesting and Analysis.” Boston, ACM Press. pp. 89-98. Howden, W. E. (1976). Reliability of the path analysis testing strategy. IEEE Trans. Software Eng. SE-2,208-215. Howden, W. E. (1982). Weak mutation testing and completeness of test sets. IEEE Trans. Software Eng. SE-8,371-379. Knuth. D. E. (1981). “The Art of Computer Programming,” Vol. 2. Addison-Wesley, Reading, MA. Marick, B. (1991). Experience with the cost of different coverage goals for testing. In “Pacific Northwest Software Quality Conference,” Portland, OR, pp. 147-167. Musa, J. D.. Iannino. A., and Okumoto, K. (1987). “Software Reliability: Measurement, Rediction, Application.” McGraw-Hill. New York. Myers, G . J. (1979). “The Art of Software Testing.” Wiley (Interscience). New Yo&.
S O W A R E QUALITY, PROCESS, AND TESTING
229
Ntafos. S. (1988). A comparison of some structural testing strategies. IEEE Trans. Sofnare Eng. SE-14,250-256. Pamas,D. L.. van Schouwen, and Kwan, S. (1990). Evaluation of safety-criticalsoftware. Commun. ACM 33,638-648. Podgurski, A.. and Clarke, L. A. (1990). A formal model of program dependencesand its implication for software testing, debugging. and maintenance. IEEE Trans. Sofnare Eng. SE-16,965-979. Ramsey, J., and Basili, V. (1985). Analyzing the test process using structuralcoverage. In “Roceedings of the 8th International Conference on Software Engineering,” London, IEEE Press, pp. 306-312. Rapps, S.. and Weyuker, E.J. (1985). Selecting software test data using data flow information. IEEE Trans. Sofnare Eng. SE-l1,367-375. Shooman, M. L. (1983). “Software Engineering Design, Reliability, and Management.” McGrawHill. New York. Thayer. R., Lipow. M., and Nelson, E. (1978). “Software Reliability.” North-Holland. Publ., New York. Voas. J. M., and Miller, K. W. (1992). Improving the software development process using testability research. In “Proceedings of the 3rd International Symposium on Software Reliability Engineering,” IEEE Press, Research Triangle Park,NC, pp. 114-121. Weyuker, E. J., and Jeng, B. (1991). Analyzing partition testing strategies. IEEE Trans. Software Eng. SE-17,703-711. Weyuker, E. J., Weiss, S. N., and Hamlet, R. G. (1991). Comparison of program testing strategies. In “Symposium on Testing, Analysis, and Verification (TAV4),” Victoria, BC, ACM Press, pp. 1-10.
This Page Intentionally Left Blank
Advances in Benchmarking Techniques: New Standards and Quantitative Metr ics THOMAS M. CONTE Department of Electrical and Computer Engineering University of South Carolina Columbia, South Caroline
WEN-ME1 W. HWU Department of Electrical and Computer Engineering University of Illinois at Urbane-Champaign Urbane, Illinois
Abstract Comparing computer system performance is difficult. Techniques for benchmarking computers have evolved from small test programs into extensive specifications of workloads. Over the last several years. competing vendors in several major computer markets have banded together to form consortia to develop standard benchmark sets. In addition to system evaluation, benchmarking is increasingly W i g used for system design. Characterizingthe benchmarks in terms of systemdesign requirements aids both of these activities. This chapter discusses new advances in benchmarking. including quantitative benchmark characterization. An overview of several popular. contenqm my benchmark suites is presented,along with a discussion of the different philosophies of each suite. A detailed characterizationis presented for a popular workstation benchmark suite-SPECint92. These characteristics focus on memory system and processor microarchitecturc design, since these two subsystems largely determine the performance for workstationclass machines. Althoughcomprehensive in its hardware considerations, software interactions an not covered by these characteristics. Suggestions for softwan characteristics are discussed at the close of the chapter.
1. Inmuduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 FindingtheBenchmarks . . . . . . . . . . . . . . . . . . . . . . . 1.2 Characterizing Benchmarlw: Approaches . . . . . . . . . . . . . . 1.3 Thischapter.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Summary of Popular Benchmark Suites . . . . . . . . . . . . . . . . . . . 2.1 Supercomputer Benchmarking: 'Ihe Perfect Benchmarks. . . . . . . .
231 ADVANCES
IN COMPUTERS. VOL. 41
.. ..
....
.. ..
....
232 233 234 234 235 237
Chpyidu 0 1995 by Audcm* R ~ IIoc. , Au dgbU of rcpoduccka in my ram rasvcd
232
THOMAS M. CONTE AND WEN-ME1 W. HWU
2.2 Commercial System Benchmarking: The “(2 Benchmarks . . . . . . . . 2.3 Workstation Benchmarking: The SPEC Benchmarks . . . . . . . . . . . 2.4 Run Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Result Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Benchmark Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Memory Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Processor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Rocessor Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Compiler Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. FinalRemarks.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. ..
237 238 239 241 242 242 245 247 249 251 251 251
1. Introduction The fair comparison of commercial computer systems is difficult due to multiple, proprietary architectures and software offerings. Yet it is true that a given program will run faster on some systems than on others. In fact, if it is the only interesting program to the end-user, this simple test is sufficient to determine which system to purchase or which system to build. A computer by its very definition is general purpose. To capture this, a specification of a test workload, typically called a benchmark, is defined. In recent years, these specifications have begun to emerge from industry consortia, such as the System Performance Evaluation Corporation, The Perfect Club, or the Transaction Processing Council (TPC). A specification is made in one of a number of ways. At one extreme, only the pattern of the user’s requests is specified (e.g., database queries and response times). No code is specified with the benchmark. Cheating is reduced by using independent observers to audit the results. At the other extreme, test programs and their inputs are supplied, along with rigid run rules limiting the allowed code modifications. In a sense, the user’s requests are encoded in terms of these test programs. For this encoding to be accurate, each program should represent an application that a group of end users deems important. Benchmark characterization is a technique for selecting these programs (Conte and Hwu, 1991). This process quantitatively compares different benchmarks in terms of architectural parameters. It is demonstrated in this chapter with an example from the SPEC92 benchmark suite. There is much confusion over what the term “benchmark” results mean. Much of this confusion comes from the two separate uses of benchmarking. It is important to understand these two activities in order to understand benchmarking results. The more familiar use is for perJonnance evaluation of existing systems. This information is used to aid consumer purchasing decisions. Consumers of computers are the same as consumers of any product; they want the most performance for a given amount of money. Today, standard benchmark sets such as
ADVANCES IN BENCHMARKING TECHNIQUES
233
SPEC92 or TPC-A supply the majority of the performance estimations. A less familiar use of benchmarking is for sysrem design. In this situation, tradeoffs between system features are resolved by simulating the hypothetical systems as they execute the benchmarks. The most used technique for this kind of study is trace-driven simulation, where a time-ordered list of resource requests from the benchmark are fed into a discrete-event simulation. When the interaction between benchmark and system behavior is high, trace-driven simulation must be replaced by direct program interpretation. The results of the simulations, either tracedriven or interpreted, are used to decide between architectural alternatives.
1.1 Finding the Benchmarks Benchmarks must be selected carefully. They are effectively the representatives of the user’s desires. The most important rule in benchmark selection is: a benchmark should be a program thur someone cares about. The famous Towers of Hanoi puzzle is a poor benchmark for most users, since this is rarely a critical application. For many users, a compiler or spreadsheet is a good benchmark. A simulation of the thermohydrolical evolution of a reactor core would be a good benchmark if reactor simulation is a major concern to the user (the SPEC89/ SPECfp92 benchmark doduc performs this simulation). If it is not, this benchmark’s result may not matter. There are situations where benchmarks selected from different areas have similar machine requirements. This is most clear in numerical workloads, where mathematical operations are shared between highly disjoint applications. For example, computational fluid flow and computational field theory use the same mathematics. In general, a user may not know if a benchmark matches a workload without prior knowledge of the problem. Quantitatively characterizing benchmarks can solve this problem. If the characteristics of benchmark X match those of the user’s favorite program, Y, the performance results for X should predict the performance of Y. The user can now make an informed purchasing decision without ever having run Yon any system. This is one of the powerful consequences of adding quantitative metrics to benchmarking. Benchmarks should be programs someone cares about. How can that be determined? One naive approach would be to measure the occurrence of different classes of programs in everyday use. Such an analysis might erroneously determine that the network news reader and several window-based games are the most important programs and should be included in a benchmark suite. So a second rule of benchmarking is: heavy usage does not imply importance. Instead, the user often knows what programs are the most important. Blind analysis may lead to the wrong conclusions. There are pitfalls for using benchmarks without characterizing them. Some benchmarks use only small amounts of memory (e.g., a small memory “foot-
234
THOMAS M. CONTE AND WEN-ME1 W. HWU
print”) yet exercise the processor quite efficiently. Examples of such benchmarks are Whetstone and Dhrystone (Curnow and Wichmann, 1976; Weicker, 1984). Consider two computer systems: m, that has a robust (and expensive) cache memory hierarchy and m, that has a relatively simple interleaved memory system. Assume for the purpose of this argument that fl and m use identical processors, software, etc. Benchmarks with little memory usage will not be able to detect any performance difference between m and m. If the end-user plans to run programs with only small memory footprints, then either system will suffice. However, if the benchmark results are interpreted without information about the memory usage of the benchmarks, a user with memory-hungry applications may draw the wrong conclusions and purchase system rn-only to discover later that it has an insufficient memory system for the application. Similarly, building a system architecture using small-footprint benchmarks would guarantee that the system 9Jl would never be built to start with. If the benchmarks are black box tests, none of these pitfalls will be seen by the user or architect.
1.2 Characterizing Benchmarks: Approaches Selection of the appropriate set of characteristics is difficult. Consider again the design of the memory hierarchy of a system. One method for determining the memory characteristics of the benchmark is to improve the memory system until the benchmark’s performance no longer increases. This would be the critical memory size for this benchmark, or the elite memory system. A similar technique could be used for the processor, input/output (YO) devices, etc. Even though the elite memory system may not be required in an actual design, it allows the comparison of two benchmarks to see if they are equivalent. This serves as an unbiased method of benchmark comparison. Benchmark characteristics based on the elite approach are useful for design as well as performance evaluation. Even though the elite design parameters may be too expensive to implement, they are the requirements for the benchmark. As such, a machine design based on characteristics, however hypothetical, can serve as a first-cut approximation of system capacity. Most system design is an iterative process of simulation,cost analysis, and resimulation. Using this first-cut approximation will reduce significantly the number of iterations needed to make engineering tradeoffs.
1.3 This Chapter Benchmarking has evolved for two purposes: performance evaluation and system design. Benchmark characterization helps with both of these activities. It gives more information to the end-users about which benchmarks matter, and it supplies a first cut design to engineers to narrow the design space.
ADVANCES IN BENCHMARKING TECHNIQUES
235
This chapter discusses new advances in benchmarking, including benchmark characterization. The next section presents an overview of several contemporary benchmark suites. This is followed by a detailed characterization of a popular workstation benchmark set-the SPECint92 suite. The characteristics focus on memory system and processor microarchitecture design, since these two subsystems largely determine the performance for workstation class machines. Although comprehensive in its hardware considerations, software interactions are not covered by these characteristics. The effects of software on the characteristics are discussed at the end of the chapter.
2. Summary of Popular Benchmark Suites There are several current, popular benchmark suites, as summarized in Table I. Program-based benchmarks fall into several, overlapping classes:
Synthetic benchmarks. small programs especially constructed for benchmarking purposes. They do not perform any useful computation. The main idea behind synthetic benchmarks is that the average characteristics of real programs can be statistically approximated by a small program. Examples: Dhrystone, Whetstone (Curnow and Wichmann, 1976; Weicker, 1984). Kernel benchmarks. code fragments extracted from real programs, where the code fragment is believed to be responsible for most of the execution time of the program. Many of these benchmarks have the same advantages as the synthetic benchmarks: small code size and long execution time. A cluster of kernel benchmarks is part synrhetic and part kernel in nature, since it specifies a workload’s characteristics based on some analysis of the workload. For these benchmarks, the workload is specified as a synthetic list of kernels found to be important to the user of a system. TABLEI
MAJOR BENCHMARK S u m s AND THE~R USES Benchmark suite
Intended use ~~
Perfect SPEC SPECint92 SPECfp92
TPC: TPC-A
TPC-B TPC-C
Supercomputers (Cybenko ef al., 1990) Workstation. general-purpose systems (Dixit, 1991) Nonnumeric (integer code) Scientific (floating-point code) Commercial server systems Multibranch banking workloads (TPC,1992a) Batch input version of PTC-A (TPC.1992b) Warehouse operation workloads (TFC, 1992~)
236
THOMAS M. CONTE AND WEN-ME1 W. HWU
Utility benchmarks. programs widely used by the user community. (Examples: grep, compress, nrofi SPICE, espresso, li.) Application benchmarks. programs compiled to solve some specific problem. The difference between these benchmarks and utility benchmarks is not always clearly defined. At the extreme, application benchmarks have been designed to solve a specific problem of interest to a single user, whereas utility benchmarks solve a problem for a large group of users. Supercomputers are the chief domain for application benchmarks. Several other classes exist that modify the meaning of the above classes, including:
Integer benchmarks. benchmarks that perform nonnumeric work, such as Lisp interpreters, compilers, sorters, etc.
Floating-point (numeric) benchmarks. benchmarks that perform the majority of its useful work using floating-point arithmetic. These programs tend to be from scientific workloads. They also tend to be long running and very compute intensive. Recursive benchmarks. benchmarks that perform large amounts of recursion. These benchmarks strain the calling mechanisms of the hardware and the interprocedural analysis mechanisms in the compiler. Recursive benchmarks are often separated into two subclasses: shallow recursive and deeply recursive benchmarks. Shallow recursive benchmarks place much less strain on the hardwadcompiler than deeply recursive benchmarks. Partial benchmarks. partial traces of programs. Often the traces are collected in terms of a specific architecture, including timing and instruction-set specific information. Partial benchmarks are usually intended for simulation purposes only. All benchmark suites, both program-based and specification-based are targeted toward particular markets. The most widely used supercomputerbenchmark suite is the Perfect Benchmark Suite (Cybenko et al., 1990).Just below supercomputers on the cosVperformance spectrum is the commercial server market. These are now benchmarked by the Transaction Processing Council’s TPC series of benchmarks, comprising TPC-A, TPC-B,and TPC-C, with each targeted toward different workloads (TPC, 1992a,b,c). For the workstation marketplace, the Standard Performance Evaluation Corporation (SPEC) has released several benchmark suites, including the popular SPEC89 and its successor, SPEC92 (SPEC, 1989; Dixit, 1991). In March 1994, SPEC and The Perfect Club merged. The Perfect Club is now the High-Performance Steering Committee of SPEC (SPECkIPSC), whereas the former SPEC suites are under the auspices of the Open Systems Steering Committee (SPEUOSSC). The two committees will keep separate run and reporting rules for the benchmarks. The reason for this separation is due to a difference
ADVANCES IN BENCHMARKING TECHNIQUES
237
is philosophy between the two system markets. This is explained in more detail below.
2.1
Supercomputer Benchmarking: The Perfect Benchmarks
The Perfect Benchmark Suite was compiled by The Perfect Club, a consortia of supercomputer vendors and leading university researchers, founded in 1987. The goals of this suite include use of full applicationsinstead of kernel benchmarks and the reporting of sufficient data in order to understand results. The members of the suite are outlined in Table 11. All of the programs are scientific or engineering applications that are compute intensive. Many of the codes have been built to explore phenomena of interest to small research teams. This is in stark contrast to benchmarks that are utilities, where the code may perform mundane tasks such as sorting a list or assembling an object file. All of the code in the Perfect Benchmark suite is written in Fortran77. However, most of the code supplied with the suite will not run efficiently without modifications. This is because supercomputer code is often hand-tuned for particular architectures. The benchmark suite requires the recording of a baseline time for the unmodified code before the code is tuned to the hardware under test. A discussion of the consequences of this style is presented below in the discussion of run rules.
2.2 Commercial System Benchmarking: The TPC Benchmarks Commercial systems are largely used for On-Line Transaction Processing
(OLTP).Examples of OLTP include airline reservation systems and inventory TABLE I1 THEPERFECTBENCHMARKS Benchmark
Description
ADM ARCZD DYFESM KO52 MDG MG3D OCEAN QCD SPEC77 SPICE TRACK TRFD
Air pollution, fluid dynamics Supersonic reentry, 2-D fluid dynamics Structural dynamics, engineering design Transonic flow, 2-D fluid dynamics Liquid water simulation, molecular dynamics Seismic migration, signal processing Ocean simulation, 2-D fluid dynamics Lattice gauge, quantun chromodynamics Weather simulation, fluid dynamics Circuit simulation, engineering design Missle tracking, signal processing Two-electron transform integrals, molecular dynamics
*
Adapted from Kipp (1993).
238
THOMAS M. CONTE AND WEN-ME1 W. HWU
management. OLTP workloads are composed of transaction-based accesses to shared databases. The first standard in this area was Debitcredit, which came as a result of a paper published by Jim Gray of Tandem Computers under the name Anon et al. in Datamation, 1984 (Serlin, 1993; Anon et al., 1985). The initial specification of the benchmark was informal and allowed for multiple interpretations. One reason was that the benchmark did not present any code, but instead specified a workload as a particular database transaction and a rate at which the transaction should be made (expressed as an acceptable response time). In order to prove to customers that the results were valid, server vendors adopted the practice of hiring outside consulting companies to “audit” a benchmark run and make an independent report available to the consumer. These practices led to the formation of a consortium of server vendors in 1988-the Transaction Processing Council (TPC). Since its formation, TPC has released three benchmark specifications; TPC-A, TPC-B, and TPC-C. The TPC-A benchmark is a version of the original Debitcredit benchmark. It simulates the requests made by tellers at a large bank with multiple branches. The TPC-B benchmark is similar to TPC-A, but with a batch input instead of multiple, attached terminals. TPC-A and -B performance is reported in two forms: transactions per second (tps), and dollars per tps, ($/tps). The second metric, $/tps, gives a user a reasonable idea of the system cost/performance tradeoff. The third benchmark, TPC-C (released in 1992) is significantly more complex than TPC-A or TPC-B. It presents six transaction types, ranging from readonly requests to a multibatch update. Because of this complex mix, the metric transactions per second has been replaced by transactionsper minute (tpm). Like its predecessors, TPC-C results are also reported using dollars per tpm ($/tpm).
2.3 Workstation Benchmarking: The SPEC Benchmarks SPEC benchmarks are some of the most widely used benchmark suites for workstation performance evaluation. These suites were developed by the SPEC. SPEC was formed as an industry consortium in 1988. It is interesting to note that SPEC, the Perfect Club, and PTC formed at about the same time, in reaction to similar problems, and all three as joint efforts of competing vendors. The workstation market is fundamentally different from the supercomputer or commercial server market, as is reflected in both the contents and the use of these benchmark suites. SPEC Release 1.0 (also called SPEC89) contains 10 programs, including synthetic programs, kernel benchmarks, utility and application benchmarks (SPEC, 1989). SPEC89 results are summarized by the geometric mean of the runtimes of the 10 benchmarks in a metric termed SPECmarks. SPEC89 is Unixspecific, although it is sometimes used for evaluating PCs. SPEC89 was well
ADVANCES IN BENCHMARKING TECHNIQUES
239
received, but several problems developed with it. For example, the matrix300 benchmark (a kernel benchmark that performed common operationson 300 X 300 matrices), was naively written. Application of automated parallelizers, such as Kuck and Associares Paraphrase, resulted in significant speedup for vendors who had licensed this technology (Keatts er al., 1991). This was thought to be unfair, not beause the optimization was difficult to perform or generated much benefit, but because an increase in this one benchmark resulted in a significant increase in the summary SPECmark numbers (Case, 1992). Several of the problems in SPEC89 were solved in the SPEC92 benchmark suite, released in January 1992 (Dixit, 1991). SPEC92 is divided into two subsets of benchmarks: SPECint92 (6 integer benchmarks) and SPECfp92 (14 floatingpoint benchmarks). SPECint92 contains several of the same integer benchmarks as the original SPEC89, including expresso, li, eqntott, and a longer-running version of gcc. Two new integer benchmarks have been added: sc, a publicdomain spreadsheet program, and compress, the Lxmpel-Ziv file compressor. The increase in the size of the supplied floating-point benchmarks from 6 in SPEC89 to 14 in SPEC92 helps reduce a repeat of the mrrix300 phenomenon.
2.4
Run Rules
Each standard set of benchmarks has emerged after a period of cheating by system vendors. To reduce this cheating, run rules for the benchmark sets have been developed. These rules must be followed to obtain valid performance measurements. Comparison of the run rules between Perfect, TPC, and SPEC provides some interesing insight into different benchmarking philosophies and their corresponding views of cheating. One run rule for the popular SPEC92 suite is:
Special features. No special software features tailored specificallyfor a particular benchmark may be used. The reason for this rule is to prevent compiler writers from producing optimizations tailored to a particular benchmark. One set of these kinds of optimizations are unsafe optimizations: these may speed up a special case, yet do not operate correctly for all cases. An example is string copying. Many architectures have longer memory access times for byte transfers than for word transfers. The semantics of string copying are to copy each byte from one string to another, checking for end-of-string [American Standard Code for Information Interchange (ASCII) NWL] after each memory-to-memory copy. A much faster method is to copy multiple-byte words, checking the word for NUL-byre entries. This often overcopies a string, which can result in data corruption in some programs. None of the programs that use string copying in SPEC92 are affected by overcopying.
240
THOMAS M. CONTE AND WEN-ME1 W. HWU
It is therefore tempting to use this unsafe optimization for SPEC92, even though it violates the SPEC92 special features run rule. The other SPEC92 run rules are also designed to prevent cheating. These rules include:
Source code modification. No source code may be modified, except for the purposes of portability. Special libraries. Special libraries may be used as long as they: (1)are announced products, (2) require no source code modification, (3) do not replace routines in the benchmark source code, and (4) are not benchmark specific. Results. The benchmark runs must produce the correct results. The SPEC92 benchmarks are designed for workstation evaluation, where the user is relatively unsophisticated. Contrast these to the run rules for the Perfect Benchmarks (paraphrased Kipp, 1993):
Baseline measurements. Results must be obtained for unmodified code. One exception is to modify the code in order to port it. Manual optimization.The following optimizations are allowed: (1) addition of compiler directives to enhance performance, (2) hand software restructuring (e.g., loop transformations), and (3) algorithm modification to replace the original sequential algorithm with a vectorized or parallel version. Diary. Benchmarkers are required to maintain an optimization diary that specifies the transformations used, the amount of human effort that each entailed, and the performance payoff for each. Results. The benchmark runs must produce the correct results. The Perfect Benchmarks come complete with a guide that suggests optimizations for the codes. The difference in philosophy between the Perfect Benchmarks and the SPEC92 suites is a reflection of the differences between the two communities. In supercomputing, hand optimization is considered a requirement. The extra effort is justified by the cost of the machine, which itself is justified by the potential for performance improvement. In a sense, the human applications programmers are part of the system. In a workstation, the cost is less prohibitive, and the variety of tasks performed by workstations is much greater than that of supercomputers. Because of these facts, support personnel are not part of the system configuration. The very concept of a workstation embodies this philosophy: one computer, one user. For workstations, hand-optimization (i.e., source code modification) is not allowed. Although hand-optimization is allowed in The Perfect Club, some optimizations can improve program speed yet reduce result accuracy. As an extreme example of this, consider converting all calculations from double-precision to single-
ADVANCES IN BENCHMARKING TECHNIQUES
24 1
precision floating-point arithmetic. Special limits are placed on result accuracy to reduce such cheating. These limits are discussed in the following section. The TPC benchmarks are characterized completely by their run rules, since no code is contained in these benchmarks. This is a further evolution of the Perfect Benchmarks’ philosophy that only the results must be correct to obtain a valid run. The run rules of each of TPC-A, TPC-B, and TPC-Cdiffer and are quite complex. Examples of TPC run rules are:
Terminal inputs and outputs. (Paragraph 1.3.1:) Each transaction is composed of 100 alphanumeric bytes composed of at least four fields: Accounr-ID, TellerJD, Branch-ID, and Delta. (Paragraph 1.3.3:) User-level data may not be compressed. (Paraphrased from TPC, 1992a.) The ACID properties. (Paragraph 2.1.1 :) The ACID (Atomicity, Consistency, Isolation, and Durability) properties of transaction processing systems must be supported by the system under test during the running of the benchmark. (from TPC, 1992a.) Both of these rules limit cheating. The specification contains approximately 150 such rules, with topics such as VOs, database design, response time requirements, reporting formats, steady-state behavior, hardware configurations, and pricing methodologies. This last set of rules is included to reduce cheating in the $/tps metric. Configurations for high-end, commercial servers can retail for $1.5-$2 million. It is clear that the high cost of commercial servers lends itself to a philosophy similar to that of supercomputers, where the variety of potential workloads is relatively small. Also, in many cases, support personnel are part of the system configuration. Without any benchmark code, rules are not enough to guarantee vendors do not cheat. The TPC goes further than SPEC or the Perfect Benchmarks and requires that each benchmark result is audited and certified by an associate of the council.
2.5 Result Accuracy Verifying the correct results for a nonnumeric benchmark often means comparing the output to some standard. This is nontrivial for numeric benchmarks. Many compiler optimizations can alter the outcome of a numeric calculation. One example is strength and reduction, which can convert a division into a subtraction for small divisors. Acceptable result accuracy is usually the prerogative of the programmer or user. SPECfg92 and the Perfect Benchmarks trivialize such decisions and eliminate the chance of cheating by specifying acceptable tolerances for each of their benchmarks. There are two classes of tolerances: relative and absolute.
242
THOMAS M. CONTE AND WEN-ME1 W. HWU
Relative tolerance. The relative maximum difference between two numbers, expressed as a fraction. Absolute tolerance. The maximum absolute difference between two numbers. These tolerances are checked after the program runs, using the spi# tool for SPECfp92.The Perfect Benchmarks are self-checking.They have additional code that checks the results against relative error constraints built into the code. The Perfect Benchmarks enforce relative tolerance constraints. SPECfp92specifies either relative, absolute, or both constraints. Several SPECfp92 benchmarks (e.g., doduc, swm256,andfpppp) specify both a relative and an absolute tolerance. If the benchmark results do not satisfy both of these criteria, it fails and the results are automatically determined to be invalid by the SPECfp92 support software.(The SPECfp92tolerances are summarizedin Table III.) Two SPECfp92 benchmarks (doduc and fpppp) converge iteratively. The run rules for these benchmarks specify a tolerance on the results and on the number of iterations required to converge (see Table 111). The Perfect Benchmarks do not allow adjustable iterations, but instead set a constant number of iterations for each benchmark.
3. Benchmark Characterizations The run rules of the previous section suggest that benchmarking is a subjective activity for the Perfect Benchmarks or the TPC benchmark specifications. It is difficultto characterizethese benchmarks quantitatively without also characterizing the cleverness of the support personnel. This is not an issue for workstation benchmarking. Therefore, it is possible to characterize sets such as SPECint92. The benefits of the characterization are a detailed understanding of benchmark requirements and a prediction of benchmark performance on various systems. This section demonstratesa technique to characterizethe workstation benchmarks of SPECint92. These benchmarks are summarized in Table IV, along with a classification according to the categories discussed in the previous section (note that all the benchmarks are integer benchmarks).
3.1 Methods A benchmark makes requests of the system as it executes and alters its behavior accordingly.The requests can be in the form of instructions to execute, references to memory locations, requests to the kernel for system functions, etc. The term truce is used here to refer to a log of these requests. Figure 1 shows a memory hierarchy composed of one cache indexed by virtual addresses, one cache indexed by physical addresses, the physical memory, and
ADVANCES IN BENCHMARKING TECHNIQUES
243
TABLE 111 RELAnvE AND ABSOLUTE TOLERANCES FOR THE SPECfc92 BENCHMARKS
Tolerance Benchmark
Relative
spice2g6
Absolute
0.01
doduc
Results Iterations mdljdp2 wave5 tomcaw alvinn ear mdljsp2 swm256 su2cor hydro2d fPPP Results Iterations
0.05 0.02 0.025 0.03 0.001 0.001
0.009 0.009
1 x 10-7
0.01 0.01 0.01 0.10
0.001
1 x 10-7 0.00025
the virtual memory of the system. Only the address usage is shown in the figure; the data path has been omitted. An address tract can be taken at any point along this hierarchy. Address traces taken lower in the hierarchy are specific to the behavior of the combination of the program and the hardware. For example, in the figure the physically addressed cache receives only references that miss in the virtually addressed cache and they are translated to physical addresses; therefore, its address trace is a function of the design of the virtually-addressed cache. TAeLe IV TIE SPECint92 BENCHMARK SeT (Drxrr. 1991) Benchmark
Class ~~
~
Description ~
compress
Utility
eqnton espresso gcc sc xlisp
Utility Utility Utility Utility Application, recursive
~
Reduces the size of the files using adaptive Lempel-Ziv coding Generates truth table from logic equations Performs PLA optimization Compilation using GNU C compiler Curses-based spreadsheet calculation Lisp interpreter executing the Nine Queens problem.
244
THOMAS M. CONTE AND WEN-ME1 W. HWU virtual addresses (benchmark-generated)
1 cache (virtual-addressed)
translation buffer
I cache
physical memory
virtual memory (disk) FIG. I . A typical memory hierarchy showing address usage.
It is also a function of the multiprogramming level of the system, since the virtual-to-physical page mapping depends on the ensemble of the address requests of all jobs running in the system. However, pages are only mapped to physical locations at power-of-2 boundaries. If the trace is composed of the virtual addresses, the simulation of a first-level virtually addressed cache is valid, and the simulation of a first-level physically addressed cache with size less than the page size is also valid. For the remainder of the hierarchy, a prototype cache for a given level can be used tojlter the trace for use in simulating the next-level cache. For these reasons, the virfual address trace, a trace of the virtual addresses referenced during the execution of the program, is used here. The compiler can aid in tracing the virtual addresses by generating extra instructions surrounding all loadstore operations. These added instructions record the addressesof the load or store them in a truce buffer. During program execution, the trace buffer is periodically flushed to a trace consumer (e.g., a simulator) or written to a file. This technique is an application of sofhvare instrumentation and has been used by Lams for AE, Stunkel and Fuchs for TRAPEDS, and Golden for Spike, among others (Larus, 1990; Stunkel and Fuchs, 1989; Golden, 1991). This chapter uses Spike for software instrumentation of virtual address traces.
ADVANCES IN BENCHMARKING TECHNIQUES
245
Capturing traces of instructions poses a difficult problem. Instruction encodings are idiosyncratic and vary widely from vendor to vendor. Only one vendor encoding must be used to estimate instruction cache performance. The PA-RISC instruction set (Version 1.1) used in Hewlett-Packard workstations is the target for this study. This chapter uses the GNU C retargetable, optimizing compiler (Version 2.2.3) with optimizations enabled to translate benchmark source into executables (Stallman, 1989). The PA-RISC delay slots and other hardwarespecific features are not encoded in the traces. The resulting trace is free of anomalies that might complicate the interpretation of the results. Traces are traditionally written to a secondary storage device and then used for simulation. Two of the SPECint92 benchmarks, gcc and espresso, have data address traces of slightly more than 33 and 150 million references, respectively. Each of the references is a 32-bit quantity, resulting in sizes of 132-600 MB of storage. The gcc and espresso benchmarks have the two smallest data address traces of the benchmarks in the SPECint92 benchmark set. A solution to the problem of large trace size is in-process trace generation. In this technique, the simulation is combined with the benchmark in the same process. The simulator is compiled as an auxiliary function to the benchmark, and the compiler inserts periodic calls to the simulator to flush the trace buffer. The trace need not be recorded since it is regenerated by running the benchmark.
3.2 Memory Characteristics The benchmarks are characterized in terms of the smallest cache size required to achieve a 1% miss ratio. This information is presented graphically in Fig. 2 for instruction cache designs, and in Fig. 3 for data cache designs. The figures display the size requirements for both 16- and 32-byte cache block sizes. The different system requirements for the benchmarks are clear from the cache characteristics. The data cache designs of Fig. 3 can be used in much the same way as the instructioncache designs. Considerthe instruction cache characteristics for gcc, which requires 128KB of cache when the block size is 16 bytes. Such a high requirement suggests two things: gcc has a large, diverse instruction access footprint, and gcc will run more efficiently on a processor with very large instruction caches. Profile-driven trace layout has been suggested as a technique to improve instruction cache design (Hwu and Chang, 1989). Gcc is more likely to benefit from this kind of optimization. Consider the problem of interpreting benchmark results. If a user has an application with cache performance close to gcc, then the SPECint92 numbers for gcc will likely predict the performance of the user’s application. There are other factors involved, such as UO design or processor design (see below). However, the characterization process can add meaning to the SPECint92 results for gcc for this user.
THOMAS M. CONTE AND WEN-ME1 W. HWU aa
o direct-mapped
1
I 2-way set assoc. I &way set assoc.
18 ao
16
.i 2 3, Po"
fully-associative
n
-
14-
ia10-
2
8 6 -
4 -
a -
-
LI 311
161 bloch31B block.
compress
168
321
II 1 311
321
168
168
161
eqntott espresso gcc FIG.2. Instruction cache characteristics.
xlisp
8c
0direct-mapped
7
I 2-way set assoc. I +way set Bssoc. I fully-associative
an
n
I 3a1
compress
eqntott
espresso
gcc
FIG.3. Data cache characteristics.
xliap
ADVANCES IN BENCHMARKING TECHNIQUES
247
Another striking feature of the instruction cache data is the variation between benchmarks. Compress, espresso, and eqntott do not require increased associativity or increased block size to achieve a 1% miss ratio. However, there is a tradeoff between these parameters for gcc and xlisp. Sc has behavior that lies somewhere in-between: increased block size is not important, but increased associativity can reduce cache size requirements by a factor of 4. It is interesting to note that some of these same variations are present in the data cache characteristics. This is no surprise, since instruction reference patterns influence data reference patterns, especially for global data accesses. Consider now the problem of designing a new system. If the entire benchmark set must be supported, associativity shoud be high to satisfy benchmarks such as gcc, Sc, orxlisp, yet the traditional tradeoff between cache size and associativity is not as important. A designer may infer a design such as a cache size that matches gcc's requirements with a block size of 32 bytes and a moderate degree of associativity to aid sc and xlisp. Further simulation of the entire system would then help fine-tune these parameters. This illustrates the use of characteristics as first-cut designs.
3.3 Processor Model A very powerful processor model is needed to characterizethe effect of processor architecture on benchmark performance. Less powerful processor architectures that have limited hardware resources would tend to homogenize the benchmark characteristics. The processor model for this study is a superscalar engine with full-Tomasulo scheduling and pipelined functional units. To achieve high parallelism, instructions are issued at a rate of eight per cycle, and integer and floating-point functional units are duplicated. The types of functional units are shown in Table V. The Latency column presents the operation latency for each functional unit. These values are based on typical design parameters. Functional units of the IAlu type implement integer addition, subtraction, and logical operations. Functional units are pipelined, with the exception of the FPDiv unit (see below). The instructionsthat perform address calculation are executed by the AddrC unit. In some processors, an IAlu unit is used for AddrC operations. This is also the case with the Move unit, which performs register-to-register transfer operations. The Shift unit peforms binary bitwise shift operations. This is often implemented using a barrel shifter. The floating-point units are grouped into addition (FPAdd), multiplication (FPMul), conversion (FPCvt), and division (FPDiv). FPDiv is a pseudo-unit; division actually takes place in the multiplier using the quadratic convergence division method in an iterative, unpipelined fashion.' The data cache is accessed through two units: Loud and Store. For the purposes of characterization,loads and stores never miss in the data cache. Lastly, the Brunch unit performs a control transfer.
THOMAS M. CONTE AND WEN-ME1 W. HWU
TABLE V FUNCTIONAL UNITTypes ~
Functional unit
Description Integer ALU Integer multiply Integer divide, remainder Address calculation Register to register move Shift Load store FP add FP multiply FP divide, remainder Fp convert Branch Cornpardtest
IAlu IMul IDiv Add& Move Shifr
Load Store FPAdd FPMul FPDiv FPCvr Branch Tesr
Latency 1
3 10 1 1 .
1
2 1 3 3 10 3 1 1
Abbreviations: ALU, arithmetic and logic unit; FP,Floatingpoint.
Architectural behavior is determined via trace-driven simulation. The simulator implements a dynamic instruction scheduling model, with the window for instruction scheduling moving between correctly predicted branches. Yeh’s adaptive training branch algorithm is used to predict branch behavior, since it is currently one of the most accurate prediction schemes (Yeh and Patt, 1991). Table VI presents the performance of this scheme. Since the benchmarks can generate extremely long traces, trace sampling techniques are employed to reduce trace size and simulationtime (Conte, 1992; Conte and Mangione-Smith, 1993). Branch hardware simulation is done for the full traces. The full trace is used to mark each TABLEVI PERFORMANCE OF THE BRANCH
PREDICTION SCHEME Benchmark
Prediction accuracy (S)
compress eqnrort espresso gcc
90.4 95.8 93.7 83.8 94.8 90.9
SC
xlisp
249
ADVANCES IN BENCHMARKING TECHNIQUES
incorrectly predicted branch in the sampled trace file. This lessens considerablythe possibility of sampling error.
3.4 Processor Characteristics Characterization of the processor resources is done by supplying an unlimited number of each type of unit and observing the dynamic unit usage. An example distribution of this usage is presented in Table VII for gcc. The table shows, for example, that gcc needs two IAfu units 18% of the time, or three Loud ports 2% of the time. These results can be used to characterize the required processor resources. Tables VIII and IX present this type of characterization. Table VIII shows the number of units required to satisfy 95% of the total execution cycles of the benchmarks. Table IX shows similar requirements, but for 99% of the execution cycles. The first notable feature of the characteristics is the similarity, rather than the differences, between the benchmarks. All benchmarks need a large number of IAfu units. The SPECint92benchmarks are true to their integer benchmark classification. Load units (i.e., cache ports) are very important. Two to three Loud units are required even when the 95% criterion is used for characterization. Address calculations are very critical for compress and xfisp; they are not critical for eqntott. Processor characteristics can be used by customers in much the same way as memory characteristics. If a user's application benchmark has requirements that T ~ LVII E
DYNAMIC FUNCTIONAL Urn DISTRIBUTION (PERCEKTAQE) FOR
gcc
Resource usage distribution for function units ($" (x). in percent)
Type
0
1
2
3
4
5
6
7
8
9
10
11
12
13
214
~
IAh IMul IDiv Add& Move Shifr
Load Store
FPAdd FPMul FPDiv FPCvr Branch a
38 29 18 99 0.2 99 0.8 68 20 8 89 7 2 3 85 10 48 31 14 86 10 2 lo0
8
4
1
1 1.0 0.5 2 0.3
0.4 0.1 0.3 -'
1
0.7
0.4
-' -' -'
-' -'
-'
-a
-'
1
0.2
-'
-' -'
-'
-a -a -a
-a
-a
-
100
99 -' 99 -a 70 20
Less than 1%.
-O
5
0.3
-' -' -'
-'
-0
-0
-a
250
THOMAS M. CONTE AND WEN-ME1 W. HWU
TABLEVIII PROCESSORRESOURCEREQUIREMENTS
FOR 95% OF EXECUTION
Benchmark
lAlu
lMul
IDiv
Add&
Move
Shift
Load
compress eqntott espresso gcc
3 4 4 4 4 3
1
1 1 1 1 1 1
3 I 2 2
1
2
1 1 1 1 1
1 1
1 1 1
1 1
2
3
1
2 2 3 2 2 3
Store
FPAdd
FPMul
FPDiv
FPCvt
Branch
Tesr
1 1 1 1 1
1 1 1 1
1 1
1 1 1 1
2 3
2
1
1 1 1 1 1 1
sc
xlisp
compress eqntott espresso gcc sc xlisp
I
1 1 1 1
2
1 1
2
1
1 1 1
2 2 2
1 1 1
TABLEIX PROCESSORRESOURCEREQUIREMENTS
Benchmark compress eqntotr espresso gcc sc xlisp
compress eqnrorr espresso gcc sc xlisp
FOR 99%OF EXECUTION
IAlu
IMul
IDiv
Add&
Move
Shifr
bad
5
1 1
2
4
1
3
1 1 1 1
4 1 3 3 3 4
1 1
2 3 2 2
2 2 2
4 4
4
1 1 1 1 1 1
1
3 3
Store
FPAdd
FPMul
FPDiv
FPCvr
Branch
Tesr
2
1 1
1 1 1
1 1
2 3 2
1 1
2
1 1 1 1
1
1
1 1
1
1
1 1 1
4 4
3
1 1 1
1 1 1
4
5 5 5
1 1
1
1
3
I
ADVANCES IN BENCHMARKING TECHNIQUES
251
match one of the SPECint92 benchmarks, then performance differences between platforms may be compared using published results for the SPECint92benchmark. In addition,designers may use the characteristicsto decide the degree of functional unit duplication.
3.5 Compiler Effects The benchmark characterization technique relies on standard compiler optimizations. The prevalent GNU C compiler is used here to compile the benchmarks, since it reflects the median offering for workstation compilation. However, compilers are becoming increasingly integral contributers to the performance of systems. Recent work has been done, both by the authors and others, to study the effect of optimizations on benchmark characteristics. Compiler optimizations fall into several categories: (1) traditional optimizations, (2) scheduling, and (3) resource allocation (e.g., register allocation), and (4) cache performance improvement. Traditional optimizations are used across all platforms (GCC implements them). However, their effects can be detrimental in some cases. For example, common subexpression elimination can add a data dependence to otherwise parallel code. The effect of traditional optimizations on processor characteristics has been studied by Conte and Menezes (1994). In general, these effects can be estimated to a high degree of accuracy. Register allocation and scheduling interact. Applying the elite design principle would suggest removing register allocation from consideration. This can be done by characterizing the required number of registers for a benchmark. Mangione-Smith ( 1994) studied these effects for polycyclic scheduling (Le., software pipelining). Code-expanding optimizations can alter instruction cache performance. Chen et al. (1993) derived a method to predict these effects using a detailed model of memory characteristics that is based on the concept of recurrences and conflicts. This model can be used in much the same way as required cache size is used in this chapter. Lastly, data cache optimizations most often rely on a form of prefetching to reduce misses. The effect of these optimizations on benchmark characteristics has yet to be studied.
4.
Final Remarks
The last several years have yielded several industry consortia with a common goal of standardized benchmarking. These consortia have been created for the supercomputing, comercial server, and workstation markets. Each industry has different average cost/performance points. When the relative cost of hardware surpasses the cost of applications programming, benchmarking run rules are liberalized. There are two styles for liberal run rules. The first style supplies code
252
THOMAS M. CONTE AND WEN-ME1 W. HWU
that may be modified in any form, as long as the results match a specification (e.g., the Perfect Benchmarks). The second style supplies a specification for the input to the application, without any code whatsoever (e.g., the TPC benchmarks). In either case, the modifications are carefully monitored using either a diary of modifications or independent auditing of results. Workstation markets are the midpoint between the high-cost, high-performance supercomputer/server markets and the low-cost, low-performance personal computer market. As such, the expectations for these systems are high. Their benchmarks have rigid run rules since the cost of additional programming personnel would eclipse the cost of the hardware. This chapter has presented a characterization technique for quantifying the meaning of workstation benchmark results. The characteristicsare measured in terms of the resource requirements for custom, elite designs. If two benchmarks have nearly identical characteristics, they should behave similarly across a range of workstations. In this way, a user can compare the characteristics of a local application to the members of the SPEC92 suite. If a match is found, then the performance of that single SPEC92 benchmark can be used to make purchasing decisions, instead of relying blindly on SPECmarks. ENDNOTES I This algorithm can achieve the precision required by the Institute of Electrical and Electronics Engineers (IEEE) standard at reasonable cost and speed (Koren, 1993) and was implemented in the RS/6000(Markstein, 1990).
REFERENCES Anon el al. (1985). A measure of transaction processing power. Datamation 31(7), 112. Case, B. ( 1992). Updated SPEC benchmarks released. Microprocessor Reporl September. Chen, W. Y.,Chang, P. P.. Conte, T. M., and Hwu, W. W. (1993). The effect of code expanding optimizations on instruction cache design. IEEE Trans on Cornput. 42(9), 1045-1057. Conte, T. M. (1992). Systematic computer architecture prototyping. Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Illinois. Urbana. Conte, T. M., and Hwu, W. W. (1991). Benchmark characterization. IEEE Compur., pp. 48-56. Conte. T. M., and Mangione-Smith,W. (1993). Determining cost-effective multiple issue processor designs. In “Proceedings of the International Conference on Computer Design, Cambridge, MA.” Conte, T. M., and Menezes, K.N. P. (1994). The effects of traditional compiler optimizations on superscalar architectural design. In “The Interaction of Compilation Technology and Computer Architecture” (D. J. Lilja and P. L. Bird, 4s.). pp. 119-136. Kluwer Academic Publishers, Boston. Curnow, H. J., and Wichmann, B. A. (1976). A synthetic benchmark. Compur. J. 19( I). Cybenko. G..Kipp, L., Pointer, L.. and Kuck, D. (1990). Supercomputer performance evaluation and the Perfect Club. In “ b e e d i n g s of the International Conference on Supercomputing,” pp. 254-266. JEEE Computer Society Press, Amsterdam, The Netherlands. Dixit, K.M. (1991). CINT92 and CFF92 benchmark descriptions. SPEC Newsl. 3(4). Golden, M. L. (1991). Issues in trace collection through program instrumentation. Master’s Thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign.
ADVANCES IN BENCHMARKING TECHNIQUES
253
Hwu, W. W., and Chang, P. P. (1989). Achieving high instruction cache performance with an optimizing compiler. In “Proceedings of the 16th Annual International Symposium on Computer Architecture.” IEEE Computer Society Press, Jerusalem, Israel,” pp. 242-25 1, Keatts, J., Balan. S.. and Bodo, P. (1991). Matrix300 performance concerns. SPEC Newsl. x4). Kipp, L. (1993). “Perfect Benchmarks Documentation: Suite 1, Tech. Rep. Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign. Koren, I. (1993). “Computer Arithmetic Algorithms.” Prentice-Hall. Englewood Cliffs, NJ. Lams, J. R. (1990). “Abstract Execution: A Technique for Efficiently Tracing Programs.” Tech. Rep. Computer Sciences Department, University of Wisconsin-Madison. Mangione-Smith,W. (1994). Register requirements for high performance code scheduling. In “The Interaction of Compilation Technology and Computer Architecture,” (D. J. Lilja and P. L. Bird, eds.), pp. 51-86. Kluwer Academic Publishers, Boston. Markstein, P. W. (1990). Computation of elementary functions on the IBM RISC system/6000 processor. IBM J. Res. Dev. 34(1), 11 1-1 19. Serlin, 0. (1993). The history of Debitcredit and the TPC. In “The Benchmark Handbook for Database and Transaction Processing Systems” (J. Gray, ed.). pp. 21-40. Morgan Kaufmann,San Mateo, CA. Standard Performance Evaluation Corporation. (SPEC) (1989). SPEC Newsletter, Vol. 1, No. 1. SPEC, Fremont, CA. Stallman, R. M. (1989). “Using and Porting GNU CC.” Free Software Foundation. Stunkel, C. B., and Fuchs, W. K. (1989). TRAPEDS: Producing traces for multicomputers via execution driven simulation. In “Proceedings of the ACM SIGMETRICS ’89 and PERFORMANCE ’89 International Conference on Measurement and Modeling of Computation Systems, Berkeley, CA,” pp. 70-78. TransactionProcessing Council (TPC)(1992a). “TPC BenchmarkA StandardSpecification,Revision 1.1,” Tech. Rep. TPC, San Jose, CA. TransactionProcessing Council (TPC) (1992b). “TPC BenchmarkB StandardSpecification,Revision 1.1,” Tech. Rep. TPC, San Jose, CA. TransactionProcessing Council (TPC) (1992~).“TPC Benchmark C StandardSpecification,Revision 1.0,” Tech. Rep. TPC. San Jose, CA. Weicker, R. P. (1984). Dhrystone: A synthetic systems programming benchmark. Commun. ACM 27( lo), 1013- 1030. Yeh, T., and Patt, Y. N. (1991). Two-level adaptive training branch prediction. In “Proceedings of the 24th Annual International Symposium on Microarchitecture.” Albuquerque, NM,”pp. 51-61.
This Page Intentionally Left Blank
An Evolutionary Path for Transaction Processing Systems CALTON PU, AVRAHAM LEFF, AND SHU-WIE F. CHEN Department of Computer Science Columbia University New York, New York
Abstract Transaction processing (TF’) systems have been evolving toward greater distribution, heterogeneity, and autonomy. This chapter shows that a specific evolutionary path has been followed in the development of TP systems. We introduce a classification scheme for TP systems, based on the different constraints that must be addressed in different environments. The task of implementing the traditional transaction model while not degrading system performancebecomes especially difficult in certain system configurations. We identify specific problems. and abstract them into a set of five system dimensions: processes, machines, machine heterogeneity, data schemas. and sites. These dimensionsinvolveissues such as concurrency,coordination,data homogeneity, and autonomy. The effects of distribution are now well understood:our discussion therefore focuses on steps that have been taken in the evolution toward heterogeneity and autonomy. Autonomous TP appears to offer the greatest challenges to the traditional TP model. These challenges-and various approaches to their solution-are examined. Keywords:transaction processing, heterogeneity, autonomy, concurrency control, crash recovery, federated databases, epsilon serializability.
1. Introduction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Classification Model and Terminology. . . . . . . . . . . . . . . . . . . . . . 3. Taxonomy Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Base System (Pl, Hl, HI, D1, S I ) . 3.2 Process Parameter (P,, M1, H1, D1, S l ) . . . . . . . . . . . . . . . . . . . 3.3 Machine Parameter (P,, H., HI. D1. Sl) . . . . . . . . . . . . . . . . . . . 3.4 Machine Heterogeneity Parameter (P,, M,, H,, D,, S1). . . . . . . . . . . . . 3.5 DataParameter(P., H., H,. D,. S,). . . . . . . . . . . . . . . . . . . . . 3.6 Site Parameter (Px, H,, H,, D,, So) . . . . . . . . . . . . . . . . . . . . . 4. Systems Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 “Base” System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Evolution along the Data Dimension . . . . . . . . . . . . . . . . . . . . 4.3 Evolution along the Machine Dimension . . . . . . . . . . . . . . . . . .
.....................
255 ADVANCES
IN
COMPUTERS. VOL. 41
256 257 260 260 261 262 267 271 276 280 281 282 283
Copyright Q 1995 by Acsdcmic Ress. Irr. AU @la Of rrproduetion h M y form nsmed.
PU, LEFF, AND CHEN 4.4 Evolution along the Machine Heterogeneity Dimension . 4.5 Evolution along the Site Dimension. . . . . . . . . . 4.6 OpenAreas.. . . . . . . . . . . . . . . . . . . 5. Beyond Traditional TP. . . . . . . . . . . . . . . . . . 5.1 Nontraditional TI? Examples . . . . . . . . . . . . 5.2 Extending Traditional TP . . . . . . . . . . . . . . 5.3 Conclusion . . . . . . . . . . . . . . . . . . .. . References . . . . . . . . . . . . . . . . . . . . . .
........... ...........
...........
...........
...........
........... . . . . . . . . . . . ...........
284
286 281 281 288 290
293 295
1. Introduction Database systems provide an organized and consistent way of accessing vast amounts of data. User interaction with databases is facilitated by data and rransacrion models that define the external functionality of a database system. Data models structure a database by imposing relationships and constraints among the atomic data units that comprise the system. Transaction models structure user interaction with a database by imposing constraints on the type and outcome of operations on the database. Specifically, the traditional transaction model (Gray, 1978) possesses the two key properties of recovery and concurrency atomicity. The recovery atomicity property means that when users initiate a transaction, the transaction either executes in its entirety or will have no effect whatsoever on the database. Thus, even if underlying hardware or software fail, application programs will not corrupt the database with incorrect results. The concurrency atomicity property means that users are assured that concurrent execution of another transaction will not affect their own application. This is done by controlling execution of transactions so that their interleaved executions are equivalent (in their effect) to some serial execution. Transaction constraints are particularly needed for database use because the results of transactions are “important.” Recovery atomicity is a general recovery technique that is provably correct and provides fault tolerance. Similarly, concurrency atomicity is a provably correct general technique for increasing concurrency without introducing inconsistency into the database or into one’s results. Both of these properties are intuitively appealing because they allow users to blackbox a complex database application program as an atomic, fail-safe modification of the database. Although the database is used by many people simultaneously, and is subject to many forms of real-life failure, transaction processing (TP) systems enable users to completely ignore these complexities. In other words, TP systems enable people to think of themselves as being the sole user of an idealized (failure free) database system. While the external functionality offered by data and transaction models is very attractive, effective internal mechanisms must be devised for their implementation. Performance-whether defined in terms of transaction throughput, response
AN EVOLUTIONARY PATH
257
time, or system availability-cannot degrade “too much” in achieving this functionality. For example,the areas of file structures,indexing, and query optimization must all be addressed in order that data model implementation not compromise system performance. Similarly, in order to implement a transaction model in database systems, additional mechanisms must be added that supply transaction properties. The required mechanisms, and their complexity, vary from system to system. When the underlying database is “simple” [e.g., a single-thread personal computer (PC)], the problems are relatively few. In contrast, if the database system is built on top of a network of machines, the problems are much greater. If these machines are heterogeneous [say with respect to the database management system software (DBMS), or simply in their hardware], the difficulties are further compounded. In this chapter, we discuss the problems that are faced in building TP systems, and show that the difficulties are usefully viewed as a function of specific attributes of the underlying database system. We classify the attributes along five dimensions, and analyze the specific problems posed by various combinations of system characteristics.First, in Section 2, we present a model of a TP system, and introduce the dimensions that we believe are important in classifying TP systems. In Section 3 we analyze the effect of moving along the different dimensions, with an emphasis on the independence of these dimensions. Section 4 describes the evolution of TP systems in terms of the above classification. In Section 5 we sketch some currently active areas in TP research.
2. Classification Model and Terminology Figure 1 sketches a hardware model of TP systems. A number of machines (perhaps only one) are connected via a network such as a local area network (LAN)or wide area network (WAN). (If the system is comprised of a single
Network (UNMAN)
U FIG. 1. Hardware model of TP systems.
258
pu, LEFF, AND CHEN
machine, the system does not, of course, access a network). Each machine contains CPU and main-memory, and can access nonvolatile data on disk. The features that distinguish a transaction processing system from a database system are shown in the software model of a transaction processing system of Fig. 2. When users access the database, although the usual queryhpdate processing is done, the interaction is controlled by two modules that reside “on top” of the database: a concurrency control (CC) and recovery manager (RM) module. Let M be a set of machines [each possessing memory and central processing unit (CPU)] on which processes run. Information can be transmitted between machines via a communication network. These are the machines shown in Fig. 1. Let P be a set of processes, each being an abstract entity of computation, (e.g., a thread of control) that runs on the machines. For example, when a user inputs a database request on the terminal, a process is invoked in order to satisfy the request. Let H be the degree of heterogeneity among the set of machines (and associated system software) in the system. (The diagram does not constrain the machines in Fig. 1 to be identical in any way.) Let D be a set of logical data descriptions (e.g., schemas). Each logical data schema is coupled with a mapping mechanism such that, when users read a logical datum, a unique and consistent value is always returned. The physical data reside in main memory and on disk: the logical schema enable this data to be correctly interpreted, and conveniently accessed, by the user. Let S be a set of sites. Sites support one or more logical schemas and consist of the machines in the database system. A given machine can only access a schema if the machine belongs to a site that supports the schema. A TP system can be viewed as a quintuple (P, M, H ,D, S), designed so that a program running in the system will manifest the transaction properties of recovery and concurrency atomicity. In other words, the function of the CC and RM modules depicted in Fig. 2 is to supply users with a consistent and useful method of interaction with the database system, whatever the degree of complexity in the underlying components of the system. Because of these two modules, TP systems can ensure that the interaction will appear atomic with respect to (1) other processes in P (concurrency atomicity) and (2) failures in machines in M (recovery atomicity). In our discussion of “data” in TP systems, we state at the outset that we consider the issue of data models to be orthogonal to our TP model. Data may be described through record-based models such as the relational, network, and hierarchical models, or through object-based, logical models such as the entityrelationship and semantic network data models. We merely assume that some formalism exists with which to describe processes’ data: this formalism (or formalisms) is denoted by D. The complexity of implementing TP systems depends on the complexity of the underlying database system depicted in Fig. 1. The dimensions in the
AN EVOLUTIONARY PATH
259
Nelwork conwnundlon mM.g.r
FIG.2. Software model of TP systems.
(P ,M ,H ,D ,S) quintuple capture many of the system aspects that affect the implementation, and at the same time allow us to simplify the complexity by analyzing each dimension in isolation. The key issue in the tuple model is the cardinality of the sets P, M, H, D, and S; the most significant problems arise as the cardinality of these sets changes from 1 to n’. To denote the case when the system has only one machine, we write Hl: when the system is comprised of many machines we write Mn, Thus, in Section 3 we vary the cardinality of each of the five components, and examine the effect on the resulting transaction processing system. Some of the system dimensions are orthogonal to others, and solutions for the problems in that dimension remain valid even when other dimensions are changed. We denote such a case by Dimension, indicating a “don’t-care” for the value of the given dimension. Briefly, we can say that in going from: 0 0 0 0 0
PI to M1 to HI to D1 to S to
P, M,, H,, D,, S ,,
concurrency is the issue. coordination (protocols and algorithms) is the issue. machine and software heterogeneity is the issue. data homogeneity is the issue. management or autonomy is the issue.
Although we examine particular systems and system evolution in Section 4, in Table 1 we present the systems discussed in this chapter in terms of these five transaction processing system dimensions. The entries in the table denote the cardinality of the system: the value k denotes a value “in-between” 1 and N. These ideas are discussed at length in Sec. 4.
PU, LEFF, AND CHEN
3. Taxonomy Instantiations
3.1 Base System (PI,
MI,
HI,
D1,
S1)
In this base system, only a single process can run at a time on a single machine. One data schema is supported by the single site. This case poses the fewest difficulties in implementing transaction properties, because of the simplicity of each of the components. Concurrency atomicity is available trivially. Recovery atomicity, due to reallife failure properties of the machine, must be supported by mechanisms such as logging or shadow paging. Under logging, a disk-resident, append-only, structure called a log is used to record database operations (e.g., write) and transaction operations (e.g., begin, commit, abort) of every transaction in their order of execution. Recovery procedures use log records to undo or redo transactions so as to guarantee recovery atomicity. Under shadow paging, some form of page directory is used to map between logical database pages and physical disk pages. A update to a logical page involves two physical pages-a new page on which the actual update is performed and the original page, called the shadow page, which is left untouched by the update. If the update commits, the page directory is atomically changed to refer to the new page. If the update is not committed, the page directory remains pointed to the shadow page. A fairly simple mapping suffices to enable the system’s logical to physical data transformation because the data reside on the same machine on which the process is running, and because only one data schema is supported by the site. Mechanisms such as a database dictionary and a file manager can be used to locate, access, and present the data in appropriate format. Note, however, that the construction of a single logical view of the world representing the viewpoints of a number of users can be a nontrivial task. It may seem as if the “base system” is so simple that it does not exist in the real world. In a sense, many database products for PCs resemble this system to the extent that concurrency atomicity is taken for granted. Protection against system failures, however, has not been provided with standard PC productsprecisely because the overhead involved with incorporating recovery atomicity is considered prohibitive. To get a sense of this overhead, consider logging with redohndo, a typical crash recovery (CR) algorithm used to implement recovery atomicity. As transaction operations are executed, they are recorded on a log. Ongoing transactions are declared to be in one of two states: committed or aborted. After a system failure, committed transactions are reexecuted by redoing part of the log. Similarly, aborted transactions are “rolled back” by undoing other parts of the log. From the viewpoint of a typical user of a base system, the benefit of transaction properties is less than the cost incurred in supplying the properties. However,
261
AN EVOLUTIONARY PATH
the base system is interesting because more complicated systems can be viewed as attempts to map the underlying machines, processes, and other components, into this simpler and more appealing interface. That is, despite the existence of multiple processes, machines, and data, the system attempts to transparently offer the user the illusion that he is dealing with the base system.
3.2 Process Parameter (Pn, MI,
H1, D1,
S1)
In this system, more than one process runs concurrently on a single machine. However, the database offers only one view of the world. Because the critical distinguishing feature of this system (in terms of implementation) is the single machine, we refer to it as the unimachine system. Implementing recovery atomicity is not affected by concurrent process execution, and is therefore supported as in the base system. Similarly, supplying data to many processes is no more difficult than supplying data to a single process. Supporting concurrency atomicity is now a problem because one program may change a datum’s value in the middle of another program’s execution. Notice that one way to trivially implement the transaction model is to restrict transaction execution to run serially. Of course, while this is a perfectly correct (and simple) way to implement the model, the performance of this implementation is unacceptable. First, there is no reason to forbid the concurrent execution of transactions that do not access data items in common (or in “read-only” mode). Second, serial execution would lead to severe underutilization of the system’s CPU because most applications would require a lengthy access of disk media. The key notion of seriulizubility (Papadimitriou, 1979) says that concurrent transaction executions are permissible as long as the set of concurrent processes are equivalent to some schedule of serially running processes. An intuitive definition of correctness is that of view seriulizubilify in which the actual (concurrent) and hypothetical (serial) executions are identical to the extent that their transactions read the same values and leave the database in the same state. Transaction processing systems do not implement this correctnessdefinition because recognizing view-serializable histories is an NP-complete problem’ (Papadimitriou, 1979). Instead, the correctnessdefinition implemented by traditional TP systems is that of conflict seriulizubility. Under this definition, actual (concurrent) and hypothetical (serial) executions are identical to the extent that the order of conflicting operations (“readwrite”) is the same. The attractiveness of this definition lies in the fact that an on-line serialization graph can be constructed such that a set of transactions are conflict serializable if and only if the serialization graph is acyclic. As a result, efficient concurrency control mechanisms can be devised to ensure serializability (Bemstein et ul., 1987). There are many different types of CC mechanisms including timestamp ordering, multiversions, and validation. However, the mechanism most used in practice
PU, LEFF, AND CHEN
is locking in which a transaction is required to obtain a lock on a data item before it is permitted to access the data item. Transactions must also follow a locking protocol in acquiring and releasing locks. The key idea behind these protocols is that processes do not acquire or release locks on data in “just any way”; instead, specific protocols that are provably correct, must be adhered to. For instance, the two-phase locking protocol (2PL) is comprised of two distinct locking phases: the growing phase in which locks are acquired, and the shrinking phase in which locks are released. Under this protocol, a transaction may not acquire a lock after it has released any locks nor may it release a lock before it has acquired locks for all data items that it will access. This discussion illustrates the tension between intuitively correct transaction models and effective implementations of the model. The traditional concurrency control mechanisms represent, in a sense, a compromise: they efficiently implement a rigorous-but less intuitively appealing-notion of concurrency atomicity.
3.3 Machine Parameter (Px, R,, HI, D1, S,) 3.3.1 Overview and Definitions We examine here the effect of implementing transaction properties in a system with more than one machine. In this type of system, processes may make use of multiple machines while executing. This system tuple represents a database that is distributed with respect to both processors and the physical data residing on the multiple machine^.^ Even if a process executes on only one machine, it may access data residing on another machine. The advantages of distributed data are intuitively clear: users can take advantage of other people’s data-even if the data exists on another machine. The critical feature of this type of TP system is that the system is composed of more than one machine. We therefore refer to it as the multimachine system. We note at the outset that a continuum exists between “one” and “many” machines, and that the boundary can be fuzzy with respect to the dimension that we are discussing. In this context, for example, a tightly coupled multiprocessor is a single machine; a cluster sharing data would really consist of many machines. Similarly, a system based on shared memory might well be an instance of a transaction processing system implemented in a multimachine environment. The key point is the identification of implementation issues that depend on this distinction. Primarily two issues are affected by the machine dimension: concurrency atomicity and management of data replication. Concurrency atomicity is affected by the possibility of various failure modes-i.e., there are independent failure modes in the system such that part of the system can “function” despite the failure of another part of the system. In the case of the unimachine system
AN EVOLUTIONARY PATH
263
discussed in Sec. 3.2, when an important component (software or hardware) fails, there is nothing than can be done until the failure is repaired. In contrast, in a multimachine system, methods have been developed that permit parts of the TP system to continue functioning, and also facilitate the eventual recovery of the overall system. Management of data replication is affected by how memory is managed in the system. In the unimachine case it does not pay to replicate data, because if the system fails then all data are equally inaccessible, and while the system functions the performance improvements are usually not great. (Highly available transaction processing systems such as Tandem, which do replicate data and are managed as a unimachine, are in fact multimachines because all system resources-including hardware-are replicated). Clearly, the details of systems architecture will affect these issues in various ways. Nevertheless, extremes of the “single versus many” machine continuum are clearly marked: we will therefore not deal with the subtleties of the many variations of the machine dimension.
3.3.1.1 Multiple Machines: Supplying Data. With many machines in the TP system, supplying data for transactions can be a complex problem. In this system, data can now physically reside on any number of machines although data is still mapped into a single logical schema supported by the single site. These data relate to one another along two dimensions. The first isfragmentation which denotes the extent to which the data used by the set of processes running on a given machine physically exists on that machine. At one extreme of fragmentation, no data must be retrieved from another machine; at the other extreme, the data is horizontally or vertically fragmented over different machines and must somehow be reconstructed if it is to be used by the executing machine’s processi When data are horizontally fragmented then certain objects residing on many machines are identified as logically belonging to the same set of objects (e.g., a table or other “entity”). When data is vertically fragmented, certain objects are identified as modeling the same “larger” object: the objects differ from one another in the amount of information they each contain about the larger object (e.g., how many attributes to they represent). The second dimension is replication which denotes the degree to which data used by processes running on a machine exist as well on other machines. One extreme of replication is that the data is considered to exist solely on the executing machine; the other extreme is full replication in which the same data is replicated on all machines. The fact that only one logical schema is available to processes implies that only the relationships of disjoint union and equivalence exist between the logical data residing on the system’s machines. Either the data consists of disjoint concepts or the data identically models the same concept. Because redundancy in distributed databases increases performance, reliability, and availability, we considerthe extremes of both no replication and no fragmenta-
264
PU, LEFF, AND CHEN
tion to be a “degenerate” point in the space of distributed data. (This point implies that all data reside on one machine, and the other machines contain no other database data whatsoever.) At any other point in this space, some coordination between machines is required to ensure that the underlying data remain consistently defined. As before, the single logical schema formalizesthis coordination, but the coupled data mapping mechanism is now considerably more complex. Moreover, before implementing the mechanisms for mapping logical to physical data, design choices (where to locate and how to fragment the data), and definitions of efficiency must be considered. A catalog table frees the user from specifying a particular replica: the system uses the table to track down all replica of the data item. The system can create a set of aliases (alternative names) so that the user’s simple name is translated into a location-specific name. Similarly, the process accesses data in unfragmented form: although this may be nontrivial, it is the system’s task to optimize the request so that only necessary data are reconstructed.
3.3.1.2 Multiple Machines: Transaction Properties. Having discussed the effect of multiple machines on the data utilized by transactions, we turn to the manner in which transaction properties are implemented in such an environment. A global process is one which, in order to satisfy transaction constraints, must update data that reside on a machine other than from the one on which the process is running. A local process, in contrast, is one that updates data which only reside on the machine that executes the process. If each of the machines in the system already have TP mechanisms in place, then nothing new has to be done to run processes if they remain local processes. However, the system must be able to run global processes that maintain transaction constraints. Moreover, global processes must not interfere with the transaction behavior of local processes. We term the additional mechanism needed to run global processes on M, the global mechanism.
3.3.2 Multimachine System: Coordination In order to guarantee the “all or none” recovery atomicity characteristic, a commit protocol requiring information about more than one machine is required. This follows because a process whose execution was initiated at one machine must nevertheless either commit or abort at all machines participating in the process’s execution. Coordination among the machines is thus necessary in order to implement concurrency atomicity. This is usually achieved through variations of the popular two-phase commit (2PC) protocol (Bernstein er al., 1987). In phase one, a transaction coordinator sends out an inquiry message to the participant databases; these databases respond with their vote. In phase two, the coordinator collects the votes, reaches a decision, and broadcasts the decision to all
AN EVOLUTIONARY PATH
265
participants. During recovery, if a participant database is in doubt about any transaction, it asks the coordinator, thus guaranteeing a uniform decision. The key idea is that the coordinator makes an atomic decision regarding the transaction’s commitment, and all participants must then adhere to this decision. Variations of this protocol include the “presumed-abort” and the ‘‘presumed-commit” protocols (developed for use in the R* system at IBM’s San Jose Research Laboratory in the late 1980s).These are designed to reduce the number of message exchanges and log operations required by the basic agreement protocol. In the case of the base system, the system only has to be concerned about a single process running at a time. Concurrency atomicity is therefore available trivially. As in the unimachine system, when moving from PI to P,, supplying recovery atomicity does not become more difficult (once the commit protocol is in place). What changes, of course, is the task of supplying concurrency atomicity. Because one process can now interfere with the data needed by another process, the transaction processing system must ensure that their execution is serializable. Solutions to this problem cannot be trivially applied from the case of the unimachine system. Serializing a set of processes in a distributed environment means that the system must determine a serializable schedule of processes spanning multiple processors. The problem, in other words, is the coordination of multimachine activity. Many distributed concurrency algorithms have been suggested: it has been pointed out (Bernstein and Goodman, 1982),that most of these are modifications to either the locking or timestamp mechanisms developed for the centralized case. Locking protocols remain the same as before: they differ in how the lock manager is implemented. In a system where data is not replicated the implementation depends on whether the system maintains a single lock manager or maintains a lock manager at each machine (multiple coordinators). The former approach is a trivial extension to the single-processor case, but the coordinating processor is a bottleneck that leaves the system vulnerable. Unimachine timestamp mechanisms can be immediately extended to the case of nonreplicated data if the system can generate globally unique timestamps. The transaction processing system then uses the unique timestamps to determine the serializability order. Note that these algorithms are applying a centralized approach to the distributed setting, and cannot handle the case where data are replicated.
3.3.3 Multimachine System: Replicated Data Multiplicity of machines complicates more than providing reading access for a process. Maintaining database consistency requires that the value of data be changed in an all-or-none manner. When data are fully replicated at each machine, then although reading data is theoretically no more difficult than in the centralized case, in writing data, care must be taken to maintain consistency between ma-
266
PU, LEFF, AND CHEN
chines. Even more information must be shared and coordinated if the fragmentation relationship holds between machines. When data is replicated the system must decide how to make a global decision (lock a data item) while ensuring that individual processors are aware of the decision, and thus not let another process access their copy. In other words, the issue is the same as supplying concurrency atomicity: how can a global activity be coordinated so that inconsistent results are not generated. Locking protocols of the unimachine system can be extended to the multimachine system. These include the “primary copy” and “majority” protocols. Timestamp mechanisms can be extended to handle replicated data via “multiversions,” where each version is the state of some data after an update. Under this approach, the version appropriate for a given transaction is based on the time associated with the version. Coordination is difficult enough even when the system operates without failures, Managing replicated data becomes still more difficult because of the way that independent failures in the system affect consistency maintenance among a datum’s copies. For example, when a single site crashes, does the unavailability of its copy mean that other sites cannot access their copies? Naive approaches to the management of replicated data lead to the ironic result that replicated data, whose purpose is ostensibly to increase availability, in practice reduce availability because a single failure in a multicomponent system leads to complete unavailability. An even more difficult problem is introduced by the possibility of communication failures that lead to the partitioning of the system. Transactions executing in different components of the system may have conflicting updates to copies of the same item, but the communication failure means that the updates cannot be immediately synchronized. Various solutions have been developed to these problems: they enable replicated data to be managed even in the real-world presence of failures (Bernstein et al., 1987). Available-copies algorithms can deal with site failures by applying updates to all available sites and using a validation protocol to ensure that consistency is maintained. Algorithms such as quorum consensus and virtual partitions deal with communication failure by prohibiting some sites from applying updates (while permitting others) in a controlled fashion. These algorithms work because the partitions have enough information to determine whether they can or cannot apply the update. Another practical consideration is the synchronization overhead required to maintain mutual consistency between replicas. For many replica control protocols, an update at one site requires synchronous updates at some (possibly all) replica sites so as to guarantee a consistent view at all sites. Depending on the distance between replica sites and on the speed of the underlying communication mechanism, an update request may take on the order of seconds to complete. For many applications, such large and unpredictable delays are unacceptable. One solution is to use the notion of eventual consistency-as opposed to immediate mutual consistency-as the correctness criterion for replication (Sheth
267
AN EVOLUTIONARY PATH
et al., 1991). With eventual consistency, sites are not guaranteed to have the most recent copy at all times. They are only guaranteed to have the most recent copy after some length of time or at set time intervals. Updates can be performed synchronouslyto the nearest copy, and the updates are propagated asynchronously to the other replica sites. Quasi-Copies (Alonso et af., 1990) is an example of an asynchronous replication method. Under Quasi-Copies, the user can request different degrees of consistency by specifying inconsistency constraints such as time delay. The system then propagates updates in a manner that maintains the specified degree of consistency.
3.4 Machine Heterogeneity Parameter (Px, PI,, 3.4.1
Hn,
D,, S1)
Overview and Definitions
In discussing the machine parameter we have assumed a homogeneous environment in which the machines share identical hardware, operating systems, and database management systems. Even with the homogeneity assumption, we showed that moving from a R1 to a M, system configuration introduces considerable difficulty in building a transaction processing system. The underlying problem is really one of coordination: the system must coordinate the set of machines so that they behave as a single machine with respect to the constraints that TP systems place on processes. This section relaxes the assumption of a homogeneous system, and examines the effect of the heterogeneous components on a TP system. We term this a heterogeneous system. We introduce a distinct H dimension because the machine dimension, M, does not specify anything about the homogeneity of the TP system’s components. The machines can “differ” at a number of levels including the hardware level (computer architecture), at the operating systems level, and at the database system level. At each of these levels the following general problem must be solved heterogeneous components must somehow be integrated so that users are not aware of the underlying heterogeneity. There are essentially two possible approaches to this general problem: direct and indirect mapping, each with associated costs and benefits. The direct method maps from the source representation into the destination representation. For M sources and Ndestinations, this requires (M* N) translation routines. The indirect method proceeds in two steps. First, it translates every source representation into some standard representation. Then it translates the standard representation into a destination representation. For M sources and N destinations, this gives (M+ N) translation routines. We observe that much of system heterogeneity is in practice handled (at each level of the hierarchy) via a indirect mapping of the set of lower-level instances to an abstraction of those instances. What are the possible sources of heterogeneity in a multimachineenvironment? One source is system hardware in which machine
268
PU, LEFF, AND CHEN
registers and interconnections can differ considerably. The solution to this heterogeneity is through an operating system that presents users with “virtual machine” that is easier to program than the underlying hardware. Thus, with the appropriate system software, heterogeneous hardware can be presented to the user as a homogeneous virtual machine. Another source of heterogeneity comes from different computer architectures that may represent data in different ways. The data interchange problem has many solutions, including a standard conversion format such as Sun Microsystems’s Network File System XDR. Different communication protocols are another example of heterogeneity; solutions such as specialized gateways between the different networks have been developed for protocol conversion. These lower levels of heterogeneity present serious interoperability problems, but they can be handled with “local” solutions that work without requiring communication with other parts of the network. In the same way, higher system levels such as database applications use indirect mappings to deal with heterogeneity. Since commercial database products are built on top of existing operating systems, researchers tend to ignore the problem of heterogeneous hardware with the assumption that operating systems have solved that problem. Although many operating systems exist, database users can afford to ignore this heterogeneity because of the DBMS software. A DBMS is a layer of software designed to present users with the same “type” of database regardless of the underlying operating system. For instance, commercial database products are designed so that the same product can run on different types of PCs as well as on different networks. The DBMS, in other words, is a layer of software that maps various database functions into heterogeneous operating systems. Moving up a level, we encounter the problem of heterogeneous DBMS. The dimension of heterogeneity introduces many problems for databases: they must now deal with different query languages, data models, and data schema. Note that, although database heterogeneity problems are considerable, they are independent from that of heterogeneous TP (i.e., H,) essentially because the database layer must first solve its own problems if any useful work is to be done with its heterogeneous components. Once the database layer has addressed these problems (e.g., through a single canonical global model, or through individual mappings between each of the different components), the transaction layer is simply faced with a homogeneous interface. The problems of database heterogeneity have been described by a paper introducing a general model of federated databases (Sheth and Larson, 1990). Because the above mappings are well understood, we do not further discuss the issue of heterogeneity in the non-TP aspects of the system. In contrast, the mapping required to implement the critical transaction constraints are essentially open problems. Transaction processing is faced with the task of dealing with different commit protocols (for effective crash recovery), and different concur-
AN EVOLUTIONARY PATH
269
rency control methods. We refer to this as algorithmic heterogeneity, and it differs from other sources of system heterogeneity because it involves a global component that must coordinate activity among the sites’ CC and CR modules. Section 3.3 discussed how machines can coordinate their TP activities: a H, configuration implies that this coordination information must now be mapped from one form into another. In the multimachine, single-process TP system (PI, !I,),the commit protocols of each machine must be mapped; in the multiprocess version (P,, R,) the serializability mechanisms of the machines must also be mapped. We now examine the effect of moving from a HI to a H, configuration.
3.4.2 Machine Heterogeneity and Autonomy As TP systems increasingly attempt to incorporate heterogeneous components,
questions arise about the relationship between heterogeneity and autonomy. Heterogeneity can be seen as a motivation for autonomy since most databases do not want (and therefore do not authorize) their systems to be greatly modified. As a result, it is tempting to implement heterogeneous database systems in a manner that treats the component systems as independent. Global integration must then be done through complex layers of software that maintain component independence while still implementing global TP properties. Because this task is so difficult, naive implementations of “heterogeneous” systems, in practice, considerably modify the system components-thus effectively “homogenizing” the overall system. In this view, heterogeneity TP systems are identical to autonomous TP systems since true heterogeneity implies autonomy. In our view, however, the above reasoning may in fact lead to an opposite conclusion: namely, that autonomy is a motivation for system heterogeneity. In a desire to keep other people from taking over their data, a suborganization may opt to allow or encourage various degrees of heterogeneity precisely so that their system cannot be easily integrated with other systems. Heterogeneity is thus a technical issue whose origin may or may not be connected with political issues. We therefore consider the dimension of autonomy to be independent of heterogeneity or other intrasystem barriers. We isolate the heterogeneity dimension from the autonomy dimension by examining the former in the context of a cooperative environment. In a cooperative environment, component databases are willing to exchange any necessary control information-just as in classic distributed (R,) system. The main issue then becomes the need to handle different algorithms for CC and CR. In Section 3.4, we focus our attention on this algorithmic heterogeneity. Section 3.6 discusses the dimension of autonomy independently of heterogeneity-i.e., the issue of implementing global TP properties in situations where (for whatever reason) system components are unwilling to give free access to their resources. Thus, in this chapter, autonomy is discussed in the context of the Site parameter.
270
PU, LEFF, AND CHEN
3.4.3 Approaches to Machine Heterogeneity If a TP system is constructed in “top-down” fashion then, of course, there is no reason for the machines to use different protocols or algorithms. The difficulty arises when the system is constructed in “bottom-up” fashion, and it happens that different concurrency or recovery atomicity implementations were used at the individual machines. At first glance there is but one way to deal with H,: however difficult the task, construct either a direct or indirect mapping so that machines’ mechanisms can map into one another. Database implementations have often hesitated to do this because the benefit of getting “global” processes with transaction properties is outweighed by the cost of interfering with “local” processes. Gligor and Popesecu-Zeletin (1985) point out that there seems to be an inherent tradeoff between redesign effort and functionalityin implementingconcurrency and recovery atomicity within heterogeneous machines. On the one hand, if the global mechanism leaves local mechanisms unmodified then global concurrency and recovery atomicity will have limited functionality. On the other hand, modifying local mechanisms in order to enforce transaction constraints on global processes, would require considerable redesign effort because DBMS systems typically do not provide clean interfaces. Database implementors hesitate to expend this effort because the machines can function “as is” to the extent that local processes can run without any additional effort. Moreover, if the linkage of the machines is not permanent so that they could revert to being unimachine systems of the (P,, \l, HI, D,, S1) form, interference with local mechanisms would make leaving the network more difficult. Here again, we encounter the tradeoff between the attractive properties of the TP model and the difficulties of effective implementation. The designers of MULTIBASE (developed in the early 1980s at the Computer Corporation of America) explicitly opted for no redesign of local sites; MULTIBASE, as a consequenceprovides no global update or transaction capability whatsoever. At the other end of the continuum, SIRIUS-DELTA (developed at the INRIA research center in the early 1980s) provides reliability and atomicity, but limits heterogeneity even more because the participating DBMS must all be Codasyl-type systems. Atomicity is implemented with 2pC and two-phase locking; updates are done with a write-all method as in the SIRIUS-DELTA system. Some researchers have tried to break out of the redesign versus functionality tradeoff by claiming that local concurrency and recovery atomicity mechanisms can be concatenated in a fairly simple manner to provide global serializability and atomicity mechanisms. Gligor and Popesecu-Zeletin (1985) list five conditions that suffice to allow a global serializabilitymechanism to be constructed from a concatenation of local serializability mechanisms without loss of functionality. However, they do not offer algorithms for the concatenation. Pu,under a similar
27 1
AN EVOLUTIONARY PATH
set of assumptions, shows how transactions (i.e., both recovery and concurrency atomicity) can be implemented in a (M,, H,) system through hierarchical composition across database boundaries (Pu, 1989). If local sites (1) understand some agreement protocol, such as 2PC (necessary for recovery atomicity) and (2) can provide an explicit serial ordering of its local processes (necessary for serializability) then consistent and reliable global updates can be done even in a H, environment. Pu shows that a sufficient, but not necessary, condition for global serializability (given local serializability) is that the global component certify that all local serial orders are compatible in a global serial order. Moreover, certification can be easily provided by the local sites themselves in a representation (termed an “order-element”) that can be produced by two-phase locking, timestamps, and optimistic serializability methods. Pu’s assumptions differ from those of Gligor and Popesecu-Zeletin in that “superdatabases” (hierarchicallycomposed databases) certify after process execution (an optimistic approach); they specify that local serializability must preserve the relative order of execution determined by the global component (a pessimistic approach). Also, Gligor and PopesecuZeletin specify a condition that is concerned with global deadlock detection; Pu does not solve this problem, but observes that as the time-out mechanism is used so much in distributed systems that it can be used to break deadlocks as well.
3.5 Data Parameter (P,, M,, H,,
Dn,
S,)
3.5.1 Overview and Definitions This section examines the effect of changing a system configuration from D1 to Dn-i.e., the TP system supports more than one data schema. We refer to the latter system as a multischema system. This may happen in one of two ways. In the first, there is only one logical view of the world, but more than one coupled mechanisms exist to map logical data to actual data. In other words, the system supports inconsistent values. A second instance of multiple data schema exists when the supported schema are actually different at the logical level. By supporting data heterogeneity, a (P,, M,, H,, D,, S l ) configuration supports a richer and more realistic model of the world. People do have inconsistent views and values about the world, and forcing everyone to conform to a single schema is very limiting. In addition, a multiplicity of views enables the system to incorporate greater security. Often one does not want all users to have complete knowledge of even the structure of the system’s data; if only one schema is available it is harder to isolate such knowledge. When the site can support more than a single view, information can be made available to users on a “need to know” basis in which a given process knows only about a given schema, whereas other processes have access to different schema.
272
PU, LEFF, AND CHEN
Note, that in our view, the Data dimension is completely independent of the other TP dimension. The task of constructing and maintaining schema is one that is faced by database systems in general; solutions developed for databases extend to the more constrained transaction systems as well. For this reason, this discussion involves techniques developed in the context of database (as opposed to TP) systems. In discussing the Data dimension it is important to note that “differences” do not have equal importance. The difference between logical schema lies on a continuum, bounded by syntactic heterogeneity (the least difference) on the one hand, and semantic (the greatest difference) on the other. The difference can be “quantified” by the degree of effort required to resolve the conflicts. A D 1 system must resolve all data heterogeneity, whereas a D, system supports a greater or lesser degree of heterogeneous data. The salient point of syntacticheterogeneity is that everyone readily understands that a single underlying concept is involved: the problem is merely that this concept is not expressed with a common language. In a transaction processing system this situation arises trivially when either the interface to the logical data differ (i.e,, the data model language), or (less trivially) when the data on the different machines are logically organized in more than one data model (e.g., network, relational, hierarchical, and entity-relationship). Naming conflicts are a second source of syntactic problems. These include synonyms (the same data entity is given different names by different components), and homonyms (the same name is used by different components for different data entities). Even if data on the machines is organized with the same data model, and schema interfaces are identical, there can be both subtle and major disagreements between the schemas as to the nature of the world which they model. As Batini et al. (1986) point out, two types of semantic diversity exist: intraschema diversity and interschema diversity. Intraschema diversity refers to differences among the component schemas with respect to common concepts. For example, a dependency conflict arises when one schema models manager/employee relation as I :n, whereas another schema models the relation as m .-n (because an employee may work for two managers in the latter organization). Such conflicts arise because different choices were made in selecting modeling constructs or in fixing integrity constraints. Interschema diversity refers to relationships among distinct concepts in the component schemas. For example, if one schema models managers and another models people, then closer inspection of the schemas may show that the “manager” concept is a subset of the “people” concept. As a result, gender information-not part of the original manager schema-may now need to be included in the schema if the manager-person “subset” relationship is made use of in other contexts. Navathe and Gadgil(l982) identify other such relationships such as categorization and partitioning. Intraschema differences are more syntac-
AN EVOLUTIONARY PATH
273
tic than interschema differences, because the latter cannot be perceived simply by comparing the way that concepts common to the schemas are modeled. Such differences can only be perceived after much effort, by considering all of the underlying schema assumptions, and these are rarely made explicit. Some types of data model heterogeneity lie on the border between syntactic and semantic problems. They are not purely syntactic, because the differences involve a different conception of external reality, but they are not really semantic because people would readily agree that the systems are modeling the same phenomena. For example, the system may incorporate data representation conflicts (e.g., are pay raises integer or real-valued quantities?). In a sense, when the mapping mechanisms differ so that the site supports schema with inconsistent values, the “difference” between the schema also lies in the middle of the syntactic/semantic data heterogeneity continuum.
3.5.2 Multiple Schema and Data Heterogeneity A D, TP system does not necessarily imply that the system supports heterogeneous data. For example, if schemas model disjoint parts of a user’s universe, the system incorporates a D, component. Note, though, that this instance of ncardinality is simply a function of schema granularity. Since it is obvious that the schema cannot be resolved, the system is in fact supporting a single schema that is the trivial union of the individual schema. Even the case of different virtual schema is in a sense that of D 1. We refer here to systems that allow the creation of schemas that do not map directly onto the underlying physical data. Instead, the virtual schema is composed of various actual schema supported by the system. However, since all of the virtual schemas are ultimately mapped into the single, underlying, system schema, the “syntactic” difference between views is insignificant compared to the equivalence/disjunction relationship holding among the schemas. The critical point here is that when users cannot categorically state that a relation of equivalence (true/false) exists among data, then the data is heterogeneous. It follows, therefore, that a multischema system has the potential of supporting heterogeneous data. In fact, we can define heterogeneous data as data whose maintenance requires multiple schema for support: that is, data that cannot be expressed through a single schema. The syntactichemantic continuum discussed above in a sense reflects the extent to which schemas cannot be resolved to a single schema without loss of information.
3.5.3. Single versus Multiple Schema Supporting more than one data schema requires more effort than the mechanisms discussed in the single-schema quadruples. However, this is essentially a “linear” effect in that the work of maintaining the data mapping, coordination
27 4
PU, LEFF, AND CHEN
between the site manager and users, etc. is simply multiplied by the number of schema. Note that in the transition from a (MI, Dn) to a (Nn, Dn) configuration (where the actual data reside on a number of machines), the only new problem is maintaining consistency of schema across a network of machines. Automatic propagation of updates, from operations that are expressed in terms of views, to the underlying actual data is, however, a nontrivial problem. The mapping mechanisms needed for one schema over a number of machines have already been discussed in Section 3.3, and we have just mentioned the mechanisms needed for the support of multiple schemas. The compositionof these mechanisms does not pose intrinsic difficulty; e.g., work on MRDSM (developed at INRIA in the mid-1980s) did not report special problems in creating multiple schema over several machines. In order to appreciatethe benefits of a multischema system, it is worth analyzing an alternativesystem-an D system created from n distinct schema. This situation arises when a M n system is constructed by linking II unimachine TP systems. In linking the machines together, system designers may well decide to construct, in bottom-up fashion, a single schema for the new system out of the original II local schema. Construction of the schema (often termed the global schema) involves resolution of syntactic and semantic data heterogeneity-definitely a nontrivial task. At a minimum, the information required is (1) the function mapping the syntax of one machine’s data to another machine’s, and (2) the function mapping the semantics of one machine’s data model to another. Here we discuss some of the required effort, and note that a multischema system avoid rhis set of problems entirely. Both research and implemented systems have followed the indirect mapping method in mapping between the global and local schemas: they differ only in their systems’ canonical model. Typically, this may be the relational model, the entity-relationshipmodel, or a functionalmodel. If two schemas are “equivalent,” but use different data models, then mapping one schema into the other is not overly difficult. However, the notion of “equivalence” is difficult to formalize (Batini et al., 1986). Essentially the notion of equivalence corresponds to what we have termed syntactic conflict; when schemas are not equivalent it is because they conflict semantically. It does not suffice for the system to map local schemas (and queries) to the global schema (and global data language). The system must also perform the reverse mapping-to transform the global schema into local schemas so as to get data from the local nodes. The relational data model is not as expressive as others, and is therefore at a disadvantage when it comes to integrating the component schemas. The flip side of this is that it is easier to do query decomposition and other reverse transformations with the relational model. This is why prototype systems such as MERMAID (developed at System DevelopmentCorpo-
AN EVOLUTIONARY PATH
275
ration in the late 1980s), PRECI*, and ADDS (developed at Amoco in the mid1980s) have adopted the relational approach. MULTIBASE completely redefines the local schemas into the functional data model (using DAPLEX): since these are the same as the global schema and data manipulation language it is easy to map in both directions. After local schema, residing on the system’s component machines, are resolved into a single schema, the system must ensure that naming conflicts do not exist among the individual schema. Homonyms are much simpler than synonyms, because the system need only compare concepts possessing the same name across component schemas. Of course, the comparison requires human interaction. In any case, the “universal relation schema” assumption is often made: that is, every attribute is unique over all databases, and thus homonyms cannot arise. This can always be implemented by simply preceding every object or attribute name with a schemai for the i,,, schema. The detection of synonyms is more difficult, because the system cannot even automatically flag possible occurrences for human inspection as it can for homonyms. For example, Batini and Lenzerini’s (1984) integration system automatically assigns a “degree of similarity” (based on various matching criteria) to pairs of objects. The similarity information is then given to users for their decision. Once the system has dealt with syntactic conflicts between the component schemas, it must deal with semantic conflict. It is important to realize that, as Batini et af. (1986) point out, even methodological treatments of this problem do not provide algorithmic specifications for the integration process. Rather, they provide general guidelines, and identify critical steps, for the process. Low emphasis is placed on automating the integration process: weddesigner interaction plays a major role to the extent that while a methodology iterates, loop termination is left to the designer’s discretion. When the system detects schema conflicts, the next step is to resolve these conflicts. This requires that schema transformation be performed. Sometimes the schema do not actually contradict each other in their world view, but only express themselves differently. An example of this situation is the “subset” relationship described in Sec. 3.5.1: this relation simply models multiple views about the same class of objects, By noting such interschema relationships, the system can create a single, richer, view that encapsulatesthe concepts of the original component schemas. In other cases the schema, when unified, contain redundant information. The system can detect, and then eliminate, redundancies such as subsets, derived dependencies, composition of functions, and cycles of relationships. Of course, sometimes no resolution can be done (e.g., the same object has different values in different schemas). In this case, users must decide which schema “counts” and which does not. Not many semantic conflict transformations are explicitly dealt with in the literature. Batini and Lenzerini (1984) do discuss
276
PU, LEFF, AND CHEN
atomic transformations in which entities, attributes, and relationships are mapped one into the other. This is done until a canonical representation of the schemas is achieved. Unfortunately, such conflicts are more syntactic than semantic in nature precisely because there is a very strong overlap between atomic concepts. The benefits of multiple versus a single system schema involves a tradeoff between effort on the part of system administrators and effort on the part of system users. Managing a single schema is easier (in terms of first defining the schema, and then maintaining its consistency) than managing multiple schemas. On the other hand, users do not want to be forced to “compromise” their world views in favor of a model that may not completely satisfy anyone. We have already noted that certain (chiefly semantic) data modeling differences simply cannot be “resolved” except by making a choice in favor of one model. When this happens users then have to do their own data “filtering” and management in order to keep the data consistent from their individual point of view. Not only does this task run counter to the idea that database systems should remove the burden of data management from users and place it onto database administrators, but it also may be impossible for a user to do the filtering properly. In any case, this tradeoff only exists in the context of schema maintenance. If system designers are concerned with schema creation, they may well opt to build a Dn system in which syntactic and semantic differences are made explicit. When multiple schema are already present in the system, the difficulties inherent in their resolution may well encourage designers to also favor the multischema system too.
3.6 Site Parameter (P,,
Hn,
H,,
Dn, S n )
3.6.1 Overview and Definitions Two types of architecture are commonly identified in a database system comprised of network-linked machines. The first is a global (or centralized) database system. In such an architecture a global database control and communications system is interposed between users and the local databases. A global data model is used to describe the “overall” meaning of the integrated component schemas: this is often termed the global schema by analogy to local schemas. Such a tightly coupled system is designed to be indistinguishable to a user from a local, component system. The second type of architectureis called a “federated database system” (Heimbigner and McLeod, 1985). In such a system each component database determines what data it will allow other sites to access and, in turn, can access only such nonlocal data that other sites are willing to give it. However, when permission is granted to access nonlocal data, users may access (through a local site) heterogeneous and distributed data with complete transparency. Federated databases are thus a loosely coupled system (Sheth and Larson, 1990).
AN EVOLUTIONARY PATH
277
Database architectures need not be constrained by the number or type of logical schema that are available to processes. The administratorof a centralized, tightly coupled system, may choose to allow different classes of users different views of the data so that some, in fact, only see local site data, but others see remote data as well. Similarly, the local administrator of a federated database may consent to import sufficient remote data so that users can construct integrated views from the entire system. Thus, users can continue to see “their” data without regard to the underlying distributedheterogeneous system architecture. The notion of architecture is concerned with the distribution of authority within the overall system. The question is: Can a Tp system incorporate multiple sources of authority (S”), or must a Tp be constrained to have only one source of authority (S1)? This question is often termed the autonomy issue, which we define as the capability of a (P,, M,, H,, D,) configuration-a site-to remain independent of other sites while still allowing users to benefit from other sites in the system. In contrast to the other TP dimensions discussed here, even the issues posed by the S dimension are only beginning to be seriously addressed. For example, one must distinguish between design and execution autonomy. Design autonomy involves the question of who controls the design and maintenance of data schema. It also involves the integration of different databases managers (and TP systems) without modification. These database systems must solve problems such as lack of cooperation (missing commit protocols), and access control. Litwin etal. (1990) discuss data management issues and Breitbart et al. (1990) discuss resistance to change as a motivation for adaption of old technology. In contrast, execution autonomy refers to the issue of whether local systems can control local transaction execution or whether they must accede to a global transaction coordinator in the scheduling and execution of global transactions. Technical solutions to the problems posed by the S dimension are, in our opinion, insufficiently mature to be discussed in the same manner as we did for the other dimensions. Instead, we shall try to give a sense of the flavor of current work in this area and also discuss how this area may evolve. The material in Sec. 4.5 and Sec. 5 is thus closely related to the ideas discussed here.
3.6.2 Design Autonomy In a Sl, centralized architecture, local sites are completely integrated into the overall global system. The global database manager makes these decisions, and cannot allow local sites any independence in these matters. To do otherwise would mean, for example, that although the global schema describes one world state, actual data can reflect a different world state. In such a system, all transactions are treated in the same manner, in the sense that they all originate “inside” the system.
278
PU, LEFF, AND CHEN
In a S, configuration,a given site can only be certain of access to data belonging to a schema supported by the site itself. If a transaction needs access to data belonging to a schema supported by another site, then (by the very definition of a site) the site that is running the transaction must “negotiate” with the site owning the data for access to that data. A distinction is thus introduced between local transactions (about which the site knows all relevant information) and remote transactions where the site must get its information from another site. In such a federated database architecture, if a site has a different world model from another site, it simply refuses to let the foreign data into its system. Similarly, catalog design, naming schemes, and user authorization are done independently in a federated system: in contrast, an integrated system has to make, and enforce, these decisions in all-or-none fashion. A federated system can employ systemwide names that allow local data references to be performed locally while enabling a remote reference to request data from a foreign site. A system such as R* (developed at IBM’s San Jose Research Laboratory in the early 1980s), facilitates design autonomy, because it uses a flexible naming scheme that permits sites to rename relations without coordinating the activity with a central authority. Of course, a S, configuration allows for a system with a mixed architecture. Several machines can comprise a single site’s machine set; if the site only allows these machines access to certain schema the machines will resemble a global database system. At the same time, other machines can belong to other sites, and the overall transaction processing system will be a federated system. In the initial design of a TP system, the issue of design autonomy primarily requires evaluation of the political tradeoffs of centralized control versus distributed control. The technical challenges that must be solved to facilitate a S ,system play a secondary role. When a system is put together from a set of existing databases, however, the autonomy and data dimensions interact more closely. In Section 3.5 the differences between a single and multischema system are described, and the difficulties in creating a single global schema from multiple local schema were noted. Choosing between D 1 and D, is typically not possible in a S, system: each site will insist on the validity of its schema. As a result, putting the system together is easier to the extent that decisions regarding syntactic and semantic consistency can be ignored. Making effective use of such a (D,, S,) configuration, however, can be very difficult task. For example, informing users about what data is available where is nontrivial in an autonomous system. Moreover, such systems do not easily support the concept of replication, since replication implies that data under the control of a given site logically is the same data as that controlled by another site. In order for the second site to benefit from such replication, it must be guaranteed that the other site treats the data in the same way that it does. Such a guarantee is impossible (by definition) because the two sites are autonomous. The most that can happen is that different sites
AN EVOLUTIONARY PATH
279
make “pacts” with one another; a task requiring much coordination. Such pacts are often formalized in export schemas in which each site defines the part of its database that it is willing to share with the rest of the federated database (Heimbigner and McLeod, 1985; Thomas et al., 1990). We address this issue further in Section 4.5. An important observation is that (from the standpoint of design autonomy) multiple sites are incompatible with a single data schema. As the sites are committed to a common logical view of the world, and as only one mapping exist to map logical to physical data (i.e., a unique and consistent value is returned for every data reference) the intrinsic property of multiple sites-local autonomy-is precluded simply because the sites have agreed to use a single schema. Since sites can access all data belonging to their schema, it is impossible for any one site to disallow read or write access to their data-because it is also the other sites’ data. Similarly, because integrity constraints are part of the logical schema, it is not possible for one site to allow a value assignment that another site would disallow. For these reasons, systems that combine a D1 component with a S, component are “philosophically” inconsistent in the sense that the management aims implied by site autonomy cannot be realized while the system is limited to only one schema.
3.6.3 Execution Autonomy Although design autonomy poses many difficult problems, they involve autonomous databases, and have no intrinsic connection to autonomous TP. In the context of execution autonomy, local sites are allowed to control local transaction execution without being forced to accede to a global transaction coordinator in the scheduling and execution of transactions. Execution autonomy in effect calls the entire notion of a TP system into question because participating sites are no longer willing to sacrifice independence simply in order to implement a globally serializable transaction schedule. In contrast to the many political and administrative issues posed by autonomy in general, the issue addressed here is whether mechanisms can be devised that support autonomous TP. The key assumption in the discussion of techniques that provide global recovery and concurrency atomicity (Section 3.4), is that system components want to cooperate in providing global transaction properties. People, however, are beginning to call this assumption into question, claiming that components may not wish to cooperate-or rather, that they will not cooperate if global constraints interfere with local system performance. For example, in the commit phase of the two-phase commit protocol, a local component may wish to commit a transaction even though the global coordinator decides to abort the transaction. Performance penalties also arise in large and highly replicated TP systems. Standard notions of transaction are synchronous in nature so that all replicas of an object
PU, LEFF, AND CHEN
must be updated together in order to maintain consistency. Sites may lose too much availability if they must coordinate updates with many other sites in the system. As a result, the cost of cooperation is sufficiently great that sites will not willingly cooperate in the usual transaction algorithms. Because of such concerns, much ongoing effort is devoted to providing global transaction capabilities, while at the same time allowing local systems to maintain individual autonomy (Breitbart et al., 1990; Elmagramid and Helal, 1988). In order to properly evaluate autonomy mechanisms, it is necessary to first define the criteria that execution autonomy must satisfy. Much argument in this area can be traced to the fact that people have different ideas as to what the notion of autonomous TP is. We believe that a TP system is “autonomous” if sites m y cooperate but are not forced to cooperate. Component sites will cooperate if (1) the political will is there and (2) technical means of cooperation are well understood, so that the price of cooperation is “low.” From this point of view, the context of a specific autonomy issue is very important. For example, take the issue of assumptions about the characteristics of component sites. Some algorithms assume that local systems each have some kind of commit component (Sec. 3.4). Other people may consider even this “minimal” requirement to be a violation of autonomy because a component with centralized system characteristics would have to be modified in order to participate in the global system. Thus, the Superdatabase requirement that participating sites be “compossible” can certainly be viewed as a violation of autonomy. However, since the actual effort expended in modifying University INGRES for Supernova was quite little, sites might well cooperate to satisfy this requirement since the system costhenefit ratio is low. From this viewpoint, the Superdatabase requirement does not violate autonomy. This approach explains why the traditional two-phase commit algorithm used in homogeneous distributed systems such as R* is generally not viewed as a violation of the autonomy of component sites. Much experience has accumulated about efficient implementation of the two-phase commit algorithm so that relatively homogeneous sites do not pay too great a cost. Because the benefit from the algorithm is so great-enabling TP in distributed systems-the restriction implicitly placed on feasible transactions is accepted by component sites. New releases of commerical distributed database systems such as INGRES,Oracle, and DB2 support 2PC. Autonomy mechanisms must therefore be devised in contexts where (for reasons mentioned above) users are not willing to incure the penalties associated with traditional implementations of the transaction model.
4. Systems Evolution In this section we discuss the way that TP systems have evolved in terms of the system dimensions that we have identified: namely, the Process (or P),
AN EVOLUTIONARY PATH
28 1
Machine (or N), Heterogenetiy (or H), Data (or D), and Site (or S) components. The overall configuration of a transaction processing system incorporates a specific value for each of the five components. The set of processes execute on a set of machines (with a certain degree of heterogeneity between the machines) in a way that maintains concurrency and recovery atomicity. Physical data resides on the machines; sites are responsible for (1) maintaining the mapping between requests for logical data to this physical data and for (2) maintaining certain constraints (i.e., properties) on the set of executing processes. For database systems to be useful as TP systems, more than one process must be able to run concurrently (i.e., the cardinality of P must be greater than 1). In the following discussion we will therefore ignore the process parameter and assume that systems all have a P, component. Also, as noted in Section 3.6.3, execution autonomy-allowing site autonomy in a global transaction processing system-is a very difficult problem, and is only recently being researched intensively. The S dimension is therefore discussed only in the context of design autonomy. In describing this evolution we shall picture a given transaction processing system as a point in a three-dimensional space whose axes can represent three of the four dimensions discussed in this paper; the D, N, H and S dimensions. The D axis is marked with Single Schema, Views, and Multischemu, and represents the evolution of data heterogeneity. The N axis is marked with Unimachine and Multimachine, and represents the evolution of the machine parameter. The S axis is simply marked with Autonomy and No Autonomy, and the H axis is marked Homogeneous and Heterogeneous (see Figs. 3 and 4). The origin in Figs. 3 and 4 thus indicates the point (Pn, MI, HI, D1, Sl), and represents a “base” system that has “simple” properties. We first discuss the base system, and then trace system evolution along the various dimensions in terms of acutal transaction processing system examples. The systems described here are presented in Table 1, which lists their values in each of the five system characteristics. More details about some of these systems can be found in the references cited by Ceri and Pelagatti (1984). We have not discussed commercial transaction processing systems because either the specific aspects of these systems are not readily available, or because they are too detailed for this chapter. The interested reader can find information about some commercial systems in a recent survey on heterogeneous distributed database systems for production use (Thomas et al., 1990). Finally, we discuss the “open” areas of the graph that represent systems with characteristics that pose particularly difficult problems that are only beginning to be addressed now.
4.1
“Base” System
The origin represents transaction processing systems that run on one machine, and support only one view of data. Single-processor systems are the natural
282
PU, LEFF, AND CHEN
TABLE I DIFFERENCES AMONG THE STRATEGIES System system System R
Ingres SDD- 1 POREL Distributed INGRES
R“ SIRIUS-DELTA PRECI’ NDMS MRDSM
Process
Machine
Heterogeneity
N N N N N N N N N N N
1 1
1 N
1
N
1 1 1 N N N N
N
N N
N N N
Data
Site
1
1
1
k k
1 1
1
1 1
k k k k
k N
1 1
1 1
N 1 1 1
N
building block for the base system. What is especially interesting about this system is that users are presented with a single view that represents the “union” (in fact, the natural join) of the underlying relations. Although virtual views (discussed in Sec. 4.2) are a useful mechanism for simplifying interaction with system data, updating databases through views poses certain problems. In order for users to modify the underlying data (update/insertion/deletion) via views, the modification must first be expressed in terms of the actual schema and data. Null values must often be created by the system, thus introducing complications in later database access. Although these difficulties are surmountable,a base system will naturally support only a single view. An example of such a system is System/ U (developed at Stanford University in the early 1980s) which uses the universal relation model. Note that, even when the system supports only a single schema, a fragmentation mechanism can still partition the overall schema into disjoint components, thus enabling greater security.
4.2 Evolution along the Data Dimension Examining Fig. 3, we move from the base system along the D axis, and maintaining the unimuchine value along the n axis. Because of the way unimachines are usually operated, the S value is no autonomy. Although a single schema and data mapping are the easiest data components to support, they are less useful as more users access the system, or when system data is created or maintained by multiple sources. In these situations a single view is an inaccurate model of the multiple views that users have of the external world. Thus, as systems evolve, they tend to support the ability to define virtual views. Such views are produced as a result relation from one or more operand relations that are operated on by
AN EVOLUTIONARY PATH
283
Distributed lngrw
aulonom
SDD-1 System R
FIG. 3. System evolution along the machine, data, and autonomy axes.
a select statement. Virtual views enable users to tailor their approach to the system’s data to their own satisfaction. Mappings between actual data and the logical schema interface are, of course, more complex. The use and definition of such views (there termed simply “views”) was first described in the structured query language (SQL):unimachine systems supporting this degree of data heterogeneity include System R (developed at IBM’s San Jose Research Laboratory in the late 1970s) and INGRES (developed at the University of California at Berkeley in the early 1970s). These systems lie further along the D axis in Fig. 3, as we hold the N value constant. Although multiple schema can be maintained by a single site, in practice systems that support multiple schema have done so in the context of maintaining autonomy among multiple machines. As a result, we do not place any points further along the Data axis.
4.3 Evolution along the Machine Dimension Here we move along the I! axis (from uni-machine to mulfi-machine) while maintaining the D value at single schema. Transaction processing systems running on one machine evolve naturally into multimachine systems because of a need for increased performance, including better response time, reliability and availability. The difficulty is that mechanisms (e.g., logging), protocols (e.g., 2PC). and mappings (e.g., data definition, replication, and fragmentation) must be somehow coordinated among the machines in such a way that the atomicity, serializability, and data integrity constraints are maintained for global processes. The basic problem of coordination is difficult enough that the additional mapping problems introduced by heterogeneity are avoided 3s much as possible. Thus the
284
PU, LEFF, AND CHEN
“bare bones” multimachine system presents a single logical schema to the user; processes are mapped onto the set of more-or-less identical machines by a coordinator that has access to the necessary global information. Example of such systems include SDD-1 (a prototype system developed in the early 1980s at the Computer Corporation of America) and POREL (developed at the University of Stuttgart in the early 1980s). Holding the R value constant, we change the D value to views, and find systems that support user-defined views, in addition to the critical property of location transparency that allows users to treat the system (with respect to data) essentially as a unimachine system, The critical issues here is the coordination of data management so that critical information related to naming, integrity constraints, and value consistency is properly distributed among machines. Quite different approaches have been proposed: often the adopted approach depends critically on the value of the S dimension. Systems such as SDD-1 and Distributed INGRES (developed at the University of California at Berkeley in the late 1970s) require that all information, on all machines, be available in managing the data. In other words, they do not allow individual sites any autonomy: approaches such as centralized catalogs are therefore possible. Moving up the S axis in Fig. 3 to the value of auronomy, we see systems such as R* (developed at IBM’s San Jose Research Laboratory in the early 1980s) that support autonomy to the extent that it is possible for each site to modify all of the previously mentioned data information without accessing any information stored at other sites.
4.4 Evolution along the Machine Heterogeneity Dimension As database systems continue to evolve they no longer insist that the underlying, local databases be similar. Systems are often constructed from existing components (H,) that can differ widely in the DBMS managing individual systems. In Fig. 4 we replace the R axis with the H axis, and examine systems running on multiple machines. Various approaches to the problem of integrating the systems (to the extent of maintaining transaction constraints on global processes) were discussed in Section 3.4. Although the approach of composing local system transaction functionality into global functionality may be implemented in later systems, the solution in current systems (such as heterogenous SIRIUS-DELTA, developed at INRIA in the early 1980s) is to define a set of common functions that each participating DBMS in the overall system must provide. This can obviously involve considerable redesign of software, and in a sense solves the problem of heterogeneity by insisting on homogeneity. Such a system is nevertheless further evolved from “truly” homogeneous machines, but further evolution along the machine heterogeneity dimension may make such solutions inviable. The introduction of an increased degree of machine heterogeneity to the system is orthogonal to the issue of the nature of the data component because, once the
AN EVOLUTIONARY PATH
285
Aubnomy
FIG.4. System evolution along the heterogeneity, data, and autonomy axes.
data distribution mechanisms are in place, they are high-level enough to be unaffected by a change in DBMS. In fact, the multilevel schema architecture of heterogeneous SIRIUS-DELTAis unchanged from that of homogeneous SIRIUSDELTA (developed at INRIA in the late 70s). Of course, lower-level heterogeneity (e.g., at the operating system level) will complicate the mapping mechansims needed to handle data management, unless a homogeneous interface is presented by local DBMS. SIRIUS-DELTA has a heterogeneous value on the H axis, and the views value on the D axis. Because machine heterogeneity is a relatively recent feature of actual systems, systems skip over the stage of single schema, and instead incorporate more sophisticated data components, such as virtual views. The counterpart of the tendency of system evolution in the M dimension to incorporate existing heterogeneous machines exists in the D dimension as well. Real systems are increasingly constructed “bottom-up” simply because heterogeneous components are already in use and represent too much investment to be simply replaced with a new system. Also, heterogeneous components can give increased functionality to the overall system, precisely because of the specialization of the components. In order to engineer a bottom-up approach (that can compose individual pieces into a larger system) with heterogeneous data, techniques must be developed to somehow incorporate local schema and data, which are almost always heterogeneous, in a systematic fashion. In Section 3.4 we observed that machine heterogeneity in implemented systems tends to be solved by imposing a homogeneous interface. Similarly systems that are composed of heterogeneous data components, map the local data models to a common interface-a process that involves conversion of the local data schema. In other words, data heterogeneity is resolved with techniques described in Sec. 3.4, and a single global schema is presented to the user. Because the data resolution mappings are complex we have placed systems such as PRECI* and NDMS further to the
286
PU, LEFF, AND CHEN
right on the D axis than SIRIUS-DELTA in Fig. 4. Because these systems do resolve data heterogeneity into a single, common interface (the canonical and relational data models respectively),they do not have the multiple schema D value.
4.5 Evolution along the Site Dimension Some systems have the multischema value on the D axis, and require no conversion of the local schema. For example, MRDSM (developed at INRIA in the mid 198Os), allows users to create individual schema composed of elements from local database schemas. Systems that support multischemas are motivated partly by the desire to maintain local autonomy. As discussed in Sec. 3.6, system designers are essentiallyfaced with a choice between either integrating component schema into a single global schema or of not touching the local schema and somehow getting global functionality. As a result, the Data parameter interacts here with the Site parameter because if the system is to allow site autonomy system designers cannot interfere with local schema. MRDSM’s notion of allowing users to customize their individual schema from any schema in the system thus has obvious advantages. In constructing these schema, users must at the least (1) identify and resolve syntactic heterogeneity, (2) identify and resolve semantic heterogeneity, and then (3) actually build and maintain the customized schema. Each of these steps is a formidable task: how does MRDSM implement multiple schemas? In MRDSM, the component databases operate the same relational MRDS DBMS, so that only semantic heterogeneity must be dealt with. When users create their integrated global schemas,they must also create “dependency schemas” that contain the information relating component databases to each other. Such a process is obviously a delicate one: it would be desirable to have a more automated mapping mechanism between local databases and a given global schema. Moreover, the effect of changes to a component database, in terms of a user’s existing conceptual schema, are not clear. While creating views is not that difficult, propagating updates (expressed in terms of user views) to the underlying “real” schemas is much harder-especially given the fact the components sites are allowed full autonomy of their own. Also, because MRDSM decomposes queries formulated in terms of users’ conceptual schemas into queries on component databases, and then proceeds to gather the data into a “working database,” it would seem that much effort (such as recompilation) is required when even fairly trivial modifications are made to the underlying databases. Clearly, the MRDSM approach is not completely adequate, but, nevertheless, this system represents one of the few implemented approaches to the problem of supporting the multischema value on the D axis in addition to the autonomy value on the S axis.
AN EVOLUTIONARY PATH
287
4.6 Open Areas Looking at Figs. 3 and 4, we see that some “open” areas exist in transaction processing systems. One such area involves the multischema sector of the D axis. As discussed in Section 3.5,the efficient design and maintenance of multischemas is itself a difficult problem. The problem is compounded when designers wish to also maintain local site autonomy in order to enable design autonomy (i.e., when the D, and the S, dimensions interact). This requires that a site’s schema be built from other sites’ schema at the same time that the component schema can be modified at will by the participating sites. Another open area involves the entire autonomy portion of the S axis-as it pertains to execution autonomy. Section 4.5 considered autonomy from the design viewpoint: who controls design and access to schema. Thus we cite the R* system as a system that supports autonomy, because it allows a flexible naming scheme which permits sites to rename relations without coordinating the activity with a central authority. As discussed in Section 3.6.3, autonomy involves execution autonomy which calls the entire notion of a transaction processing system into question. Some critical points that are beginning to be addressed include: 0
0
Must participating sites sacrifice independence simply in order to implement a globally serializable transaction schedule? Can autonomy be defined in terms of sufficiently fine granularity so that designers can build transaction processing systems with flexibility in this key parameter?
Answers to these questions will help determine whether the open areas in Figs. 3 and 4 get “filled in” in the next decade.
5. Beyond Traditional TP Section 3 showed that the key properties of the traditional transaction modelrecovery and concurrency atomicity-are so useful that they have been implemented in a wide range of environments. Despite the challenges posed by the complexity of a (P,, M,, H,, D,, S,) system, the so-called ACID (Atomicity, Consistency, Isolation, and Durability) benefits (Gray, 1978) are sufficiently large in such application domains as banking and airline reservation systems, that many view these properties as defining TP systems. Recently, however, new applications have emerged that, while requiring database “consistency,” appear to require relaxation of the TP constraints. We first give some examples of these applications, and describe why the traditional TP (TTP)properties are too constraining. We then outline some promising solutions that relax the TP proper-
288
PU, LEFF, AND CHEN
ties without completely abandoning the basic notion of maintaining database consistency in the presence of concurrent updates and system failures. Departures from traditional TP may appear to be a complete break from the past. Recall, however, that from the beginning of TP. tension has existed between the conceptual elegance of the TF’properties and difficulties in their implementation (Section 3.2). This tension resulted in many tradeoffs, that, for the most part, have preserved the “feel” of the traditional TP (or lTP) properties. The key issue is thus whether solutions devised for the new application domains will be an evolutionary extension of TTP.
5.1
Nontraditional TP: Examples
Picture an application mix in which update transactions are interleaved with long-running queries that scan large portions of the entire database. This mix is characteristic of decision support systems. Unless multiple versions are used, the continuous update availability required by high transaction rates is incompatible with a serializable scanning of the entire database. A traditional solution to the problem is the notion of degree 2 consisrency that has been implemented in systems using two-phase locking (Gray and Reuter, 1993). Degree 2 consistency prevents “dirty” reads but allows a transaction to see the results of transactions committed ufer it initiated. Interestingly, the need to reduce the time that read locks are held is so great that degree 2 is the default consistency supplied by many SQL products (Gray and Reuter, 1993). Since nonserializable results can now be seen, application designers must add extra program logic to maintain database consistency. Other application domains differ from those of ”’P in that they involve cooperation by multiple participants in a single process. Well-known examples are computer-aided desigdcomputer-aided manufacturing (CAD/CAM) and publication environments. These applications appear to require breaking the TTP paradigm of isolating one user’s interaction with a database from that of other users. At the same time, the intuitive notion of encapsulating one cooperative process from others remains appealing. Various approaches to this problem can be found in Elmagramid et ul. (1991). Autonomy and security considerations point to another area in which TTP requirements may be unacceptable. Picture an application in which funds are transferred from one bank to another. In the emerging area of distributed electronic commerce, the banks typically do not trust one another and therefore go through a trusted third party to execute the funds transfer. The transfer is implemented through a withdrawal from one bank and a deposit of the same amount in the other bank. (This is analogous to the current situation, in which a clearinghouse is used as an explicit escrow agent. This simulation of a paper trail satisfies existing legal and accounting requirements, but an entire night is often needed
AN EVOLUTIONARY PATH
289
for the clearinghouse to transfer the funds.) To guarantee (1) that money is not lost in the transfer and (2) to handle the cases where a transfer may fail, we assume that some form of commit protocol, such as 2PC, is used to guarantee the atomicity of the transfer. Figure 5 shows a typical scenario in which the funds transfer succeeds without any problems. Both banks perform their respective withdrawal and deposit with no significant delay and both agree to commit. Figure 6 shows a scenario in which using 2PC to enforce atomicity leads to performance problems and violations of execution autonomy (Section 3.6.3). Suppose the banks have been sent the withdrawal and deposit requests. The withdrawing bank performs the withdrawal and votes that it is prepared to commit. The depositing bank, however, either fails to perform the deposit or is unable to vote that it is prepared to commit. In order to guarantee atomicity and global serializability, the 2PC protocol requires Bunkl to either block until the outcome of the Bunk2 deposit is known, or to abort and roll back its withdrawal transaction. Unfortunately, the user waiting to receive her money at the automated teller machine (ATM) may be unhappy at having to wait until Bunk2 is ready to proceed with the transaction. The banks themselves may be unhappy with the fact that a crash during a 2PC can lock an account for a long period of time. Such impingement on site autonomy is especially uancceptable given the fact that it can be caused by a competing bank. Notice that, under certain circumstances, it may be more appropriate to allow the Bunkl withdrawal to commit-even though the status of the Bunk2 deposit is unknown. For instance, if the amount of money involved is a relatively small amount, the withdrawal could be allowed to commit along with a guarantee that the deposit at Bunk2 will eventually succeed. Under TTP constraints, however, such a possibility is not allowed. Since the transfer transaction is expected to preserve the total balance of the two banks (no money disappears as a result of
Transfer
withdraw
Bank1
Bank2
Ro. 5. Standard use of two-phase commit (ZFC)to coordinate distributed commerce.
290
PU, LEFF, AND CHEN
n Transfer
I
Bank’
I
I
Fm. 6. Scenario in which use of two-phase commit (ZPC) leads to implementation difficulties for distributed commerce.
the transfer), global serializability requires that the funds withdrawn be deposited before the end of the transfer transaction. Another possibility is to write application transaction programs that deliberately (temporarily) violate serializability in order to achieve better performance or to preserve execution autonomy. Since this approach places the burden of guaranteeing system correctness on the programmer as opposed to the TP system, it is severely flawed. Furthermore, the approach requires that application programmers work without a rigorous correctness criterion, thus requiring adhoc definition of a correctness criterion.
5.2 Extending Traditional TP 5.2.1 Some Approaches Clearly there exist application domains in which implementation of TTP properties is incompatible with acceptable performance. To date, there have been various proposals for extending the TTP model in one manner or another so as to address one or more of the above TTP constraints (Elmagramid, 1991). An underlying theme of much of this work is the attempt to find analogies to the fundamental read-write serializability model that has been so fruitful in TTP application domains. One such effort is the series of concurrencycontrol optimization methods based on application semantics. TTP is based on only two operation types: read and write. The recognition of higher-level operation semantics allows some apparent readwrite conflicts to proceed without blocking or aborting. For instance, multiple credits to a bank account are operations that can be interleaved freely-even from different transactions. System concurrency can be improved (for this class of applications) if such interleaving is permitted. Proposals using application semantics suggest interesting possibilities for optimization (e.g.,
AN EVOLUTIONARY PATH
291
Garcia-Molina, 1983; Badrinath and Ramamritham, 1991; O’Neil, 1986). The idea is to supply TTP requirements without adhering to the traditional serializability schedule. As such, these efforts can be viewed as optimization techniques that remain faithful to the TTP viewpoint. In contrast, work in extended transaction models seeks to change transaction semantics by incorporating richer structure into the transaction structure itself (Elmagramid, 1991). For instance, cooperative transactions allow non-ACID sharing of information based on the semantics of CAD/CAM activities, since designers need to share intermediate results before final results are released to production. (Many extended transaction models have a flavor of semantics-based optimizations for concurrency control, since they also rely on application semantics.)
5.2.2 Relationship to TTP Although these attempts to “go beyond” TTP in specific application domains show promise, they have not yet been incorporated into commercial software. There are several explanations for this. First, to date, implementations of these efforts have been quite ad hoc. Progress is being made, but difficulties in the combination of semantics based optimization methods still require further research effort. Second, people have not yet devised a conceptualization of the new application domains that approaches the power of TTPs with respect to the traditional “airline reservation” domain type. While these reasons are important from the viewpoint of research, they may ultimately be less important than a third reason: namely, the requirement for coexistence between new transaction technqiues and the well-established TTP applications. We shall focus our discussion on this third reason. The main question is whether we can (or should) stretch l’TP techniques to handle these new applications. On the one hand, such extensions facilitate the integration of new applications with the existing TP-based, mission-critical, corporate applications. On the other hand, extension of already complex lTP systems may result in overly complex systems that cannot be easily maintained. The fact that there are few software packages that even claim to be capable of assuming the role that TTP has played in the past 20 years, suggests that we had better be very careful in “tinkering” with TTP. This is especially true given the huge investment in the development of exisitng software. Practical considerations, therefore, lead toward development of modular extensions of lTP concepts and implementation technqiues. We refer here to conceptual extensions that subsume TTP properties as a set of “base conditions,” and to CC and CR algorithms that generalize existing TTP algorithms. Preferably, these extensions should be implemented on top of existing TTP software, thereby leveraging software investments made over the last two decades.
292
PU, LEFF, AND CHEN
A trend toward such system modularity can already be observed within the domain of traditional TP. One motivation is the use of TP monitors and database management systems as resource managers. An example is the Encina TP monitor offered by Transarc Corporation. Encina supports the WOpen standard interface to transactions, and can therefore coordinate atomic transactions that involve database management systems from different vendors such as Sybase and Informix. Encina’s implementation consists of modules with clearly defined interfaces. Furthermore, Encina supports a rich set of user-accessible entry points, whereby a sophisticated application (or middleware) designer can augment and modify Encina behavior. Epsilon serializability (ESR) is one example of a modular extension to TTP concepts. ESR is a correctnesscriterion that offers the possibility of asynchronous TP. It is based on the notion of explicitly specifying the permitted amount of inconsistency that a transaction may see during its execution. The system then guarantees that inconsistency is controlled so that the amount of error-i.e., departure from the consistency-never exceeds the specified margin (termed an E-spec) Pu and Leff, 1992). Because ESR requires that the state of the database eventually converge to a consistent (1SR) state, system designers are not forced to abandon TTP in order to benefit from ESR. Since ESR allows asynchronous threads to execute within a single transaction, site autonomy is less constrained than under a TTP sytem. An example of this, in the context of asynchronous updates to replicated data, is discussed in Pu and Leff (1991). By analogy to CC algorithms, divergency control methods guarantee that transactions using ESR bound the degree of inconsistency to less than its E-spec. A methodology for extending classic CC methods into divergence control methods is presented in Wu ef al. (1992). Section 5.1 showed that TTP can unnecessarily constrain certain distributed commerce applications-essentially because it impinges on site (execution) autonomy. Figures 7 and 8 shows how ESR could be applied in this application
Bank1
Bank2
AN EVOLUTIONARY PATH
293
(7 Transfer
Bankl
Bank2
domain. If the transfer amount is deemed by the banks to be small enough, say $lo00 (Fig. 7), the withdraw and deposit transactions can be allowed to commit or abort unilaterally. There will thus be a period of time in which the bank databases will be globally inconsistent. The traditional 2PC protocol is used to detect situations in which the two banks cannot agree about the transaction’s outcome; an appropriate consistency restoration technique is then invoked. For example, if the withdraw transaction at Bankl commits unilaterally and the deposit transaction at Bunk2 aborts unilaterally, the Bankl withdraw must be undone-possibly by issuing a compensating transaction that redeposits the withdrawn funds in Bankl. ESR is used to explicitly specify the permitted amount of inconsistency between the two banks’ databases. As discussed above, this approach is prohibited under TI’P. In contrast, if the transfer amount is “large,” say $1 million (Fig. H), the withdraw and deposit transactions will be required to be synchronized. The execution is then equivalent to the TTP execution in which the transaction termination is coordinated by the 2PC protocol. While such efforts as ESR are, of course, preliminary, an important feature is its ability to not only coexist with l T P but also to have its relationship to ‘ITP defined rigorously. ACTA is a formal framework for comprehensive modeling of non-TTP transaction models (Chrysanthis and Ramamritham, 1993). As more experience is gained in non-TTP application domains, we can expect the development of a powerful general model for such domains that, when integrated with TTP systems, will provide the basis of the TP processing of the future.
5.3 Conclusion Despite the success of OLTP in banking and airline reservation systems, the tension between the conceptual elegance of ACID properties and implementation difficulties has, from the beginning, resulted in many tradeoffs. More recently, a new set of applications has emerged that, while it needs on-line transaction
294
PU, LEFF, AND CHEN
processing (0LTP)-like database consistency, also needs to relax the ACID constraints. This set of applications includes concurrent update and query processing, scientific data management, long and open-ended activity management in CAD/CAM, and electronic commerce on the information superhighway. We have examined some of these applications and also considered some promising solutions that can successfully relax the ACID properties. We also discussed the benefits and issues involved in the modular extension of OLTP techniques. Three major goals in such an effort are: 0 0
0
The ability to leverage the reliability of mature OLTP software products Ease of software maintenance, despite increasing functionality, through modular construction Portability and adaptability in the evolution of hardware and software, with emphasis on scalability of performance
An example of extendable production software is the Transarc Encina product. An example of a suitable modular extension to ACID properties is ESR, which
has a prototype implementation on top of Encina. Such extensions may provide the kind of system support required by the new application set over the medium term. However, direct extensions to ACID properties can go only so far. For the long term, we expect a new generation of system software to be developed in conjunction with a new model of the new application set that will properly capture their sophistication. For example, while a DBMS forms the core of service industry back-end operations, there is little generic system software for the frontend services (e.g.. customer support), even in bank and airline systems. A simple model of these new application areas, a software package that implements this model, and the integration of the model and software with back-end OLTP systems are exciting challenges that lie ahead in the evolutionary path of transaction processing systems. ENDNOTES I In the case of system components such as H and D a multivalued-rather than a binarymetric is clearly appropriate. For the sake of simplicity, however, this chapter focuses on the binaryvalued situation. * The class of NP-complete problems arises in the field of complexity theory, the study of the computational resources needed to solve problems. Many important and longstanding combinatorial problems are in this class. At present, no efficient solutions are known for any NP-complete problem. The only known solutions require brute force, exhaustive search. It has been proven that if any NPcomplete problem can be solved efficiently, then all of them can be solved efficiently. To date, attempts to prove the existence or nonexistence of efficient solutions have been fruitless, and it is conjectured that no such solutions exist. We are not concerned with the issue of distributed computation,in which more than one processor contributes cycles for the process’s execution.
’
AN EVOLUTIONARY PATH
295
RERRENCES
Alonso, R.. Barbar, D.. and Garcia-Molina, H. (1990). Data caching issues in an information retrieval system. ACM Trans. Database Sysr. 15. Badrinath. B. R., and Ramamritham, K. (1991). Semantics-based concurrency control: Beyond communtativity. ACM Trans. Database Sysr. 16. Batini, C. and Lenzerini, M. (1984). A methodology for data schema integration in the entity relationship model. IEEE Trans. Sofnvare Eng. SE-10(6). Batini, C., Lenzerini, M., and Navathe, S. B. (1986). A comparative analysis of methodologies for database schema integration. ACM Compur. Surv. lE(4). Bernstein, P. A., and Goodman, N. (1982). A sophisticate’s introduction to distributed database concurrency control. In Proceedings of the 8th Very Large Database Conference.” pp. 62-76. Bernstein, P. A., Hadzilacos, V.. and Goodman, N. (1987). “Concurrency Control and Recovery in Database Systems.” Addiosn-Wesley, Reading, MA. Breitbart, Y.,Silberschatz, A., and Thompson, G.R. (1990). Reliable transaction management in a multidatabase system. In “Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data.” Ceri. S., and Pelagatti, G. (1984). “Distributed Databases: Principles and Systems.” McGraw-Hill, New York. Chrysanthis, P. K., and Ramamritham, K. (1991). ACTA: The SAGA continues. In “Transaction Models for Advanced Applications” (A. Elmagarmid, ed.). Morgan Kaufmann, San Mateo, CA. Elmagamid. A,, ed. (1991). “Transaction Models for Advanced Applications.” Morgan Kaufmann, San Mateo, CA. Elmagramid, A,. and Held, A. (1988). Supporting updates in heterogenous distributed database systems. In “Proceedings of the 4th International Conference on Data Engineering.” Elmagramid, A. K., Leu, Y.,Mullen, J. G.,and Burkhres, 0. (1991). Introduction to advanced transaction models. In “Transaction Models for Advanced Applications” (A. Elmagarmid, ed.). Morgan Kaufmann, San Mateo, CA. Garcia-Molina, H. (1983). Using semantic knowledge for transactions processing in a distributed database. ACM Trans. Database Sysr. 8,(2), 186-213. Gligor, V. D., and Popescu-Zeletin, R. (1985). Concurrency control issues in distributed heterogeneous database management systems. In “Disrribured Data Sharing Systems, Proceedings of rhe 3rd International Seminar on Disrribured Data Sharing Systems” (F.A. Schreiber and W. Litwin, eds.). North-Holland Publ., Amsterdam. Gray, J. N. (1978). Notes on database systems. In “Operating Systems: An Advanced Course” (R. Bayer, R. M. Graham, and G. Seegmuller, eds.), Lea. Nores Comput. Sci. Vol. 60,pp. 393-481. Springer-Verlag. Berlin. Gray, J. N., and Reuter, A. (1993). “Transaction Processing: Concepts and Techniques.” Morgan Kaufmann, San Mateo. CA. Heimbigner, D., and McLeod, D. (1985). A federated architecture for information management. ACM Trans. Ofl In& Sysr. 3,(3). Litwin, W., Mark, L., and Roussopoulos, N. (1990). Interoperability of multiple autonomous databases. ACM Compur. Surv. 22(3), 267-293. Molina. H. G., and Kogan, B. (1988). Node autonomy in distributed systems. In “Proceedings of rhe Inrernarional Symposium on Databases in Parallel and Disrribured Sysrems. ” Navathe, S . B., and Gadgil, S. G. (1982). A methodology for view integration in logical data base design. In “Proceedings of rhe 8th Very Large Database Conference. ONeil, P. E. (1986). The escrow transactional method. ACM Trans. Database Sysr. 11,(4),405-430. Papadimitriou. C. (1979). The serializability of concurrent database updates. J. ACM 26(4). Pu,C. (1989). Superdatabases for composition of heterogeneous databases. I n “Integration of Information Systems: Bridging Hetergeneous Databases” (A. Gupta, ed.), IEEE Computer Society Press, Los Alamito, CA.
296
PU, LEFF, AND CHEN
Pu, C., and Leff, A. (1991). Replica control in distributed systems: An asynchronous approach. In “Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data,” pp. 377-386. h.C., and Leff, A. (1992). Autonomoustransaction execution with epsilon-serializability.In “Proceedings of 1992 RIDE Workshop on Transaction and Query Processing.” IEEE Computer Society Press, Los Alamito, CA. Sheth, A., and Larson, J. A. (1990). Federated databases: Architectures and integration.ACM Compur. Surv. 22(4).
Sheth, A,. Leu, Y.,and Elmagmid, A. (1991). “Maintaining Consistency of Interdependent Data in Multidatabase Systems,” Tech. Rep. CSD-TR-91-016. Computer Science Department, Purdue University, West Lafayette, IN. Thomas G. et al. (1990). Heterogeneous distributed database systems for production use. ACM Compur. Sum. 22(4). Wu, K.L.. Yu,P.S., and Pu,C. (1992). Divergencecontrolfor epsilon-serializability. In “Proceedings of the 8th International Conference on Data Engineering,” pp. 506-515.
Author Index Numbers in italics indicate the pages on which complete references are given.
A Abdel-Hamid, T. K., 3, 59 Adler, P. S.,150, 155 Akao, Y.,67, 82 Alonso, R.. 267, 295 Anon, 238, 252 ANSI, 94, 155 Antoy. S.,200, 227 Application Portability Profile, 159. 188 Armenise, P., 29, 59 Arnold, P.. 51, 57, 61
B Badrinath, B. R., 291.295 Balan, S.,239. 252 Balcer, M. J.. 203,227 Bandat, K. F., 22, 28, 45. 56, 63 Bandinelli, S. C., 25, 29. 40. 42, 51, 53, 57,59 Barbar, D., 267,295 Basili, V., 15, 59, 205, 229 Basili, V. R., 5, 8, 11, 14, 16, 18-19, 22, 47, 51, 57, 59, 62. 65, 67. 73, 78. 81-82. 200,227 Batini, C., 272,274475,295 Bauer, F. L.. 4, 59 Belasco, J. A., 87, 94, 155 Belkhatir, N.,34-35,59 Belz, F. C., 31, 63 Ben-Shaul, I. Z., 38, 59 Bernstein. P. A., 261, 264-266,295 Blaha. M., 142,156 Blum, M.. 223,228 Bodo. P., 239, 252 Boehm, B. W., 11,60, 96, 125, 155 Bourguignon, J. P., 188 Bragg, T. W., 187, 188 Breitbart, Y.,277, 280. 295 Bridges, W., 94, 155
Briickers. A., 29,45,60 Brown. A. W., 90,139. 151, 155-157, 158, 170, 186,188, 188,189 Burkhres, O., 288,295 Bush, M., 98, 117, 126, 156 Butler, R. W., 216. 225. 228
Cagan, M. R., 165. 171.189 Caldiera, G., 14-16, 19, 47, 57, 59, 67, 73, 78.82 Caldwell, W. M., 188 Camp, R. C., 155 Campbell, J., 206, 228 Card, D.. 90,155 Carney, D. J., 157, 186.188-189 Case, B., 239,252 Casucci, M., 95, 155 Ceri, S., 281,295 Chang, P. P., 245, 252 Chen, Shu-Wie F.. 255 Chrissis. M., 98, 117, 126, 156 Christie. A. M., 52, 55-56,60 Chrysanthis,P. K., 293. 295 Clarke, L. A,, 31. 63, 206,229 Cobb, R. H., 196,228 Coene, Y..95,155 Columbia University, 37, 60 COmdi, R., 56-57.60 Conte, T. M., 231, 232,248, 251.252 Craigen, D., 196,228 Cringely, R. X.,151, 155 Crosby, P. B., 76.82 CWOW,H.J., 234-235,252 Curtis, B., 21, 29, 60, 88. 155 Cybenko, G., 235-236,252 D
Dahl, 0. J., 225, 228 Dart, S., 189 Davis, A. M., 11, 60
297
298
AUTHOR INDEX
Decker, W.. 51. 60 Decrinis, P., 12, 61 Deiters, W., 13, 62 DeMarco, T.. 108, 130, 155 DeMillo, R., 227, 228 DeMillo, R. A., 181, 189 Deming. W. E., 13, 60, 75,82 Department of Defense, 107. 155 Dijkstra, E. W., 225, 228 Dixit, K. M., 235-236, 239. 243. 252 Dowson, M., 17, 60 Duran, J.. 220,228
E ECMA, 139.156, 160.189 Eddy, F., 142, 156 Elmagramid. A., 266-267,280, 288. 290-291, 295-2% Estublier. J., 34-35, 54, 60 Ett. W. H., 51, 57, 61 European Computer Manufacturers Association, 52, 60
F Faget, J., 19. 62 Favaro, J., 95. 155 Feigenbaum, A. V., 14. 60, 76, 82 Feiler, P.,158, 165. 189 Feiler, P. H.. 47, 55-56, 60-61, 139, 155 Feldman, S. I.. 28, 60 Fenton, N., 5, 12. 55. 62 Femstr6m. C., 17, 51, 56-57. 60 Finelli, G. B., 216, 225, 228 Finkelstein, A., 47, 55, 61 Firth, R., 125, 155-156 Fowler, P. J.. 87. 94. 150, 155-156 Frakes, W.. 19. 60 Frankl, P., 218,228 Frankl, P. G., 205-206,228 Freeman, P., 18,62 Fuchs, W. K.. 244,253 Fuggetta, A,, 23, 25, 28,40,42,51-53, 56-57,59-60
Garcia, S.M., 98, 117, 126, 156 Garcia-Molina, H., 267, 291.295 &hart.S., 196,204, 226.228 Ghezzi, C., 5, 23, 25. 28-29, 42, 53, 57, 59-60 Gibbs, W. W., 2, 12, 60 Gilb, T., 17, 60, 96, 156 Gish, J. W., 17. 60 Glass, R. L., 87, 156 Gligor. V. D., 270, 295 Gockel, L. J., 158, 189 Golden, M. L..244, 252 Goodenough, J., 204,226,228 Goodman. N.,261, 264-266,295 Gray, J. N.,256,287-288,295 Green, S., 73.82 Grigoli, S.,40, 42, 51, 59 Grudin, J.. 56, 61 Gruhn, V., 13, 22, 24, 42.61-62
H Haase, V., 12, 61 Hadzilacos, V., 261,264,266,295 Hamlet, R. G., 191, 200. 208-209,210, 213,218,220.222-223.225229 Hansen, 0.A,, 43, 61 Harding. A. J., 76. 82 Harel, D., 42. 61 Hasling, W. M.,203, 227 Hatley, D. J.. 108, 130. 142, 156 Heimbigner. D., 28, 30, 32, 61, 63, 276, 279, 295 Heineman, G.T., 38, 59 Helal, A., 280, 295 Hoare, C.A.R.. 225.228 HOid, B., 13, 61 Howden, W. E.. 209,226-227.228 Huff, K.. 17, 60 Humphrey, W. S., 12, 14, 21, 28, 42, 55-56. 60-61, 76.82 HWU,W. W., 231, 232,245,252
G Gadgil, S. G.. 272, 295 Gmon, J., 200,228
I Iannino. A., 212,228 IEEE, 94. 125. 156
299
AUTHOR INDEX International Standards Organization, 7, lo,21, 61,158,189 Isoda, S., 19,60
Lonnscn, W.,142, 156 Lorn. C. M., 16.21-22,29,45-46,50,52-53. 55-56.60-62 Lydon, T.. 126, 152, 156
J Jeng. B., 210, 220,229 Jolley, T.M., 158,189 Jones, D.T.,77, 82 Jowett, J., 171, 189
K Kaiser. G.E.. 31, 38,59, 62 Kannan, S.. 223.228 Katay-, T.,35-36,47.55. 61 Keatts, J., 239,252 Kellner. M.. 42,61 Kellner, M. I., 21-22, 29, 43,47.55. 60-61 Kim. L.. 235-237,240,252 Klinger. C. D.. 21-22, 46. 50,55, 61 Knuth. D.E.,213,228 Koch, G.,12,61 Kogan, B.. 295 Kogure. M., 67. 82 Koren. I., 252. 253 Krasner, H.,51, 57, 61 Krueger, C. W.,17,61 KUck, D., 235-236, 252 Kugler, H.J.. 12. 61 Kwan, S.. 194,229
L Lachover, H..42, 61 Larson, J. A., 268,276,2% Lams. J. R.,244, 253 Leff. A., 255, 292.2% LenzCrini, M., 272,274-275,295 Leu, Y.. 266-267,288.295-2% Leveson, N. G., 4, 62 Levine, L., 150,155 Like& R., 76, 82 Linehan, A., 51, 57,61 Lipow, M., 215,229 Lipton. R.. 227,228 Litwin, W..277,295 Lonchamp. J., 56,62 Long. F. W.,158, 188-189
M Madhavji. N.,13,62 Maher. J. H.,Jr., 87,94, 156 Mangione-Smith, W.,248. 251. 252253 Marca, D.,189 Marick, B., 203-205,228 Mark, L., 277,295 Markstein, P. W.,252,253 Marmor-Squires. A,, 21-22,46,50. 55.61 Martin, D., 158,189 Matsumoto, M., 18,62 McCall, J. A.. 67,82 McDermid. J. A,, 170,188,188 McGarry, F., 15, 59,67,73,78, 82 McGarry. F. E..73.82 McLeod, D..276,279,295 McMullin, P.,200,228 McSharry, M.. 93,156 Melo, W.L.. 34-35, 59 Menezes, K.N.P., 251,252 Messnm, R., 12.61 Microsoft. 143, 156 Miller, K. W..222,229 Mills, H.D..196,228 Molina, H.G..295 Morel, J.-M., 19,62 Moms, E., 158,189 Moms, E. J., 90,151,156, 158, 186, 188189 Monenti. A., 29.59 Mosley, V., 125, 155 Mullen, J. G.,288,295 Munnis, P. E..76,82 Musa. J. D.,212,228 Myers, G.J., 197,225,228 Myers, W.,91,156
N Naamad A., 42, 61 National Bureau of Standards, 189 Naval Air WarfareCenter, 188,189
300
AUTHOR INDEX
Navathe, S. B., 272, 274-275, 295 Nejmeh. B., 167,189 Nejmeh, B. A,, 52, 63 Nelson. E., 215, 229 Neviaser. M..21-22, 46, 50, 55, 61 NIST, 139,156, 160, 188, 189 Ntafos. S., 206,220.228-229
0
Object Management Group, 187. 189 Oivo. M.,51, 62 Okumoto, K.,212. 228 ONeil. P. E., 291, 295 Osterweil. L., 30-32, 62-63 Osterweil, L. J., 28, 47, 55, 61, 63 Ostrand, T.J., 203, 227 Over, J., 21, 29, 60
P Page, G.,15. 59, 67, 73, 78, 82 Pajerski, R., 15.59. 67, 73, 78, 82 Papadimitriou, C., 261,295 Pamas. D. L., 194. 229 Pan, Y.N., 248,253 Paul, M.,19, 63 Pauk, M. C., 98. 117. 126, 156 Pelagatti, G., 281, 295 Penedo. M.H..23, 47, 52, 55, 61-62 Perry, D. E.. 31.62 Pethia, R., 125, 155 Pethia, R. D., 156 Peuschel. B., 38-39,53,62 Pfteeger, S. L., 5. 12, 55, 62 Phillips, R. W., 76, 82 Pirbhai. I. A.. 108. 130, 142, 156 Pnueli, A.. 42, 61 Podgurski. A.. 206.229 Pointer, L.. 235-236,252 Politi, M., 42. 61 Popescu-Zeletin, R., 270,295 Potts, C., 28.62 Powell, M. L., 253 Premerlani, W., 142.156 Pressman, R. S., 118, 125, 156 Prieto-Dfaz. R.. 5, 17-18,62 PU, C., 255, 271,292,295-2%
R Rader, J. A., 83, 90,122, 151, 156 Radice, R., 76, 82 Ralston. T., 196, 228 Ramamritham, K..291, 293, 295 Ramsey, J., 205, 229 Rapps, S., 206, 229 Redwine, S., 87, 156 Reuter, A., 288, 295 Richards, P. K., 67, 82 Riddle, W., 23, 52. 62, 87, 156 Roberts, L., 125, 155 Rombach, H.D., 1, 5, 8, 12, 14, 16, 18-19, 21-23.28-29,45-47, 50, 53.55-57, 59-63, 67.78.82 Roos, D., 77, 82 Roussopoulos, N., 277, 295 Royce, W. W., 11, 63 Rumbaugh, J., 142, 156 5
Saracelli, K. D., 22, 28, 45, 56. 63 Sayward, F.,227, 228 Schilfer, W.,13, 38-39, 53, 62 SEI, 160. 188, 189 Selby, R. W., 5, 8. 31, 62-63, 200, 227 Serlin, O., 238, 253 Shaw. M., 2,63 Shenhar, A.. 150, 155 Sherman, R., 42, 61 Sheth, A., 266-268, 276,296 Shewhart. W. A.. 75, 82 Shooman, M.L., 211, 215, 229 Shtull-Trauring, A., 42, 61 Silberschatz, A., 277, 280, 295 Smith, D., 158, 189 Smith, D. B., 186, 189 Snowdon. R.. 56.60 Snyder, T.R., 14, 61 Software Engineering Institute, 82 Sommerville, I., 19, 63 Stallman, R. M.,245, 253 Standard Performance Evaluation Corporation (SPEC). 236, 238, 253 STARS R o g a r m o f f i ~ e 158, , 171, 187-188.189 Strelich. T.,158. 189 Stunkel, C. B., 244, 253 Sutton. S. M.,Jr., 28, 32, 63 Swenson, K. D., 56, 63
30 1
AUTHOR INDEX
T Taylor, R., 210, 220, 228 Taylor, R. N., 31, 63 Terrel, J.. 51, 57, 61 Thayer, R., 215, 229 Thomas, G., 279. 281, 296 Thomas, I., 52, 63, 167, 189 Thompson, G. R., 277, 280, 295 Thomson, R.. 17.60 Trakhtenbrot, M., 42, 61 Transaction Processing Council (TPC), 235-236, 241, 253
U Ulery. B. T., 22, 62
V Valett, J. D., 22, 51, 60, 62, 74, 82 van Schouwen, 194, 229 Verlage, M., 1. 28-29, 45, 53, 56, 60, 63 VOD, J., 222-223.226. 228 Voas, J. M., 222,229 von Mayrhauser, A., 17, 63
W Wakeman, L., 171, 189 Waligora, S., 15, 59, 67, 73, 78, 82 Wall, D. W., 253
Wallanu, K. C., 139,155 Walters, G.F., 67, 82 Wasserman, A,, 165. 188.189 Weber, C. V., 98, 117. 126, 156 Weicker, R. P., 234-235, 253 Weiss, D. M.,67, 82 Weiss, S. N.,218, 229 Weyuker, E., 218, 228 Weyuker, E. J., 205-206, 210. 218, 220, 228-229 Wichmann, B. A.. 234-235. 252 Wileden, J. C., 31, 63 Willis, R. R., 14, 61 Wolf, A. L., 31, 63 Wolf, S., 38, 62 Womack. J. P., 77, 82 Wood, D. P., 156 Wood, W., 125,155 Wood. W. G., 156 Wu, K. L.. 292.2%
Y Yeh, T., 248, 253 Young, M.,31,63 Yu, P. S., 292, 2% Z
Zarrella, P. F.. 186, 188-189 Zelkowitz. M. W., 87. 156 Zultner, R. E., 14.63
This Page Intentionally Left Blank
Subject Index
A Absolute tolerance, and benchmarking accuracy, 242 Accuracy, see Result accuracy ACID properties, see Atomicity, consistency, isolation, durability properties Ada Process Programming Language with Aspen, 31-33 Address, see Memory Adele system, and process representation, 34-35 Adoption plan, for CASE, see Computer-aided software engineering All-uses coverage, of software testing, 206 Analysis, and process models, 22. 27. 51 API, see Application programming interface AppVA, see Ada Process Programming Language with Aspen Application benchmark, 236 Application programming interface, 1 10, 133 Applications dimension, of organizational structure, 115 Architecture, and benchmarking, 247-248 Artifact, as resource for CASE adoption, 102- 103 Assessment, of software, see Software testing Atomicity, consistency, isolation, durability properties benchmarking run rule for, 241 transaction processing and. 287, 293-294 Automated support, explicit models for, 22-23 Autonomy machine heterogeneity, 269 transaction processing. 277-280 Awareness phase, of CASE adoption, 98, 106-107, 113-124
B Baseline measurements. benchmarking run rule, 240
Base system, in transaction processing, 260-26 1.28 1-282 Benchmarks, 231-253 application, 236 characterizations, 242-25 1 approaches to, 234 compiler effects, 25 1 memory, 245-247 methods, 242-245 processor model, 247-248 processor resources, 249-251 cheating by vendors, 239. 241 classification, 235-236 elite memory system for, 234 floating-point, 236 importance, 233 integer, 236 kernel, 235 memory usage, 234, 242-247 partial, 236 performance evaluation vs system design, 232-233 popular suites, 235-242 for commercial systems, 237-238 result accuracy, 241-242 Nfl rules. 239-241 for supercomputers, 237 for workstations, 238-239 purposes, 234 recursive, 236 selection, 233-234 SPECint92. 242-25 1 SPEC92.239-240 standardization, 239. 251 synthetic, 235 tolerances of, 241-242 utility, 236 Boundaries, software interfaces at, 180-184 Branch testing, of software, 204,218, 219 Business, quality improvement in. 66-67; see also Experience Factory Business process reengineering, 56
304
SUBJECT INDEX C
Cache, see Memory Capability Maturity Model CASE adoption, 117-118 quality improvement, 75,76-77,80 Carrier level. of information agreement, 169 CASE, see Computer-aided software engineering CASP, see Computer aided subprocess Chaining, for process representation, 37-38.39 Champion, CASE team member, 88-89, 101-102 Change agent, CASE team member, 89, 101- 102 Classification model, of transaction processing, 257-259 CMM, see Capability Maturity Model Coarse-grain process model. 10-1 1 Code engineering, 6 Commercial system benchmarks for. 237-238 COTS off-the-shelf tools, 131. 140, 141-143 nontraditional transaction processing in. 291 Commit protocol, in transaction processing, 264-265,268,280,289,293 Common data store, 171 Communication, and software engineering process, 21,27,50-51 Compiler effects, and benchmarking, 251 Computer-aided software engineering literature. 119-120, 124 software engineering environments and, 158 standards, and integration, 182- I84 Computer-aided software engineering, adoption Of, 83-156 activities during, 94 benefits, 89-90 study. 106-113 CASE team evolution and, 152 importance, 103-106 as resource, 97 roles, 88-89, 101-102, 104-105 skills needed, 147-148 training, 103, 105-106. 110,143-147 costs, 90-91, 135-137 evaluation and selection phase, 124-137
keys to success, 85-93 in case study, 112 expertise, 86-87 first victim, choice of, 138 physical infrastructure, 86-87 reassurance of management, 89-92 reassurance of technical staff, 92-93 second victim, 152-153 suggestions for, 154-155 supporting operational use, 147-149 team roles, 88-89 technology maturation, 87-88 obstacles to,84 phase, awareness. 98, 106-107, 113-124 learning from vendors and others, 120-124 understanding, tools for, 117-1 18 understanding CASE technology, 118-120 understanding the organization, 1 14-1 18 phase, evaluation and selection, 124-137 costs, 135-137 detailed criteria, 129-132 evaluation activities, 127-129 technical criteria, 132-135 three-filter approach, 125-127 phase, first victim, 137-149 CASPS, 139-141 choice of, 138-139 preparation for. 141-143 supporting operational use, 147-149 training, 143-147 phase, second victim, 149-153 phases of in case study, 106-113 in detail, 113-153 planning for, 93-99 preparation for, 93-106 adoption plan, 94-99 infrastructure. 99, 101-103. 141 PhaSeS Of, 93-99 as a process, 153-155 resources for, 97,99-103, 150 risks, 91-92 training, 103. 105-106, 110,143-147 vendors and, 111-112, 120-124, 143-144 Computer aided subprocess, 139-141, 153 Computer Supported Cooperative Work,56 Concurrency atomicity, for transaction models, 256
SUBJECT INDEX Conferences, on CASE technology, 123-124 Consistency benchmarking, 241 transaction processing, 266-267.287. 288 Constraints, on CASE adoption plan, 97-98 Control-flow method, of software testing, 204-205.218
Control integration, 52, 166-167 Coordination. in multimachine transaction processing, 264-265 cost of CASE adoption, 90-91, 135-137 of software testing, 224, 226 of system, and benchmarking, 251-252 COTS tool. see Commercial system Coverage, of software testing, 202-211.220 CSCW, see Computer Supported Cooperative work Culture change, and CASE adoption, 95-96. 15 1-152
D Data heterogeneity, 271,273-276 integration, 52, 166-167 mechanisms for sharing, 171 Q P analysis, 67,69-70 replication, 263,265-267 Database management system, and transaction processing, 257, 268,270, 284-285 Database system, see also Transaction processing system a r ~ h i t e c t ~276-277 ~~~,
transaction constraints and, 256-257 as useful TP system, 281 Data coverage method. of software testing, 207-209
Data flow diagram tree, 108 for software testing, 206207,218 Data model, 256 Data parameter, in transaction processing, 271-276.282-283
Data schema, in transaction processing, 27 1-276
Data supply, in transaction processing. 263-264
DBMS, see Database management system Def-use pair, and software testing, 206
305
Degree 2 consistency, and transaction processing, 288 Dependability theory, see Software dependability theory; Software testing Dependency chain, and softwan testing, 206-207
Descriptive modeling, 17 Design, of integrated SEES,185-186 Design autonomy, in transaction processing, 277-279
Design engineering, 5-6 Development process model, 11-12 Diary, benchmarking run rule. 240 Dimensions approach, to integrated SEE, 165- 167
Direct data transfer, 171 Direct tool communication. 170-171 Documentation CASE evaluation, 109-1 10, 127-129 CASE implementation, 112-1 13 CASE resources, 103 CASP, 140 process model design. 17 software engineering, see Rocess representation language software reuse, 18
E Easily integrable SEE. 167-168 Education, see Training EFO, see Experience Factory, organization EF/QIP, see Experience Factory Elite memory system, 234 End-user SEE services to, 160 standards for, 184 Engineering database, for process representation, 37 Evaluation copy, of CASE technology, 121 Evaluation phase, of CASE adoption, 98, 107-110, 124-137
Event coordination. for sharing information, 171-172
Eventual consistency, in transaction processing, 266-267 Evolution in CASE adoption, 95-96, 113 of processes. 57
SUBJECT INDEX
306
of transaction processing systems, 280-287, 287-294
Functional boundary, see Interface Functional testing, of software, 202-204
Executable model, of processes. 26 Execution autonomy, in transaction processing,
G
277,279-280
Execution model, and QIP, 68-69 Experience, packaged, see Packaged experience Experience Factory, 65-82 Lean Enterprise Management, 75,77-78,79 organization, 70-73 Plan-Do-Check-Act cycle, 75-76, 78-79 Quality Improvement Paradigm, 67-75 benefits, 74-75, 81 characteristics, 80-81 comparison to other paradigms, 78-80 Experience Factory organization, 70-73 fundamental steps, 67-70 packaged experienced in SEL,73-74 SEI Capability Maturity Model, 75, 76-77,EO
software development, nature of, 66-67 software process research, 19, 20 Total Quality Management, 75, 76.79 Expertise, and CASE adoption, 86-87, 148 Explicit process model, 21-23, 27 Extreme values, in data coverage, 207
GCT,see Generic Coverage Tool General representation requirement. 24 Generic Coverage Tool, for software testing, 205
Global mechanism, in transaction processing, 264, 270
Global process, in transaction processing, 264, 270
Global schema, in transaction processing, 274-275,276
Goal definition, and measurement process, 15-16
GoaVQuestionMetricParadigm, 68.8 1 Goals alignment of, 140-141 for CASE adoption, 97, 149 for QIP, 67, 68 GQM, see GoaYQuestiodMetricParadigm Graphical representation language, 42-45 Guidance, for software development, 22 Guiding model, of processes, 30
H F Failure, as opposite of success, 199 Failure intensity. 212,214 Fault, in software, 199-200 Federated database system, 276-277 Feedback loop. in QIP, 70 Filter, and CASE evaluation, 124-127 Fine-grain process model, 11-12 First victim phase, of CASE adoption, 98-99, 110-113, 137-149
Fitness for use, in CASE adoption, 129 Flexibility. of processes, 175 Flexible model, of processes, 26-27 Floating license, 136-137 Floating-point benchmark, 236 Formal model, of processes, 25. 27, 55 Fragmentation, and data supply, 263 Framework services. 160 Framework technology, for CASE adoption, 103
Hatley-Pirbhai method, and CASE adoption, 107, 127, 139-140
Hazard rate, 212 Heterogeneity. and transaction processing. 267-269.272-276,
284-286
Heterogeneous system, 267 HFSP, see Hierarchical and Functional Software. Process Hidden model, of processes. 30 Hierarchical and Functional Software Process, 35-37
Homegrown off-the-shelf tool, HOTS CASE adoption and, 141-143, 145, 147 for integrated SEES, 179 Homogeneous interface. and transaction processing. 268 Homonyms, in transaction processing, 275 Host platform. for CASE adoption, 102, 107- 108. 132- 133
HOTS tool, see Homegrown off-the-shelf tool
SUBJECT INDEX Hughes Aircraft, CASE adoption case study awareness, 106-107 evaluation and selection, 107-1 10 first victim, 110-1 13 Human resources, see People; Team
I IDEFO notation, 174, 175 Immaturity, of CASE technology, 95-%, 145, 150-151; see also Technology maturation Implementation, and SEE three-level model, 164 Improvement community, of process representation, 28-29 Improvement process, 8. 13-15, 23 Infeasible path problem, 205 Information sharing, in integrated SEEs, 168 levels of, 169-170 mechanisms for, 170-172 Infrastructure, and CASE adoption, 86-87,99, 101-103, 141 Integer benchmark, 236 Integrated software engineering environments, 157-189 attributes, 158-159 commercial products, 159 conditions necessary for, 179-184 interface locality, 180-182 Standards, 182- 184 tool functionality, 181-182 design, 185-186 engineered environments approach, 185-1 86 future of, 187 implementationdifficulties, 158 mechanisms and semantics, 165-172 dimensions approach, 165-167 information shared, 169-172 relationship approach, 167-168 overview, 158-162 integration problem, 161-162 SEE, 159-160 process aspects, 172-179 context, 174-175 practical example, 176-179 practical means of integration, 173 pragmatic solutions, 179 successful processes, 175-176 three-level model, 162-165 integration in terms of, 164-165
307
Integration of CASE tools, 133-134 of process modeling tools, 28-29. 52 of process and product engineering, 19-27 Integration engineering. 6 Integration problem, in SEE, 161-162 Interface locality of, for integrated SEEs. 180-182 standards for integrated SEEs. 182- 184, 187 transaction processing and, 268 Iterative enhancement model, of software projects, 11
K Kernel benchmark, 235 Key practice area, and CASE adoption, 117 Key process area, and quality improvement, 77 Knowledge, as resource for CASE adoption, 102-103
L Lean Enterprise Management, and quality improvement, 77-78, 79 LEM. see Lean Enterprise Management Lexical level, of information agreement, 169 Licensing. and CASE adoption, 136- 137 Lifecycle process model, 10-1 1 Local process, in transaction processing, 264, 270 Local schema, in transaction processing, 274-275, 276 Local transaction processing, 278 Locking protocol, in transaction processing. 262 Loop coverage, of software testing, 205
M Machine heterogeneity parameter, in transaction processing. 267-271,284-286 Machine parameter, in transaction processing, 262-267,283-284 Make, UNM tool, 28 Management support, for CASE adoption, 89-92,97, 148, 150 Manual optimization, benchmarking run rule, 240 Marvel Strategy Language, 37-38
308
SUBJECT INDEX
Mean runs to failure, 2 13 Mean time to failure. 212-213.214-216. 219-220 Measurable model, of processes, 25, 27 Measurement process, 8, 15-16,23 Mechanism, as SEE level, 162-165 Memory. and benchmarks, 234,242-247 MERLIN, for process representation, 38-40 Method level, of information agreement, 170 Methods training, for CASE adoption, 143-145 Metrics, for process monitoring, 175 Model for database systems, 256 definition, 5 lack of, for software business, 66 of SEES, 162-165 for TP systems. 257-259 Model, of engineering processes explicit, 21-23, 27 process repsentation language properties, 30-3 1 process representation language requirements. 23-27 tools for building and browsing of, 50-5 1 tools for interpretation of, 52-54 Modeling process, 8. 16-17. 23 Module testing, of software, 201 MSL. see Marvel Strategy Language MlTF, see Mean runs to failure; Mean time to failure Multicondition coverage, of software testing, 204-205 Multimachine system, in transaction processing. 264-267 Multiperson model, of processes, 31 Multiple machines, in transaction processing. 263-264 Multiple schema. in transaction processing, 273-276 Multischema system, in transaction processing, 27 1 Multiview process modeling environment for, MVP-S, 53-54 language for, MVP-L, 45-50, 53 Mutation testing, of software, 208-209 MVP-L, see Multiview process modeling
N NASNGSFC SEL. see Software Engineering Laboratory
Natural model, of processes, 23-24, 27 Nelson reliability, of software. 221 Node-locked license, 137 Notation for process description, 174, 175 software testing and, 196 Numeric benchmark, 236 0
Objectbase, for process representation, 36 Objectives for CASE adoption plan, 97 of CASE evaluation, 129- 130 OLTP, see On-line transaction processing On-line transaction processing ACID properties and, 294 benchmarking for, 237-238 Operational profile, and software testing, 213-214, 216-217 Operational user, and CASE adoption, 147- 149 Operation latency, benchmarking, 247 Oracle problem, in software testing, 200-201 Organizational awareness, in CASE adoption, 114-1 18 Organizational structure. of Experience Factory, 70-73
P Packaged experience, and software development, 66-67, 71-74 Packaging process models, 22. 51 QIP models, 67,70 Parameters, in transaction processing. 261-280 Partial benchmark, 236 Partition testing, of software, 210. 220 Path testing, of software, 205, 206 PDCA, see Plan-Do-Check-Act cycle People, see also Team communication by, 2 1.27.50-5 1 dimension of, in organizational structure. 116 Perfect Benchmark suite, 237. 240 Performance evaluation, and benchmarking, 232 Petri nets, for process representation, 40. 42, 51
309
SUBJECT INDEX Plan-Do-Check-Act cycle Experience Factory and, 75-76,78-79 as plan-actevaluate-adjustcycle, % software process research and, 13-14 Planning CASE adoption, 93- 106 project replanning, 25-26 QIP, 70 software development projects, 56-57 Planning process, 8. 16-17, 23 Platform, host, see Host platform Platform dimension, of organizational ~tructure,115-116 Platform integration, 166 Platform training. 143 Prescriptive model, of processes. 30 Presentation integration, 52, 166 Probabilistic features, of software testing, 221-223.225 Problem solving, in Experience Factory, 73 Rocess evolution of, 57 improvement, 29-30 QIP execution, 67,68-69 as SEE level, 162-165, 173-179 of software development, 195 in transaction processing, 264,270 Rocess dimension, of organizational structure, 114-115 Rocess engineering, see Software process research Rocess integration, 52. 166 Rocessor model, and benchmarking, 247-248 Rocessor resources, and benchmarking, 249-25 1 M e s s parameter, in transaction processing, 261-262,281 Process programming, 29-30 Process representation, explicit models of, 21-23 Rocess representation language approaches to, 28-29 classification, 29-3 1 history. 28-29 as infrastructure, 21 practical needs and, 55 requirements fulfilled by, 23-27 survey, 29-50 Rocess-sensitive SEES,52-53
Product engineering, see Software process research Roduct management, in software engkming. 6 Project characterization, for QIP, 67-68 Project feedback cycle, 20 Project guidance,for software development, 22,51 Project management, in software engineering, 7 Project Organization, of Experience Factory, 71-73 Project planning, in software development, 56-57 Roof, formal, for software testing, 1% Proscriptive model, of processes, 30 Prototyping, 11
Q QFD, see Quality Function Deployment QIP, see Quality Improvement Paradigm QIPEF. see Experience Factory Quality, and plan-do-check-act cycle, 13-14 Quality assurance, in softwan engineering, 7 Quality Function Deployment, 68 Quality Improvement paradigm, see also Experience. Factory fundamental steps, 67-70 software research. 14-15,20 Quality software, see Software testing Questionnaire, organization assessment, 118
R Random testing, of software. 213-214,220 Real-world model, of processes, 23-24.27 Reasoning, and process models, 22, 27, 51 Recovery atomicity, for transaction models, 256 Recursive benchmark, 236 Relationship approach, to integrated SEES, 167-168 Relative tolerance., and benchmarking accuracy. 242 Reliability, software, see Software reliability Remote tnmact~ ‘on processing, 278 Replicated data, 263,265-267 Representation language, see Process representation language Requirements engineering, 5
310
SUBJECT INDEX
Research needs in software engineering processes. 55-57 in transaction processing systems, 287 Resources, for CASE adoption, 97, 99-103, 150
Result accuracy, and benchmarking, 241-242 Results, benchmarking run rule, 240 Reusable experience, 66-67.71-73, 123 Reuse process, 8. 17-19, 18, 23 Role. in process representation. 38-39 ROPE, see Reusable experience Run rules. for benchmarking, 239-241
explicit models for. 22-23 integrated, see Integrated software engineering environments Marvel system, 37 MVP-S and, 53-54 overview, 158-160 services offered by, 160 three-level model of, 162-164 tools for process modeling in, 52-53 Software Engineering Institute CMM. and CASE adoption, 117-1 18 CMM. and quality improvement. 75, 76-77, 80
S Sampling, and random testing of software, 213 Schema, see Data schema Second victim, of CASE adoption, 99, 149-153 SEES, see Software engineering environments SEI. see Software Engineering Institute SEL, see Software Engineering Laboratory Selection phase, of CASE adoption, 98. 107-110, 124-137 Self-checking program, 223-224 Semantic heterogeneity, 272 Semantic level, of information agreement, 169- 170 Semantics, and transaction processing, 290-291 Serializability, in transaction processing, 261, 270-27 1,290-292 Service, as SEE level, 162-165 Shadowing, for CASE evaluation, 130-132 Single-person model, of processes, 31 Site license, 136 Site parameter. in transaction processing. 276-280,286 SLANG,for process representation, 4 - 4 2 software crisis in, 19 nature of, 66 quality improvement in. 67-70 success of, 199 Software dependability theory, 220-226; see also Software testing Software engineering, computer-aided. see Computer-aided software engineering Software engineering environments. see also Computer-aided software engineering classification, 52
Software Engineering Laboratory. packaged experience in. 73-74 Software process research, 1-63 characteristics, 9- 10 future of, and practical needs, 55 future of, and research needs, 55-57 importance, 55 integrated product and process engineering, 19-27
benefits of explicit models, 21-23, 27 overview, 19-21 real-world projects, questions on, 21 representation requirements, 23-27 process engineering processes, 12-19 definition, 2,4 improvement processes, 13-15 measurement processes, 15-16 modeling processes, 16-17 planning processes, 16- 17 reuse processes, 17-19 product engineering processes, 10- 12 definition, 2,4 development process models, 11-12 life-cycle process models, 10-1 1 representation languages, 23-27, 28-50 classification, 29-3 1 existing languages, survey, 29-45 history, 28-29 MVP-L. 45-50 requirements fulfilled by, 23-27 requirements satisfaction, 3 1 software business, 2 - 4 software engineering definition, 4-5 software business and, 3 - 4 software engineering processes, 5-10 managerial processes, 6-7
SUBJECT INDEX organizationwide learning processes, 8-9 technical proce~sc~, 5-6 terminology, 56 tools for software process modeling. 50-54 for building and browsing, 50-51 for interpreting, 52-54 Software quality, see Software testing Software Quality Metrics, 68 Software reliability difficulties with, 220-221. 225 prediction of, 219-220 testing, 194, 211-217 Software requirements specification, 107, 109 Software testing, 191-229 background and terminology, 198-201 oracle problem, 200-201 terminology, 198-200 unit vs system test, 201 comparisons of methods, 217-220 reliability prediction. 219-220 subsumes relation, 218-219 cost, 224, 226 dependability, 195,220-225 definitions, 224 probable correctness, 221-222 reliability-based issues, 221, 225 self-checking programs. 223-224 testability analysis, 222-223 failure detection, 197-198,201-211 coverage, principles of, 209-21 1 coverage, suggestions for, 210-21 1 functional testing, 202-204 in software process. 21 1 structural testing, 202-203.204-209 systematic testing, 201-203 formal methods, 195-197 importance, 192-194 probabilistic features, 221-223.225 purpose, 197-198 reliability, 194, 21 1-217 MTTF. 212-213,214-216,219-220 operational profile, 213-214, 216-217 physical systems analogy, 21 1-213 random testing, 213-214.220 software reliability theory, 214-216 technical difficulties with, 220-221 software process and, 195 software quality, as absence of failure, 194-195
311
Source code modification, benchmarking run
rule, 240 SPEC, see System Performance Evaluation Corporation Special features, benchmarking run rule, 239-240 Special libraries. benchmarking run rule, 240 Specification of a program, 199.200 of test workload,232 Specification language, software testing and, 196 SPECint92 benchmarks, 242-251 SPEC92 benchmarks, 239-240 Spiral model, of software development, 11 Sponsor, CASE team member, 88.97. 101-102, 112 SQM,see Software Quality Metrics Standardization, of benchmarking, 239. 251 Standards
for CASE, 103, 133-134 for integrated SEES. 182-184, 187 future of, 187 improvements needed, 183-184 interfaces and, 183-184 shortcomings, 182-1 83 software process models as, 12 STATEMENT, for process representation. 42-45,51 Statement testing, of software, 204,218,219 Structural testing, of softwan, 202-203. 204-209 Structured analysis, for CASE adoption, 107, 108, 131 Subprocesses, and CASE adoption, 102-103, 139-141 Subsumes relation, and software testing, 2 18-219 Success, as opposite of failure, 199 Supercomputer benchmarking, 237 Support, and SEE three-level model, 164 Synchronous update, in transaction processing, 266 Synonyms, in transaction processing, 275 Syntactic heterogeneity, 272 Syntactic level, of information agreement, 169 Synthetic benchmark. 235 System design, and benchmarking, 233 System manager, and CASE adoption, 148
312
SUBJECT INDEX
System Performance Evaluation Corporation, benchmarks, 238-240, 242-251 System testing, 201
T Tailorable model, of processes, 25, 27, 74 Tailoring, of CASE tools. 133-134, 141-143, 145, 147 Taxonomy instantiations, and transaction processing, 260-280 Team, for CASE adoption evaluation by, 152 importance, 103-106 as resource, 97 roles of members, 88-89, 104-105 skills needed, 147-148 training, 103, 105-106, 110, 144-146 Team model. see Multiperson model Technical criteria, for CASE evaluation, 132-135 Technical staff support, for CASE adoption, 92-93 Technology maturation, 87-88; see also Immaturity, of CASE technology TEMPO, for process representation, 34-35 Terminal inputs and outputs, benchmarking run d e , 241 Testability analysis, and software reliability, 222-223 Testset, 198-200 Three-filter approach, to CASE evaluation, 125-127 Token, for process representation, 41-42 Tolerance. and benchmarking accuracy, 241-242 Tool, see also Computer-aided software engineering; Integrated software engineering environments comparison to service. 163 COTS, 131, 140, 141-143 features, 134-135 functionality of, 181- 182 HOTS, 141-143, 145, 147 training for use of, 145-146 Tool integration community, of process representation, 28-29 Tools dimension, of organizational smcture, 116 Tool-to-framework relationship. in SEEs, 167
Tool-to-process relationship, in SEEs. 167 Tool-to-tool relationship, in SEES, 167 Total Quality Management, 14 CASE adoption, 96 Experience Factory, 75, 76, 79 TPC, see Transaction Processing Council TP system, see Transaction processing system TQM, see Total Quality Management Trace, and benchmarking, 242-245 Traceable model, of processes, 27 Trade shows, on CASE technology, 122-123 Training for CASE adoption, 103, 105-106, 110, 143-147 components, 146-147 types, 106 Transaction model, 256 Transaction Processing Council, benchmarks, 237-238, 241 Transaction processing system, 255-296 data and transaction models, 256-257 hardware model, 257-259 as a quintuple, 258-259 research needs, 287 software model, 258-259 systems evolution, 280-287 base system, 281-282 data dimension, 282-283 machine dimension, 283-284 machine heterogeneity dimension, 284-286 open areas, 287 process parameter, 28 1 site dimension, 286 taxonomy instantiations, 260-280 base system, 260-261 data parameter, 27 1-276 machine heterogeneity parameter, 267-271 machine parameter, 262-267 process parameter, 261-262 site parameter, 276-280 terminology data parameter, 271-273 machine heterogeneity parameter, 267-269 machine parameter, 262-263 overview, 257-259 site parameter, 276-277 top-down vs bottom-up conshuction, 270
313
SUBJECT INDEX traditional Tp, beyond, 287-294 commercial software for, 291 extending traditional TP.290-293 nontraditional TP,288-290 Transaction properties, 264
U Understandable model, of processes, 26. 27 unimachine system in transaction pocessing, 261 Unique foibles dimension, of organizational structure, 116-1 17 Unit testing, of software, 201 User-locked license, 137 Utility benchmark, 236
encodings by, and benchmarking, 245 learning from, in CASE adoption, 120-122 questions for, in CASE adoption, 121- 122 standards and, 184 training from, 143-144 Verification, in software development, 6 Victim target, of CASE adoption, 98-99, 110-113, 137-153 Views for process representation languages, 45-48 virtual, in transaction V e s s i n g , 283 Virtual address trace, 244 Vision, for CASE adoption plan, 97
W V Validation, in software development, 6 in measurement processes, 16 by Quality Improvement Paradigm, 14- 15 Vendor cheating by, in benchmarking, 239. 241
Well integrated SEE, 167-168 Workstation benchmarking. 238-239 2
Z language, 1%
This Page Intentionally Left Blank
Contenta of Volumes in This 8ori.r
Volume 1 General-purpoSe Programming for Business Applications C a m C. GOYLIEB Numerical Weather prediction NORMAN A. PHILLlps The Present Status of Automatic Translation of Languages YEHOSHVA BAR-HILLEL Programming Computers to Play Games A R m L. SAMUEL Machine Recognition of Spoken Words bmum FATEHCHAND Binary Arithmetic GEORGE W. Rmwmm Volume 2 A Survey of Numerical Methods for Parabolic Differential Equations Jm DOUGUS. JR. Advances in OrthonormalizingComputation PHILIPJ. DAVISAM) PHalp Rmmowm. Microelectronics Using Electron-Beam-Activated Machining Techniques I h W E T H R. SHOULOeRS Recent Developments in Linear Programming SAUL I. GASS The Theory of Automata: A Survey ROBERT MCNAUGHTON Volume 3 The Computation of Satellite Orbit Trajectories S a m D. Corn Multiprogramming E. F. CODD Recent Developments in Nonlinear Programming M I P worn Alternating Direction Implicit Methods GARR~T BIRKHOFF. b c w S.VAROA. AM) DAVID YOUNG Combined Analog-Digital Techniques in Simulation HAROLDF. SICRAMSTAD Information Technology and the Law
c.LAWLOR
Volume 4 The Formulation of Data Processing Problems for Computers WILLUMC. M&EE
315
316
CONTENTS OF VOLUMES IN THIS SERIES
All-Magnetic Circuit Techniques AND HEwm D. CRANE DAVIDR. BENNION Computer Education HOWARLI E. TOMPKINS Digital Fluid Logic Elements H. H.G L A E ~ I Multiple Computer Systems WILLIAMA. CURTIN Volume 5
The Role of Computers in Election Night Broadcasting JACKMOSHMAN Some Results of Research on Automatic Programming in Eastern Europe WUDYSLAWTURSKI A Discussion of Artificial Intelligence and Self-organization GORDON PASK Automatic Optical Design ORESTES N. STAVROUDIS Computing Roblems and Methods in X-Ray Crystallography CHARLESL. COULTER Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL An Introduction to Procedure-Oriented Languages HARRY D. HUSKEY Volume 6
Information Retrieval CLAUDEE. WALSTON Speculations Concerning the First Ultraintelligent Machine IRVINGJOHNGOOD Digital Training Devices CHARLES R. WICKMAN Number Systems and Arithmetic HARVEY L. GARNER Considerations of Man versus Machines for Space Probing P.L. BAROELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HERBERT GELENVIER Volume 7
Highly Parallel Information Processing Systems JOHNC. MURTHA Programming Language Processors RUTHM. DAVIS The Man-Machine Combination for Computer-Assisted Copy Editing WAYNEA. DANIELSON
CONTENTS OF VOLUMES IN THIS SERIES
317
Computer Aided Typesetting WILLIAM R. BOZMAN Programming Languages for Computational Linguistics ARNOLD C. SAT~ZRTHWAIT Computer Driven Displays and Their Use in MadMachhe Interaction ANDRIESVAN DAM Volume 8 Time-shared Computer Systems THOMAS N. PYKE. JR. Formula Manipulation by Computer JEANE. SAMMET Standards for Computers and Information Processing T. B. STEEL,JR. Syntactic Analysis of Natural Language NAOMISAGER Programming Languages and Computers: A Unified Metatheory R.NARASIMHAN Incremental Computation LIONELLO A. LOMBARDI Volume 9 What Next in Computer Technology? W. J. POPPELEIAUM Advances in Simulation JOHNMCLEOD Symbol Manipulation Languages PAUL W. AEIRAHAMS Legal Information Retrieval AVIEZRI S. FRAENKEL Large Scale Integration-An Appraisal L. M. SPANDOWER Aerospace Computers A. S.BUCHMAN The Distributed Processor Organization L. J. KOCZELA Volume 10 Humanism, Technology, and Language CHARLFSDECARLO Three Computer Cultures: Computer Technology, Computer Mathematics. and Computer Science PETER WEGNER
Mathematics in 1984-The Impact of Computers BRYAN THWAITES Computing from the Communication Point of View E. E. DAVID,JR.
318
CONTENTS OF VOLUMES IN THIS SERIES
Computer-Man Communication: Using Computer Graphics in the Instructional Rocess FREDERICK P. BROOKS. JR. Computers and Publishing: Writing, Editing, and Rinting ANDRIESVAN DAMAND DAVIDE. RICE A Unified Approach to Pattern Analysis ULFGRENANDER Use of Computers in Biomedical Pattern Recognition S.LEDLEY ROBERT Numerical Methods of Stress Analysis WILLIAM PRAGER Spline Approximation and Computer-Aided Design J. H. AHLBERG Logic per Track Devices D. L. SLOTNICK Volume 11 Automatic Translation of Languages Since 1960: A Linguist’s View HARRYH. JOSSELSON Classification, Relevance, and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conventional Speech KLAUS w. 0” Man-Machine Interaction Using Speech DAVIDR. HILL Balanced Magnetic Circuits for Logic and Memory Devices R. B. K~EBURTZAND E. E. NEWHALL Command and Control: Technology and Social Impact ANTHONYDEBONS Volume 12
Information Security in a Multi-User Computer Environment JAMES P. ANDERSON Managers, Deterministic Models, and Computers G. M. FERRERO DIROCCAFERRERA Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROBERTS Systems Programming Languages F. W.TOMPA.AND A. VAN DAM R. D. BERGERON. J. D. GANNON. D. P. SHXHTER. Parametric and Nonparmtric Recognition by Computer: An Application to Leukocyte Image Processing JUDITHM. S. P R E ~ Volume 13 Programmed Control of Asynchronous Program Intempts RICHARD L. WEXFLLBLAT
CONTENTS OF VOLUMES IN THIS SERIES Poetry Generation and Analysis JAMES JOYCE
Mapping and Computers PATRICIA FULTON Practical Nautical Language Processing: The REL System as Rototype FREDERICK B. THOMPSON AND BOZENA HEmsz THOMFSON AIfificial Intelligence-The Past Decade B. CHANDRASEKARAN
Volume 14 On the Structure of Feasible Computations
J. HARTUWAND J. SIMON A Look at Programming and Programming Systems T.E. CHEATHAM. JR..AND JUDYA. TOWNLEY Parsing of General Context-Free Languages L. GRAHAM AND MICHAEL A. HARRISON SUSAN Statistical Processors w. J. POPPU~AUM Information Secure Systems DAVIDK.HSIAOAND RICHARDI. BAUU
Volume 15 Approaches to Automatic Programming ALANW. BIERMANN The Algorithm Selection Problem JOHNR. RICE Parallel Processing of Ordinary Programs DAVID J. KUCK The Computational Study of Language Acquistion LARRY H.REEKER The Wide World of Computer-Based Education DONALD BWR
Volume 16 3-D Computer Animation CHARLES A. CSURI Automatic Generation of Computer Programs NOAHS.PRYWES Perspectives in Clinical Computing KEVIN
c.O h N E AND A. HALUSKA
The Design and Development of Resource-Sharing Services in Computer Communication Networks: A Survey SANDRA A. MAMRAK Privacy Protection in Information Systems R E I N TURN
319
CONTENTS OF VOLUMES IN THIS SERIES
Volume 17 Semantics and Quantification in Natural Language Question Answering w.A. WOODS Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base NAOMI SAGER Distributed Loop Computer Networks Mmo T. LIU Magnetic Bubble Memory and Logic TIENCHIC m AND Hsu CHANG Computers and the hblic's Right of Access to Government Information ALAN F. W e s m
Volume 18 Image Rocessing and Recognition Azmz ROSENFELD Recent Progress in Computer Chess MONROE M. NEWBORN Advances in Software Science M. H. I - L u s ~ D Current Trends in Computer-Assisted Instruction PATRIC SUPF'ES Software in the Soviet Union: Progress and Roblems
s. E.CbODMAN
Volume 19 Data Base Computers DAVIDK.Hsmo The Structure of Parallel Algorithms H. T. KUNG Clustering Methodologies in Exploratory Data Analysis Rtawm Dunm AND A. K.Jm Numerical Software: Science or Alchemy? C. W. GEAR Computing as Social Action: The Social Dynamics of Computing in Complex Organizations ROB KLING AND WALT SCACCHI
Volumo 20 Management Information Systems: Evolution and Status GARYW. DICKSON Real-Time Distributed Computer Systems W. R. FRANTA, E. DOUGUSJmSm. R Y.KAm. AND ~ e O R O ED.h l A R S W Architecture and Strategies for Local Networlrs: Examples and Important Systems K. J. THURBER Vector Computer Architecture and Rocessing Techniques KAI HWANG.SHUN-ho Su. AND LIOW M. NI
CONTENTS OF VOLUMES IN THIS SERIES An Overview of High-Level Languages JEANE. SAMMET
Volume 21 The Web of Computing: Computer Technology as Social Organization ROBKLma AND WALT SCACCHI Computer Design and Description Languages SUBRATA DAWUPTA Microcomputers: Applications, Problems. and Promise ROBERT C. GAMMILL Query Optimization in Distributed Data Base Systems GIOVANNI MARIASACCOAND S.Bma YAO Computers in the World of Chemistry PETER LYKOS Library Automation Systems and Networks JAMES E. RUSH Volume 22 Legal Protection of Software: A Survey MICHAEL C. GEMIONAM Algorithms for Public Key Cryptosystems: Theory and Applications S.LAKSHMNARAHAN Software Engineering Environments ANTHONYI. WASSERMAN Principles of Rule-Based Expert Systems BRUCE G.BUCHANAN AND RICHARD 0. DUDA Conceptual Representation of Medical Knowledge for Diagnosis by Computer: MDX and Related Systems AM) SANlAY MmAL B. CHANDRASEKARAN Specification and Implementationof Abstract Data Types ALFS T. B M s s AND SATISH 'I1um Volume 23 Supercomputersand VLSI: The Effect of Large-Scale Integration on Computer Architectun LAWRENCE SNYDER Information and Computation J. F. TRAUE AND H. WOZNIAKOWSKI The Mass Impact of Videogame Technology THOMAS A. DEFANTI Developments in Decision Support Systems CLYDEW. HOLSAFPLE,AND ANDREW B. WHINSTON ROBERT H. BONCZEK. Digital Control Systems PereR Dorun, AND DANIELPEIZRSEN International Developments in Information Privacy G. K. GWTA Parallel Sorting Algorithms S.LAKSHMIVARAHAN. SUDARSHANK.D w L , AND LESLIE L. MILLER
321
CONTENTS OF VOLUMES IN THIS SERIES Volume 24
Software Effort Estimation and Productivity S.D. C o r n . H. E.DUNSMORE. AND V.Y.SHEN Theoretical Issues Concerning Protection in Operating Systems MICHAEL A. HARRISON Developments in Firmware Engineering SUBRATA DASGUITAAND BRUCED. SHWR The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance RANANB. BANERJI The Current State of Language Data Pmcessing PAULL. GARVIN Record? Advances in Information Retrieval: Where- Is That I#*&@$ DONALD H. KRAF? The Development of Computer Science Education WILLIAM F. ATCHISON
Volume 21 Accessing Knowledge through Natural Language AND GORDON MCCALLA NICKCERCONE Design Analysis and Performance Evaluation Methodologies for Database Computers STEVEN A. DEMURIIAN. DAVIDK. Hsuo. AND PAULA R. STRAWSER Partitioning of Massive/ReaLTime Programs for Parallel Processing I. LEE.N. PRYWES. AND B. SZYMANSKI Computers in High-Energy Physics MICHAEL METCALF Social Dimensions of Office Automation ABBEMOWSHOWITZ
Volumo 28 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVADWA Unary Processing A. DOLLAS. J. B. GLICKMAN. AND C. O'TOOLE W. J. POPPELBAUM. Parallel Algorithms for Some Computational Problems ABHA M o m AND S.S ~ ~ A R AIYENGAR MA Multistage Interconnection Networks for Multiprocessor Systems S.C. KOTHARI Fault-Tolerant Computing WINGN.TOY Techniques and Issues in Testing and Validation of VLSI Systems H.K. REC~HBATI Software Testing and Verification LEEJ. WHITE Issues in the Development of Large, Distributed, and Reliable Software A m PRAKASH. VUAYGARG.TSUNEOYAMAURA. AND ANUPAM BHIDE C. V.RAMAMOORTHY.
CONTENTS OF VOLUMES IN THIS SERIES Volume 27 Military Information Processing JAMES STARKDRAPER MultidimensionalData Structures: Review and Outlook S. S ~ A R A M IYENGAR. A R. L. KASHYAP.V. K.VAISHNAVI. AND N. S. V. RAO Distributed Data Allocation Strategies ALAN R. HEVNER AND h U N A RAO A Reference Model for Mass Storage Systems STEFTENW. MLLER Computers in the Health Sciences KEVIN
c.O’KANE
Computer Vision AVUEL ROSENFELD SupercomputerPerformance: The Theory, Practice, and Results OLAFM. LUBECK Computer Science and Information Technology in the People’s Republic of China: The Emergence of Connectivity JOHN H. MAER Volume 28 The Structure of Design Processes SUBRATA DASGUITA Fuzzy Sets and Their Applications to Artificial Intelligence KANDELAND MORDE~HAY SCHNEIDER ABRAHAM Parallel Architecture for Database Systems A. R.HURSON. L. L. MILLER, S. H.PAKZAD. M. H. EICH.AND B. SIURAZI Optical and Optoelectronic Computing MIRMOJTABA MIRSALEHI, MUSTAFA A. G.ABUSHAGUR. AND H. JOHN CAULFIELD Management Intelligence Systems MANFRED KOCHEN
Volume 29 Models of Multilevel Computer Security JONATHAN K.MILLEN Evaluation, Description, and Invention: Paradigms for Human-ComputerInteraction JOHN M. CARROLL Protocol Engineering MINGT. LIU Computer Chess: Ten Years of Significant Regress MONROE NEWBORN Soviet Computing in the 1980s RICHARDW. JUDY AND ROBERT W. CLOUGH Volume 30 Specialized Parallel Architectures for Textual Databases L. L. MILLER.S. H. PAKZAD.AND JIA-BINGCHENG A. R.HURSON.
323
324
CONTENTS OF VOLUMES IN THIS SERIES
Database Design and Performance MARKL. GILLENSON Software Reliability ANTHONYWINO AM) JOHN D. MUSA Cryptography Based Data Security AND Yvo DESMELIT GEORGE J. DAVIDA Soviet Computing in the 1980s: A Survey of the Software and Its Applications RICHARD W. JUDYAND ROBERT W. CLOUGH
Volume 31 Command and Control Information Systems Engineering: Progress and Prospects STEPHEN J. ANDRIOLE Perceptual Models for Automatic Speech Recognition Systems RENAMDEMORI.MATHEW 1. PALAWL.AND PIEROCOSI Availability and Reliability Modeling for Computer Systems DAVIDI. HEIMANN. NITINMIITAL.AND KISHORS.TRIVEDI Molecular Computing MICHAEL CONRAD Foundations of Information Science ANTHONY DEBONS
Volume 32 Computer-Aided Logic Synthesis for VLSI Chips SABURO MUROGA Sensor-Driven Intelligent Robotics M.TRIVEDIAND CHUXINCHEN MOHAN Multidatabase Systems: An Advanced Concept in Handling Distributed Data A. R. HURSON AND M. W. BRIGHT Models of the Mind and Machine: Information Flow and Control between Humans and Computers KENTL. NORMAN Computerized Voting ROYG. SALTMAN
Volume 33 Reusable Software Components H. ZWEBEN BRUCE W. WEIDE.WILLIAM F. &DEN. AND STUART Object-Oriented Modeling and Discrete-Event Simulation BERNARD P. ZIEGLER Human-Factors Issues in Dialog Design AND MARTIN HELANDER THIAGAWAN PALANIVEL Neurocomputing Formalisms for Computational Learning and Machine Intelligence S.GULATI. J. BARHEN. AND S.S.IYENGAR
CONTENTS OF VOLUMES IN THIS SERIES Visualization in Scientific Computing D. BROWN THOMASA. DEFANTIAND MAJCINE Volume 34
An Assessment and Analysis of Software Reuse TEDJ. BIGGERSTAFF Multisensory Computer Vision AND J. K.AGGARWAL N. NANDHAKUMAR Parallel Computer Architectures RALPHDUNCAN Content-Addressable and Associative Memory AND R. JAMESDUCKWORTH LAWRENCE CHISVIN Image Database Management I. GROSKY AND RMIVMEHROTRA WILLIAM Paradigmatic Influences on Information Systems Development Methodologies: Evolution and Conceptual Advances RUDYHIRSCHHEIM AND HEINZK.KLEIN Volume 35
Conceptual and Logical Design of Relational Databases S.B. NAVATHE AND G. PERNUL Computational Approaches for Tactile Information Processing and Analysis P.GADAGKAR AND MOHAN M. TRIVEDI HRISHIKESH Object-Oriented System Development Methods ALANR. HEVNER Reverse Engineering JAMESH. CROSS11. ELLIOTJ. CHIKOFSKY. AND CHARLES H.MAY,JR. Multiprocessing CHARLES J. FLECKENSTEIN. D. H. GILL.DAVIDHEMMENDINGER. C. L.MCCREARY. JOHND. MCGREGOR.ROYP. PARGAS. ARTHUR M. RIEHL.AND VIRGILWALLE"E The Landscape of International Computing AND HSINCHUN CHEN EDWARD M. ROCHE.SEYMOUR E. GOODMAN. Volume 38
Zero Defect Software: Cleanroom Engineering D. MILLS HARLAN Role of Verification in the Software Specification Process MARVIN V.ZELKOWITZ Computer Applications in Music Composition and Research AND JEFFREY E. HASS GARYE. W ~ I C HERIC . J. ISAACSON. Artificial Neural Networks in Control Applications V. VEMURI Developments in Uncertainty-Based Information GEORGEJ. KLIR Human Factors in Human-Computer System Design MARYCAROLDAYAND SUSANJ. BOYCE
325
CONTENTS OF VOLUMES IN THIS SERIES Volume 37
Approaches to Automatic Programming RICHAND RICHARD C. WATERS CHARLES Digital Signal Rocessing STEPHEN A. DYERAND BRIANK. HARMS Neural Networks for Pattern Recognition S. C. KOTHARI AND HEEKUCK OH Experiments in Computational Heuristics and Their lessons for Software and Knowledge Engineering JURGNIEVERGELT High-level Synthesis of Digital Circuits GIOVANNI DE MICHELI Issues in Dataflow Computing BENLEEAND A. R. HURSON A Sociological History of the Neural Network Controversy MI= C U Z A l U N Volume 38
Database Security GUNTHERPERNIJL Functional Representation and Causal Processes B. CHANDRASEKARAN Computer-Based Medical Systems JOHNM. LONG Algorithm-Specific Parallel Rocessing with Linear Processor Arrays JOSEA. B. FORTES,BENJAMIN w. WAH.WEUASHANG. AND KUMAR N.GANAPATHY Information as a Commodity: Assessment of Market Value ABBEMOWSHOWKZ Volume 39
Maintenance and Evolution of Software Products VON MAYRHAUSER ANNELIESE Software Measurement: A Decision-Rocess Approach WARREN HARRISON Active Databases: Concepts and Design Support THOMAS A. MUECK Operating Systems Enhancements for Distributed Shared Memory VIRGINIA La The Social Design of Worklife with Computers and Networks: A Natural Systems Perspective Roe KLmo AND TOMJ m Volume 40
Program Understanding: Models and Experiments A. VON MAYRHAUSER AND A. M. VANS Software Prototyping ALANM. DAVIS
CONTENTS OF VOLUMES IN THIS SERIES
327
Rapid Prototyping of Microelectronic Systems AFOSTOLOS DOLLAS AND J. D. STERLING BABCOCK Cache Coherence in Multiprocessors: A Survey M m S.YOUSIF,M. J. THAZH~VEETIL, AND C. R. DAS The Adequacy of Office Models CHANDRA S. AMARAVADI. JOEYF. GEORGE. OLMA R. LRISHENG. AND JAYF. NUNAMAKER Volume 41 Directions in Software Process Research H. DIETER ROMBACH AND WTIN VERLAOE The Experience Factory and Its Relationship to Other Quality Approaches VICTOR R. BASIL! CASE Adoption: A Rocess, Not an Event JOCKA. RADER On the Necessary Conditions for the C o m p o s i W Integrated Software Engineering Environments DAVID J. CARNEY AND ALANW. BROWN Software Quality, Software Process, and Software Testing DICKHAMLET Advances in Benchmarking Techniques: New Standards and Quantitative Metrics THOMAS CONTE AND WEN-MEI W. H w An Evolutionary Path for Transaction Rocessing Systems CARLTON PV.A W W LEFF.AND SHU-WE! F.CHEN
This Page Intentionally Left Blank