requirements Editor: Suzanne Robertson
■
The Atlantic Systems Guild
■
[email protected] Where Do Requirements Come From? Neil Maiden and Alexis Gizikis
W
hat should a product do? What qualities should it have? How do we determine these requirements for new generations of products? Requirements engineering is an activity in which a wide variety of stakeholders work together to answer these questions. But, the answers do not arrive by themselves—there’s a need to ask, observe,
discover, and increasingly create requirements. If we want to build competitive and imaginative products that avoid re-implementing the obvious, we must make creativity part of the requirements process. Indeed, the importance of creative thinking is expected to increase over the next decade (see the sidebar). Unfortunately, most RE research and practice does not recognize this important trend. In this column, we highlight some important information sources for creativity and innovation and present some useful examples of creative RE as references to guide and inspire readers. Creative design: Where does RE fit? Creativity has increasingly been the focus 10
IEEE SOFTWARE
September/October 2001
of interest in design. For example, IDEO, the innovative design consultancy (www.ideo. com), uses a four-stage method in which design teams try to understand the context of a new product, observe it in real-life situations, visualize it, and then evaluate and refine it.1 The company’s focus is on team-led multidisciplinary creativity and innovation. One of IDEO’s most important activities is to determine customer requirements for new designs. However, RE processes—such as brainstorming, simulating, and visualizing storyboard-illustrated scenarios—are second to IDEO’s tried-and-tested creative design process, environment, and team structure. Current thinking Lemai Nguyen and colleagues have observed that RE teams restructure requirements models at critical points when they reconceptualize and solve subproblems, and moments of sudden insight or sparked ideas trigger these points.2 Their observations support creativity theories from previous work that distinguish between four phases of creativity: preparation, incubation, illumination, and verification.3,4 Incubation and illumination are the most important phases. Incubation handles complexity; during this relaxing period, the designer unconsciously and consciously combines ideas with a freedom that denies linear and rational thought. During the subsequent and shorter illumination phase, a creative or innovative idea suddenly emerges, often at the most unlikely time in the most unlikely place. Many creative fields report this “eureka” effect. Margaret Boden is a leading authority on 0740-7459/01/$10.00 © 2001 IEEE
REQUIREMENTS
cognitive creativity. She identifies basic types of creative processes:5 ■
■
■
exploratory creativity explores a possible solution space and discovers new ideas, combinatorial creativity combines two or more ideas that already exist to create new ideas, and transformational creativity changes the solution space to make impossible things possible.
Most RE activities are exploratory. They acquire and discover requirements and knowledge about the problem domain. RE practitioners have explicitly focused on combinatorial and transformational creativity in RE, although we can construe some research (such as analogical reuse of requirements)6 as creative design. Mihaly Csikszentmihalyi identifies three social contexts that influence creativity: domain, field, and person.7 The social environment as well as cognitive processes influence creative thinking. IDEO recognizes this with its greenhouses—carefully designed office spaces that encourage creative thinking by imposing as few
rules as possible.1 The authors have applied these ideas by running a series of workshops to promote creative design during RE. In each workshop, stakeholders work with design experts to envision ideas that could later provide the basis for further requirements. An example Computerized systems for managing air traffic are becoming increasingly complex, and we need new approaches to cope with future demands and to use technology effectively. We are working with Eurocontrol’s Experimental Center in Paris to design and implement an innovative process for determining stakeholder requirements for a system called CORA-2 (Conflict Resolution Assistant). This system will provide computerized assistance to air traffic controllers for resolving potential conflicts between aircraft. We have designed an RE process with three concurrent, independent activities to determine the stakeholder requirements, generate creative designs, and model the system’s cognitive and work context. We keep these activities independent to avoid overcon-
straining the CORA-2 requirements. Instead, we trade off requirements and design ideas at predefined synchronization points. The resulting design alternatives provide the basis for a systematic scenario-driven process for acquiring and validating stakeholder and system requirements. We encourage creative thinking by holding workshops. We design each workshop to support four essential processes (preparation, incubation, illumination, and verification) in creative thinking. To prepare stakeholders, we maintain a regular dialogue between workshops through email to encourage them to discuss creative processes and ideas from earlier workshops. In each workshop, we encourage ideas to incubate using techniques such as sponsoring presentations from experts in non–air-traffic-control domains, playing games to remove people’s inhibitions, listening to music and discussing paintings, and making airplanes from balloons. Shorter, more intensive periods follow each incubation period to illuminate ideas. People work together in dynamically designed groups to generate new ideas
REQUIREMENTS
Creativity: The Next Big Thing Historically, the industrial revolution replaced agriculture as the major economic activity, and then information technology replaced industrial production. Now the Nomura Research Institute in Japan predicts that information technology will be replaced with a new dominant economic activity focusing on creativity (see http://members.ozemail.com.au/~caveman/Creative/Admin/crintro. htm). The shift of IT organizations toward the creative sector and companies striving to design innovative products that combine and use existing technologies in unanticipated ways is beginning to justify this prediction. In the literature on creativity, we identify three models that describe the creative process’s characteristics: ■ Graham Wallas’s model describes creativity from an individual’s point of view
and defines the four cognitive phases of a creative process;1 ■ Margaret Boden’s definition distinguishes three types of creativity and examines the difference between creativity and novelty;2 and ■ Mihaly Csikszentmihalyi’s model captures the socio-cultural characteristics of creativity by describing how it occurs in a domain of knowledge and rules.3 It outlines the transfer of knowledge between domain and individuals and the contribution of individuals to the domain. This model strongly represents the social environment required in modern problem situations. Elsewhere, Robert Sternberg reports a thorough and focused collection of chapters focusing on major psychological approaches to understanding creativity.4 Teresa Amabile’s research is useful for its numerous summaries.5 References 1. G. Wallas, The Art of Thought, Harcout Brace, New York, 1926 2. M.A. Boden, The Creative Mind, Abacus, London, 1990. 3. M. Csikszentmihalyi, Creativity: Flow and the Psychology of Discovery and Invention, Harper, London, 1997. 4. R.J. Sternberg, ed., The Nature of Creativity: Contemporary Psychological Perspectives, Cambridge Univ. Press, Cambridge, Mass., 1988. 5. T.M. Amabile, Creativity in Context: An Update of the Social Psychology of Creativity, Westview Press, Boulder, Colo., 1996.
building on the previous creative process cycle. In later workshops, we also encourage idea verification by asking stakeholders to assess, rank, and categorize new ideas in different ways. We repeat this process several times in each workshop. To guide the participants during these processes, we used Roger Von Oech’s explorer, artist, judge, and warrior roles to focus them on specific activities.8 One of these workshops’ most prominent features is that we invite experts from domains that have similarities (not necessarily obvious at first) to aircraft conflict resolution to support idea incubation. To encourage insight into the conflict resolution domain, we invited experts in Indian textile design 12
IEEE SOFTWARE
September/October 2001
and modern music composition. Similarly, to explore candidate design ideas as fully as possible, we invite experts in creative systems design, brainstorming, and information visualization. Finally, to incubate creative combinations of these design ideas, at one workshop, we invited one of London’s top fusion chefs to talk about combining unusual ingredients and to demonstrate fusion cooking in the workshop. The resulting pâté de foie gras with bruschetta and mango chili salsa was an unlikely but delicious combination. The result of these workshops has been to inspire pilots, air traffic controllers, administrators, managers, and developers to work together and create new ideas. We have learned
that by exploiting even simple creativity models, we can promote significant levels of creative thinking about requirements and candidate designs for an air traffic control system. We look forward to reporting these results more fully in the future.
W
e believe that RE is a highly creative process, and economic and market trends mean that it will become much more creative. Unfortunately, current work on RE does not recognize the critical role of creative thinking. We have argued that you should recognize the importance of creative thinking in RE, and we recommend useful information sources for people who wish to know more about creativity. Furthermore, we invite readers to contact us with their thoughts or experiences on the subject or to find out more about our creative RE process. References 1. T. Kelley, The Art of Innovation, Harper Collins, New York, 2001. 2. L. Nguyen, J.M. Carroll, and P.A. Swatman, “Supporting and Monitoring the Creativity of IS Personnel during the Requirements Engineering Process,” Proc. Hawaii Int’l Conf. Systems Sciences (HICSS-33), IEEE CS Press, Los Alamitos, Calif., 2000. 3. J. Hadamard, An Essay on the Psychology of Invention in the Mathematical Field, Dover, New York, 1954. 4. H. Poincare, The Foundations of Science: Science and Hypothesis, The Value of Science, Science and Method, Univ. Press of America, Washington, D.C., 1982. 5. M.A. Boden, The Creative Mind, Abacus, London, 1990. 6. N.A.M. Maiden and A.G. Sutcliffe, “Exploiting Reusable Specifications through Analogy,” Comm. ACM, vol. 34, no. 5, Apr. 1992, pp. 55–64. 7. M. Csikszentmihalyi, Creativity: Flow and the Psychology of Discovery and Invention, Harper, London, 1996. 8. R. van Oech, A Kick in the Seat of the Pants: Using your Explorer, Artist, Judge and Warrior to Be More Creative, Harper and Row, New York, 1986.
Neil Maiden is head of the Centre for HCI Design at City
University in London, where he leads multidisciplinary research in systems and requirements engineering. Contact him at the Centre for HCI Design, City Univ., Northampton Square, London EC1V OHB;
[email protected]; www-hcid.soi.city.ac.uk. Alexis Gizikis is a research student interested in the cre-
ative aspects of systems requirements engineering and design. Contact him at the Centre for HCI Design, City Univ., Northampton Square, London EC1V OHB;
[email protected].
manager Editor: Donald J. Reifer
■
Reifer Consultants
■
[email protected] Engineers Will Tolerate a Lot of Abuse Watts S. Humphrey
E
very year, one in five programmers changes jobs. The waste of time and money is extraordinary. If you could prevent just one programmer from leaving, you would save your organization $50,000 to $100,000 in replacement and training costs alone. Consider these facts.
Turnover is expensive now and the costs will likely increase. To address the turnover problem, we must understand it. That is, we need to consider the principles of employee turnover (see the sidebar).
The Bureau of Labor Statistics estimates that, by 2006, half the programmers hired by US industry will be replacements for ones who have left the field. These are experienced programmers who have taken up some other kind of work. ■ Turnover in our field averaged 14.5 percent a year in 1997 and was increasing. My personal observations suggest that annual turnover currently runs between 15 to 30 percent. For East Coast companies, the range is 15 to 20 percent a year, with the figure growing to about 20 to 30 percent in California. Turnover is expensive. The average cost to replace a single employee (any employee) usually starts at about $10,000, not including relocation, training, and lost work time. Considering all costs, it can take nearly a full year’s wages to replace an employee. With the US programmer population currently standing at about 3 million, turnover is the largest avoidable expense
Software people aren’t pulled out of organizations, they’re pushed “That’s crazy,” you say. “We need software people. We would never push them out.” You might think so, but that’s not how many managers behave. Let’s consider why people actually leave organizations. Many of us think that software people leave because they find more interesting and betterpaying jobs elsewhere. It’s certainly true that they might receive stock options, a big promotion, or a chance to do more exciting work, but is that really why they leave? A few people do go looking for more money, stock options, or better opportunities, but in general, that’s not what happens. Consider your own case. Could you find a better-paying job right now? Of course you could. It might take a little time to find just the right job, but most of us have known for most of our working lives that we could get a promotion and a bigger paycheck if we just looked around. So, why don’t you start looking right now? If you’re like most of us, your current job might not be ideal, but, on balance, its advantages outweigh the disad-
■
■
■
0740-7459/01/$10.00 © 2001 IEEE
for most software organizations—and it is rarely budgeted, planned, or managed.
13
MANAGER
vantages. More importantly, you don’t want to disrupt your life by changing to a largely unknown situation that could easily be worse. So software people leave their organizations because they started to look around. That’s the first part of our principle: they were not pulled out of the organization. Now, what about being pushed? Or, to put it another way, why do software people start looking? A few anecdotal stories will shed some light on this question. ■
■
■
14
When our lab moved to a new building 30 miles away, one engineer told me that he had never considered leaving until a questionnaire from the personnel department asked if he were going to stay with the company—or not. His instincts told him that he was naturally going to stay. However, when he looked at the questionnaire box for “or not,” he started considering alternatives. He ended up leaving. In a company I’ll call e-RATE, several engineers left when the business unit was sold. Management announced they would keep the facility open for at least another year, but, as several of the engineers later told me, that didn’t sound like job security. So, they then started looking around and quickly found better jobs. In launching a major economy drive, another company’s president mandated an across-the-board 5percent layoff—even in engineering. When the layoff rumor hit, some of the best software people got nervous and started looking around. In no time, the firm lost over 25 percent of its best engineers. Although the company certainly cut payroll costs in a hurry, contract revenue dried up, several projects failed, IEEE SOFTWARE
September/October 2001
ment?” The answer leads to the second principle of employee turnover.
■
and it had a worse financial problem than before. In yet another organization, projects were in perpetual crisis and programmers regularly worked 50- and 60-hour weeks. Leave at quitting time, and you’d be criticized. Take a Saturday with your family, and you’d likely be branded disloyal. Turnover approached 30 percent, with several people quitting programming entirely. As one engineer said, “I need a life.”
These situations certainly qualify as pushing people out the door. However, while some engineers left, most stayed—that’s what is really interesting. What’s more, most engineers will stay in spite of a steady stream of unthinking administrative actions. The important question is, “Why would engineers stay when they receive this kind of treat-
Software people will tolerate a lot of abuse This principle is consistent with everything we have learned about change: People won’t change to some new and unknown situation unless they are in considerable pain. This is true for changing jobs just as much as for changing processes, tools, or methods. For example, the e-RATE team was formed over a year before the business unit was sold. Many of its engineers had worked together from the beginning. Although several left, they were recent additions—and all the engineers in the original group stayed. Those longtimers had planned this project, designed the system, and were getting into test. They were committed to the project and to each other. They owned this job, and they were going to see it through. When engineers are committed to the job, are working on a cohesive and supportive team, and have a manager they respect and can relate to, they will stay through the good times and the bad.
The important turnover issues are local This brings us to the final question: What can you do about turnover? To answer this question, we must consider the third principle of turnover: Just as in politics, all the important turnover issues are local. This means that you, the engineers’ immediate manager, hold the key to employee turnover. This does not mean that inconTurnover Principles siderate actions and uncompetitive pay and benefits won’t hurt. ■ Software people aren’t pulled out of organizations, they’re pushed. It means that once engineers join an organization, they have made ■ Software people will tolerate a lot of abuse. a personal commitment and will ■ The important turnover issues are local. stay unless something starts
MANAGER
them thinking about the alternatives. Although inconsiderate administrative actions and unthinking management behavior are highly undesirable and can easily cause problems, most engineers overlook them as long as their jobs are personally rewarding and they feel committed to their teams. They are generally so focused on what they are doing that they tend to ignore other distractions. But if the project is winding down, they get pulled off for a temporary assignment, someone disrupts the team’s chemistry, job pressures disrupt family life, or they think that you don’t appreciate their work, they will start thinking about alternatives. Then they are likely to look around. Perhaps most importantly, if engineers do not find their jobs rewarding or their teams cohesive and supportive, or they do not feel comfortable working with you, almost any disruption will start them out the door. Then you will almost certainly have a turnover problem. So, what can you do? Focus on building an environment that will keep your engineers—that’s the obvious answer. Give them rewarding assignments, build cohesive and committed teams, and know what each of them is doing. Then, every week if possible, show that you appreciate their efforts. Although this seems like a tall order, it is not terribly difficult. Engineers like to be committed to their work and enjoy working on cohesive teams. Given half a chance, they will help you build the kind of environment that will keep them in the business. You just need to get them involved and provide the leadership to put the necessary changes into place.
Focus on building an environment that will keep your engineers.
W
hile much of what you need to do is common sense and basic management, there are some special techniques, particularly for building cohesive and committed teams. I have studied this subject for much of my life and have described it in Managing Technical People (Addison-Wesley, 1997). My publications on the Team Software Process describe the more specific team-building and team-working methods. In
fact, the data from teams that have used the TSP show that you can achieve zero turnover when you follow its principles. You can find out more about the TSP at www.sei. cmu.edu/tsp.
Watts S. Humphrey founded the Software Process
Program at Carnegie Mellon’s Software Engineering Institute. Contact him at the SEI, 4500 Fifth Ave., Pittsburgh, PA 152133890; www. sei.cmu.edu/watts-bio.html.
focus
guest editors’ introduction Benchmarking Software Organizations David Card, Software Productivity Consortium David Zubrow, Software Engineering Institute
e faced many challenges when organizing this special issue— in addition to the usual lack of time for such projects. The biggest challenge was defining our focus, because the term benchmarking is ambiguous. The articles submitted ranged from discussing general measurement concepts (a key tool of benchmarking) to summarizing results of software process assessments using a reference
W
model (such as the Capability Maturity Model framework). Although many consider the assessment of organizational practices against a reference model to be a form of benchmarking, other special issues of IEEE Software recently covered that issue (“Process Maturity,” July/Aug. 2000, and “Software Estimation,” Nov./Dec. 2000). We decided to focus on the two ideas historically recognized as benchmarking in other fields: ■
■
comparing an organization’s quantitative performance with its peers (typically using an industry database), and comparing the processes of cooperating organizations to identify and transfer “best practices.”
Two factors seem to be driving today’s high level of interest in benchmarking among software organizations. First, software acquirers and organizations considering outsourcing want to know if they are dealing with competent potential suppliers. Second, 16
IEEE SOFTWARE
September/October 2001
individual organizations striving to be “best in class” must define “best” to direct their improvement efforts appropriately. Government and industry have attempted to meaningfully compare the practices and performance of different organizations (see the NSDIR focus piece on page 53, for example). Meanwhile, the problems of and techniques for benchmarking have attracted relatively little interest from academics working in software engineering and computer science— although they often are anxious to obtain industry data for their own research purposes. Unfortunately, relatively few of our industry’s practical experiences with benchmarking have been well documented. Consequently, this special issue explores what researchers and software practitioners are doing in the area of benchmarking instead of systematically surveying the topic. Quantitative benchmarking Many methods and tools commonly used to support quantitative benchmarking were originally developed to support cost estima0740-7459/01/$10.00 © 2001 IEEE
tion. This doesn’t invalidate them for benchmarking purposes, but we need to understand their inherent limitations. Estimation typically addresses one project at a time; benchmarking attempts to characterize organizations that typically encompass multiple projects. However, we can’t describe an organization’s performance simply as the sum or average of its projects’ performance. Organizational elements that are not part of a project might provide resources and capabilities (such as quality assurance and reuse libraries). Answering the question, “What is my organization’s current level of performance?” requires considering the whole organization. Comparing projects alone gives an answer, but not the most complete one. Quantitative benchmarking depends on the ability to assemble good historical data. “Collecting Data for Comparability: Benchmarking Software Development Productivity,” by Katrina Maxwell, describes some of the trials and tribulations of an effort to establish a public repository for software engineering data. Emphasizing the need for high-quality data, Maxwell discusses the metadata and operational definitions of the data being collected and reported to the repository. In addition, she cautions that we are less likely to be skeptical and inquisitive about these issues when the data agree with our predispositions. “Organizational Benchmarking using the ISBSG Data Repository,” by Chris Lokan, Terry Wright, Peter Hill, and Michael Stringer, illustrates the use of a semipublic data repository for benchmarking. (An industry association manages the repository.) The example described involves selecting a set of projects from the repository against which to compare an organization’s projects. The authors discuss the value an organization obtains from such an exercise and explain which data are critical to making the exercise successful. Similarly, in “What I Did Last Summer: A Software Development Benchmarking Case Study,” James Heires describes a cost estimation database for benchmark projects to track changes in performance that process improvement initiatives produce. This proprietary cost-estimation database proved to be a suitable vehicle for the purpose. Practice benchmarking While quantitative benchmarking helps describe the level of performance achieved, it
doesn’t provide much insight into how to improve performance. Comparing engineering and management practices helps develop specific ideas and strategies for improvement. “The Benchmarking Process: One Team’s Experience,” by Sam Fogle, Carol Loulis, and Bill Neuendorf, describes an approach to conducting such a benchmarking activity to identify best practices in process asset libraries—a common infrastructure element in software process improvement programs. Although the study’s results are not available, the article describes a methodical approach that might help readers who are interested in conducting their own studies. Finally, “Using Structured Benchmarking to Fast-Track CMM Process Improvement,” by Gareth Thomas and Howard Smith, reports on the results of a less formal benchmarking study intended to identify practices and strategies to accelerate the implementation of a CMM Level 3 process. Smith describes the conduct of the benchmarking visits and results obtained. This example demonstrates how the conventional notion of benchmarking can facilitate process improvement. Common lessons The experiences reported in this special issue demonstrate that it isn’t easy to conduct a successful benchmarking exercise. Three factors are critical to success: ■
■
■
Well-defined objectives. Define what you want to learn from the benchmarking effort and keep your focus narrow. Careful planning. Align the benchmarking process with the stated objectives (ad hoc data collection often leads to random results). Cautious interpretation. Seek to uncover and understand the limitations of the data and other information collected. Don’t try to generalize results beyond the study’s scope.
Lack of consistency remains a serious obstacle to efficient and effective benchmarking. Inconsistent data make quantitative comparisons across organizations problematic, and inconsistent terminology makes comparisons of practices across organizations tedious. As the software engineering discipline matures and industry standards become more widely accepted, this obstacle should diminish.
About the Authors David Card joined the Software Pro-
ductivity Consortium in October 1997 as a researcher and consultant in software measurement and process improvement. He has led several major consulting efforts in CMM-based improvement for Consortium members. Before that, he managed several software tools development and research projects at Lockheed Martin and Software Productivity Solutions. He received an interdisciplinary BS from the American University, then performed two years of graduate study in applied statistics. He has authored more than 30 papers and a book, Measuring Software Design Quality (Prentice Hall, 1990), on software engineering topics, and he is coeditor of ISO standard 15939, Software Measurement Process. He is a senior member of the American Society for Quality and a member of the IEEE Computer Society. Contact him at
[email protected]. Dave Zubrow is Team Leader for the
Software Engineering Measurement and Analysis group within the SEI. His areas of expertise include empirical research methods, data analysis, and data management. Current projects include a study to assess the benefits of model-based verification, measures for product line adoption and management, and development of guidance for pilot projects using quasi-experimental design techniques. He earned his MS in public policy and management and PhD in social and decision sciences from Carnegie Mellon University. He is a member of the editorial board for the Software Quality Professional and chairs the committee on Metrics, Measurement, and Analytical Methods for the Software Division of the American Society for Quality. He also is an ASQ certified Software Quality Engineer. Contact him at
[email protected].
September/October 2001
IEEE SOFTWARE
17
focus
benchmarking
Collecting Data for Comparability:
Benchmarking Software Development Productivity Katrina D. Maxwell, Datamax
hether you are benchmarking an organization or simply a project, it all boils down to one thing—data. Do you have the necessary data in your company, and is that data valid and comparable? How can you access data from other organizations? To help you answer these questions and avoid some common serious mistakes in the benchmarking process, I’ve summarized my practical real-life experiences with software project data collection and benchmarking efforts in the following guidelines.
W Collecting comparable benchmarking data is not a straightforward task. The author shares her experience, acquired over eight years, in collecting, validating, analyzing, and benchmarking software development projects. 22
IEEE SOFTWARE
Planning for data collection You can’t benchmark data if you haven’t collected it. Writing a questionnaire and having people complete it is not enough—you need a vision. It’s similar to getting the requirement specifications right before developing software: if you learn that something is wrong with the data after collecting and analyzing it, your conclusions are meaningless, and you have to redo your work. I have wasted months trying to make sense of data collected without a clear purpose and without statistical analysis requirements in mind. If you work for a large company, consider asking the market or operations research department to help design your benchmarking questionnaire. Software managers know about software; data analysts know about questionnaire development. Collecting the right data for your purposes might require a multifunctional team effort.
September/October 2001
Regardless of whether the data concerns chocolate bar sales, financial indicators, or software projects, the old maxim “garbage in equals garbage out” applies uniformly. Make sure that the variable definitions and responses are clear before you collect the data. Typical questions I ask when validating software project databases include: What does a zero mean? Does it mean none, is it a missing value, or was a number close to zero rounded to zero? And if a value is missing, does that indicate no value, don’t know, or, if the question involved choosing from a list, was the correct response missing? Lists that include Other as a choice are also problematic, especially when collecting data for benchmarking. For example, let’s assume that your company questionnaire includes a list of case tools. The case tool used on your project does not appear in the list, so you select Other. This 0740-7459/01/$10.00 © 2001 IEEE
category will thus include many diverse tools and will mean different things for different organizations. When I receive a new software project database, I usually need to spend much more time understanding and validating the data than I do actually analyzing it. You can greatly reduce the risk of collecting the wrong data and the effort spent validating it if you spend more time up-front defining what variables to collect and how to measure them. Think about how you collect data in your own company. How careful are you? Do you ensure that everyone understands the definitions? How do you ensure uniformity over the years? Has your definition of effort evolved over time? Have you always counted support staff effort and tracked management time? If the person initially in charge of collecting the data has left the company, is the current person collecting the data in exactly the same way, using the same definitions? Even assuming that you have a high-quality data collection process for estimating cost and comparing project productivity within your company, if you want to benchmark against other companies, the critical question is: Is your data comparable? Benchmarking and data comparability You can’t benchmark software development productivity if you have not collected size and effort data for your software projects. Productivity is typically defined as output divided by the effort required to produce that output. Although not perfect, we traditionally use software size as a measure of output for software development productivity (size/effort)—for example, 0.2 function points per hour. This should not be confused with the project delivery rate, which is also sometimes referred to as productivity but is actually the reciprocal of productivity (effort/size)—five hours per function point.1 Remember to verify the units! You can measure size in either lines of code or function points. How do you define lines of code—do you include comments, reused code, and blank lines? Additionally, a variety of function-point counting methods exist, including IFPUG, Mark II, 3D, Asset-R, Feature Points, Experience, and Cosmic.2–5 How are you going to count
them? Variation can also occur when different people do the counting—even if it’s for the same project with the same functionpoint counting method. Another question involves effort. Will you measure it in hours or months? If you use months, note that the definition of a work month can vary in other countries and companies. Also, will you include managerial time, support staff time, or just developer time? Will you include unpaid overtime? Did the customer provide significant effort, and will you count it? How many phases are you including—requirements specification through installation or feasibility study through testing? Effort is notoriously difficult to measure accurately, even within a company. In addition to the problems already mentioned, other sources of error include late time sheets, missing cost codes, or misallocation of time for various reasons. In a recent article,6 Martin Shepperd and Michelle Cartwright recount the experience of assisting one organization with its effort-estimating practices. The total effort data available for the same project from three different sources in the company differed in excess of 30 percent. Needless to say, if they do not pay attention to data comparability, two companies measuring the same project can end up with different sizes and efforts. As productivity is calculated by dividing these two error-prone terms, benchmarking productivity is potentially extremely inaccurate. For example, let’s assume that Companies A and B have developed the exact same insurance software application and used exactly the same effort. However, Company A uses the IFPUG 4.0 method,7 which doesn’t count algorithms, and Company B uses the Experience 2.0 function point method,5 which does count them. This results in a 20 percent greater function-point count for Company B. In addition, Company B does not count the effort of installation at the customer site, whereas company A does, and this results in a 20 percent lower effort for Company B. So, for Company A, 100 function points divided by 400 hours equals .25 function points per hour. For Company B, 120 function points divided by 320 hours equals .375 function points per hour. Because Company B divides a 20 percent larger size by a 20 percent smaller effort, it calculates its productivity as 50 percent higher than Company A.
You can’t benchmark software development productivity if you have not collected size and effort data for your software projects.
September/October 2001
IEEE SOFTWARE
23
Productivity rates are highly variable across the software development industry.
Obviously, you need to beware of comparability errors. Unfortunately, we are less likely to ask questions and more likely to believe a result when it proves our point. If you think that comparability errors exist, rather than calculate a single productivity value, calculate a probable range of productivity values assuming an error in both terms. If you want a dependable benchmark of software development productivity, make every effort possible to measure in exactly the same way. One way to compare your data to similar benchmarking data is to collect effort in hours by phase and staff type and to keep the detailed breakdown of the function-point count so that you can create the different effort and function-point metrics. Another way is to decide in advance which benchmarking database you want to use and to collect your data using its definitions. If benchmarking is something you plan to do on a regular basis, you should collect your data with a tool used by other companies that also want to benchmark. In addition, verify that the benchmarking database you use contains projects that the data collector has carefully validated. Benchmarking and project comparability Even if you are measuring productivity in exactly the same way, you must also benchmark against similar projects. It is not enough to measure a project’s size and effort and compare it with a large database’s average productivity. Productivity rates are highly variable across the software development industry. Business sector, requirements volatility, application language, hardware platform, case tool use, start year, and hundreds of other parameters can affect productivity. The identification and interaction of these factors makes comparing productivity rates very difficult. This is why software development databases should be statistically analyzed to determine the factors that contribute most to
the specific database’s productivity variation. Once you’ve identified the variables—or combinations of variables—that explain most of the database’s productivity variation, you can limit your comparisons to projects similar to your own. For example, if you developed a project using Cobol on a mainframe, and language and platform are important factors in explaining productivity differences in the database, then you should only benchmark your productivity against other projects using Cobol on a mainframe platform. On the contrary, if your project uses case tools and using the tools does not explain the differences in productivity of the database projects, there is no point in limiting your comparisons to other projects that also use case tools. So, either verify that the benchmarking service statistically analyzes the data and informs you of the key factors, or that it provides you with the raw data so that you can do so yourself. Also, pay attention to how many projects the benchmark is based on for each subset of data. You might consider a benchmark more reliable if it is based on 20 projects rather than four. Benchmarking against upto-date data is also important. Benchmarking data availability Although many companies would like to benchmark projects, few contribute data to multicompany databases. We need data on a regular basis to keep these services up-to-date. Although large companies with well-established metrics programs, high project turnover, and data analysis competency might be content to benchmark projects internally, smaller companies do not have this option. These companies must look to benchmarking services for access to numerous recent, comparable projects. (See the “Sources of Software Project Benchmarking Data” sidebar for some useful sources.) In addition, most cost estimation tool vendors also have databases that you can use for benchmarking.
Sources of Software Project Benchmarking Data ■ ■ ■ ■ ■
24
IEEE SOFTWARE
Experience Benchmarking: www.datamax-france.com European Space Agency/INSEAD: http://xrise.insead.fr/risenew/rise_esa.html International Software Benchmarking Standards Group: www.isbsg.org.au Rubin Systems: www.hrubin.com Software Productivity Research: www.spr.com
September/October 2001
O
nce you have a valid and comparable software project database, your company possesses a valuable asset. In addition to benchmarking, there are many other things you can learn from your data. For example, which factors influence the productivity of projects in your company? Which variables affect software development duration? Are any of these factors within your control? How accurate of an inhouse cost estimation model can you build with your data? Extract the most value you can from your data collection efforts, and use this knowledge to guide and defend your future actions. Recommendations backed by hard data carry more weight with upper management.
References 1. The Benchmark, Release 6, Int’l Software Benchmarking Standards Group, Australia, 2000. 2. M. Maya et al., “Measuring the Functional Size of Real-Time Software,” Proc. 1998 European Software Control and Metrics Conf., Shaker Publishing BV, Maastricht, The Netherlands, 1998, pp. 191–199; www.escom.co.uk (current 26 July 2001). 3. H. Rehesaar, “Software Size: The Past and the Future,” Proc. 1998 European Software Control and Metrics Conf., Shaker Publishing BV, Maastricht, The Netherlands, 1998, pp. 200–208; www.escom.co.uk (current 26 July 2001). 4. COSMIC-FFP Measurement Manual, Version 2.1, Software Eng. Management Research Laboratory, Univ. of Quebec, Montreal, 2001; www.lrgl.uqam.ca/cosmic-ffp (current 26 July 2001). 5. Laturi-System Product Manual Version 2.0, Information Technology Development Center, Helsinki, Finland, 1996. 6. M. Shepperd and M. Cartwright, “Predicting with Sparse Data,” Proc. 7th Int’l Software Metrics Symposium, IEEE CS Press, Los Alamitos, Calif., 2001, pp. 28–39. 7. Function Point Counting Practices Manual, Release 4.0, Int’l Function Point Users Group, Westerville, Ohio, 1994.
For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.
About the Author Katrina D. Maxwell is a cofounder of
Datamax, a company that specializes in adding value to data. Her research interests include applied data analysis, software productivity, and effort estimation. She has taught courses in quantitative methods at the University of Illinois, INSEAD, and the ESCP Paris School of Management. She received a BS in civil engineering from the University of Illinois and a PhD in mechanical engineering from Brunel University. She was the program chair of the European Software Control and Metrics conference in 2000 and 2001 and is an IEEE Computer Society member. Contact her at 7 bis boulevard Foch, 77300 Fontainebleau, France;
[email protected].
Related References This article is based on the experience I have acquired analyzing software metrics databases. You can find the results of my software development productivity analyses in the following references. K. Maxwell, Software Manager’s Statistics, Prentice Hall PTR, Upper Saddle River, N.J., forthcoming 2002. This book leads you through all the steps necessary to extract the most value from your software project data. Four case studies, covering software development productivity, time to market, cost estimation, and software maintenance cost drivers, provide examples of statistical methods using real data. K. Maxwell and P. Forselius, “Benchmarking Software Development Productivity,” IEEE Software, vol. 17, no. 1, Jan./Feb. 2000, pp. 80–88. This article presents the results of a statistical analysis of the productivity variation of the Experience database, which consists of 206 business software projects from 26 companies in Finland. It provides productivity benchmarking equations that are useful both for estimating expected productivity at the start of a new project and for providing a benchmark for a completed project in each business sector. K. Maxwell, “Benchmarking Software Development Productivity: Statistical Analysis by Business Sector,” Project Control for 2000 and Beyond, R. Kusters et al., eds., Shaker Publishing B.V., Maastricht, The Netherlands, 1998, pp. 33–41. This article provides more details about the statistical analysis of the Experience database. K. Maxwell, L. Van Wassenhove, and S. Dutta, “Performance Evaluation of General and Company Specific Models in Software Development Effort Estimation,” Management Science, vol. 45, no. 6, June 1999, pp. 787–803. This article includes a detailed comparison of several variables, including language, application type, and seven Cocomo factors, that explain the productivity of the European Space Agency database and one company’s database. K. Maxwell, L. Van Wassenhove, and S. Dutta, “Benchmarking: The Data Contribution Dilemma,” Proc. 1997 European Software Control and Metrics Conf., The ESCOM Conf., Reading, UK, 1997, pp 82–92. This paper compares the productivity analysis results of two very different software project databases, the European Space Agency database and the Experience database, and answers the question, Should your company contribute data to multicompany databases? K. Maxwell, L. Van Wassenhove, and S. Dutta, “Software Development Productivity of European Space, Military and Industrial Applications,” IEEE Trans. Software Engineering, vol. 22, no. 10, Oct. 1996, pp. 706–718. This paper presents the results of the analysis of a European Space Agency database consisting of 99 software development projects from 37 companies in eight European countries. It also provides a comprehensive summary of prior software development productivity research publications.
September/October 2001
IEEE SOFTWARE
25
focus
benchmarking
Organizational Benchmarking Using the ISBSG Data Repository Chris Lokan, University of New South Wales Terry Wright, Multimedia Victoria Peter R. Hill, International Software Benchmarking Standards Group Michael Stringer, Software Engineering Management
A software benchmarking experiment performed by the ISBSG determined whether using anonymous data provides any valuable information to an organization. An organization’s completed projects were compared to similar projects in a public data repository to establish averages for the organization and the industry as a whole. 26
IEEE SOFTWARE
he International Software Benchmarking Standards Group1 maintains a repository of data from numerous organizations’ completed software projects. The ISBSG Data Repository has many uses, including project benchmarking, best-practice networking, and summary analyses. The repository, open to the public, has provided research data on several topics, including function points structure,2,3 project duration,4 and cost estimation.5
T
A critical aspect of the repository is confidentiality. Each organization is represented by a code (for example, “contributed by Organization X”) so that the ISBSG can identify projects without revealing the organization itself. Codes are not available to the public. In 1999, an organization contributed a large group of enhancement projects to the repository. The contributing organization received an individual benchmark report for each project, comparing it to the most relevant projects in the repository. The ISBSG also performed an organizational benchmarking exercise that compared the organization’s set of 60 projects as a whole to the repository as a whole. The benchmarking exercise’s first aim was to provide valuable information to the organization. The second aim was to measure the benchmarking exercise’s effectiveness given the repository’s anonymous nature.
September/October 2001
This is a case study in benchmarking using a public, multiorganization data set. What limitations or benefits did we discover from being unable to identify particular organizations and from the wide range of projects represented in the repository? What value did the benchmarked organization receive? What data is critical to benchmarking? The ISBSG The members of this not-for-profit international organization are the national software metrics associations from the US, United Kingdom, Australia, Italy, Germany, Japan, the Netherlands, Spain, and India. Finland and Brazil also cooperate closely with the ISBSG. The ISBSG aims to accumulate a body of knowledge about how software projects are carried out, to learn the lessons of experience through analysis and research, and to 0740-7459/01/$10.00 © 2001 IEEE
Organizational Benchmarking
disseminate the lessons learned about best practices in software development. The ISBSG Data Repository is central to all three goals. It contains data on 1,238 projects from over 20 countries; most projects come from the US, Australia, Canada, the UK, the Netherlands, France, and Brazil. The projects cover a wide range of applications, implementation languages, platforms, and development techniques and tools. Data repository project attributes The ISBSG checks project data validity as projects are added to the repository. It notes doubts about data quality and excludes questionable projects from analyses. The attributes recorded for each project include ■
■ ■
■
■
project context: country, type of organization, business area, and type of development; product characteristics: application type, architecture, and user base; development characteristics: development language and platform, team characteristics, and techniques and tools applied; function points size: effort, duration, cost, and the quality of the delivered product; and qualitative factors influencing project execution: developer experience, project requirements stability, environment, and development tool suitability.
The ISBSG believes that repository projects come from the top 25 percent of software industry companies mainly because the projects are all complete (this alone makes them more successful than many projects). They also come from organizations whose development processes are mature enough to include established software metrics programs. The software projects cover a broad cross-section of the software industry. They generally have a business focus. The ISBSG benchmarking exercise The ISBSG first analyzed the organization’s enhancement projects as a group on their own, and then compared them to similar projects in the repository. The organization can then judge for itself its relative position in the industry.
Benchmarking characterizes performance as a step in the pursuit of process improvement and best practice. To set performance targets, an organization must know its own performance, how it compares to other organizations, and what performance is possible. A benchmarking exercise characterizes an organization’s performance, as reflected in a group of projects, based on some key indicators, such as productivity, quality, or timeliness. One question addressed in benchmarking is “Where are we?” This question seeks to characterize an organization’s performance at a given point in time on the relevant key indicators. The organization uses the established baseline to compare against future performance. A benchmarking exercise might stop there. More often, the benchmark delves further than “Where are we?” by comparing the project to other groups. The comparison group might be the organization itself in earlier times, as the organization attempts to track performance improvement. Or it might be other organizations, as an organization positions itself against its industry sector or against the software industry in general.
The contributing organization’s identity is unknown. So is its organization type (banking, insurance, government, and so on), as this was not recorded when the projects were contributed. This organization seems to be large, with a programming staff of at least 200 (on the basis of the number of concurrent projects, and the estimated average team size for each project). Because the ISBSG does not know how these 60 projects represent the organization’s efforts, it cannot claim to benchmark the entire organization—the ISBSG can analyze only this particular project set. However, 60 projects generated during such a short period is a large number, even for a large organization. So, they might well represent the organization’s efforts as a whole. The data set The organization’s projects had these attributes: ■
■
■
Approximately two-thirds of the projects were transaction and production systems, approximately one-sixth were management information systems, and the rest were a mixture of various systems. Approximately one-third of the reports used third-generation languages, and approximately two-thirds used fourthgeneration languages. Organization type, business area, development platform, and development techniques used were not recorded when the projects were submitted. September/October 2001
IEEE SOFTWARE
27
Mean
0
0
Figure 1. (a) Characterizes PDR for the contributing organization (the vertical lines show the median and mean PDR for the organization); compares (b) PDR and (c) delivery speeds for the contributing and other organizations.
■
■
■
At the time of the exercise, the ISBSG Data Repository contained 208 enhancement projects from other organizations. The projects had these attributes: ■
■
■ ■
■
■
66 were transaction and production systems, 53 were management information systems, and 39 were other application types; the application type was not recorded for 50 projects. 82 projects were implemented with thirdgeneration languages (mainly Cobol and PL/I), and 68 with fourth-generation languages (mainly Natural). Most projects were implemented between 1995 and 1998. Project size ranged from below 50 function points to approximately 3,500 function points. The median was 193 function points, and the mean was 327 function points. Effort ranged from 100 hours to 37,500 hours. The median was approximately 1,400 hours, and the mean was approximately 2,800 hours. Duration ranged from one to 45 months. The median was six months, and the mean was almost eight months.
The organization’s enhancement projects contained a greater proportion of transaction and production systems and fourthgeneration-language use than did the repository as a whole. They were larger and took 28
IEEE SOFTWARE
September/October 2001
0
(b) Project delivery rate (hrs./function point)
Project size ranged from approximately 100 to 2,000 function points, with the median at approximately 250 function points and the mean at approximately 350 function points. Effort ranged from 300 to 40,000 hours. The median was approximately 3,000 hours, and the mean was approximately 6,000 hours. Duration ranged from one to 12 months. The mean and median were five months.
This organization Other organizations
0
0
(a) Project delivery rate (hrs./function point)
Percent of projects
Percent of projects
Percent of projects 0
Median
This organization Other organizations
(c) Speed of delivery (function point/month)
more effort than average but were completed more quickly. Results The contributing organization provided data on project size, effort, and duration but not quality. So, this exercise’s benchmarking indicators were limited to productivity and speed of delivery. Productivity. The ISBSG measures productiv-
ity as the project delivery rate in hours per function point. Low PDR values represent high productivity. The solid line in Figure 1a shows the spread of PDRs for the contributing organization’s 60 projects. In this graph type, the ideal shape is a high, narrow spike near the left edge (a narrow spike indicates predictable productivity, and the left edge indicates high productivity). For the sake of confidentiality, the axes in Figure 1 do not give specific numbers (in case a reader happens to know or can guess the contributing organization’s identity). For this exercise, ISBSG emphasizes relative comparisons rather than specific numbers. The distribution is skewed (this is common in software engineering data). The most common value is fairly small, representing high productivity. The mean and median are somewhat higher, with the median approximately 75 percent of the mean. The graph for the contributing organization has a second small peak, representing a few projects with high project delivery rates (indicating lower productivity). Measurable project attributes explain only one-half the variation in productivity.6 The rest must be explained by other factors, such as project quality, team skills, and experience. The ISBSG asks for descriptions of project-specific factors that affect project performance. If we know that information, we can more clearly interpret the benchmark results.
Table 1 Project delivery rate and delivery speed in some industries (enhancement projects only) Figure 1b also shows the results of comparing PDRs for the 60 projects with the 208 projects from other organizations. At this organization’s best, its productivity matches the best of the software industry— the best 10 percent of this organization’s projects are as good as the best 10 percent of the comparison projects. But overall, the contributing organization’s productivity in enhancement projects appears to be generally poorer than that for other organizations. The peak in both lines in Figure 1b occurs at about the same number of hours per function point. Thus, the most common value for PDR is about the same for both the contributing organization and other organizations. However, the peak is higher for other organizations, meaning that the “most common value” is more prevalent in other organizations than in this one. A higher percentage of other organizations’ projects have low PDRs. The contributing organization has fewer projects with a low PDR and more projects with a higher PDR. For other organizations, the median and mean PDR are both lower than for the contributing organization. One explanation for this organization’s poorer productivity is that its development teams are larger than average. Productivity is much better for small teams than for large teams.1,7 We estimate average team size as total hours of effort for the project divided by the project duration in calendar months, divided by the number of hours in a personmonth (we assume 152 hours per personmonth, the default value in the Cocomo estimation model). For the contributing organization, it is nearly twice the repository average. This might not be a fair comparison. Productivity also varies in different industry sectors.1,6 But because we don’t know this organization’s industry sector, we cannot select projects from only its own sector for comparison. Its projects are therefore compared with projects from a range of industry sectors. This organization’s overall productivity might even be relatively good, once the effects of team size and industry sector are taken into account. This cannot be analyzed with the data available, but the organization can consult Table 1 and perhaps draw its own conclusions. Delivery speed. For many organizations,
Organization type
Number of projects
Median PDR (hours per function point)
Median speed (function points per month)
Banking Communication Finance/property/ business services Insurance Manufacturing Government
21 21 34
12.5 5.2 7.0
32 55 32
55 17 25
12.2 5.9 5.2
27 20 46
time to market is at least as important as productivity. The ISBSG analyzes time to market by calculating delivery speed, measured as function points delivered per elapsed month. A high delivery speed indicates that project time is minimized. We noted earlier that this organization’s enhancement projects are larger than the average project’s in the ISBSG Data Repository, yet they are delivered faster than average. That suggests a greater-than-average delivery speed, which is indeed what we see in Figure 1c. Figure 1c shows the spread of values for delivery speed for the contributing organization and for other organizations. (In this graph it is better to be closer to the right edge because high delivery speeds are desirable.) In Figure 1c, the line for this organization is clearly to the right of the other organizations’ line. Productivity and team size influence delivery speed. As we noted earlier, team sizes in this organization are larger than the repository average. The greater delivery speed reflects this. In this organization, the largest projects were delivered with the greatest speed. Detailed comparisons.Analysis shows that the projects should be analyzed in three subsets: Size. We used project size to split off a small subset of projects. These projects were much larger than the rest and were developed using different languages and tools. We characterized these projects for the organization and individually compared them to the others, but this group was not large enough to support discussion of averages. Applications. The remaining projects were split into two subsets by application type: September/October 2001
IEEE SOFTWARE
29
Commercial Benchmarking Organizations Several commercial organizations provide benchmarking products and services based on their own project data databases. For example, Quantitative Software Management (www.sqm.com) and Software Productivity Research (www.spr.com) each have databases of over 5,000 projects; in Europe, Software Technology Transfer Finland (www.sttf.fi) has a database of 500 projects. The organization’s consultants (or their software tools) select comparison projects for a benchmarking exercise from their databases. The benchmarked organization does not know the comparison projects’ details because they are drawn from a proprietary database.
transaction and production systems, and other systems. These two subsets did not differ in speed of delivery, but they did differ in productivity. For transaction and production systems, this organization’s productivity was somewhat worse than the repository average. For other types of systems, this organization’s productivity was somewhat better than the repository average. Once again, these comparisons might be unfair, depending on this organization’s industry sector. The only other piece of data that we could have used to split the projects into subsets was programming language. We discovered that there was no value in using programming language, because it produced subsets that were no different from each other in terms of productivity or delivery speed. Value to the organization The ISBSG provides a project benchmark report for each submitted project. This report compares the project to similar projects in the repository with regard to implementation aspects such as language, platform, development methodology, and team size. Reports of PDR ranges and quartiles determine project positioning. By looking at the organization’s collection of projects, we can report distributions, identify subgroups within the project group, and analyze the subgroups separately. These are analyses that the organization could do itself, but organizations often do not have the staff or the time available to do this. We can identify strong and weak performance areas when we compare separate subgroups. These represent areas of strength on which to build and challenges to address for the organization. The amount of data available determines what can be done. The contributing organization provided data about several projects but not many details about each one. We 30
IEEE SOFTWARE
September/October 2001
were able to identify groups of similar projects and to determine their areas of comparative strength and weakness. The ISBSG’s value differs from that of a benchmarking consultant (see the “Commercial Benchmarking Organizations” sidebar). A consultant engaged by an organization would obviously know details about that organization. The consultant would be in a position to judge whether the organization’s projects were representative and could use the benchmarking outcomes to make specific recommendations. But the organization would be in no position to judge the representativeness of the comparison data set. Consultants generally have their own proprietary databases, whose contents are completely unknown to the organization being benchmarked. The ISBSG cannot be as specific as a consultant, but its data is open and inexpensive. The ISBSG’s exercise is suitable as a lowcost initial determination of an organization’s industry position and its comparative strengths and weaknesses. This is not the limit of the ISBSG’s capabilities. On this occasion, the contributing organization did not ask specific benchmarking questions. The ISBSG therefore performed a broad analysis of two of the most interesting project indicators (productivity and speed), based on the limited data available. Normally, the organization would ask specific questions and contribute more data. The ISBSG could then provide a more definite and specific analysis of the organization’s capability, although the organization still must interpret the results. Alternatively, the organization can obtain its own copy of the ISBSG Data Repository by buying the ISBSG Data Disk (see the related sidebar) with which it can perform further analysis and comparisons. Lessons learned The ISBSG exercise and results showed new insight into the benchmarking process— that benchmarking is useful even when using anonymous and limited data. Anonymous data Because the contributing organization is anonymous, we do not know how representative the collection of 60 projects is of the organization as a whole. It might be a small fraction, or it might be the organization’s
The ISBSG’s Data Disk
whole efforts for this period. In an exercise such as this, where the ISBSG performed the benchmarking, this problem cannot be avoided. If the organization does its own analyses, this is not an issue, although the organization must perform its own legwork to judge how representative its projects were. Because the other comparison projects are also anonymous, the representativeness of the comparison data set should be closely analyzed. The ISBSG does not claim that the repository represents the whole industry; rather, it believes that the repository only represents the best software companies. Without specific knowledge of the organization, the ISBSG cannot give specific recommendations for process improvement or benchmarking targets. This is inherent in the use of anonymous data. Anonymity for its own sake has no direct benefits; its advantage is indirect—without anonymity, an open data repository could not exist, and we would not have an open source of comparison projects for benchmarking purposes. Range of projects The ISBSG Data Repository’s broad project range means that appropriate comparison projects are available for most organizations. The large number of organizations represented in the repository means that we can make comparisons against the industry as a whole rather than just isolated sectors. (This is true of any large multiorganization data set.) The disadvantage to anonymous submission is that unless project attributes are known—thereby permitting detailed filtering of comparison projects—the comparison set might contain inappropriate projects. The broader the comparison set, the more caution is needed when interpreting the benchmark results. In the ISBSG exercise, few project attributes were known for the 60 projects being analyzed. Because of this, only broad summaries could be given, along with some specific data points against which the organization can compare itself. What data is critical? The most important attributes that we need to know about a project are those that have an impact on the interest indicators
The 2001 version of the Data Disk provides data on 1,238 software projects, presented in a Microsoft Excel spreadsheet. It is a simple task for an organization to benchmark its software development operation performance against organizations with a similar profile. For example, a banking organization might select all projects with an Organization Type entry of “Banking,” a Development Platform entry of “Mainframe,” and a Language Type entry of “4GL.” The search results could then be analyzed to provide low, median, and high groupings for productivity and speed of delivery. The analyzing organization could then compare its performance against the findings and set targets for its future performance based on the “best practice” displayed in the analysis results. Programming language, development type, application type, and several other project characteristics can also be used for comparison.
(for example, factors known to influence productivity or speed of delivery). Knowing these attributes helps when choosing similar comparison projects, and it helps to explain differences between project performances. Attributes. The ISBSG has found that the main factors that influence productivity are programming language, team size, type of organization, and application type.1 At the very least, these factors should be known in a productivity benchmark. We did not know the organization’s team size or type, although we might be able to estimate team size from project effort and duration. Programming language did not matter in this instance, but the others all did. Organization type. For an organizational benchmark, knowledge of the organization type is very important. Average PDR varies from one industry sector to another.1,6 The absence of organization type data meant that the ISBSG could only compare this organization’s projects against the entire set of repository enhancement projects and provide averages for different industry sectors. If we knew the organization type, we could have identified a more focused set of comparison projects. As with productivity, delivery speed varies considerably by industry sector, as Table 1 shows. Size and effort. We can benchmark a project
against others as long as we know its productivity—even if we don’t know the size and effort from which the productivity was calculated. But it is important to know the size and effort as well, and not just the PDR value derived from them. Comparison projSeptember/October 2001
IEEE SOFTWARE
31
ects should not be too dissimilar in attributes such as project size, effort, team size, and productivity.7 Knowing the size and effort means that we can exclude projects from the comparison set that are very different in scale. For the same reason, it is important to know a project’s effort and duration, and not just the delivery speed that is derived from them, for benchmarks of delivery speed. Knowing the project’s type of development (new development or enhancement project) is useful for two reasons. First, there is a weak impact on productivity (in mainframe projects in the ISBSG repository, delivery rates are slightly better for enhancement projects than for new developments.)1 More important, enhancement projects tend to be smaller than new development projects. Limiting the comparison set to enhancement projects, as the ISBSG did in this exercise, helped filter out projects of a different scale.
A
s the software industry continues to mature and becomes more competitive, and as organizations seek to negotiate and manage outsourced software development, benchmarking will play an increasingly important role. Benchmarking provides the information required to set targets for future performance through process improvement. This results in improved competitiveness. Benchmarking also provides the information required to set and then assess the performance of a supplier of software development or support services. Older engineering disciplines have been benchmarking for decades or centuries; software engineering is now in a position to join their ranks.
Acknowledgments We thank the reviewers for their insightful comments and suggestions.
About the Authors Chris Lokan is a senior lecturer at the University of New South Wales (Australian De-
fense Force Academy campus) in Canberra. His teaching and research focus on software engineering and software metrics. He is the principal data analyst for the ISBSG and is a member of the Australian Software Metrics Association, IEEE Computer Society, and ACM. Contact him at ADFA, Northcott Dr., Canberra, ACT 2600, Australia;
[email protected].
Terry Wright is the founder and president of the ISBSG (www.isbsg.org.au), as well as
founder and executive member of the Australian Software Metrics Association (www.asma. org.au). He works for Multimedia Victoria. In addition to his software metrics activities, he is the Director of Government Online. Contact him at 10/55 Collins St., Melbourne, Victoria 3000, Australia;
[email protected].
Peter Hill is the executive officer of the ISBSG and an Advisory Committee member of Software Engineering Australia (Vic). He was previously chairman and secretary of the Victoria branch of the Australian Computer Society and a past director of Software Engineering Australia (Vic). He is a member of the Australian Institute of Company Directors, the Australian Computer Society, and the Australian Software Metrics Association. Contact him at PO Box 127, Warrandyte, Victoria 3133, Australia;
[email protected].
Michael Stringer is principal consultant with Sage Technology, Australia, specializing in system requirements specification, project management, and software process improvement. He is also involved in the preparation of international standards for software measurement. Contact him at 4 Sadie St., Mount Waverley, Victoria 3149, Australia; m.stringer@sagecomp. com.au.
References 1. The Benchmark–Release 6, Int’l Software Benchmarking Standards Group, Warrandyte, Australia, 2000. 2. C.J. Lokan, “An Empirical Study of the Correlations between Function Point Elements,” Proc. 6th Int’l Symp. Software Metrics, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 200–206. 3. C.J. Lokan, “An Empirical Analysis of Function Point Adjustment Factors,” Information and Software Technology, vol. 42, no. 1, June 2000, pp. 649–659. 4. S. Oligny, P. Bourque, and A. Abran, “An Empirical Assessment of Project Duration Models in Software Engineering,” Proc. 8th European Software Control and Metrics Conf. (ESCOM 97), 1997. 5. R. Jeffery, M. Ruhe, and I. Wieczorek, “A Comparative Study of Two Software Development Cost Modeling Techniques Using Multi-organizational and CompanySpecific Data,” Information and Software Technology, vol. 42, no. 14, Nov. 2000, pp. 1009–1016. 6. L. Briand, K. El Emam, and I. Wieczorek, A Case Study in Productivity Benchmarking: Methods and Lessons Learned, tech. report 98-08, Int’l Software Eng. Research Network, Kaiserslautern, Germany, 1998; www.iese.fhg.de/network/ISERN/pub/technical_reports/ isern-98-08.pdf (current 30 July 2001). 7. K.D. Maxwell and P. Forselius, “Benchmarking Software Development Productivity,” IEEE Software, vol. 17, no. 1, Jan/Feb. 2000, pp. 80–88.
For more information on this or any other computing topic, please visit our digital library at http://computer.org/publications/dlib.
32
IEEE SOFTWARE
September/October 2001
focus
benchmarking
What I Did Last Summer: A Software Development Benchmarking Case Study James T. Heires, Rockwell Collins
ockwell Collins, a world leader in the design, production, and support of communications and aviation electronics solutions for commercial and government customers, recently funded a benchmarking study. The study aimed to improve the ability of Rockwell Collins’ applications development department to deliver software solutions to its internal customers. The deliverables from this study included
R This article describes a vendor-supported benchmarking study of an applications development department. The study established a quantitative performance baseline of the organization and compared it to industry trends. 0740-7459/01/$10.00 © 2001 IEEE
current performance, recommendations for improvement, and a database containing the department’s historical project performance. The study concluded that the department as a whole produced applications cost effectively but more slowly than most other organizations in its industry. It further concluded that single-developer Web-based projects performed differently, in several ways, than the larger team-oriented non– Web-based projects. Using root cause analysis, the study identified several factors associated with software development performance. Based on the results, the consultants— Quantitative Software Management (QSM) Associates in Pittsfield, Massachusetts— recommended enhancing the existing project management processes with requirements management, project management, staff training, and a metrics program. The department planned to implement the recommendations using the practices described
in the Software Engineering Institute’s Capability Maturity Model for Software (SWCMM).1 The department’s smaller projects were characterized by motivated but less experienced developers, short schedules, highly volatile requirements and staffing, high complexity, and few standards and tools. The large projects, on the other hand, were just the opposite. What wasn’t understood was the magnitude of the project performance characteristics and what effect they had on development productivity. The company Rockwell Collins employs 16,000 people worldwide—the applications development department has about 100 people within a 500employee IT department (see Figure 1). The applications development department produces traditional and cutting-edge business systems to support every facet of the comSeptember/October 2001
IEEE SOFTWARE
33
The study discovered that the majority of the department’s software development effort is expended on very large projects.
Rockwell Collins
IT department
Applications development department
Figure 1. Organizational context.
Very small project < 125 staff hours 2% Small project < 375 staff hours 3%
Medium project < 750 staff hours 3% Large project < 1,500 staff hours 7%
Very large project > 1,500 staff hours 85%
Figure 2. Projects by effort.
pany’s products and services, including manufacturing, accounting, marketing, and human resources. The study discovered that the majority (85 percent) of the department’s software development effort is expended on very large projects—those requiring more than 1,500 staff hours of effort each (see Figure 2). The applications development department employs a broad variety of technologies that bring solutions to its customers. Platforms vary greatly from HP and IBM mainframes to client-server to stand-alone PCs to the Web. Of the 28 projects involved in this benchmarking study, 28 different programming languages were employed in various combinations. Some of the more popular languages include Delphi, Visual Page, HTML, Visual Basic, Lotus Notes, Access, Cobol, Pascal, Perl, Power Builder, VB Script, Crystal Reports, Oracle, ABAP, and SQL. 34
IEEE SOFTWARE
September/October 2001
Tools and development environments were equally varied. A declining mainframe development environment was making way for a newer, but less mature, Web infrastructure. The IBM mainframe environment included such rudimentary features as production configuration control, debugging, build control, and program trace, but was managed in a consistent, centralized fashion. Although developers complained about the weak tool set, the consistent process provided a much-needed development standard and discipline. Web development tools, on the other hand, were more modern but varied by programming language. Most development languages came with their own compiler, debugger, database, and testing tools. There were few configuration management, quality assurance, or design tools available across these environments. The study was undertaken to help quantify the value of the department’s software process improvement initiative—choosing the SW-CMM as its improvement model because Rockwell Collins had been successfully using it in other business units. The company planned two benchmarking studies (one early and one late) to quantify the improvements. Although it is usually difficult to connect cause and effect, this dual measurement approach was selected to help quell the expected accusations that the initiative had no impact on the bottom line. The department had recently been assessed at CMM Level 1 and had just put in place a core team of process improvement specialists to help carry out the initiative. We used QSM Associates to assist with this benchmarking study for several reasons—time being one reason. Because the project schedule was aggressive, there wasn’t time to use internal resources. In addition, using a professional benchmarking firm lent more credibility to the study results. QSM Associates had carried out dozens of similar studies over the past 20 years and maintains a large database of historical performance statistics from various industries. Benchmark methodology overview Planning for the study began with the consultants and department management establishing boundary conditions, including cost, schedule, deliverables, roles, and responsibilities. These conditions helped provide some management controls.
Figure 3. Detailed schedule of benchmarking tasks.
The study’s cost consisted of consulting fees and employee time needed to carry out the study. The benchmarking was part of a larger project and was represented by a series of tasks in the project schedule (see Figure 3). The deliverables the consultants provided included a kick-off presentation, a set of findings and recommendations, a final presentation, and a database of our own historical project data. In addition, the consultants demonstrated how to leverage the database to better estimate new projects. They also established roles and responsibilities to minimize the study’s duration and management oversight required. The project manager planned the study, selected and negotiated with the consultants, identified target projects and their key personnel, ensured the study performed to plan, and provided logistics support. The consultants provided preparatory materials, manpower to carry out the data collection interviews, data validation and statistical analysis of the data, and kick-off and final presentations. Key project team members from participating projects gathered project data, attended data collection interviews, and supported post-interview data validation activities.2 The department generated a list of approximately 75 recently completed projects to be included as part of the benchmarking study, then narrowed that list down to 30 to 50 projects. To perform a reasonable quantitative analysis, statisticians recommend a sample size of at least 30 data points. Trends are easier to determine and outlying data points have a reduced effect on the analysis results. The consultants knew in advance that some candidate projects would not be on the final list. They disqualified projects for several reasons, including incomplete or unobtainable core metrics, unavailable knowledgeable project team members, and prematurely cancelled projects. Because the projects with complete data
were more likely to be chosen, they were also more likely to exhibit more discipline and therefore higher productivity. However, it was also likely that other organizations represented in the reference database had a similarly biased distribution. Who, after all, would submit mediocre project data to be included in an industry-wide database? Thus, the comparison between the department’s performance and that represented by the industry reference database is appropriate. The consultants provided preparation materials—data collection forms, metrics definitions, and guidelines. Next, the benchmark project manager asked the other PMs and developers to gather preliminary data on their projects. The requested data included the Software Engineering Institute’s core measures of size, time, effort, and defects.3 Although several PMs expressed concern about participating in the study, an early decision to explicitly fund the study helped convince some project teams to participate. Other project teams exhibited stereotypical shyness about having their project performance measured. In my experience, this is quite common in lower-maturity organizations that have less experience with software project measurement or less understanding of its benefits. The on-site benchmarking study began with a brief kick-off presentation given by the consultants to invigorate management and set the participants’ expectations for the remainder of the study. The consultants explained the process as well as the deliverables and their value.
Using a professional benchmarking firm lent more credibility to the study results.
Data collection Immediately following the kick-off meeting, three solid days of data collection began. A 90-minute time slot was allocated to each project, giving each PM a chance to tell his or her story. The PM and one or two key developers from the project brought their prepared project data to the data collection September/October 2001
IEEE SOFTWARE
35
of data to enter into the database. In reality, however, this was a painful and slow process, requiring several passes. More than half of the projects on the initial list had to be dropped from the study because of incomplete or inaccurate data. One issue that plagued data collection on several projects was PM staff turnover. Some projects ratcheted through two or three PMs during their six-month life cycle. The lack of a data collection infrastructure was also a cause of “project amnesia,” whereby the entire project legacy left with the departing PM.
Number of full-time-equivalent workers
Phases: Functional design Main build
1
2
3
4
5
6
7
8
9
10
11
12
Months
Figure 4. Staffing profile by phase.
interview. A consultant and a process improvement specialist also attended. For each project, data collection began with SEI core measures. These quantitative measures included the project’s size, time, effort, and defects. Additional qualitative and demographic data were also collected— application complexity, team skills, development environment, and technology used. For a Level 1 organization like us, this data was difficult to gather. Some projects did not establish a separate time-charging mechanism to allow measuring effort separately for each project. Other projects did not know when the project began because of numerous false starts. To facilitate gathering this critical data, a simple staffing profile was produced early in the interview (see Figure 4). Drawing a monthly (or weekly) timeline showing the number of full-time-equivalent people working on the project helped the team remember what happened. Different colors indicated each phase, illustrating any phase overlap. This simple chart helped us determine phase-specific effort and duration information. In a perfect world, each PM would have arrived at the interview with a complete set 36
IEEE SOFTWARE
September/October 2001
The QSM data structure QSM Incorporated, a sister organization to QSM Associates, has a well-established data structure and associated analysis methodology that is employed at benchmarking engagements. The data structure originated with an empirical study carried out by QSM researcher and chief scientist Larry Putnam.4 Putnam’s study established a software equation that defined a metric called the Productivity Index in terms of size, time, and effort. This macro-level metric is primarily used to empirically determine a project’s, or an organization’s, relative development efficiency. The QSM data structure includes the SEI’s recommended core measures as well as post-delivery defects, planned duration, planned effort, the number of requirements, the languages used, and demographic data (for example, project team members, application type, and tools used). The consultants needed to carry out some simple transformations in order to use the QSM data structure. Size was the most difficult conversion. The two most popular size measures—source lines of code (SLOC) and function points—can be used directly in any combination in the QSM data structure. Because we did not have trained individuals in function point counting, we relied on physical measures such as modules, programs, and SLOC as our fundamental size measures. Line-based languages (for example, Pascal, ABAP, and HTML) could be counted directly, while other languages were converted to SLOC. Languages with means of expression other than lines of code (for example, Visual Basic, IMG, Crystal Reports, and Lotus Notes) were converted to SLOC. Several QSM-provided gearing factors en-
abled conversion from these native measures to SLOC. Gearing factors are expressed as a range of values. For example, one frequently used language is Crystal Reports. These reports are made up of fields, which, on average, tend to be between 2 and 10 SLOC each, depending on complexity. Although QSM derived its gearing factors empirically through benchmarking studies such as this one, converting size units reduces the size measure’s precision. Using function points was an option for this study, and we used them on a few projects as data validation. However, function points still must be converted to SLOC in order to use QSM’s data model. Furthermore, it would have been difficult to use function points in this stage of our maturity because of our lack of expertise in that area. In retrospect, SLOC was the best unit of size for this situation. Another adjustment required to conform to QSM’s data structure involved the development phase definitions. Data is requested by phase of development: feasibility study, functional design, main build (detailed design through delivery), and maintenance. If a project used life-cycle models, such as the rapid-prototyping or evolutionary paradigms, the data had to be reconfigured to match the QSM phases. This was not, however, a difficult task. It consisted of combining effort and schedule information from the project data into the development phases defined by QSM. Although the SEI and other organizations have tried to foster standard measurement definitions, such definitions must be better defined and more widely used before the industry truly benefits. Core measures need standard definitions and empirical studies to help establish their use in business. One problematic example is the inadequate definition of SLOC. The Software Productivity Consortium has one of the only documented definitions.5 Unfortunately, the SPC definition is quite dated and does not adequately define common relationships such as the number of changed lines of code. With standards in place, tools would evolve to support the standard. All project data found its way into QSM’s data collection and analysis tool, Slim-Metrics.6 This database was instrumental in ensuring that project data was
collected in a consistent fashion, and it greatly facilitated the analysis portion of the benchmarking study. The most important step in the process is validating the collected data for correctness and consistency. Without valid data, the analysis is of little value. The consultants carried out this step, but the project team members supported it. As the data was collected, and when analysis began, issues about the data’s validity arose. As these issues were clarified, confidence in the data’s validity improved. The validation step was completed only after the data was scrutinized and cross-checked. The consultants then had a good, quantitative understanding of what happened on the project. Data analysis The benchmarking study gathered and validated data from 28 projects. The data was stratified in multiple dimensions to uncover strengths and weaknesses. Root cause analysis was the basic method used, but various statistical, trend, and demographic analyses were also part of this activity. Root cause analysis relies heavily on the experience of the analyst and attempts to find causes that explain the organization’s behavior. Positive and negative causes were sought to deliver a balanced recommendation. Various analytical tools were used to find the root causes, including stratification, scatter plots, Pareto diagrams, and histograms. The analysis becomes the department’s legacy, so it should be representative. It explains the quality, efficiency, size, and cost of the organization’s software development projects. The analysis is the basis of a deliverable—consisting of an annotated presentation and a database of project data—and shared with the organization. The database is later used to support further analysis, project estimates, and tracking activities.
The benchmarking study gathered and validated data from 28 projects. The data was stratified in multiple dimensions to uncover strengths and weaknesses.
The findings The response to the benchmarking study was positive and immediate. Developers and senior management alike resonated with the study’s implications. Everyone involved felt the study accurately characterized the software department’s development capability. Although the study didn’t reveal any surprises, it did quantify our behavior. The applications development department expends September/October 2001
IEEE SOFTWARE
37
The most immediate benefit received was the establishment of a performance baseline for the company from which it could measure improvements.
much less effort and staffing than the industry average on their projects, compared to like-sized projects in QSM’s IT database. Consequently, the project duration is generally longer than average. The Productivity Index of the 28 projects followed the industry trend line but had much more volatility. We decided to stratify the data to identify the performance differences of various project types. Stratification proved to be the most revealing part of the analysis. When completed, two significant factors emerged: team size and Web technology. Compared to non–Webbased, team development projects, Webbased single-developer projects exhibited the following positive and negative differences. The positives were ■ ■ ■ ■ ■
10 percent more efficiency (as measured by the Productivity Index), 64 percent lower cost, 45 percent shorter duration, 47 percent higher field reliability, and 77 percent smaller team size.
The negatives were ■ ■ ■ ■ ■ ■
7.5 percentage points more scope growth, 19 percent more staff turnover, 65 percent less experience on the application being developed, 53 percent more logic and 28 percent greater logic complexity, 11 percent more customer issues, and 24 percent lower development tool capability.
S
ome of the benefits of benchmarking your own organization are enhanced software estimates, better project tracking, and proof of the value of improvement initiatives. The vast majority of software estimation models were developed with the use of empirical data collected from hundreds or thousands of actual projects. However, all estimation models should be calibrated to an organization’s own software development behavior before being used to forecast a new project. The difference between a calibrated and an uncalibrated estimation tool is im-
38
IEEE SOFTWARE
September/October 2001
mense. A properly calibrated estimation model generates more accurate results than an uncalibrated one.7 That being said, calibration is best carried out only after a thorough quantitative study. Using a calibrated estimation model serves as a sanity check against the estimate. The historical data helps answer the all-important question, “compared to what?” Forty-eight percent of the projects in the study used various Web-based technologies. If an estimate of a new Web site were needed, it might make sense to use an estimation model calibrated with these projects and to compare the results with the historical project data. Benchmarking data can also enhance tracking and oversight of in-flight projects. If, for example, your project is entering the testing phase but the original deadline has already elapsed and the customer is asking for an estimated delivery date, benchmarking data can help. Using a defect discovery rate for your project and a calibrated forecasting model, you can make this task much easier. Perhaps more importantly, you can avoid the embarrassment of delivering a defect-laden product too early. Evidence of the benefits of improvement can often be difficult to demonstrate. Many times, however, historical data and a few statistical techniques can prove that value has been delivered as a result of an improvement initiative.8 Although this initiative is not yet complete, the existence of this benchmarking study will make the postimprovement analysis easier. The department benefited from this study in many ways. Because the company was just starting its process improvement journey, the most immediate benefit received was the establishment of a performance baseline for the company from which it could measure improvements. Another benefit was the ability to compare our performance with industry trends. This comparison let us determine how competetive we are in our industry. In addition, the final analysis listed recommendations that included processes to improve the practices of requirements management, project planning, staff training, and metrics collection. These recommendations were combined with the goals of the SW-CMM initiative to constitute an action plan for improvement. The action plan was
a critical step for the department, which had never before based an improvement plan on actual project performance data. Next summer, the company plans to repeat the study and compare the two data sets. This comparison will clearly show how much improvement the company realized over the duration.
About the Author James T. Heires is the principal project manager at Rockwell Collins. His professional experiences include design of electronic flight instrumentation systems, engine indicator and crew alerting systems, flight management systems, and consumer electronics. He is working to improve the state of the practice of project management through parametric cost estimation and quantitative tracking techniques. Contact him at
[email protected].
References 1. M.C. Paulk, C.V. Weber, and B. Curtis, The Capability Maturity Model: Guidelines for Improving the Software Process, SEI Series in Software Engineering, AddisonWesley, Reading, Mass., 1995. 2. IT Organization, Benchmark Thyself, Cutter Consortium, Arlington, Mass., 2000, pp. 29–31 and 67–71. 3. A. Carleton et al., Software Measurement for DOD Systems: Recommendations for Initial Core Measures, tech. report CMU/SEI-92-TR-019, ADA 258 305, Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh, 1992. 4. L.H. Putnam, “A General Empirical Solution to the Macro Software Sizing and Estimating Problem,” IEEE Trans. Software Eng., vol. SE-4, no. 4, 1978, pp. 345–361. 5. R. Cruickshank and J. Gaffney, Code Counting Rules and Category Definitions/Relationships, Version
02.00.04, Software Productivity Consortium, Herndon, Va., Apr. 1991. 6. J.T. Heires, “Measuring QSM’s Repository and Analysis Tool,” Application Development Trends Magazine, vol. 5, no. 6, 1998; www.ADTmag.com/Pub/jun98/pr601-2. htm (current 15 Aug. 2001). 7. B. Boehm et al., Software Cost Estimation with COCOMO II, Prentice Hall, Upper Saddle River, N.J., 2000, pp. 150–151. 8. J.T. Heires, “The High Technology Historian: Historical Data Analysis,” IT Metrics Strategies, vol. VI, no. 8, Aug. 2000.
For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.
ICLES: T R A OR CALL F
ry Indust nce e Experi s Report
The com learn from munity of leading m practition e lessons le any sources. Exp erience re rs can arned in ports with in successe d u s tr y are o s o similar sit r failures with oth ne way to share uations. ers who likely face Experienc e reports o to report ff e r a u on tho in their co a technology or pro rs the opportunity mpany, a cess they n aly introduce and explo d re what th ze the impact of their effort could do ey would s, it again. do differe ntly if th IEEE Soft ey cluding d ware seeks origin al articles evelopme nt techniq on topics training, a ues innd 2,400 wo management. Art , processes, testin icles shou rds with g, ld ea counting as 200 wo ch illustration, g be 2,000 to ra rds. We a ph, or ta submit up lso e ble to and refere 10 short bullet po ncourage authors ints on le to nces to W ss eb sites fo the topic. r further in ons learned fo rmation o Submiss n magazine ions are reviewed ’s Industr b y m e m y Advisory bers of editing fo Boa the rs guidelines tyle, clarity, and sp rd and are subject ace. For d , visit our to etailed au Web site a genres.htm thor t compute o r c o n r. ta o content-re rg/softwa ct softwa re/ la re chief Wolf ted questions, con @computer.org. F tact asso gang B. S ciate edit or trigel at s or in trigel@sp c.ca. SUBMISSIONS ARE ACCEPTED AT ANY TIME.
September/October 2001
IEEE SOFTWARE
39
focus
benchmarking
The Benchmarking Process: One Team’s Experience Sam Fogle, Carol Loulis, and Bill Neuendorf, Software Productivity Consortium
ecently, the Software Productivity Consortium conducted a benchmarking study of process asset libraries. The report provided a significant set of ideas for organizations seeking to create new PALs or improve existing ones. Even the participants with the most “best practices” found gems among other participants’ best practices for use in their continuous improvement efforts. This article discusses why we chose a benchmarking approach, the process we used in conducting
R
this study, and key lessons we learned for future studies.
This article argues that a formal benchmarking study will yield the best practices of measurable value for your organization. 40
IEEE SOFTWARE
Why a benchmarking approach The Consortium conducts research; develops processes, methods, and tools; and performs a variety of consulting services for about 60 supporting member companies. Our members include companies from aerospace, defense, finance, commercial services, government contracting, telecommunications, management consulting, commercial software, and manufacturing. In the past, several member companies asked the Consortium to provide specific guidance on how to establish or improve a process asset library. A process asset library is a repository containing process documents, tailoring guidelines, templates, tools, examples, and other useful process information for a software or systems development organization. The use of a PAL is a required practice for
September/October 2001
achieving Level 3 of the CMM and the CMMI.1,2 A well-designed, well-maintained PAL facilitates standardization and process improvement in organizations and is a key enabler for achieving higher capability maturity. But a poorly designed and maintained PAL degenerates into an attic of process clutter that discourages and frustrates users. The Consortium thus set out to identify and document what makes a PAL a vital, well-used source of guidance to an organization. When we began our research on PALs, we found few published materials in the literature. So we considered other approaches. Ideally, we wanted to interview people in a handful of companies with excellent PALs and consolidate their good ideas. However, our dilemma was this: How do we know which companies have the best PALs? Do we rely on word of mouth? Ask for volunteers? Assume that a high CMM maturity 0740-7459/01/$10.00 © 2001 IEEE
rating was a guarantee of a great PAL? This dilemma led us to consider a formal benchmarking study. If conducted properly, we believed a formal benchmarking study would ensure that we got advice from the groups with the best process asset libraries and yield best practices of measurable value. Because we conducted this benchmarking study on behalf of the entire Consortium membership, we did not use the practices of a specific company as the baseline for comparison. Instead, we selected a generic benchmarking framework as the baseline. Benchmarking process The process we used to conduct the benchmarking study was largely based on the Consortium’s training course An Introduction to Process Benchmarking.3 We included six phases of a benchmarking project: project initiation, planning, benchmarking partner identification, data collection, data analysis, and reporting. While this article discusses all six phases, it emphasizes identification of benchmarking partners, data collection, and data analysis. Project initiation phase This phase involves identifying the benchmarking study’s team leader, sponsors, and initial stakeholders, and approving the study’s business requirements, objectives, and scope. In the PAL study, our member companies had the business requirement of successfully establishing organizational processes and achieving higher levels of process maturity against software and systems process capability maturity models. The benchmarking objectives included exploring how PALs are organized, what type of content they include, how they are implemented, and what processes are associated with managing and controlling them. The study’s desired scope was to include benchmarking partners from inside and outside of the Consortium membership. The scope also placed constraints on the study’s budget and schedule. Planning phase The next phase involves selecting and training the benchmarking team members, identifying or establishing a benchmarking framework, developing necessary infrastructure for conducting the study, and de-
vising a detailed project plan and having it reviewed and approved by the sponsor and key stakeholders before moving forward. Benchmarking team. The key to forming a good benchmarking team is selecting individuals with the appropriate skills, not just the nearest warm body. The team as a whole must possess solid knowledge of the subject domain, understanding of the benchmarking process, ability to organize and analyze data, and ability to listen, question, and comprehend. Team members must be respected in the organization and have the willingness and time to commit to the study. Unless the organization is well experienced in benchmarking, team training in the benchmarking approach they’ll use is a good idea. This PAL study’s benchmarking team consisted of four Consortium employees whose time commitment ranged from 10 to 50 percent. The team’s combined skills included knowledge of the PAL subject area, benchmarking, project management, technical research, and data organization and analysis. We did not formally train the entire team in the benchmarking approach, although informal just-in-time training occurred as the study progressed.
The team must possess solid knowledge of the subject domain, understanding of the benchmarking process, ability to organize and analyze data, and ability to listen, question, and comprehend.
Benchmarking framework. A benchmarking framework establishes the topic areas that the benchmarking partners will study and compare. As the benchmarking study progresses, the benchmarking team will add details for comparison to the framework’s specific topic areas. For the PAL study, the topic areas in the benchmarking framework included ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■
key characteristics of the organization, human resources to support a PAL, purposes for which the organization uses its PAL, structure of the PAL, types of content in the PAL, PAL owners and users, user interface with the PAL, configuration management practices, security practices, measurement practices, tools and hardware to support a PAL, training and communication practices, and enablers for a successful PAL. September/October 2001
IEEE SOFTWARE
41
Identifying candidate partners is comparable to shoveling as much raw material as possible onto a sieve, then shaking the sieve until a reasonable number of potential gems remain.
Infrastructure. Strong organization is vital to a benchmarking project. Very quickly, the benchmarking team must properly categorize a tremendous volume of data and store it for future retrieval and analysis. An infrastructure should be established to accommodate the data before the need arises. In the PAL study, the infrastructure included a database for all contact information, status, activity logs, and benchmarking data, and file structures for keeping track of all data artifacts, both hard and softcopy, received from participants in the study. Detailed project plan. Similar to any wellmanaged project, a thorough project plan is essential. It should include the project’s scope and requirements; the process to be used; the benchmarking framework’s topic areas; estimates for cost, schedule, and labor; staffing plans; and key milestones, assumptions, dependencies, constraints, and risks. Benchmarking partner identification phase Here, we identify candidate partners, develop screening instruments, request that candidate partners complete the screening instruments, evaluate data from the completed screening instruments, and select the benchmarking partners. Identifying candidate partners is comparable to shoveling as much raw material as possible onto a sieve, then shaking the sieve until a reasonable number of potential gems remain. Candidate partner identification. Identifying candidate partners requires extensive networking and research. Some resources to consider include organizations, such as the American Productivity and Quality Center (www.apqc.org), that maintain clearinghouses of benchmarking organizations. Electronic information databases are other excellent sources for conducting research. You should establish criteria for what you desire in a candidate partner, but avoid being overly selective at this early stage.4 In the PAL study, the initial place to look for candidate partners was right here—the Consortium membership itself. Further leads came through professional networking and research into databases and Web sites. We contacted 75 organizations and conducted a short phone survey to determine their qualifications for participating in the study. We
42
IEEE SOFTWARE
September/October 2001
wanted candidates that had a PAL for at least two years, were interested in participating, and could invest time and effort in the benchmarking study. Of the 75 organizations contacted, 29 organizations met these criteria. Developing the screening instruments. The screening instruments are meant to help select benchmarking partners by screening out those candidates that do not meet your criteria. Early on, avoid stringent criteria that might rule out some candidates arbitrarily, when the candidates actually have valuable practices to bring to the benchmarking table. Also, provide more than one type of screening instrument to approach candidate viability from different angles. As you develop screening instruments, consider how they will be scored. If certain questions will produce answers that you cannot easily measure, consider changing the questions. For the PAL study, we developed two screening instruments: a detailed technical questionnaire and a PAL user satisfaction survey. The detailed technical questionnaire aimed to determine the types of practices—grouped according to the topic areas in the benchmarking framework—that the candidate partner organization performs. The entire detailed technical questionnaire had over 100 questions, some requiring short explanatory responses, but most needing only a yes or no response. A knowledgeable person in the responding organization would require about one to two hours to complete this detailed questionnaire. Questions in the questionnaire’s Measurement topic area included ■ ■
■
Are measurements taken on how often the PAL is accessed over a period of time? Are measurements taken on PAL performance, such as response time, access time, and so forth? Are measurements taken on the PAL’s overall success or benefit?
The second instrument, the PAL user satisfaction survey, helped us determine what the users in the responding organization think of their organization’s PAL. Low user satisfaction scores, we reasoned, would steer us away from organizations with PALS that were not particularly effective. Figure 1 shows the survey.
Figure 1. PAL user satisfaction survey.
Process asset library user satisfaction survey Using the following scale, please enter a number in the RESPONSE column to indicate your level of agreement with each of the statements below: Strongly disagree 1
Disagree 2
Neutral 3
Agree 4
Strongly agree 5
Don't know 6
My organization’s process asset library: Helps new projects start up Enhances standardization across the organization Increases awareness of best practices in the organization Is used regularly by managers and team leaders Is used regularly by technical members of the organization Is used regularly by people involved in process improvement Is used by most projects in my organization Has an increasing number of content items Contains no obsolete or useless content items Is well organized (I can find what I’m looking for readily.) Is easy to use (I can use it without a lot of help.) Has acceptable response time (I don’t get frustrated by the response time.) Makes good use of technology available to my organization Processes related to the use, management, and control of my organization’s process asset library: Exist Are well communicated to those in the organization with the need to know Are easy to follow Are used consistently in my organization Work well Overall, I believe that my organization’s process asset library: Enhances process improvement in my organization Helps me to do my job better I am active in my organization’s process improvement group.
Administering the screening instruments. Before distributing the screening instruments, you should ensure that instructions for completing the screening instruments are clear and that any tools associated with the screening instruments are thoroughly tested. Once you have identified candidate partners and they have agreed to cooperate, and screening instruments are ready, it is time to begin the screening process itself. Distribute the screening instruments and give ample time for returning the completed instruments. In the PAL study, we asked the 29 qualified organizations to complete the detailed technical questionnaire. We also asked them to administer the user satisfaction survey to at least 10 people who represent the entire spectrum of their PAL users. Fourteen organizations completed and returned both screening instruments. Two others returned the detailed technical questionnaire but withheld the user satisfaction survey. We eliminated these two from consideration because we needed equivalent data from all benchmarking partners. The organizations that failed to return the required data within an ample window of time in effect eliminated themselves. Evaluating screening instrument data. If
Not applicable 7 Response
you establish scoring rules when you develop the screening instruments, evaluating the data from the screening instruments is simple. Otherwise, evaluating the data will be difficult and time consuming. At this point, you want to determine which candidates are strong overall and which might be especially strong in particular areas of the benchmarking framework. Ranking the total scores will indicate which candidate partners are strong overall. Examining the scores in each framework area might identify candidates with a medium score overall that have the highest scoring in the selected area, indicating that they might have the best practices in those areas. In the PAL study, scoring the user satisfaction survey was easy because it was based on quantitative responses to the survey statements. But the detailed technical questionnaire was a challenge because many responses required judgment to score. In retrospect, wording the questions differently could have minimized this problem. Once the scoring rules were resolved, however, determining and normalizing the scores was straightforward. Figure 2 shows the normalized scores of the detailed technical questionnaires from the 14 candidate partners. Green cells indicate firstSeptember/October 2001
IEEE SOFTWARE
43
Figure 2. Normalized scores from 14 detailed technical questionnaires.
Candidate partner organization Benchmarking framework area Organization's key characteristics Human resources Purposes of the PAL Owners and users of the PAL Type of PAL content PAL structure User interface Configuration management PAL measurement Security of PAL Hardware and tools Training and communication Enablers Total normalized score Rank by total normalized score
A 0.60 0.69 0.68 0.86 0.82 0.30 0.69 0.72 0.15 0.71 1.00 0.63 0.00 7.85 6
B C 0.60 0.40 0.85 0.92 0.89 0.81 0.91 0.85 0.69 0.77 0.73 0.76 0.54 0.95 0.90 0.90 0.00 0.51 0.86 0.86 0.75 1.00 0.63 0.75 0.00 1.00 8.33 10.47 5 2
D 0.00 0.85 0.45 0.71 0.74 0.88 0.86 1.00 0.07 0.86 1.00 0.50 1.00 8.91 4
E 0.40 0.77 0.63 0.59 0.50 0.00 0.62 0.50 0.00 0.71 0.25 0.13 0.00 5.10 13
place scores in the particular benchmarking framework area. Blue cells indicate secondand third-place scores. Coloring the high scores this way shows where the strengths of particular organizations might lie. This analysis of the technical questionnaire results shows the potential for integrating best practices from multiple companies. For example, while Organization K is strong in many areas of the benchmarking framework, this organization could improve in other areas by considering the practices of Organizations D, M, and N. We performed a similar analysis for the user satisfaction survey rollups from the 14 candidate partner organizations. While most of the responding organizations’ user satisfaction data showed rankings quite similar to those analyzed for the detailed technical questionnaire, several notable exceptions arose. For example, Organization C ranked last on the user satisfaction survey, but second on the detailed technical questionnaire. This example illustrates the value of obtaining multiple perspectives before selecting a particular benchmarking partner. Selecting the benchmarking partners. Once you’ve evaluated the data using a quantitative scoring method, you can select the benchmarking partners. Consider taking as
Table 1 Several benchmarks relating to “Managing and Controlling the PAL” ID
Benchmark
218 219 220
A documented procedure exists for selecting PAL content items. A documented procedure exists for selecting best practices. Currency of PAL content items is reviewed periodically.
44
IEEE SOFTWARE
September/October 2001
F 0.60 0.54 0.45 0.36 0.95 0.00 0.34 0.56 0.00 0.14 0.25 0.65 0.00 4.85 14
G 0.60 0.54 0.92 0.55 0.87 0.76 0.50 0.60 0.07 0.43 0.50 0.85 0.00 7.18 8
H 0.00 1.00 0.82 0.58 0.55 0.48 0.52 0.14 0.07 0.71 0.25 0.75 0.00 5.88 10
I 0.60 0.32 0.76 0.75 0.66 0.39 0.28 0.46 0.09 0.57 0.00 0.50 0.00 5.38 11
J K 0.00 1.00 0.69 0.85 0.69 0.89 0.59 0.78 0.68 1.00 0.15 0.45 0.48 1.00 0.72 0.90 0.00 1.00 0.29 0.43 0.25 0.75 0.63 1.00 0.00 1.00 5.17 11.05 12 1
L 0.60 0.85 0.94 0.73 0.77 0.09 0.76 0.74 0.37 0.00 0.50 0.63 0.00 6.96 9
M 0.80 0.85 0.84 1.00 0.90 0.45 0.52 0.50 0.07 0.14 0.00 0.50 1.00 7.57 7
N 1.00 1.00 1.00 0.64 0.99 1.00 0.95 0.60 0.07 1.00 0.75 0.88 0.00 9.87 3
benchmarking partners those with the highest overall scores, and if necessary, add to this group any candidate partner with scores that are highest in any of the areas of the benchmarking framework. Be sure to consider the benchmarking project’s budget and the amount of data to be collected when you decide how many benchmarking partners you will select. In the PAL study, we selected five benchmarking partners based on their scores on the screening instruments. However, in case one or more of these organizations might be unable or unwilling to fulfill the partnership, we also selected a sixth partner. It turned out that all six benchmarking partners could participate for the entire study. Although this was a rather large number of partners, particularly with the volume of data to be gathered, we found this to be a manageable group. Data collection phase This phase entails establishing detailed benchmarks relating to the benchmarking framework, developing a data collection plan, and holding benchmarking meetings with each partner. Establishing the benchmarks. A benchmark is a standard against which a measurement or comparison can be made. Take care to word the benchmarks such that participants can make clear, measurable comparisons. There should be a sufficient number of benchmarks to adequately cover all benchmarking framework areas. Each benchmark should carry a unique number or identifier. In the PAL study, we identified 239 benchmarks, and the combined set of benchmarks covered all areas of the benchmarking framework. Table 1 shows several
Organization 1 ID
Benchmark
Perform?
How?
Organization 2 Comments
Perform?
How?
Comments
benchmarks relating to “Managing and Controlling the PAL.” Data collection plan. A data collection plan is a spreadsheet or table used to record, annotate, compare, and analyze the hundreds of pieces of information that will be collected. Down one side are the benchmarks. Across the top are columns, several for each of the benchmarking partner organizations, for recording the following information: ■
■ ■
a yes or no indicating whether the organization performs the practice stated in the benchmark, a brief description of how the organization performs the practice, and the benchmarking team’s comments or questions about this practice in this organization. (This is a temporary field that will be cleared when adequate information is obtained about whether and how the organization performs the practice.)
Figure 3 shows a simplified template for a generic data collection plan. In the PAL study, we implemented the data collection plan as a spreadsheet. Where possible, we filled the data collection plan cells with data obtained from the detailed technical questionnaires. Where the detailed technical questionnaire responses were unclear, or where benchmarks not addressed previously by the detailed technical questionnaire appeared, we noted questions to ask in upcoming meetings with the benchmarking partners. Benchmarking meetings. This activity involves conducting individual meetings with each benchmarking partner to obtain the information necessary to fill in the data collection plan. Multiple techniques are available, such as observing demonstrations, reviewing documentation, and interviewing subject matter experts and users. Typically, benchmarking meetings are conducted at the benchmarking partner’s site, although virtual meetings via phone conference are possible. To optimize the available time with the benchmarking partner during these meetings, the benchmarking team must prepare thoroughly, including having a list of specific questions. Following a benchmarking meeting, the team conducts a debriefing to ensure
Figure 3. Simplified
that the data obtained are documented, com- template for a plete, and understood. Follow-up phone calls generic data to the benchmarking partner are frequently collection plan. needed to clarify data and fill in the gaps. In benchmarking meetings, the discussion with one partner often will reveal an important practice that is not among the identified benchmarks. If the benchmarking team decides that this practice is important enough to be benchmarked, the teams add this practice to the benchmarks and follows up with any previously interviewed benchmarking partners to determine if they perform this practice as well. In the PAL study, the team conducted site visits to two benchmarking partners located nearby. Another benchmarking partner, located far away, had an affiliated organization located near the Consortium, so the team examined the partner’s PAL on the corporate intranet at the nearby facility. For the remaining three partners, we conducted site visits by phone with a simulation of the PAL on our computer screens—for example, using the HTML pages or a Lotus Notes database of the partner’s PAL. After a guided tour of the benchmarking partner’s PAL, we asked the questions noted on the data collection plan for this particular partner. These questions consumed the majority of the benchmarking meetings. Typical benchmarking meetings lasted about two hours, with several short follow-up phone calls to clarify data and fill in any missing data. Data analysis phase During this phase, we normalize, sanitize, and verify the raw data; establish criteria for best-practice designation; and select potential best practices by the benchmarking team and then validate them. Normalizing, sanitizing, and verifying the raw data. During the data collection phase, the benchmarking team gathers what seems like volumes of data. Before data analysis can begin, however, the data must be normalized, ensuring that the detail provided is appropriate and comparable across the board. Terms and acronyms specific to benchmarking partner organizations must be sanitized. Any changes to the data must be verified again with the benchmarking September/October 2001
IEEE SOFTWARE
45
In determining that one practice is better than another, there should be some criteria or standard established for best practice.
partner to ensure it is still correct after normalization and sanitization. In the PAL study, normalizing involved ensuring that “not applicable” responses were replaced with yes or no, and variations in interpretations of the meaning of certain benchmarks over time were clarified. We replaced any terminology or acronyms specific to benchmarking partner organizations with neutral terminology. Criteria for best-practice designation. Because best practice is an overused term that has come to mean many things, you must explain what it means for your benchmarking study. In determining that one practice is better than another, there should be some criteria or standard established for best practice. In our study, we defined a best practice as one that met the following criteria: ■
■
■
■
■
■
Existence. The practice must have been observed in at least one partner organization. Importance. In the benchmarking team’s opinion, the practice is important to an effective PAL. Effectiveness. In the benchmarking team’s opinion, the practice appears to work well where it is used. Tangible benefit. In the benchmarking team’s opinion, there is a tangible benefit to the organization that performs this practice. Innovation. Where appropriate, the practice makes use of innovation, such as use of automation instead of manual methods for accomplishing a task. High value perceived by PAL users. The practice must be rated highly on a validation survey of 18 PAL users representing the six partner organizations.
Selection and validation of potential best practices. With clearly defined criteria, selecting best practices is straightforward but time consuming. Each benchmarking partner’s practice for each benchmark must be evaluated and, if it meets the best-practice criteria, flagged as a potential best practice. If your best-practice criteria includes a validation requirement, you must perform your selected method of validating potential best practices. In the PAL study, the benchmarking team identified 48 potential best practices. To sat46
IEEE SOFTWARE
September/October 2001
isfy the last of the best-practice criteria, we conducted individual phone surveys with three PAL users from each partner organization, asking such questions as On a scale of 1 to 5, where 1 means “of no value” and 5 means “of extreme value,” how would you rate the value of providing both a graphical and text-based navigation to PAL artifacts?
We calculated the average and median scores for the entire user group for each of the potential best practices. Scores with an average of 3.5 or higher and a median of 4.0 or higher were considered high enough to be considered validated best practices. Pairing the two measures gives a clear picture of the range and concentration of the PAL user opinions. We identified 36 validated best practices. Reporting phase Here, we document the entire work of the project in a report intended for the project’s sponsors and stakeholders. Typically, a benchmarking report covers the following topics: benchmarking process with emphasis on benchmarking partner selection, datagathering methods, and analysis techniques; benchmarking framework; data analysis, including observations and best practices; and recommendations for how to use the benchmarking report. Extensive appendices include all the sources consulted during the project, screening instruments, benchmarking agreements, and benchmarking data. The benchmarking team should include an executive summary to give senior managers a quick sense of the results. Most teams develop a presentation format for the benchmarking report as well. In our study, we published the Process Asset Library Benchmarking Report for the Consortium membership and also presented the results to several member company audiences.5 Important lessons learned The benchmarking study taught us several important lessons for future benchmarking studies. ■
Use a proven benchmarking methodology. The benchmarking study was based on a proven method with well-
About the Authors Sam Fogle is a process improvement
■
■
■
■
■
understood phases successfully applied to a number of previous studies and described in Business Process Benchmarking and Benchmarking for Best Practices.6,7 Without a proven methodology, trial-and-error benchmarking approaches might waste time and money and yield results of questionable value. Obtain user (or customer) satisfaction data. Without user satisfaction data, you might select a benchmarking partner with unhappy users and practices that are not as effective as yours. Sometimes, user satisfaction data for many organizations is available for benchmarking. But if it is not available, develop and administer a user satisfaction survey instrument. We found the tool set for conducting and analyzing the user satisfaction survey particularly efficient and will use it on our next benchmarking study. Determine scoring criteria prior to finalizing the screening instruments. Without scoring criteria established in advance, time-consuming, subjective judgment is necessary to evaluate responses. As we’ve discussed, this problem was very challenging. We will avoid this one next time. Thoroughly test and pilot all materials to be used by the candidate partners. Poorquality benchmarking materials can affect the team’s credibility and the candidate partner’s willingness to participate. When the benchmarking team developed the screening instruments, we tested their functionality and piloted them. Even so, the candidate partners found a few minor problems with the instruments as they filled them out. Be ready well in advance for the next interaction with candidate partners. Lack of readiness to proceed to the next step can stall a benchmarking study and cause partners to lose interest. When we conducted our initial phone survey to find organizations interested in participating in the study, several were eager to get started filling out the screening instruments. But the screening instruments did not exist yet. The delay in getting the screening instruments out probably contributed to the slow return of information back to us. Allow extra time in the schedule for partners to respond to requests for data.
Unrealistic expectations for partner response time will cause the schedule to be changed repeatedly, which may undermine the team’s credibility. Even when a partner considers a benchmarking study a high-priority task, other tasks will preempt the data request and lengthen the partner’s response time. Additionally, we found that partners are particularly busy during November and December, so data requests during this time should be avoided.
B
enchmarking for process improvement takes time but ultimately is very satisfying. Typically, all benchmarking partners, sponsors, and stakeholders discover something of surprise and value among the best practices identified. Consortium member companies are using the benchmarking report produced by our study as a source of ideas for improving their existing PALs or establishing their first PALs. The process, tools, and lessons learned from this study will boost the effectiveness of our future benchmarking studies. By sharing our experiences, we hope that your team will find useful ideas to enhance your benchmarking success.
References 1. M.C. Paulk, Capability Maturity Model for Software, Version 1.1, tech. report CMU/SEI-93-TR-24, Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh, 1993. 2. CMMI for Systems Engineering/SoftwareEngineering/ Integrated Product and Process Development, version 1.02 (CMMI–SE/SW/IPPD, V1.02), Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh; www.sei.cmu.edu/ cmmi/ (current 15 Aug. 2001). 3. An Introduction to Process Benchmarking, tech. report SPC-99009-MC, Version 01.00.01, Software Productivity Consortium, Herndon, Va., 1999. 4. V. Powers, “Selecting a Benchmarking Partner: 5 Tips for Success,” Quality Digest, 1997; www.qualitydigest. com/oct97/html/benchmk.html (current 9 Aug. 2001). 5. Process Asset Library Benchmarking Report, tech. report SPC-2000035-MC, version 01.00.01, Software Productivity Consortium, Herndon, Va., 2001. 6. R. Camp, Business Process Benchmarking, ASQC Quality Press, Milwaukee, 1995. 7. C. Bogan and M. English, Benchmarking for Best Practices, McGraw-Hill, New York, 1994.
engineer at the Software Productivity Consortium. His professional interests are centered on advancing the art and science of software project management. He earned a BS in electrical engineering from the University of Maryland, College Park, and an MS in technical management from Johns Hopkins University. He is an instructor for several of the Consortium’s training courses and oversees their Managing Process Improvement course. He co-authored the Process Asset Library Benchmarking Report. Contact him at the Software Productivity Consortium, 2214 Rock Hill Rd., Herndon, VA 20170-4227;
[email protected]. Carol Loulis is a senior process improvement engineer at the Software Productivity Consortium. Her research interests are in the application of process frameworks such as the CMM and the CMMI. She holds a BA in mathematics from Vassar College. She co-authored the following Consortium guidebooks, reports, and courses: Interpreting the CMM for Diverse Environments guidebook and course, Training Program Guidebook, Introduction to a Disciplined Requirements Process course, Process Benchmarking course, Common Assessment Findings Report, and Process Asset Library Benchmarking Report. Contact her at the Software Productivity Consortium, 2214 Rock Hill Rd., Herndon, VA 20170-4227;
[email protected]. Bill Neuendorf is a senior member
of the technical staff at the Software Productivity Consortium. His research interests include software and systems engineering measurement, process improvement, and business intelligence. He holds a BS in mathematics and an MBA from Eastern Michigan University. He coauthored the Introduction to Process Benchmarking course and the Process Asset Library Benchmarking Report. Contact him at the Software Productivity Consortium, 2214 Rock Hill Rd., Herndon, VA 20170-4227;
[email protected].
For more information on this or any other computing topic, please visit our digital library at http://computer.org/publications/dlib.
September/October 2001
IEEE SOFTWARE
47
focus
benchmarking
Using Structured Benchmarking to Fast-Track CMM Process Improvement Gareth C. Thomas and Howard R. Smith, Sikorsky Aircraft
S Sikorsky Aircraft, in pursuit of a Capability Maturity Model rating, determined that software benchmarking would help its process improvement and assessment effort. The authors describe how Sikorsky prepared for benchmarking trips, structured its questionnaire, aggregated results, and evaluated lessons learned. 48
IEEE SOFTWARE
ikorsky Aircraft designs and manufactures helicopters. In today’s environment of glass cockpits, fly-by-wire control systems, and special missions, we are also in the integration business—which, from this article’s perspective, translates into embedded software. We develop large systems in-house, such as the Comanche Mission Equipment Package, comprising hundreds of thousands of lines of code, and we are deeply involved in enhancing large legacy systems for our Black Hawk
S
and Seahawk platforms. Furthermore, we integrate in our labs and on our aircraft the results of our own work and that of many, varied suppliers. Our software population is thus best described as nonhomogeneous. The integration business is extremely competitive, and in early 1998, a military customer encouraged Sikorsky to achieve a CMM Level 3 rating within 18 months—a consciously ambitious schedule. Intuitively, we felt that our different software projects were at different maturity levels, so one of our objectives was to obtain consistency. The Capability Maturity Model for Software (SW-CMM) is a model for organizational capability that provides guidance for establishing process improvement programs. Carnegie Mellon University’s Software Engineering Institute developed it, with industry, government, and academic participation. The
September/October 2001
model consists of five levels of increasing maturity, with different key process areas at each level. Achieving higher levels of maturity implies successfully addressing these KPAs and using them in a process capability to meet schedule, cost, quality, and functionality targets (see Figure 1).1 To reach its goal, Sikorsky created a team, selected a leader, approved a budget, and managed the effort as a stand-alone project. The team was cross-functional, with members from the Software Engineering, Software Quality Assurance, and Systems Engineering organizations. One of the first decisions was to perform formal benchmarking to help guide the planning, because the assessment process and SWCMM model only present elements that the defined process should include—they don’t explain how to implement the process. 0740-7459/01/$10.00 © 2001 IEEE
Optimizing (5) Process change management Technology change management Defect prevention
Structured benchmarking Sikorsky decided to invite William Detlefsen, an expert from a sister division within United Technologies (the parent corporation), to assist with the process, because he is known for helping various organizations structure their approach to benchmarking. His approach, based on work by Robert Camp,2 involved creating a comprehensive questionnaire for selected companies. We could then perform the benchmarking, review their experiences, and collect best practices. The project’s importance justified this structured team approach over the uncertainties inherent in an ad hoc effort. In a one-day session, Detlefsen led the team through a welldefined series of brainstorming steps, during which he solicited the goals and suggestions of the team members and helped organize the output. The goals were to learn how other organizations achieved Level 3 and to request assistance from them in the form of sample procedures, processes, and guidebooks. The exercise resulted in a single entity—a questionnaire of some 40 questions, grouped into five categories: 1. Philosophy of implementation—how each company achieved CMM compliance in terms of schedule, teams, and planning. 2. Management commitment—the strength of institutional support for the process improvement effort. 3. Cultural change and institutionalization— issues that arose regarding acceptance of the new process philosophy. 4. Definition of organization—because the SW-CMM assessment is for specific organizations, these questions assessed the scope of their effort (for example, a section, company, or corporation). 5. Objective evidence—the SW-CMM assessment process requires objective evidence that the new process is being followed, so these questions probed how each company collected evidence. Candidate companies Sikorsky then chose a diverse group of companies to benchmark. The group included several major Level 3 suppliers who were eager to help and with whom we had a good rapport. Geographic convenience factored into choosing a sister division and a supplier. In addition, our military customer wanted to
Managed (4) Software quality management Quantitative process management Defined (3) Peer reviews Intergroup coordination Software product engineering Integrated software management Training program Organization process definition Organization process focus Repeatable (2) Software configuration management Software quality assurance Software subcontract management Software project tracking and oversight Software project planning Requirements management Initial (1)
help. In all, we chose five companies: three at Level 3, one at Level 4, and one at Level 5. They consisted of two avionics and integration companies, an aircraft engine company, a software house, and a government software engineering facility.
Figure 1. The SW-CMM key process areas by maturity level.
Trips, reports, and data The typical benchmarking trip lasted one day. The host company would usually lead off with an overview, and then the interview questions were divided by category among the team members. All team members took notes. The companies were cooperative, and extemporaneous questions were encouraged as appropriate, but the structure implied by the questionnaire set the visit’s formal tone. Back at the plant, we aggregated the team’s data. First, we wrote, approved, and distributed a trip report, to quickly disseminate the information to team members and other interested parties. Second, the completed questionnaire evolved into a matrix, with a column of responses from each company, which made it easy to compare companies and identify common or unique attributes. Table 1 illustrates the questionnaire and its matrix. For the sake of the benchmarked companies’ confidentiality, we sanitized the questionnaire responses, and for the sake of brevity, we included only one illustrative question from each of the five categories. September/October 2001
IEEE SOFTWARE
49
Table 1 Sikorsky process improvement questionnaire—sample questions and responses Company Category and question
A
B
C
D
E
Philosophy of implementation: How is the software development process structured? A. By program B. At a company-wide level C. By specific department requirements D. Other
B. One process for all government projects, tailored by project or department
B. Can be tailored for specific (large) customers
D. Division-wide implementation
D. Three product lines: new, transitioning, and maintaining (for deliverable software only)
Management commitment: Who is the champion of the continuous improvement process for software development? A. Company president B. Engineering vice president C. Software manager D. Other Cultural change and institutionalization: How was the process improvement team structured? A. Practitioners B. SEPG members C. Both A and B D. Other Definition of organization: How did you define organization for the CMM? A. The software engineering group B. The software and systems engineering groups C. All of engineering D. The company E. Other Objective evidence: What artifacts are maintained for evidence of Level 3? A. Procedures in place B. Compliance to procedures C. QA reports D. Customer agreement E. All of the above F. Other
B. (with corporation influence)
A. President D. General manager for QA
B. Starts with one procedure that establishes the software engineering process group (SEPG) and another that defines the high-level process; common process implementation by program through the software development plan, with minimal tailoring C. Also the director (the champion or sponsor) and discipline chiefs (the day-to-day champions)
D. Other, the director of company E
C.
C.
C. SEPG membership is long-term
All of the above, plus program management (customer pressure); engineering VP was a key champion C.
A. The software product (we did not include test equipment software)
D.
A.
E. Embedded software, shipped in a product (not MIS, not test)
A. No MIS or systems, just deliverable software
E.
A, B, and C. Also, D— per ISO requirement, customer sometimes specifies defect counts
E. Notebooks created for each program held artifacts (with common index); generic notebook; assessor provided list of desired artifacts (which were expanded)
A and B— procedures, plans, and QA reports
List of standards defines artifacts; separate standard for project notebook metrics
Best practices The final step of benchmarking is to create a best practices list. This emerged essentially out of the paragraph headings of the five trip reports. Example headings include the software engineering process group (SEPG); the 50
IEEE SOFTWARE
September/October 2001
C.
process asset library; artifacts; engineering and quality procedures; metrics; schedules; staffing levels; and the choice of assessor. From these, and from the matrix, we derived significant assistance in formalizing our improvement plans. Many members of our
Level 3 team intuitively thought they knew many of the answers in advance of the formal exercise, but it was an enormous confidence booster to be made aware of these top practices that had worked well for others. We learned new things and gained deeper perspective on others. We often found ourselves gravitating toward using the same acronyms as our benchmarkees. We would be remiss if we left an impression of structured benchmarking as a panacea. Its single biggest contribution was identifying risks—and our management was eager to support our risk abatement strategies. The best example of this was our schedule—we took aggressive actions to address it once we realized how ambitious it really was. For example, in addition to providing internal staff to work on the effort, management also supported hiring CMM-knowledgeable consultants to assist our process development effort. Furthermore, management funded a series of preliminary assessments by an authorized CMM lead assessor. Benchmarking results We had to assess each of the five categories. For the first category—software development process improvement and its philosophy—we realized that most of the organizations included software quality and systems engineering personnel on their teams, and some organizations involved purchasing, contracts, or program management in their process improvement. The organizations’ responses varied from minimal to major as to the extent of life cycle requirements in their procedures. For all the organizations, management actively approved or concurred with the project process steps during the program’s early stages. The level of detail of their procedures was split between concise descriptions and broad requirements. All companies tailored procedures and either provided guidelines in the procedures or documented them in the Software Development Plan. For the second category—management support for process improvement—each organization had a champion for the continuous improvement process at the vice president or director level. Management provided staff and funds and received status reports. The size of the SEPG varied by organization from two full-time to over 10 full-time staff members. All the organiza-
tions updated their procedures once or twice a year, even after their assessment. In the case of the third category—cultural change and institutionalization—some of the practitioners resisted change. Implementation techniques included cross-functional teams, the use of process consultants, and training. Either a customer or the competition drove cultural change. All the organizations developed an annual process improvement plan to manage and track their process improvement efforts. The fourth category—defining the organization for the SW-CMM effort—usually included only the software that the Software Engineering group developed, for a variety of product domains. The last category—objective evidence— addressed information the organizations collected for the assessment. They cited their procedures, metrics, and reports. The organizations varied from minor to major regarding the role process measurement played in achieving Level 3. The success of our fast-track effort was backed by the knowledge garnered from the benchmarking trips. Some of the answers provided guidance to our SEPG, whereas others provided assurance that our approach and concerns were valid.
We would be remiss if we left an impression of structured benchmarking as a panacea.
O
ur Level 3 assessment was successful, and we achieved the rating on schedule. The benchmarking experience, for a modest investment, got us off on the right foot. It enabled good risk planning and helped pick the targets on which we focused. Among other things, we learned that management commitment is essential, including staffing, travel budgets, and oversight. We received full support, which included meeting with the company president. We were made to feel that software had come a long way in an erstwhile hardware-intensive world. Management support remains strong for continuing the effort. The cross-functional aspect of the benchmarking team also worked well. The different members had inherently different orientations—for example, people involved with quality assurance think of metrics, a technologist might think of tools, and a project manager’s orientation is toward scheduling. Together, however, they are synergistic. September/October 2001
IEEE SOFTWARE
51
About the Authors Gareth C. Thomas is manager of Software Quality and Reliability at Sikorsky Aircraft.
He received a BSc and MSc in applied mathematics from the University of Wales, and an MBA from the University of Connecticut. He is on the adjunct faculty at Fairfield University, where he teaches mathematics. Contact him at
[email protected].
Howard R. Smith is the group leader for Advanced Software Technologies in the Soft-
ware Engineering Section at Sikorsky Aircraft and a member of the Software Engineering Process Group. He received a BS in mathematics from the University of Massachusetts, an MS in computer science from Indiana University, and a PhD in computer science from the University of Texas at Austin. Contact him at
[email protected].
The degree to which an organization wants to provide a formalized structure to its quest is a decision each group must make. In our case, we have been grateful for the contribution of an outside consultant. However, although communication among our team was quick and easy, we could have more aggressively publicized our results, as we currently do by posting process improvement metrics. With this software process improvement infrastructure in place, we are now investigating the CMMI—a combined software– systems maturity model—and embracing a SW-CMM Level 4 commitment.
SET INDUSTRY PSTANDARDS 51
Acknowledgments We thank William Detlefsen from United Technologies. While formally assigned to Pratt & Whitney, he has supported many benchmarking efforts throughout the corporation, in a variety of different areas. His guidance in advising us on a structured approach and help facilitating the development of the questionnaire were invaluable. Leigh Gray, head of Sikorsky’s Software Engineering Process Group, led the successful effort to achieve a SW-CMM Level 3 assessment in 18 months. She fully supported the benchmarking effort as we fed our information to the team members supporting our process improvement effort. Additional members of the Software Engineering, Software Quality Assurance, and Avionics Systems Engineering sections supported the benchmarking effort and participated on our process action teams.
References 1. M. Paulk et al., Capability Maturity Model for Software, Version 1.1, tech. report CMU/SEI-93-TR-24, Software Eng. Inst., Carnegie Mellon Univ., Pittsburgh, 1993. 2. R.C. Camp, Benchmarking: The Search for Industry Practices that Lead to Superior Performance, ASQC Quality Press, Milwaukee, Wisc., 1989.
For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.
Posix
FireWire
gigabit Ethernet enhanced parallel ports
wireless networks
token rings
Computer Society members work together to define standards like IEEE 1003, 1394, 802, 1284, and many more.
HELP SHAPE FUTURE TECHNOLOGIES • JOIN A COMPUTER SOCIETY STANDARDS WORKING GROUP AT
computer.org/standards/ 52
IEEE SOFTWARE
September/October 2001
focus NSDIR: A Legacy beyond Failure Early benchmarking effort was ahead of its time Greg Goth,
[email protected] A
t the end of the 19th century, humorist Mark Twain penned a short story detailing a short and miserable stint serving the government (in this case, a few weeks in a ragtag unit of the US Confederate Army). He gave the story the catchy and descriptive title, “The Private History of a Campaign That Failed.” At the end of the 20th century, yet another short-lived government program, the National Software Data and Information Repository, failed, and its history, like that of Twain’s hapless militia unit, is little known beyond those directly involved. However, the NSDIR has more than failure as a legacy. The database, designed to ease software research time and cost burdens for Defense Department projects, also served as a proving ground for concepts measurement pioneers are now extending beyond the military.
The repository was simply too far ahead of its time.
Using NSDIR concepts The latest effort to establish a comprehensive academic and industrial measurement database, the Center for Empirically Based Software Engineering (CeBASE), spearheaded by University of Maryland researcher Victor Basili and University of Southern California researcher Barry Boehm, is taking NSDIR concepts forward. “Some of the NSDIR working group results have been very helpful in defining the concept of operation for CeBASE’s software experience base,” Boehm says. “In particular, the work 0740-7459/01/$10.00 © 2001 IEEE
on data definitions and multisource data integration has helped the two CeBASE principals, the University of Maryland and USC, in integrating their current software repositories and associated process models.” Lorraine Martin, who managed NSDIR’s operation, says the repository, which existed from 1994 to late 1998, was simply too far ahead of its time. “I don’t think it failed because of any intrinsic problem,” Martin, now vice president of aerospace information operations at Lockheed Martin, says. “It was ahead of its time. It was established at the early edge of the World Wide Web. It was a little new.” Developing the NSDIR If any one individual should be given credit for pushing the NSDIR concept, it’s Lloyd Mosemann II. Mosemann, currently a senior vice president at Science Applications International, was the Air Force’s Deputy Assistant Secretary for communications, computers, and support systems in 1993. He organized a conference that year to address the need for a rational software measurement approach on a national level. “It wasn’t anything grand,” Mosemann says, “about 100 people from across the spectrum of software development.” Mosemann’s highest priority was establishing a baseline concept for software measurement. Even though the US was in the throes of recession and the military was seeing the worst of the post-Cold War budget cuts, the retrenching sociopolitical climate wasn’t the driving force behind the conferSeptember/October 2001
IEEE SOFTWARE
53
FOCUS
Pioneering interactivity Hissam and his team built a system capable of storing data pertaining to essential forms of information including ■
■
The point that somebody was trying to build an organizational benchmarking structure for the Air Force, let alone the Department of Defense, was new. 54
IEEE SOFTWARE
ence, held in Cooperstown, New York. “It really had more to do with the embryonic state of software engineering practice,” Mosemann says. “I was pushing CMM (the SEI’s Capability Maturity Model) at the time. I was pushing Ada. Yet there was nothing to measure at the time. I made a speech about 10 years ago in which I said, ‘Look, the Pentagon isn’t looking for any particular approach. They’re looking for predictability.’” Those attending the Cooperstown conference agreed that a national repository could greatly streamline project analysis. Mosemann spearheaded a prototype that was developed by Loral Defense Systems. A year later at Cooperstown, Mosemann demonstrated the prototype, and working groups there drafted recommendations for a working repository. When operational, Lockheed managed the NSDIR under contract. Martin estimates managing the repository took about a quarter of her time. Scott Hissam, now technical staff senior member at SEI, was lead software engineer for the NSDIR project. “I’m not a measurement guru,” he says. “I build systems. But you didn’t have to be a guru to know measurement was coming into its own.” With a small staff, Hissam built a system based on Sun Solaris Unix that featured an Oracle database and a Visual Basic client using Open Database Connectivity (ODBC). “It was built from open standards,” he says. “It had an architecture I could pull pieces in and out of. It was totally modular. I didn’t want to bet the ranch on any particular commercial product. From a technological aspect, I’m proud of what NSDIR was able to do.”
September/October 2001
project profile data, such as the project’s problem domain, software process used, and resources such as number of workers involved, skill levels, and organizational CMM level—this data was entered into the repository at least once and updated as needed; and recurring performance data, such as project size, staff hours spent, defect density, and milestones reached—this data was submitted on a more regular basis, varying according to each project’s milestones.
Hissam built flexibility into the submission format to encourage participation. Data could be submitted online, on paper, and through electronic forms transmitted via diskette, tape, FTP, or email. Data could be retrieved through an NSDIR-built Information Analysis Tool and a WWW-compatible version of IAT (WWWIAT). The tool supported several previously defined queries such as “How many projects use CMM as a criteria in source selection and award?” and “What is the effort over time, planned vs. actual, for projects that use Ada and are in the avionics domain?” The IAT also supported ad hoc queries. “We were one of the first programs to allow dynamic queries over the Web,” Martin noted about the project. Hissam says the idea of the database as groundbreaker went beyond its technology. “The point that somebody was trying to build an organizational benchmarking structure for the Air Force, let alone the Department of Defense, was new,” he says. Cultural roadblocks While the submission tools and database might have been technologically advanced, however, not much data came into NSDIR. By the end of 1996, the NSDIR had received profile data on 241 projects from 117 unique organizations. Yet, only 2 to 5 percent of these projects were submitting recurring performance data. Hissam believes that early NSDIR attempts to streamline data definitions unwittingly discouraged participation. The NSDIR naively established a rigid set of software measurement elements for projects to fulfill, he says. The upside to this approach was that
FOCUS
the NSDIR could perform analysis across data it considered “normalized.” The downside was that this approach placed the burden on the project developers to normalize data to NSDIR standards. This inhibited the data flow. “That did not work well at all,” Hissam says. Eventually, the NSDIR helped establish a submission form partially based on the metrics of Practical Software Measurement (PSM), a grassroots initiative that would remove the onus of normalizing data from the contractor and place it on NSDIR. Bill Florac, an SEI visiting scientist, concurs with Hissam. Florac, who served as a consultant to NSDIR, says the PSM initiative he undertook with SEI colleagues was a response to concerns that measurement was too far removed from contractor’s concerns about meeting budget requirements. “The thought was, ‘It’s too academic, we need to get practical data,’ and the whole notion of PSM came out of that,” Florac remembers. In retrospect, Hissam says, much of the cultural baggage attending project development simply overwhelmed the NSDIR’s good intentions. While Mosemann’s advocacy gave the NSDIR an advantage in obtaining Air Force projects data, other services were not so quick to participate. “It came down to factors as simple as people not liking each other and competition among services,” Hissam noted. “In the end, we were able to get an admiral to write a letter to the Navy, saying, ‘Participate.’ It all came down to who you knew, pressing the flesh.” In addition to political roadblocks, Hissam recounts that program managers, under pressure to complete projects on time and on budget, did not perceive participation in NSDIR would yield benefits equaling the effort. “Program managers are very short-term focused,” he observed. Because contractors were not required to submit software measurement data to their program office, the NSDIR had to rely on the program office to convince contractors to submit it or for contractors to submit data on their own accord. Mandated participation would have added cost to procurement and added considerable time as the required contract language worked its way through the acquisition chain. Contractors who were not already using measurement data internally, or who thought contributing existing data would not be worth the time and effort, did
not wish to add either to their budgets. Hissam notes there was pervasive fear among contractors that even scrubbed data would reveal too much about the health of a given project and organization. If projects met unexpected problems, would contractors meet reprisal? Would program offices lose funding or be suspended? Would the branches of the service try to use NSDIR data to get themselves bigger pieces of the pie if their projects were shown to be better performers over a given duration (a duration determined by themselves, of course)? Contractors were afraid that the sheer size and scope of many of the projects would reveal a project’s identity, says Hissam. “Imagine locating an Air Force project in the repository that was 17 years old, written predominantly in Ada, and encompassing nearly a billion lines of code,” he says. “If one such project actually existed, many decision makers in the Defense Department would be able to recognize the project by those characteristics alone. While the public might not grasp such details, the insiders most certainly could.” Eventually, the Air Force issued a barebones request for data, requesting that Air Force programs collect a set of core measures, including size, effort, schedule, and quality. It too failed, and by the end of 1998, NSDIR quietly faded away. Martin states she is certain the NSDIR effort would have paid off eventually, but it never got enough momentum to get off the ground. She believes that emphasis on measurement has succeeded within companies that consolidated in the post-Cold War retrenchment. “The consolidation enabled companies that could break down their own internal walls to build richer databases,” she says. “In some ways, companies that merged got the NSDIR challenge internally.” Lessons learned Martin still believes a resource such as NSDIR would save developers time and money, especially as the parameters of software development continue to expand. “We’re always reaching for information,” she says, “and a lot of times, that information is hard to come by. Truthfully, even today, when you want to reach out and find what the industry average is for a given question, there’s not a place where you can get that, except for what’s in existing estimating tools—
In retrospect, Hissam says, much of the cultural baggage attending project development simply overwhelmed the NSDIR’s good intentions.
September/October 2001
IEEE SOFTWARE
55
FOCUS
and that doesn’t cover the waterfront.” Hissam thinks that the climate for establishing a repository is better than when NSDIR was established. “I’ve participated in a number of program reviews, and I believe cooperation is getting better,” he says. “The military has gotten away from stovepipe systems that don’t interconnect.” He speculates the CeBASE concept may succeed where NSDIR failed, given the increased awareness of measurement’s importance and its founders’ reputation. “I don’t know how it will fare, honestly,” he says. “They’ve got to get entree into the government procurement and acquisition world, and Barry Boehm can make that happen.” Mosemann says he thinks the concepts behind CeBASE differ slightly from NSDIR, but also acknowledges the relationship between the two. “Some of Vic Basili’s work with NASA was one of the inspirations of NSDIR,” he says. Florac and SEI fellow Watts Humphrey are not as confident that the industry is ready to embrace a comprehensive repository. “Even within organizations, let alone
FOR CALL LES: ARTIC
re a w t f o S on i t c u r t Cons
across them, unless definitions are clearly drawn, it’s very difficult to interpret results from one place to another,” Humphrey says. Once those definitions are drawn, it might be hard to convince data owners that releasing it will be a long-term benefit for the industry, according to Florac. “I don’t have any hard facts, but my perspective is, they’re always going to guard their data,” he says. “The idea of having a repository with that level of data in it, from all sorts of companies, is unlikely. They have no reason to do it.” For NSDIR champion Mosemann, however, the continuum of measurement has advanced significantly since the early nineties. At the very least, he says, companies that integrate their metrics data internally will reap rewards from the effort to emphasize measurement. “Even though the Defense Department isn’t asking in every instance, we still consider it gives us a leg up,” he says. “With fixed-price contracting, you need to be predictable.” For more information on NSDIR, see the Crosstalk article by Dale Cara, S.A. Hissam, and R.T. Rosemeier (“National Software Data and Information Repository,”Crosstalk, vol. 9, no. 2, Feb. 1996).
HAVE YOU EVER HAD AN EXPERIENCE in constructing software that gave you unexpected insights into the larger problem of software engineering and development of high-quality software? If so, IEEE Software encourages you to submit your experiences, insights, and observations so that others can also benefit from them. A unique aspect of software engineering is that techniques used at the smallest scale of individual software construction are often relevant even for the largest software systems—and vice-versa. We are looking for articles that encourage a better understanding of this commonality between programming in the small and programming in the large, and especially ones that explore the larger implications of hand-on software construction experiences.
Possible topics include but are not limited to the following: • Coding for network (grid-based) applications • Coding for high-availability applications • Coding for compatibility and extensibility • Coding for network interoperability • Effective use of standards by programmers • Lessons learned from game programming • Techniques for writing virus-proof software • Simple and commonsense techniques for making multithreading and concurrency easier and more reliable • Coding implications of continually increasing processing and memory resources • Agents: When, where, and how to use them • PDAs and the future of “wearable” software • Is “agile” programming fragile programming? • Prestructuring versus restructuring of code • Integration of testing and construction
• • • • • • • • • • • • • •
Aspect-oriented programming Impacts of language choice on application cost, stability, and durability Proliferation of Internet-based computing languages (for instance, Perl, Python, Ruby) “Throw-away” programming and languages: Good, bad, or just ugly? Code ownership in an Internet economy When, how, and where to use open source choosing languages for your applications Lessons learned in buying and using tools XML: Where to next? What will follow it? How to keep documentation from diverging Personal-level configuration management Personal-level defect analysis Build processes for the new millennium Deciding when to branch code bases
SUBMISSIONS ARE ACCEPTED AT ANY TIME.
feature open source
Does Open Source Improve System Security? Brian Witten, Carl Landwehr, and Michael Caloyannides, DARPA and Mitretek Systems
An attacker could examine public source code to find flaws in a system. So
An attacker could examine public source code to find flaws in a system. So, is source code access a net gain or loss for security? The authors consider this question from several perspectives and tentatively conclude that having source code available should work in favor of system security. 0740-7459/01/$10.00 © 2001 IEEE
he current climate of functionality and performance-driven markets has created enormous code bases, which have helped drive growth in the US gross domestic product in recent years. However, these code bases have also created an information infrastructure whose vulnerabilities are so striking as to endanger national and economic security.1 Distributed denial of service attacks have demonstrated that such vulnerabilities can degrade the Internet’s aggregate performance,2
T
and recurrent virus outbreaks have inflicted substantial repair and recovery costs on businesses worldwide.3 There is no guarantee that making a system’s source code publicly available will, by itself, improve that system’s security. Quantifiable arguments could help us make that decision, but we don’t have good measures for how secure systems are in the first place.4 There is substantial literature debating the merits of open source, but much of it is based on examining a few examples anecdotally. Hard data and rigorous analysis are scarce. Open review of algorithms, particularly cryptographic algorithms, has long been advocated,5 and its benefits are hardly debatable.6 But ultimately, Rijndael, the selected Advanced Encryption Standard algorithm, will operate as software and hardware. If we can openly review the algorithm but not
the software, there is ample room for doubt as to whether a particular implementation is trustworthy. There are many definitions and degrees of open source-ness.7 For this discussion, we simplify the question to whether making source code freely available for security review, and potentially repair, is more likely to improve or degrade system security. Revealing a system’s source code is not just a technical act. It has economic and other effects that we must consider. Defender’s view: Closed doors In restricting the release of source code, producers require consumers not only to blindly accept that code but also to trust the compiler employed to create it. Consumers thereby forfeit many abilities to enhance the final executable’s security. Ken Thompson demonstrated how to create a September/October 2001
IEEE SOFTWARE
57
compiler whose subverted code you cannot find by reviewing its source code.8 With such a compiler, even a well-intentioned developer would unwittingly produce flawed executables. Having the source code available creates opportunities for defending against both accidental and malicious faults. Although an open source system user is vulnerable to the tools he or she uses to generate the executable, the choice of tools is the user’s. If caution dictates, he or she can introduce variability by, for example, using different compilers to generate the executable. Moreover, automated tools can scan source code to identify potentially dangerous constructions,9 which can then be modified manually or, in some cases, automatically. Compilers can also help strengthen a system’s security without modifying source code. For example, the Immunix Stackguard compiler adds “canaries” that defeat many buffer overflow attacks.10 This technique adds the canaries into the executable code without modifying the source code, but it requires access to the source code. Similarly, compilers that randomize their output in various ways11 can defeat other attacks that exploit specific, otherwise predictable details of compiled programs. These approaches can remove vulnerabilities at a much lower cost than manual source code modification, but they cannot work unless the source code is available. Without access to the source, defenders cannot invest in having humans review their source code, which is still crucial to investigating specific hypotheses about back
Java Obfuscation/ Decompilation URLs C.Wang, A Security Architecture for Survivability Mechanisms, PhD dissertation, Univ. of Virginia, Dept. of Computer Science, Oct. 2000. Includes survey (pp. 178–180) of code obfuscation techniques as well as original research: ftp:// ftpcs.virginia.edu/pub/dissertations/2001-01.pdf. Condensity, a commercial product for obfuscating Java byte code: www.condensity. com/support/whitepaper.html. List of Java decompilers: www.meurrens. org/ip-Links/Java/CodeEngineering.
58
IEEE SOFTWARE
September/October 2001
doors, race conditions, and other subtle flaws that continue to plague our systems. Attacker’s view: Open opportunities? Providing public access to source code means the attacker has access to it as well. Relative to cryptographic algorithms, the source code for, say, an operating system is likely to be large and complex and will almost certainly contain some exploitable flaws. On the other hand, the attacker will in any case have access to the object code and, with suitable resources, can probably reconstruct any portion of it. Or, the attacker could obtain a copy of the source illicitly, if he or she is well funded or has appropriate connections to the developer. Closed source is a mechanism of commerce, not security, and it relies on law enforcement to prevent abuses. Compilers are not designed to provide cryptographic protection for source code, as evidenced by the market in tools that try to obfuscate Java bytecode as well as decompilers to defeat the obfuscators (see the sidebar). Thus the difference between open and closed source models might not be so great as imagined, serving primarily to deter those who are basically law-abiding or whose resources are limited. A second factor, considered in more detail later, is whether open or closed source systems are more likely to have known but unpatched vulnerabilities at any particular time. The recent Digital Millennium Copyright Act and UCITA legislation seem designed to discourage law-abiding citizens from reconstructing source code to improve its security but are unlikely to deter those with baser motives. Although we have primarily limited our concern to source code that is available for open review (or not), in some open source environments the attacker might have the opportunity to corrupt the code base by submitting subtly sabotaged programs to be included in the public system. Of course, we do not lack for examples in which the authorized employees of prominent software developers have inserted trapdoors and Easter eggs in systems without management’s apparent knowledge or consent. Economics: Who pays the piper? According to the old adage, he who pays the piper calls the tune. But, in the market
for closed source operating system software, the payers are many and small, and the pipers are few and large—the payers dance to the piper’s tune. Closed source preserves the producer’s economic interest and can allow recovery of development costs through product sales. Although certain licensing arrangements afford source release in a protected manner, it is commonly believed that restricting it helps preserve the intellectual property contained in an executable. Once the initial version of the product has saturated its market, the producer’s interest tends to shift to generating upgrades. Because software isn’t consumed or worn out by repeated use, the basis for future sales depends on producing new releases. Security is difficult to market in this process because, although features are visible, security functions tend to be invisible during normal operations and only visible when security trouble occurs. Anyone other than the producer who wants to invest in improving a closed source product’s security will have a hard time doing so without access to the source code. Some might be willing to invest in the producer and trust that their investments will go toward the product’s security. Justifying such investments is hard because the benefits accrue to the single producer, whose incentive is to keep producing upgrades, which in turn tend to require renewed investment in security. The economics of the open source model seems mystical—through the efforts of volunteers, a stable product appears and grows. How do these people eat? Yet, this market’s vitality is undeniable. Part of the answer is that the expertise individuals and companies develop in the open source product is salable, even if the product itself is freely available. Another part might be that the security market is roughly evenly split between products and services, which has created a community of security professionals capable of assisting the development and maintenance of open source products.12 Metrics and models: Describing the elephant It would help to have empirical support for this discussion to a greater extent than we do. To decide objectively whether open
source will improve system security, we need to at least rank order the security of different systems (or of the same system, under open and closed source conditions). But, security is a little like the elephant of Indian lore—the appropriateness of a measure depends on the viewer’s perspective. What kinds of measures would be appropriate? A source of reliability—if not security— metrics are the “fuzz” tests of Unix utilities and services performed at the University of Wisconsin first in 1990 and again in 1995. The testers concluded that the GNU and Linux utilities had significantly lower failure rates than did the commercial Unix systems.13 Development of proper metrics for system security is a vast topic—often addressed but rarely with satisfying results—but it exceeds this brief article’s scope. Nevertheless, SecurityPortal has proposed a simple and plausible metric relevant to the case at hand. Arguably, systems are most vulnerable during the period after a security flaw becomes known to attackers and before the flaw is removed (such as by distributing a patch). So, one measure of system security might be this interval’s average length. Data measuring this interval are hard to come by, but an effort to characterize an open source system (Red Hat Linux) and two closed source systems (Sun and Microsoft) based on published security advisories is available at www. securityportal.com.14 The results on SecurityPortal’s Web site show an average number of days of vulnerability for Red Hat Linux as 11.2 with standard deviation 17.5, based on 31 advisories. For Microsoft, results show an average of 16.1 days with a standard deviation of 27.7, based on 61 advisories. Sun shows an average of almost 90 days, but only eight advisories were published during the year, making statistical inference questionable. Moreover, SecurityPortal concluded,
Security is a little like the elephant of Indian lore—the appropriateness of a measure depends on the viewer’s perspective. What kind of measures would be appropriate?
Red Hat could have … cut their turn around time in half, had they only been more attentive to their own community. There were instances when software had already been patched by the author, but Red Hat was slow in creating RPM distribution files and issuing advisories.14
Open source advocates should not be surprised to find that the proprietary, centralSeptember/October 2001
IEEE SOFTWARE
59
E [t ]
1 10λv
1 Np λp
1 Nv λv
10
Figure 1. Expected time to find next security flaw (E[t]) versus number of volunteer reviewers in open source community (Nv).
100 Np λp = Nv λv
1,000
10,000
Nv
ized distribution system for the open source software seems to double the delay in fielding patches. Many caveats go with this kind of data, including ■ ■
■ ■
not all flaws are equally bad; finding many flaws but repairing them quickly might not be better than finding few and taking longer to fix them; normalizing results from different operating systems might affect results; and the effectiveness of putative fixes might differ.
Nevertheless, making such measurements available on a routine basis could go far in helping consumers make better-informed purchasing decisions. Another approach is to look at a simplified statistical model of how the open and closed source review processes seem to work. In the closed source case, suppose there is a relatively fixed size set of paid source code reviewers. They are presumably knowledgeable and motivated. In the open source case, there is a more heterogeneous group of volunteers. We propose, with some justification from previous models15 and experiments,16 to use a Poisson process to describe the rate at which each individual finds security flaws. To account for presumed differences in motivation and expertise, we choose λp to represent the rate at which a paid reviewer finds flaws and λv to characterize the rate for volunteer reviewers. If the number of paid reviewers is Np, 60
IEEE SOFTWARE
September/October 2001
then the expected time for the paid group to find the next flaw is simply 1/(Np λp), and the rate for the volunteer group is similarly 1/(Nv λv). This model does not address the different coefficients of efficiency for group collaborations. Claims of advantages in emergent behavior weigh against claims of advantages of central control, however, and we have not yet found any hard data on which to base these coefficients. Moreover, this model doesn’t adjust for the rate decreases that might be associated with depletion of vulnerabilities. However, this model should be adequate for at least beginning to provide structure to some of the arguments on both sides. Figure 1 gives an idea of the potential effect of a larger group of security reviewers, even though they might be, on average, less expert than paid reviewers. This simple analysis highlights that past the point where Nv λv = Np λp, the open source approach detects errors more quickly. There’s a similar effect when groups of students review security requirements.17
T
oday, it seems that all software has holes. We can draw four additional conclusions from this discussion. First, access to source code lets users improve system security—if they have the capability and resources to do so. Second, limited tests indicate that for some cases, open source life cycles produce systems that are less vulnerable to nonmalicious faults. Third, a survey of three operating systems indicates that one open source operating system experienced less exposure in the form of known but unpatched vulnerabilities over a 12-month period than was experienced by either of two proprietary counterparts. Last, closed and proprietary system development models face disincentives toward fielding and supporting more secure systems as long as less secure systems are more profitable. Notwithstanding these conclusions, arguments in this important matter are in their formative stages and in dire need of metrics that can reflect security delivered to the customer. More caveats are also necessary. There is little evidence that people (in particular, experienced security experts) review source code for the fun of it. Opening the source code creates the opportunity for individuals
or groups to understand how it works and what security it provides, but it cannot guarantee that such reviews will occur. Some vendors of closed source products have regulated internal processes for reviewing and testing software. Many will make their source code available to qualified reviewers under nondisclosure agreements. There is, in any case, no guarantee that human review will find any particular security flaw in a system of millions of lines of code—especially not if there is a serious effort to hide that flaw. Review will always be most effective on small, well-structured pieces of code. Still, closed development models force consumers to trust the source code and review process, the intentions and capabilities of developers to build safe systems, and the developer’s compiler. Such models also promise to deliver maintenance in a timely and effective manner, but consumers must also forfeit opportunities for improving the security of their systems.
8. K. Thompson, “Reflections on Trusting Trust,” Comm. ACM, vol. 27, no. 8, Aug. 1984, pp. 761–763. 9. J. Viega et al., A Static Vulnerability Scanner for C and C++ Code, Cigital, Dulles, Va., 2000; www.cigital.com/ papers/download/its4.pdf (current 11 July 2001). 10. C. Cowan, “Automatic Detection and Prevention of Buffer-Overflow Attacks,” Proc. 7th Usenix Security Symp., Usenix, San Diego, 1998, pp. 63–78. 11. S. Forrest, A. Somayaji, and D. Ackley, “Building Diverse Computer Systems,” Proc. HotOS VI, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 67–72. 12. C.J. Wilson, “Graphiti,” Red Herring, no. 77, Apr. 2000, pp. 68–70. 13. B.P. Miller et al., Fuzz Revisited: A Reexamination of the Reliability of Unix Utilities and Services, tech. report, Computer Science Dept., Univ. of Wisconsin, Madison, 1995, www.cs.wisc.edu/~bart/fuzz (current 11 July 2001). 14. J. Reavis, “Linux vs. Microsoft: Who Solves Security Problems Faster?” SecurityPortal, 17 Jan. 2000, www. securityportal.com/cover/coverstory20000117.html (current 11 July 2001). 15. B. Littlewood et al., “Towards Operational Measures of Computer Security,” J. Computer Security, vol. 2, no. 3, Apr. 1993, pp. 211–229. 16. E. Jonsson and T. Olovsson, “A Quantitative Model of the Security Intrusion Process Based on Attacker Behavior,” IEEE Trans. Software Eng., vol. SE-23, Apr. 1997, pp. 235–245. 17. R.J. Anderson, “How to Cheat at the Lottery,” Proc. 15th Ann. Computer Security Applications Conf., IEEE CS Press, Los Alamitos, Calif., 1999, pp. xix–xxvii; www.cl.cam.ac.uk/~rja14/lottery/lottery.html (current 11 July 2001).
Review will always be most effective on small, wellstructured pieces of code.
Acknowledgments This article has benefited from the comments of anonymous referees. In addition, we thank Rick Murphy of Mitretek Systems and Steve Lipner and Sekar Chandersekaran of Microsoft, who provided helpful reviews of a draft of this article, but who do not necessarily concur with its conclusions!
For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.
About the Authors Disclaimer The views expressed in this article are those of the authors and not the Department of Defense or Mitretek Systems.
References 1. US Government Working Group on Electronic Commerce, First Annual Report, US Dept. of Commerce, Washington, DC, Nov. 1988, pp. 1–2, www.doc.gov/ ecommerce/E-comm.pdf (current 11 July 2001). 2. L. Garber, “Denial of Service Attacks Rip the Internet,” Computer, vol. 33, no. 4, Apr. 2000, pp. 12–17. 3. A.C. Lear, “Love Hurts: New E-Mail Worm Afflicts Millions,” Computer, vol. 33, no. 6, June 2000, pp. 22–24. 4. Computer Science and Telecommunications Board, Trust in Cyberspace, National Research Council, Washington, D.C., 1999, pp. 185, 189. 5. A. Kerkhoffs, “La Cryptographie Militaire,” J. des Sciences Militaires, vol. 9, Jan. 1883, pp. 5–38. 6. D. Wagner, B. Schneier, and J. Kelsey, “Cryptanalysis of the Cellular Message Encryption Algorithm,” Counterpane Systems, Mar. 1997, www.counterpane.com/ cmea.pdf (current 11 July 2001). 7. V. Valloppillil, Open Source Software: A (New?) Development Methodology, Microsoft Corp., Redmond, Wash., 11 Aug. 1998; www.opensource.org/halloween/ halloween1.html (current 11 July 2001).
Brian Witten is a program manager at DARPA, where he has directed several programs
including the Information Assurance, Autonomic Information Assurance, Partners in Experimentation, Survivable Wired, and Wireless Infrastructure for the Military programs. He received his BS in electrical and computer engineering from the University of Colorado. He is a member of the IEEE. Contact him at DARPA/ATOO, 3701 North Fairfax Dr., Arlington, VA 22203-1714;
[email protected].
Carl Landwehr is a senior fellow at Mitretek Systems, working closely with the director
of Mitretek’s Information Security and Privacy Center. He provides senior technical support to several program managers within DARPA’s Information Assurance and Survivability and Third Generation Security Programs and assists the Federal Infosec Research Council. He received his BS from Yale University and an MS and PhD in computer and communication sciences from the University of Michigan. He is a senior member of the IEEE and has served as an associate editor of IEEE Transactions on Software Engineering. Contact him at Mitretek Systems, MS Z285, 7525 Colshire Dr., McLean, VA 22102;
[email protected]. Michael Caloyannides is a senior fellow at Mitretek Systems concentrating on com-
puter forensics, encryption, and privacy technical problems. His interests include information security and telecommunications. He received a BSc and an MSc in electrical engineering and a PhD in electrical engineering, applied mathematics, and philosophy, all from Caltech. He holds one US patent on high-speed modems and is a senior member of the IEEE. Contact him at Mitretek Systems, MS Z285, 7525 Colshire Dr., McLean, VA 22102;
[email protected].
September/October 2001
IEEE SOFTWARE
61
feature architecture
Visualizing and Analyzing Software Infrastructures Adam Buchsbaum, Yih-Farn Chen, Huale Huang, Eleftherios Koutsofios, John Mocenigo, and Anne Rogers, AT&T Labs—Research Michael Jankowsky, AT&T Business Services Spiros Mancoridis, Drexel University
Large corporations typically run complex infrastructures involving hundreds or thousands of software systems. As marketplaces change, these infrastructures must be redesigned. The Enterprise Navigator system lets architects visualize and analyze the system interconnections of selected products and services. 62
IEEE SOFTWARE
ompanies frequently need to redesign their software infrastructures in response to marketplace changes, but they must do so carefully so that the new architecture will not disrupt existing operations or increase operating costs unnecessarily. To support these goals, system architects have long recognized the need to build a repository of information about all of their company’s systems and their interfaces. Using this information, architects create system interface diagrams
C
to help them study the existing architecture. At AT&T, these diagrams used to be created and updated manually, published annually, and distributed throughout the business units. The process of manually drawing system interface diagrams is tedious and error-prone: a simple diagram showing all the interconnections to a single system could take 30 minutes or more to draw, and the diagram often becomes obsolete before it is published. Moreover, it is not easy, through the draw-andpublish mechanism, to get a system interface diagram in real time based on an ad hoc query because the need for the diagram might not have been anticipated. For example, a manager in charge of reengineering billing operations across the company might want to generate diagrams that show the systems involved in bill calculations for more than 200 prod-
September/October 2001
ucts and service offerings. Since these queries are unexpected and therefore the diagrams not published, manually producing all these diagrams could take a long time. This situation would likely delay the reengineering decision process. We built a system called Enterprise Navigator to let users make ad hoc queries about an enterprise software architecture and then automatically generate the corresponding system interface diagram in real time on the Web. Figure 1 shows a typical diagram EN generated for a particular ad hoc query. Each node represents a system, and each link represents an interface between the two connected systems. With EN, users can ■
study a system architecture’s evolution over time, 0740-7459/01/$10.00 © 2001 IEEE
■ ■
find substructures embedded in complex diagrams, and determine which systems dominate information flows.
EN runs as a collection of stand-alone tools using a set of database visualization tools, called Ciao,1 or as an integrated Web service. This article focuses on the latter. Our work builds on established research in source code analysis, graph drawing, and reverse engineering. Acacia2 and Chava3 are examples of reverse engineering tools for analyzing C, C++, and Java programs, respectively. These systems store source code analysis results in an entity–relationship database so that users can extract software structure information through ad hoc queries without relying on customized parsers. Software engineers often use visualization tools that employ automatic graph-drawing algorithms4,5 (see the “Graph Drawing” sidebar) to help them comprehend the results of their analyses. Many reverse engineering techniques, including techniques for software clustering6 and dominator analysis,7 have underpinnings based on optimization theory, statistics, and graph theory. To date, these techniques and tools have been applied mainly to individual software systems written in a variety of programming languages. The work described here takes the next step by showing how to model, query, analyze, and visualize the entire software infrastructure of a large enterprise such as AT&T when the infrastructure information is available in a database. Architecture Figure 2 presents a high-level view of EN’s architecture. You interact with EN by means of a Java applet (as shown in Figure 3). The applet establishes a two-way socket connection to a Java application running on a server. The Java application communicates with a database of software infrastructure specifics via a JDBC (Java Database Connectivity) connection (http://java. sun.com/products/jdbc). The applet passes your visualization requests to the server application, which formulates an SQL query to retrieve the necessary information from the database. The server application then constructs a system interface graph and opens a connection to a graph layout pro-
HYRUVK
Reference data
WMVL
EVH JYVT
CKSZ
NIFKS QAMWHV Customer data
MQWZQ
ERR
QIL
WKN
DJU
PLHP
VUS
TDMI TEPP
VEFCWN IHK
COC
ETTE NLAF
HGZ XTRYR CSW
XNTT
HLJ
UGNLM UHJ
LWBQH PROBM
AMK
WIWMP DQZ
Account info
XGBW
ZRC QKF EANL XSI VKBT
IPL
SIPEKN QAXD KPET
Figure 1. A typical system interface diagram generated by Enterprise Navigator. To protect proprietary information, we have replaced real system names with randomly generated ones and omitted certain interface names. September/October 2001
IEEE SOFTWARE
63
Graph Drawing Graph drawing addresses the problem of visualizing structural information by constructing geometric representations of abstract graphs and networks. The automatic generation of graph drawings has important applications in key technologies such as database design, software engineering, VLSI, network design, and visual interfaces in other domains. In any setting, effective visualization should reveal interesting characteristics of data while avoiding distractions and irrelevancy. Most objective properties for graph layout algorithms correspond to a few simple visual principles:
The glue holding EN together is the Java application on the server machine. The EN components linked by the server application include
•
■
• •
•
Favor recognition and readability of individual objects. Identifying objects should be easy—for example, by giving them legible text labels or by choosing certain shapes, colors, or styles. This principle implies efficient use of available layout area. Avoid aliases, including edge crossings, sharp bends, and the intersection of unrelated objects. Control eye movement to help users trace edges and paths in diagrams and find sources and sinks. Short and straight or at least monotonic edges are good. Reveal patterns by emphasizing symmetry, parallelism, and regularity. Layouts having these characteristics are often easier to read and memorize than ones lacking such organization.
Three families of graph layout algorithms have been particularly successful: hierarchical layouts of trees and directed acyclic graphs, virtual physical layouts of undirected graphs (for example, spring model layouts), and orthogonal grid layouts of planarized graphs. Graphviz, a set of tools for Unix, Windows, and OSX, has components for the first two families of layouts just mentioned. Source code and binary executables for common platforms are available at www.research.att.com/ sw/tools/graphviz. One of the Graphviz tools, called Dot, was used to create most of Enterprise Navigator’s layouts. Grappa is a Java graph-drawing package that simplifies the inclusion of graph display and manipulation capabilities in Java applications and applets. Grappa does not have graph layout capabilities built into it, but integrates easily with tools such as Dot. Moreover, because Grappa stores graph structure information, it simplifies the coding of custom layout algorithms in Java. Grappa also enables questions about the graph structure to be answered easily (for instance, finding nodes that are directly connected to a given node). Grappa is available from the Graphviz Web site. It provided the interactive graph displays we used in Enterprise Navigator.
■
■ ■
System profile database SPDB, the underlying database supplying EN, contains key information about all system entities and interfaces within the enterprise of interest. In AT&T’s SPDB, EN primarily uses these three tables: ■
■
■
gram to position the graph’s elements automatically. When the server application finishes the layout, it sends the graph using Java object serialization (http://java. s u n .com/products/jdk/1.1/docs/guide/ serialization) to the applet, which creates a visualization window and displays the requested system interface diagram. You can select nodes and edges to view their attributes, alter the graph display based on those attributes, and return the graph to the server for additional processing by graph clustering or graph dominator algorithms. 64
IEEE SOFTWARE
September/October 2001
System Profile Database, an infrastructure database; Grappa, a graph manipulation and display tool; Bunch, a graph clustering tool; and Dominator, a graph dominator tool.
The system table contains basic information about each system in the entire business enterprise. It also includes such entities as work centers, network elements, databases, and Web sites as well as external systems that participate in data flows to or from other systems within the enterprise. Information about a system can include system type, name, owner, and status; business unit owner; phase-in and phase-out dates; and its parent system. The interface table gives information about flows between systems and other entities described in the system table. This information can include interface type, owner, and status; the “from” and “to” systems; the business unit owner; and transmission media, frequency, and mode. The mapping table links other entities such as products and services or business functions to systems in the system table.
In constructing a system interface diagram, EN constructs a graph of the components, setting the systems as nodes and the interfaces as edges. It stores additional pieces of information about those entities as attributes to the graph elements. Although this article focuses on the SPDB used at AT&T, you can easily modify EN to work with other databases as long as similar information on systems, interfaces, and mappings is available.
Java client display applet
Graph manipulation and display A Java package called Grappa8 handles graph manipulation and display in EN in both the client applet and the server application. It can also build and manipulate graphs independent of display considerations. Although Grappa does not contain layout algorithms, it has methods for simplifying communication with graph layout programs, particularly the Dot layout program.5 Mouse interactions with the nodes and edges that Grappa displays can trigger additional actions, including initiating a new query specific to a selected element and viewing or storing additional data about an element. You can study architecture evolution over time because Grappa colors systems according to a reference date (see the Status Reference Date field in Figure 3) and each system’s status at that date. Grappa also colors systems that have been phased out of service and those yet to be introduced differently from active systems. Visualizing these changes helps architects determine the effects of business reengineering on various products and services. Grappa is designed to be extensible—it allows additional graph manipulation methods to be integrated. Without recoding anything, we integrated the next two tools described—Bunch and Dominator—into EN through the server application, which acts as a bridge between those applications and the display applet. Graph clustering EN uses the Bunch tool (available at http://serg.mcs.drexel.edu/bunch) to cluster components in a system interface diagram.6 Clustering is particularly useful to system architects who are trying to understand large and complex software infrastructures from their graph representations. Bunch accepts a graph as input and outputs it partitioned into a set of nonoverlapping clusters of nodes. Using Grappa, EN can show this partitioned system interface diagram as a graph with node clusters enclosed in rectangles. The Java server application communicates with Bunch through the latter’s application programming interface. Bunch attempts to partition the software graph so that system entities (nodes) in the same cluster are more closely related and system entities in different clusters are rela-
Socket connection
Threaded Java server application
JDBC connection
Pipes
Software infrastructure database
Graph manipulation programs
Figure 2. The Enterprise Navigator architecture.
Figure 3. The Enterprise Navigator query interface.
tively independent of each other. Creating a meaningful partition, however, is difficult because the number of possible partitions is large, even for a small graph. Also, small differences between two partitions can yield very different results. As an example, consider Figure 4a, which presents a graph with a small number of entities and relationships. The two partitions of the graph shown in Figures 4b and 4c are similar, with only two nodes (M3 and M4) swapped. Despite this seemingly small difference, the partition defined in Figure 4c better captures the graph’s high-level structure because it groups the more interdependent nodes. Bunch treats graph clustering as an optimization problem, in which the goal is to maximize an objective function that favors creating clusters with a high degree of intraSeptember/October 2001
IEEE SOFTWARE
65
Cluster 1 M1
M2
M1
Cluster 1 M2
M1
M2
M3
M3 M4
M4
M5 M3
M4
M5
M6 M5
Figures 4. (a) A small system interface graph; (b) a partition of that graph; (c) a better partition.
B
A
(b)
C
D
B
A
(a)
C
D
(b)
E
Figure 5. (a) A system interface graph and (b) its dominator tree.
edges, the edges between nodes of the same cluster. The same function penalizes pairs of clusters that exhibit a high degree of interedges, the edges between nodes that belong to different clusters. A large number of interedges indicates poor partitioning, which complicates software maintenance because changes to a software system might affect other systems in the software infrastructure. A low degree of interedges, indicating that the individual clusters are largely independent, is a desirable system architecture trait. Changes applied to such a system are likely to be localized to its cluster, thereby reducing the likelihood of introducing errors into other systems. Graph dominators EN uses the Dominator tool to determine a graph’s dominators. In a graph with a selected root node R, node X dominates node Y if every path from R to Y goes through X. When EN generates a system interface diagram, each interface link between systems represents an information flow. For example, in Figure 5a, the link from A to B means that information flows from A to B. Figure 5b shows the dominator tree derived from IEEE SOFTWARE
M6
Cluster 2
(a)
E
66
M6
September/October 2001
Cluster 2
(c)
that infrastructure graph. A link in the dominator tree between two nodes means that any information flow from the root node (selected from the original graph) to the target node must flow through the source of the link. For example, if A is the source node and there is a dominator link from B to C, then there is no way to get from A to C without going through B. In other words, if B were to be removed, C would be cut off from any information derived by A. On the other hand, consider the links from D to E and A to E in Figure 5a. The direct link from A to E provides a way to get to E from A without going through D; therefore, D is not a dominator of E. In fact, A is the sole dominator of E. The root node of a dominator tree represents the system where the information flow logically begins. In cases where a single root is not available, the user can manually choose multiple roots from the graph to act together as the global information source. Our tool uses the Dominators Algorithm devised by Thomas Lengauer and Robert Endre Tarjan.7 Adam Buchsbaum and his colleagues9 provide a history of dominator algorithms as well as theoretical improvements to the Lengauer-Tarjan algorithm. Here are two sample applications for dominator trees in EN: ■
Performing a sanity check on system evolution. Removing a system disconnects all systems dominated by it (and by extension systems dominated by those, and so on) from the original information source. Therefore, systems that are scheduled to retire should not dominate any systems that are not retiring. EN lets you check the situation visually by uniquely coloring systems to be retired on a dominator diagram.
(a)
(b)
(c)
Figure 6. Lists of (a) business units and (b) products and services under iHome; (c) systems under iHome that are involved with the iPhone service. ■
Qualitatively assessing the dependency complexity. Flat dominator trees—that is, trees in which many systems are directly connected to root nodes—can represent highly interconnected systems, because there are few systems whose removal would disconnect the graph. Such high interconnectivity can be good due to replicated resources for system dependability, or bad due to unnecessary or duplicated information flows. On the other hand, deep dominator trees—that is, trees in which many systems are far from the root nodes—can represent less connected systems because many systems critically depend on many others for connectivity from the root. Again, this may be good or bad, depending on the application.
inary names to protect proprietary information.) If you select iPhone and browse the systems under that service, the list of systems appears (see Figure 6c). You can refine the query further by setting the values for Business Process or Business Function, and so on. You can pick any system from the list to generate a system interface diagram. Figure 7 shows a typical diagram for the fictitious HLJ system. The picture clearly shows that
SpdbViewer/Enterprise Navigator Viewer
appl appl
iNet
iLabs
QAMWHV
CSW appl
iBusiness
Account info appl
iHome IPL
VUS Customer
data Case study appl iBusiness appl iNet Figure 3 shows the query interface preTEPP appl iLabs VKBT sented by the Java applet on the client side. HLJ The interface lets you select systems from difappl iLabs ferent business units, owners (managers), WMVL products and services, business functions, appl iBusiness Customer and so on before generating a system interappl iNet COC data face diagram. The parameters you select in HYRUVK Active system some categories constrain what choices are available in other categories. For example, Insufficient appl iNet appl iBusiness information when you click the Browse button next to the Reference HGZ TDMI about system data Customer Business Unit category, the list of all business data units appears (see Figure 6a). If you choose Retired system appl iLabs iHome and then click the Browse button next NIFKS appl iLabs Customer to the Product/Service item in Figure 3, FigKPET data ure 6b appears and shows all products and services under the iHome business unit only. (All business unit names, product names, and Figure 7. System interface diagram of HLJ (a fictitious system system names have been replaced with imag- name).
September/October 2001
IEEE SOFTWARE
67
20 QIL IPL
MQWZQ
AMK
UGNLM
CSW
XIRYR
LWBQH
NLAF 10
HGZ
XIRYR
FVH
EIIE
UHJ
QIL
IEPP
NLAF
WTWMP
DJU
IHK
MQWZQ
WMVL
PROBM
VEBI
CKSZ
UGNLM
XSI
XNII
VEFCWN
DQZ
NIFKS
QAXD
WEN
VUS
PLHP
PLHP
XGBW
DQZ
EIIE
NIFKS
PROBM
11 VUS
EVH
23
IHK
VEPCWN
LWBQH
LUU
JYVI
CKSZ
WIWMP
IPL
HYRUVK
5
UHJ AMK HGZ IEPP CSW
HLJ
QAMWHV
QAMWHV
KPEI
WEN
QAXD
XSI
SIPEEN
ERR HLJ
EANL
EANL HYRUVK
ZRC
QKF
ERR
COC
QEF
VEBI
IDMI
SIPEKN
XGBW
KPEI
COC
ZRC
XNII
IYVI IDMI
25 WMVL
(a)
(b)
Figure 8. Two diagrams generated from the system interface diagram shown in Figure 1: (a) the clustering diagram and (b) the dominator diagram.
68
IEEE SOFTWARE
September/October 2001
HLJ collects reference data and then distributes accounting and customer data (among other things) to other systems. In a system interface diagram, you can color each node according to different system attributes. In Figure 7, we use the status shading scheme, in which a yellow node indicates an active system and a gray node indicates insufficient information about that system. Using the reference date (1999-1123) shown in the query interface page (Figure 3), a blue node indicates that this system is planned and will be introduced soon, and a purple node indicates that this system has retired. Because it is important to know how a business reengineering plan affects the system architecture, the architect can use different reference dates to see how the system interface diagram of a particular product or service has evolved or will change. Another shading scheme we used in the past was Y2K shading, in which nodes were colored according to whether they were Y2K compliant. This scheme let us quickly verify the Y2K readiness of a product or service (assuming the Y2K compliance data was available). The control bar in the bottom of the window shown in Figure 7 provides several other features. Hitting the “Convert to …” button converts the current interface diagram to various graphics and database formats so that other tools can import it easily. When you click on a system node, the default action is to generate another system interface diagram centered on that node. If, however, you check the Page Link checkbox, a Web browser is invoked to bring up a Web page showing all the system details (software, hardware, contacts, and so on) extracted from SPDB. Instead of focusing on a single system, you can choose to generate a system interface diagram for all systems involved in a product or service (or any systems that satisfy a particular query). Figure 1 shows a typical system interface diagram of a particular service. If you would like to discover clusters of systems embedded in a complex structure, you can invoke the Bunch tool to convert Figure 1 to a clustering diagram, shown in Figure 8a. The diagram shows two clusters with a similar architectural pattern: each has a data hub that receives data from several sources and distributes
processed data to many other destinations. Identifying clusters from complex system interface diagrams without the help of automated tools is not always easy. If you want to discover the dominating information flows starting from a particular system, you can select the node and perform a dominator analysis. Alternatively, EN can perform topological sorting to rank the nodes and add a virtual root to all the toplevel nodes before starting the dominator analysis. Figure 8b shows a dominator tree created with this method for the system interface diagram of Figure 1. Clearly, any attempt to remove the HLJ system will affect all the other systems that it dominates, because any information flows to the product or service represented by Figure 1 must go through HLJ. Such information can help system architects plan their reengineering efforts.
REACH HIGHER Advancing in the IEEE Computer Society can elevate your standing in the profession. Application to Senior-grade membership recognizes ✔ ten years or more of professional expertise Nomination to Fellow-grade membership recognizes ✔ exemplary accomplishments in computer engineering
GIVE YOUR CAREER A BOOST UPGRADE YOUR MEMBERSHIP computer.org/join/grades.htm September/October 2001
IEEE SOFTWARE
69
About the Authors Adam Buchsbaum is a principal technical staff member in the Network Services Research Lab at AT&T Labs. He specializes in the design and analysis of algorithms and data structures. His research interests also include graph problems, massive data sets, and combinatorics. He received his PhD in computer science from Princeton University. Contact him at Room E203, Building 103, 180 Park Ave., PO Box 971, Florham Park, NJ 07932-0971; alb@ research.att.com; www.research.att.com/info/alb.
Yih-Farn Chen is a technology consultant in the Software Systems Research Department, Network Services Research Center, at AT&T Labs. His research interests include mobile computing, software engineering, and the Web. His recent work includes a mobile service platform, Web-site tracking services, personal proxy servers, a reverse engineering portal, and Enterprise Navigator. He was a vice chair of the WWW10 and WWW11 conferences. He received his PhD in computer science from the University of California, Berkeley. Contact him at Room E219, Building 103, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971;
[email protected]; www.research.att.com/info/chen. Huale Huang is a senior technical staff member in the Information Sciences Research
Lab at AT&T Labs, where he is working on mobile communications. He is interested in Web and mobile computing. He received an MS in mathematics from the Chinese Academy of Sciences, an MS in computer science from the New Jersey Institute of Technology, and a PhD in mathematics from City University of New York. Contact him at Room C258, Building 103, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971;
[email protected].
Eleftherios Koutsofios is a technology consultant at AT&T. His research interests include interactive techniques, display and user interaction technologies, and information visualization. He has worked on graph layouts, programmable graphics editors, tools for visualizing large data sets, and program animation. He received his PhD in computer science from Princeton University. Contact him at Room E223, Building 103, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971;
[email protected]; www.research.att.com/info/ek.
E
nterprise Navigator’s usefulness depends heavily on the underlying data’s timeliness. We are working on facilities that would let architects update architecture data directly from graphs, thus eliminating the delays associated with the old data collection process. We also plan to add node and link operators that will let users examine in more detail the corresponding systems and data transmitted on a link. With the addition of operational-data-like system availability to the database, we might be able to perform end-to-end enterprise architecture simulations. Finally, we welcome the opportunity to apply the EN concept to other forms of enterprise data. Most of the tools and libraries we use are already available on the Web, and we plan to provide a reusable package that would simplify their integration with other infrastructure databases. For more information on EN and pointers to the software components it uses, visit www.research.att.com/~ciao/en.
References John Mocenigo is a principal technical staff member with the Network Services Research Lab at AT&T Labs. He enjoys problem solving and writing code for graph visualization and database transaction logging. Currently he is working on a scripting language called Yoix that runs under Java (www.research.att.com/sw/tools/yoix). He received his PhD in electrical engineering–control theory from Brown University. Contact him at Room D225, Building 103, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971;
[email protected]; www. research.att.com/info/john.
Anne Rogers is a technology consultant in the Network Services Research Lab at AT&T
Labs. She specializes in programming languages, compilers, and systems for processing large volumes of data. She received a BS from Carnegie Mellon University and an MS and a PhD from Cornell University. Contact her at Room E205, Building 103, 180 Park Ave, PO Box 971, Florham Park, NJ 07932-0971;
[email protected]; www.research.att.com/info/amr.
Michael Jankowsky is a senior technical staff member in the Enterprise IT Security
Group within AT&T Business Services. He specializes in database design and data management. He received a BS in mathematics from Montclair State University. Contact him at AT&T Business Services, Room E5-2A03, 200 Laurel Ave. South, Middletown, NJ 07748; jankowsky@ems. att.com.
Spiros Mancoridis is an associate professor in the Department of Mathematics and
Computer Science at Drexel University and founder and director of the Software Engineering Research Group there. His research involves reverse engineering of large software systems, program understanding, software testing, and software security. In 1998, he received a Career Award for young investigators from the US National Science Foundation. He received his PhD in computer science from the University of Toronto. Contact him at the Dept. of Math & CS, Drexel University, Philadelphia, PA 19104;
[email protected]; www.mcs.drexel.edu/ ~smancori.
70
IEEE SOFTWARE
September/October 2001
1. Y. Chen et al., “Ciao: A Graphical Navigator for Software and Document Repositories,” Proc. Int’l Conf. Software Maintenance, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 66–75. 2. Y. Chen, E.R. Gansner, and E. Koutsofios, “A C++ Data Model Supporting Reachability Analysis and Dead Code Detection,” IEEE Trans. Software Eng., vol. 24, no. 9, Sept. 1998, pp. 682–693. 3 J. Korn, Y. Chen, and E. Koutsofios, “Chava: Reverse Engineering and Tracking of Java Applets,” Proc. 6th Working Conf. Reverse Eng., IEEE CS Press, Los Alamitos, Calif., 1999, pp. 314–325. 4. G. Di Battista et al., Graph Drawing: Algorithms for the Visualization of Graphs, Prentice Hall, Upper Saddle River, N.J., 1999. 5. E.R. Gansner et al., “A Technique for Drawing Directed Graphs,” IEEE Trans. Software Eng., vol. 19, no. 3, Mar. 1993, pp. 214–230. 6. S. Mancoridis et al., “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures,” IEEE Proc. Int’l Conf. Software Maintenance (ICSM ’99), IEEE CS Press, Los Alamitos, Calif., 1999. 7. T. Lengauer and R.E. Tarjan, “A Fast Algorithm for Finding Dominators in a Flowgraph,” ACM Trans. Programming Languages and Systems, vol. 1, no. 1, 1979, pp. 121–141. 8. N. Barghouti, J. Mocenigo, and W. Lee, “Grappa: A Graph Package in Java,” Proc. 5th Int’l Symp. Graph Drawing, Springer-Verlag, Berlin, 1997, pp. 336–343. 9. A.L. Buchsbaum et al., “A New, Simpler Linear-Time Dominators Algorithm,” ACM Trans. Programming Languages and Systems, vol. 20, no. 6, Nov. 1998, pp. 1265–1296.
For more information on this or any other computing topic, please visit our Digital Library at http://computer/publications/dlib.
feature reuse
Design Reuse through Frameworks and Patterns Peter W. Fach, RWG
The approach to design reuse presented here integrates frameworks and design patterns even into the requirements engineering phase of the software development process, where they have proven useful in improving user participation. Moreover, this method lets us determine which units are suitable for reuse.
n large service organizations such as banks and insurance companies, the role of application software has changed tremendously, considering the evolution from data input terminals to today’s multimedia information and consulting instruments. Banks, for example, must offer their customers a broad range of services, from investment consulting to traveler’s checks. This places great demands on the usability of information systems—for instance, adaptability to task requirements, short response times, and ability to learn.
I
0740-7459/01/$10.00 © 2001 IEEE
Because of the trend for banks to offer all-in-one services at all branches, standard software solutions are increasingly in demand. However, as standardization increases, users increasingly demand individual workstation adaptations to meet their specific needs. Frameworks and design patterns are an answer to this and to similar—in principle mutually exclusive— standardization and flexibility requirements. RWG, a company providing software and computing services to a heterogeneous group of about 400 banks in southwest Germany, develops such frameworks. The banks have successfully used RWG’s application software Gebos, which is based on these frameworks, in more than 17,000 work places over the last 10 years. From the perspective of human–computer interaction, this article focuses not only on the numerous advantages of using frameworks in design reuse but also on the problems and obstacles that can arise.
Frameworks typically are developed in an evolutionary manner, maturing (consolidating) over a number of years. So, usability problems must be detected and corrected in the early stages of the life cycle, because corrections become increasingly costly as development progresses. Frameworks: A brief introduction The key concept in using frameworks is design reuse. In contrast to past approaches that applied the term reuse to individual software functions (such as sine), the objective of frameworks is to reuse complete domain-specific units—for instance, customer records, accounts, or security accounts. In other words, we try to preserve our existing development work, such as task analysis and domain class design, by creating a skeleton frame representing, for example, the implementation of an account and its interface components. Then, application programmers need only tailor the frame to September/October 2001
IEEE SOFTWARE
71
Figure 1. Developing stable and reusable frameworks through sedimentation. Implementation chunks in the business domain layer that are shared by at least two frameworks sediment into the application domain layer.
Business domain layer ... Loan processing Investment consulting Key: Framework
Application domain layer Account ... ...
the specifics of a particular application domain. To make the difference clear: In traditional approaches to software reuse, we create an application program using existing class libraries for database access, mathematical functions, and the user interface. In contrast, design reuse means that we adapt a prefabricated, fully developed hull—by subclassing, for instance. Thus, in frameworks, the program logic is already present. Creating layers through sedimentation When developing a framework, we first focus on a relatively small and clearly structured business domain, such as the handling of standing orders. In a team consisting of potential users (customers) and developers, we assess typical job activities such as task objects, task performance, and cooperative work. On the basis of the task models resulting from task analysis, we design domain frameworks (classes for standing orders, accounts, and so on). Potential users will be involved not only during task analysis but throughout the entire development process (for example, through reviews, prototyping, and so forth). This way of developing software is often referred to as a userparticipative approach, or simply as user participation. In my experience, the Gebos method 1–3 and similar procedures 4,5 that implement such an approach are key to successful framework development. After we ship the first small business domain and the users approve it, we gradually add software solutions for more elements of the application domain following the same method. These frameworks will overlap in 72
IEEE SOFTWARE
September/October 2001
Customer
part with existing ones but will also add new aspects. In simple terms, the “art” of creating frameworks consists of finding exactly those implementation chunks (in object-oriented jargon, operations and attributes) that you would otherwise have to design over and over again—for example, within each implementation of a new type of account. We then extract these chunks from each implementation and represent them as frameworks in the deeper of two layers, the application domain layer (see Figure 1). This process is called sedimentation. Taking up the example of an account again, the outcome of this procedure is that all classes in which the deposit and withdrawal of money are implemented settle into the ADL. Each specific type of account (such as checking or savings) is always connected to a specific type of business activity (such as loan processing or investment consulting) and has specific features particular to it—for example, checking credit limits is a specific feature of current accounts, and progressive interest rates are a specific feature of savings accounts. All the implementation chunks that deal with such tasks are subsumed under frameworks deposited in the business domain layer (see Figure 1).6 All frameworks in this layer must use the basic frameworks in the ADL, thus ensuring a high degree of reuse. As application development progresses, the sedimentation of implementation chunks decreases, until all the stable frameworks have sedimented in the ADL. (Note that sedimentation calls for an iterative product release schedule because only the objects that have proven to sufficiently address commonly oc-
curring tasks should sediment into the ADL.) This layer, then, represents all those task objects (accounts, customers, customer records, and so on) that constitute the application domain of a universal bank. Of course, framework developers must aim at accelerating sedimentation and must avoid anything that destabilizes mature frameworks.
the other hand, design patterns offer excellent opportunities for user participation in the evolutionary process of framework development, where they are instrumental in improving the prerequisites for a highly usable system. Following are two examples of design patterns that we have used with great success in the Gebos application system.
Intensive design reuse You cannot directly assess the objects deposited in the ADL using task analysis because, for instance, initially there is no such task object as an “account” but only saving accounts, current accounts, or credit accounts. Thus, we can obtain the sum of operations and attributes shared by all types of accounts only through the sedimentation process just described. Frameworks constituting the user interface develop the same way as frameworks for task objects. For instance, filling out a form is in principle similar to entering data (paying a particular amount of money) into an account, because you can work on both task objects— the form and the account—with an editor. The same principle applies here as described earlier: In the course of several development cycles, the functionalities that are common to all editors in the application system (such as highlighting erroneous values and the structure of feedback messages) settle into the ADL. This is how an editor framework that can handle all common task objects develops. In the same way, other interactive frameworks develop that support recurring forms of cooperation such as email mechanisms and electronic signatures. As Figure 1 shows, these frameworks are then assigned to the ADL as well.
The Tool–Material pattern The Tool–Material design pattern, which describes the interplay between application domain objects (such as an account) and objects for interaction purposes (such as an editor), is derived from the tool–material metaphor.1,2 This metaphor—namely, that materials can be worked on with tools— defines the interdependencies between two (technical) objects: A tool is produced for working on specific materials—for example, an editor tool might enable users to work on an account (the material, for example, for making deposits and withdrawals). Thus, a material and the user interface are not directly connected—rather, the tools themselves handle all interactions. Implementing a tool requires knowledge about the nature of relevant materials; implementing a material remains independent of specific tools. This means that when an application system’s software architecture is in accordance with the Tool–Material pattern, user interface changes will not affect the material frameworks contained in the business design layer or the ADL. However, this feature is only one side of the coin, because we can also use the tool–material metaphor as the guiding principle in structuring the user interface. Users of such an interface consciously handle the materials and selectively apply tools to them; for example, a customer search tool searches the customer register and returns a list of customers fulfilling particular search criteria. This structuring of the user interface helps users precisely describe occurring problems including the specific work context (for example, “the customer search tool cannot be used in a certain situation, because…”). It also lets software engineers classify, locate, and assess these requirements efficiently:7 Users and software engineers use the same terminology and even refer to the same entities. Thus, design patterns play a central role in supporting
Design patterns: Aids to facilitate user participation From a technical perspective, a framework is a collection of cooperating classes (for example in C++ or Java) that is used “from the outside” as a self-contained unit. However, the number of such classes can become quite large and cooperation among them rather intricate, so that documenting a framework’s functioning or explaining it to new team members often becomes difficult. Design patterns are metaphors that describe the behavior and structure of a single framework or the interplay between two or more of them, thus illustrating conceptually how they work. On
Documenting a framework’s functioning or explaining it to new team members often becomes difficult.
September/October 2001
IEEE SOFTWARE
73
■
■
Figure 2. Correspondence between the software architecture and the user interface. Workplace-specific roles appear in the lower left, and the materials available are listed on the lower right.
user participation in the evolutionary development of frameworks, which in turn is a prerequisite for rapidly converging sedimentation processes. Consequently, the Tool–Material design pattern is an excellent example of how we can get conceptual models and design models8 to converge. This pattern is completely integrated in the Gebos development life cycle.1,2 Thus, in this article, when I discuss domain-specific frameworks, I am speaking about materials; when I refer to frameworks with which materials are displayed, copied, deleted, or processed, I am speaking about tools. The Role pattern In a full-service bank, customers frequently take on different roles: as investors, borrowers, guarantors, or customers of affiliated institutions such as investment or insurance companies. Therefore, the bank’s employees have diverse task requirements. For example, loan processors must have a broad and precise overview of a customer’s financial situation, whereas customer service representatives need only certain customer information but they need it immediately—quick and efficient customer service has priority. Moreover, it is important that only authorized persons have access to specific customer information. Maintaining diverse but adaptable employee workstations presents technical challenges: 74
IEEE SOFTWARE
September/October 2001
Fast system response time. For example, customer service systems must give the workers immediate access to all relevant customer information. Smooth sedimentation process. Change requests, made to comply with legal requirements or other application system extensions, often slow down—or in extreme cases even reverse—the sedimentation process. This could happen when a requirement forces an implementation chunk that has already settled into the ADL to move back into a business domain layer framework.
We can address such problems using the Role pattern.6 Unlike the case of the Tool–Material design pattern, which has a technical origin, here everyday business transactions serve as metaphors for structuring the software architecture. The idea behind the Role pattern is that the core implementation of a customer (used by all the customer roles) is a framework in the application domain. In contrast, role-specific customer implementations are frameworks in the business domain. If a role changes, it affects only that role’s framework. The customer core framework in the application domain as well as the frameworks of other roles remain unaffected, and—even more importantly—sedimentation is not obstructed. Also, users and software engineers can use the same terminology and refer to the same entities, the roles. Just like the Tool–Material pattern, the Role pattern is completely integrated into the development life cycle and is another excellent example of how design patterns can facilitate user participation. This approach works well when developing frameworks for traditional industries such as banking or insurance, where the application domain has been stable for several years. However, it might be less worthwhile to try to initiate a sedimentation process in young fields of business where, for instance, the application domain is being explored or even elaborated by the software development process itself. This can happen, for instance, when developing customer relationship management systems, which often are exposed to continuous change requirements. In such cases, it takes some release cycles until business objects become stable; only then are they candidates for sedimentation.
Application adaptability Because individual frameworks all use the same customer core but operate independently, we can easily develop task-specific applications in an elegant way. (This is realized by different DLLs; technical prerequisites include separating the frameworks into conceptual and implementational parts.6) Figure 2, for instance, shows a Gebos screen presenting a bank customer’s general information. The top part of the screen displays invariant customer data provided by the customer core framework. The lower-left window lists workplace-specific roles; the lower-right part of the screen lists available materials. Clicking on a specific role lists all materials (current accounts, securities accounts, and so forth) that are typically connected with this role. As in the case of the Tool–Material pattern, the Role pattern structures the user interface and the software architecture using the same metaphor, thereby supporting convergence between the user’s conceptual model and the design model.8,9 Furthermore, the Role pattern offers practical and elegant possibilities for tailoring an application to particular activities. For instance, at workstations where loan processing is the predominant task, the Bank Customer role would be extended by a further category, Capital Values, which provides detailed information about the customer’s financial situation. Of course, service centers do not need this category because no in-depth consultations are intended to take place there. The Tool–Material pattern also provides further possibilities for adaptation by making a (limited or extended) set of tools and materials available, thus addressing the distinct requirements of different types of workplaces. For instance, a search tool that lets a manager access all of the bank’s loan protocols might not be available at others’ workstations for data protection reasons. Clearly, frameworks and design patterns have great potential for use in HCI. They offer the best prerequisites for competent user participation, adaptable workstation setups, and most importantly the reuse of good design. It is now possible to reuse entire tools, not just materials, single controls, and widgets—the common practice so far. This has drastically changed the role of style guides in developing frameworks.
Style guides: Catalogs of well-tried solutions In traditional software development methods, the role of style guides has become increasingly important as applications have become larger. The reason is obvious: The number of applications programmers involved in any one project is increasing, new components are constantly being added, and the danger of losing a consistent look and feel is continuously growing. Style guides for large Windows or multimedia-based applications often contain more than 500 pages (over 800 pages at times).10 Experience shows that software engineers rarely use such voluminous documentation. However, when a developer uses frameworks that are based on the Tool–Material pattern, the role of style guides changes drastically due to tool reuse. As an application system evolves, the number of newly designed tools decreases; in most cases, reusing existing ones becomes possible (the simplest examples are browsers and editors). Because of this, standardization work is concentrated in the very early stage of application development and concerns only a small team of software engineers who build the original tool and are responsible for user interface design. All the other developers do not need the traditional type of style guide, because this knowledge becomes preserved in the tools themselves. Style guides for framework development must take into account these two issues: ■
■
It is now possible to reuse entire tools, not just materials, single controls, and widgets— the common practice so far.
Criteria for reuse. It is important to precisely define the situations in which developers may reuse a particular tool. A catalog, for example, might list all the tools available for reuse together with the prerequisites for using them. Otherwise, there is the danger that tools might be used outside the intended context just because they have already been implemented and are ready for reuse. Experience has shown that problems arise in such cases, because adapting tools for mid- or long-term use cancels out the benefits of reuse and destabilizes the business or even application layers. Usage models. A usage model is the set of all metaphors (as, for instance, the tool–material and role metaphors) that guide how to use an application. The usage model must be standardized early in September/October 2001
IEEE SOFTWARE
75
HCI people must be involved very early in framework development.
the development process; otherwise, developers might unknowingly use different models as they simultaneously build tools for several business domains. At worst, this could necessitate the modification of application domain frameworks, thus slowing down or even reversing the framework maturation (sedimentation) process. Of course, when building frameworks, you still need to change the names of menu items, action buttons, hot keys, shortcuts, and so forth, because a new domain application might call for a different terminology. However, in contrast to traditional ones, you can restrict frameworks style guides to reuse criteria, usage models, and naming conventions. In the context of Gebos development, the style guide can be made available as a single file.
H
uman–computer interaction specialists have claimed for over 15 years that the most important usability work takes place long before software design and implementation.11 Still, developers using traditional approaches can repair bad usability problems in later versions of the software. However, framework design changes often have far-reaching consequences and can even destabilize an entire system under unfavorable conditions. Thus, changes for pure usability reasons have little chance of implementation in the later stages of framework development (for instance, the Role pattern framework needed more than three years to fully develop). Thus HCI people must be involved very early in framework development. In addition to their typical knowledge base, they must also understand the technical design process of frameworks, help develop suitable metaphors for design patterns, and above all ensure that these metaphors are realized consistently in
About the Author Peter W. Fach is a consultant for human–computer interaction and the software development life cycle at SAP in Walldorf, Germany. Previously, he was a consultant for HCI and object-oriented methods at RWG in Stuttgart, where he conducted the research for this article. His work focuses on software development and usability. He was a program committee member and reviewer for the Mensch & Computer 2001 conference. He received a PhD in human– computer interaction from the University of Stuttgart and also studied computer science at the University of Pisa, Italy. Contact him at SAP AG IBS Financial Services, Neurottstraße 16, 69190 Walldorf, Germany;
[email protected].
76
IEEE SOFTWARE
September/October 2001
the development process. Of course, this approach calls not only for specially trained HCI consultants but also for highly experienced software architects who support and manage the sedimentation process. Complete catalogs of technical solutions, documented as design patterns using metaphors, are now available.12 Certainly, many of these design patterns will never leave the software laboratory; however, others have the potential to support the convergence of design models and conceptual models.8,9 The Tool–Material pattern and the Role pattern are good examples. Thus, with the help of frameworks and appropriate metaphors for design patterns, it should be possible to realize in software development what Christopher Alexander revealed in the field of architecture: a pattern language that provides well-designed and successful solutions.13 References 1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12. 13.
U. Bürkle, G. Gryczan, and H. Züllighoven, “Object-Oriented System Development in a Banking Project: Methodology, Experience, and Conclusions,” Human–Computer Interaction, vol. 10, nos. 2–3, 1995, pp. 293–336. P.W. Fach, “Design Patterns: Bridges between Application Domain and Software Labs,” Proc. 8th Int’l Conf. Human–Computer Interaction, Lawrence Erlbaum, Mahwah, N.J., 1999, pp. 909–912. P.W. Fach, “Interaction Games: Using Familiar Concepts in User Interface Design,” J. Visual Languages and Computing, vol. 10, no. 5, Oct. 1999, pp. 509–521. A.R. Puerta, “A Model-Based Interface Development Environment,” IEEE Software, vol. 14, no. 4, July–Aug. 1997, pp. 40–44. S. Wilson et al., “Beyond Hacking: A Model-Based Approach to User Interface Design,” Proc. HCI ’93, Cambridge Univ. Press, Cambridge, UK, pp. 217–231. D. Bäumer et al., “Framework Development for Large Systems,” Comm. ACM, vol. 40, no. 10, Oct. 1997, pp. 52–59. W.A. Kellogg et al., “NetVista: Growing an Internet Solution for Schools;” www.research.ibm.com/journal/ sj/371/kellogg.html (current 20 Aug. 2001). D.A. Norman, “Cognitive Engineering,” User Centered System Design, D.A. Norman and S.W. Draper, eds., Lawrence Erlbaum, Mahwah, N.J., 1986. M.B. Rosson and S. Alpert, “The Cognitive Consequences of Object-Oriented Design,” Human–Computer Interaction, vol. 5, no. 4, 1990, pp. 345–379. Sparkassen-Organisation SIZ, Gestaltungsleitlinien für Grafische Oberflächen [Organizational Guidelines for Graphical Interfaces], Deutscher Sparkassenverlag, Stuttgart, Germany, 1997. B. Shackel, “Whence and Where—A Short History of Human-Computer Interaction,” Human Aspects in Computing, H.J. Bullinger, ed., Elsevier, Amsterdam, 1991, pp. 3–18. E. Gamma et al., Design Patterns: Elements of Reusable Software, Addison-Wesley, Reading, Mass., 1994. C. Alexander, The Timeless Way of Building, Oxford Univ. Press, Oxford, UK, 1979.
For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.
country report Editor: Deependra Moitra
■
L u c e n t Te c h n o l o g i e s
■
d . m o i t r a @ c o m p u t e r. o r g
Japan: A Huge IT Consumption Market Tomoo Matsubara
I
n early August 2000, officials from the Japanese Ministry of Economy, Trade, and Industry dined in Seattle, Washington, with a Boeing engineer. METI’s head delegate was shocked when the engineer said, “We accept many visitors from Japanese software organizations, but none of them have tried to do business with us. Why is that? Indians try aggressively to do business with us. We are open to working with any country.” Strong domestic demand Strong domestic demand is one of the primary reasons for not pursuing business abroad. In fact, although Japan is in a long recession, the software industry is not. Based on a recent survey, a Nikkei newspaper article stated that Japanese companies invest 14 percent of their total budget on IT. In the commercial sector, many companies prefer to build their own systems rather than buy something off the shelf. Because they have restructured—to cope with the severe economic situation in Japan—and have reduced the number of in-house software professionals, they rely on other domestic software companies to handle their outsourced work. Based on current e-government plans, central government agencies and municipal governments must build a large number of systems, at least US $84 billion by 2003. These are all a source of enormous demand. The second reason for not pursuing business abroad is the language 0740-7459/01/$10.00 © 2001 IEEE
barrier. In 1981 and 1984, Denji Tajima and I published articles on software development in Japan.1,2 At that time, such a report written in English was a rarity. A few years later, Michael Cusumano wrote a book, in English, containing a superior subset of data from Japanese mainframe companies.3 These helped introduce Japanese industry to the Western software community. Nevertheless, the language barrier continues to impede international trade for both sides; it is a subtle protector of Japan’s market. The third reason might be Japan’s oldfashioned software technologies and development style. Because many software companies have little outside stimulation to influence their work situations, most of the development companies at the lower level of the industry lag behind the US in technology. They thus hesitate to make development contracts with overseas companies. Strong demand is not always good news, though. Large companies simply want to buy programming power rather than sign a development contract with specifications, because they can’t or don’t want to write good requirement specifications. In response to such a customer’s request, medium and small software companies recruit high school graduates to program. They train them in programming languages and techniques and then throw them into customers’ development projects—the body-shop approach. Because companies are selling manpower, they don’t need software process improvement, and they have no interest in CMM or other SPI models. Still, software companies can earn decent money. September/October 2001
IEEE SOFTWARE
77
COUNTRY REPORT
Manufacturing sector
Commercial sector
Central and municipal government agencies Large-scale Mainframers, NTT data, companies equipment makers, home appliances makers... Embedded software sector Software subsidiaries of above companies
Large independent software houses
Banks,insurance, securities
Tier 1
Tier 2
Software subsidiaries of above companies Tier 3
Medium-size software companies
Tier 4
Number of engineers Small-scale companies
Small software companies
Tier 5
Figure 1. Hierarchical view of the Japanese software industry.
Software industry structure Japan has a long history of industrialization, so the structure of its software industry is complex. Figure 1 shows the industry from a hierarchical point of view: it has many tiers, each with different characteristics and behaviors. Central and municipal government agencies at the top have the largest procurement power, but they usually do not employ software professionals. Consequently, they have to rely on companies in the second tier, such as Hitachi, Fujitsu, NEC, Japan IBM, and NTT Data—the largest custom software developers in Japan. Tier 2 companies subcontract subsystems with companies in Tier 3 and below. There is a clear difference between Tier 2 and Tier 3 companies. Most Tier 3 companies are direct descendants of Tier 2 companies, and their names usually reflect that— such as Hitachi Software and Omron Software. These companies inherit development processes from their parent companies. Lower-tier companies don’t have such relationships with parent companies. Companies in the manufacturing sector treat software as an industrial product; those in the commercial sector treat 78
IEEE SOFTWARE
September/October 2001
software as a service. Some companies have succeeded in expanding their business to overseas and other domestic markets. Japan’s software development largely depends on Tier 4 and Tier 5 companies, in terms of the number of programmers and their working hours. Almost all development projects consist of a high percentage of programmers from lower tiers. But people in lower-tier companies are generally poorly educated, and their primary skills focus on lower-stream tasks.
Companies in the manufacturing sector treat software as an industrial product; those in the commercial sector treat software as a service.
Figure 1 also shows the vertical structure of industry that is peculiar to Japan. For example, Hitachi, a prime contractor of METI, divides a system into subsystems that are then subcontracted again and again—down to companies in Tiers 4, 5, and 6. Profit margins get thinner and thinner as subcontracting cascades. Recently, embedded-software organizations (the central square in Figure 1) have realized the importance of the software process in increasing product quality. Many large manufacturing companies, such as Sony, Omron, and Panasonic, are implementing a process improvement initiative in Japan. In fiscal 1999 (April ’99–March ’00), Japan’s software industry (534,000 people) produced US $85 billion after a three-year downturn (1993–1995) and three years of quick recovery (1996–1998), according to METI’s Annual survey for Service Industry, 1999. Custom software accounted for 54 percent of sales; product sales were 8 percent. Japan imports US $500 billion of COTS products but exports a minuscule US $73 million, with just a few exceptions—for example, its games are very popular, and a Japanese product for criminal DNA analysis has the biggest share of the global market, with the US Federal Bureau of Investigation as its primary customer. Improving process improvement initiatives In 1997, members of Japan’s Software Engineering Association decided to improve their software processes. They initiated a project to translate the CMM v1.1 into Japanese, and it was published in 1999. The book clearly accelerated process improvement. When SEA began a software process improvement network, or SEA-SPIN, many software engineers participated. In October 2000, SEA established the Japan Software Process Improvement Consortium, whose objectives are to publish a Japanese version of the CMMI and promote SPI.
COUNTRY REPORT
METI’s CMM project Upon surveying the US’s CMM usage in government contracts, METI established the Improvement of Software Development and Procurement Committee, independent of SEA’s CMM initiative. It compiled interim plans for process assessment and government procurement, put them on the METI Web site, and requested public comments. Based on the feedback, METI plans to promulgate new rules for government procurement and selection of eligible bidders by March 2002. The industry responded in opposite ways. Many companies reacted quickly and rushed to announce plans to get higher CMM-level certification, thinking that this would be a new entry ticket for e-government contracts. On the other hand, software engineers who had been tackling SPI in their organizations—many from the embedded-software companies producing high-quality home appliances and equipment—strongly criticized METI’s plan. They complained that the CMM should be used for improvement first, not for selecting eligible organizations for bidding. If the government does the latter, they said, the results would be counterproductive—as we already see in ISO 9000 certification, where companies see assessment as a marketing tool that spoils authentic improvement efforts. Real problems The government plans to create an e-government by 2003—at a cost of US $84 billion—but worries about today’s defect-prone computer systems. But resultant low-quality systems are the tip of the iceberg. Before the government tries to assess bidders’ software processes, it must deal with some fundamental issues. Government software procurement has a lot of other problems as well: ■
■
Government agencies do not have software professionals analyzing operational environments and writing adequate specifications. The government must rely heavily on big name companies for writ-
■
■
■
ing its proposals, specifications, and designs. This is the primary reason that 80 percent of development contracts go to large companies such as NTT, Fujitsu, Hitachi, and NEC. When a large company has won a contract, it is subcontracted to several medium and small software companies, but in fact it forces them person/day work rather than autonomous work. Engineers at the lower levels of the industry hierarchy are frequently forced to work under bad conditions. In the cascading subcontracting environment, a procurer as well as a bidder doesn’t know the people working on the project at the lower-tier companies. In fact, there was a case where people belonging to a dangerous cult group participated in a government system development project and nobody knew until a newspaper revealed it.
The Japanese software industry has other long-lived and deep-rooted problems: ■
■
Software organizations are relatively closed and isolated. This is also true of user organizations. Despite many years of English education, most Japanese hesitate to
Resultant low-quality systems are the tip of the iceberg. Before the government tries to assess bidders’ software processes, it must deal with some fundamental issues.
■
■
■ ■
communicate with people overseas, attend conferences overseas, or read technical books written in English. This delays international technology transfer a few years. Management tends to view software as a tangible rather than an intangible product, and they do not understand the high complexity. They simply think software is a big source of trouble, so they complain and refuse to commit resources for process improvement. The software community lacks creative ideas and products. As a result, the relative amount of COTS imports and exports is lopsided. Academia and industry are not tightly connected. Almost all of the government projects for promoting the software industry, including the notorious Sigma project, failed because of their poor understanding of global IT trends.
The bright side Nevertheless, there is hope. The SEA and other organizations have formed to assist Japanese software developers. SEA was established in 1986 and is open to members from both industry and academia. Many SEA members were active in introducing Unix and object-oriented technologies to the country. SEA holds the annual Software Symposium for promoting technology transfer, and it is active in exchanging ideas, discussing problems, translating important documents, and teaching new technologies through symposia, workshops, meetings, and a consortium (the Japan Software Process Improvement Consortium). There are other such societies, such as the Information Processing Society of Japan, but they are more formal. In 1987, the SEA and the Shanghai Software Center initiated the China–Japan Software Symposium, held annually in China. The symposium is now cosponsored by the United Nations University’s Institute of Advanced Studies. The meeting has been renamed twice: first it was the International CASE Symposium September/October 2001
IEEE SOFTWARE
79
COUNTRY REPORT
(1991–1995); now it’s the International Symposium on Future Software Technology (1996–2001). A number of consortia have also formed to propagate new software-related technologies such as Linux, XML, Java, and Enterprise JavaBeans. Japan has a long history of manufacturing high-quality products— ships, cameras, automobiles, and the like. Naturally, our software organizations inherited quality control techniques from those industries, including a quality-driven improvement approach called Total Quality Control. However, by installing imported models such as ISO 9000 and CMM, software engineers lost selfconfidence in the maturity of their practices, and many adopted the false notion that their software quality was low. Yet, even in Japan’s lowCMM-rated companies, the quality of delivered product is still high.4 By consistently applying a qualitydriven, problem-focused approach for quality improvement, we can achieve high quality.
When the government initiates system development for building the e-government by 2003, the gap between demand and supply will widen drastically. Because of the aggregated demands of government and industry, Japan’s IT industry must finally rely on overseas suppliers. This will be a good opportunity for Japan to expand its international business, but it will also stimulate its trading partners. If you want to do software business with Japan, you should overcome these problems:
Business with Japan As I mentioned early, Japan has many software systems to build.
■ ■
■
■
■
Build your Japanese language skills for listening, speaking, reading, and writing. Reading and writing Japanese documents will be most difficult. If you want to do this in a hurry, hire Japanese software professionals. Understand Japanese culture. Superficially, this society looks westernized, but beware—the Japanese way of thinking is different. Understand Japan’s specific business rules. Choose the appropriate domain. Provide a quick feedback loop to remedy processes.
References 1. D. Tajima and T. Matsubara, “The Computer Software Industry in Japan,” Computer, vol. 14, no. 5, May 1981, pp. 89–96. 2. D. Tajima and T. Matsubara, “Inside the Japanese Software Industry,” Computer, vol. 17, no. 3, Mar. 1984, pp. 34–43. 3. M. Cusumano, Japan’s Software Factories, Oxford Univ. Press, Oxford, UK, 1991. 4. C. Jones, Software Quality: Analysis and Guidelines, Int’l Thomson Publishing, Stamford, Conn., 1999.
Tomoo Matsubara is an independent consultant on software technology, management, and business. His interests are in using theories, technologies, and best practices to solve problems in software organizations; software process improvement promotion; and system safety. He also serves as a Japanese delegate for international software standardization. He has a BS in mechanical engineering from Waseda University. He is a board member of the Japan Software Engineers Association and a member of the IEEE Software Industry Advisory Board. Contact him at
[email protected].
Introducing the IEEE Computer Society
Career Service Center Career Service Center • Certification • Educational Activities • Career Information • Career Resources • Student Activities • Activities Board
computer.org
Advance your career Search for jobs Post a resume List a job opportunity Post your company’s profile Link to career services
computer.org/careers/
from your technical council Editor: Melody M. Moore
■
Georgia State University
■
[email protected] Software Reuse: Silver Bullet? Melody M. Moore
S
oftware reuse has been touted as a panacea, the solution to the software crisis, and a crucial technique in improving software engineering productivity. In practice, however, software reuse techniques and technologies have been plagued with difficulties in categorizing, storing, and retrieving reusable components. In our continuing series spotlighting the TCSE committees, this month’s TCSE column features an interview with the chair of the TCSE Software Reuse Committee, Bill Frakes of Virginia Polytechnic Institute and State University. Q: Is there a new definition of reuse? What is the practice’s current state of the practice? A: Reuse has evolved beyond simply reusing code or objects. Currently, we are focusing reusing knowledge itself, not just life-cycle artifacts such as specifications, code, and test data. Q: How do we reuse knowledge? A: The key is in domain engineering and analysis. This involves determining common elements across a domain and capturing this information so that it can inform further domain development. Domain knowledge sources include code, specifications, documents, as well as information that only exists in the heads of domain experts. Knowledge reuse is challenging because domain information is often not written down. Q: Once we have collected knowledge about a domain, what do we do with it? A: Domain models can assist in 86
IEEE SOFTWARE September/October 2001
forming generic templates for software architectures. Standard notations have recently been extended to describe multiple systems in a nonspecific fashion. Ontologies, or domain vocabulary models, can be extracted from documents using faceted classification. This technique involves extracting words and phrases from documents and analyzing them to determine which facets or groups are important. These facets can form the basis for a domain’s highlevel requirements definition. Feature tables are created during a step called feature analysis, which can be performed at either a user or builder level. Users and builders choose features from each of the facet groups (similar to choosing from a restaurant menu) to specify a system. Q: Is reuse the panacea that the software engineering community has hoped for? A: Reuse can work very well, but not in all circumstances. Initially the community held a naïve view of how long it would take to analyze each domain. For example, a very successful case of reuse is the lex and yacc code for creating compilers, which produce a 40-to-1 payoff from specification to code. However, this is a very narrow and specific domain. The domain must be well defined and well understood; attempting to form a cohesive domain model for the entire telecom field, for example, is too big and complicated to be effective. Q: Are there key factors in the success of reuse efforts? A: Perhaps surprisingly, the key fac-
For More Information The reuse information Web site is at http://frakes.cs.vt.edu/renews. html. The International Conference on Software Reuse (ICSR) is scheduled for 15–19 April 2002 in Austin, Texas.
tor in making reuse work is support and commitment from upper management. Reuse practices pay off, but they require resources up front. Organizations that only receive support from technical personnel are typically not successful; the overhead costs of reuse need to be amortized across an entire project or organization. Q: What’s on the horizon for reuse technology? A: The success of a systematic domain analysis approach to reuse has highlighted the need for domain engineering research. The development community has embraced object patterns, but this level is not high enough. We need to study the economics and metrics of reuse—when is it worth doing? Can we employ strategic economic models to determine whether reuse is cost-effective? Answering these questions will help make reuse more effective. Melody M. Moore is an assistant professor of computer
information systems at Georgia State University. Her research focuses on reengineering and assistive technology for people with severe disabilities, specifically brain–computer interfaces. Contact her at the Computer Information Systems Dept., College of Business Administration, Georgia State Univ., Atlanta, GA 30303-4013;
[email protected]; www.cis.gsu.edu/~mmoore. 0740-7459/01/$10.00 © 2001 IEEE
design Editor: Martin Fowler
■
T h o u g h t Wo r k s
■
[email protected] Aim, Fire Kent Beck “Never write a line of functional code without a broken test case.” —Kent Beck “Test-first coding is not a testing technique.” —Ward Cunningham
C
raig Larman talked about koans— the sound of one hand clapping, mountains aren’t mountains, and so on. Now I offer this: Test-first coding isn’t testing. Okay, Ward Cunningham, what is it? Never mind what is it—what’s it going to do to my code? We have to start with how it works. Let’s say we want to write a function that adds two numbers. Instead of jumping to the function, we first ask, “How can we test this?” How about this code fragment: assertEquals(4, sum); // cf JUnit.org for more // explanation
History sidetrack Test-first coding isn’t new. It’s nearly as old as programming. I learned it as a kid while reading a book on programming. It said that you program by taking the input tape (you know, like a long, skinny floppy disk) and typing in the output tape you expect. Then you program until you get the output tape you expect. Until I started coding testing first, I had no idea how often I was coding without knowing what the right answer should be. Crazy, isn’t it? You wouldn’t do such a thing, but perhaps you have a co-worker who does.
int sum= Calculator.sum(2, 2); assertEquals(4, sum);
Analysis So, test-first is an analysis technique. We decide what we are programming and what we aren’t programming, and we decide what answers we expect. Picking what’s in scope and, more importantly, what’s out of scope is critical to software development. Following test-first religiously forces you to explicitly state what circumstances you were considering when you wrote the code. Don’t expect any circumstance not implied by one of the tests to work.
The fragment doesn’t compile? Pity. Now we go and write our Calculator class. There are a bunch of other test cases that also have to pass? Excellent! Make a list and we’ll get to them all, one at a time.
Design But wait, there’s more. Test-first is also a design technique. You remember when we decided that sum would be a static method on the Calculator class? That’s logical, or inter-
That’s a good start, but how do we compute sum? How about int sum=? ...hmmm.… What if we represent our computation as a static method on a Calculator class?
0740-7459/01/$10.00 © 2001 IEEE
September/October 2001
IEEE SOFTWARE
87
DESIGN
face, design—just like my old college texts talked about. Of course, those texts never explained how to do logical design; they just made it clear that it must be done, it must be done before physical design and implementation, and heaven help anyone who mixes them up. Test-first is a technique for logical design. You don’t have the implementation yet, so you have to stretch to work on physical design. What you begin typing is an expression of the outsides of the logic you are about to write. Robert Martin makes the analogy to a virus. Viruses have special shapes that are designed to attach to molecules in the cell wall. The shapes are the inverse of the shape of the molecule. Test-first tests are like that. They are the inverse of the shape of the code we are about to write. And, when the shapes match exactly, we’ve written the code we expected. Not testing How much would you pay for tests like this? Don’t answer that question. There’s more. I’m a worrier. When I don’t have my test blankie, I’m forever fussing about what did I miss, what did I forget, what did I just screw up? With the tests, bang, I have instant confidence. Was that just a mistake? Push the button, run the tests, the bar is green, the code is clean. Ahhhh. Next test. All you testing gurus are probably sharpening your knives. “That’s not testing. Testing is equivalence classes and edge cases and statistical quality control and…and…a whole bunch of other stuff you don’t understand.” Hold on there—I never said that test-first was a testing technique. In fact, if I remember correctly, I explicitly stated that it wasn’t. However, reflecting on the difference between coding test-first and even automating tests immediately after coding, I think I have an order-ofmagnitude fewer defects coding testfirst. Do I have hard numbers? No, I don’t. That’s what PhDs are for, and I don’t have one of those, either. (It 88
IEEE SOFTWARE
September/October 2001
would make a great thesis, though, a la Laurie Williams’ pair programming studies.) I’m not submitting this as something everyone should do all the time. I have trouble writing GUIs test-first, for example (another thesis, eager student). It’s certainly worth a try, though. Better design We saw how test-first drives design decisions. Surprise! Not only do the decisions come at a different time, the decisions themselves are different. Test-first code tends to be more cohesive and less coupled than code in which testing isn’t part of the intimate coding cycle. Some people worry that they can’t even get their code done, and now they have to test, too. And we’re asking them to design better, too? It can’t be done. Relax. You don’t have to be a great designer—you just have to be creatively lazy. Here’s an example. Suppose we want to write a function that, given a person of a certain age, returns that person’s mortality table (gruesome, I know, but that’s life insurance for you). We might begin by writing assertEquals(expectedTable, actualTable);
How are we going to compute the actualTable? Ask a MortalityTable class.
Here’s the beauty of it— we didn’t have to be brilliantly prescient designers to find a less tightly coupled design. We just had to be intolerant of pointless effort in writing the tests.
MortalityTable actualTable= new MortalityTable(bob); assertEquals(expected Table, actualTable);
How do we create the person? There’s a constructor that takes 11 parameters (or 11 setter methods, all of which have to be called, which amounts to the same thing). Person bob= new Person(“Bob”, “Newhart”, 48, Person.NON_ SMOKER,
We’re tired of typing about this point, and we don’t feel like staring at the parameter list for the constructor any more. Is there some other solution? There might be a simpler way to construct a Person, but there’s an even easier solution: change the interface to MortalityTable to take the age (assuming that’s the only parameter of interest, remember, this is an example), not the whole person. Now we don’t have to create a Person at all: MortalityTable actualTable= new MortalityTable(48); assertEquals(expectedTable, actualTable);
Hang on a doggone minute What just happened? We made a snap design decision that created more flexibility than we needed by passing the whole Person when creating a MortalityTable. Without immediate feedback, we would probably have just ridden that decision for years or decades without questioning it. Writing the test gave us feedback seconds later that caused us to question our assumption. Here’s the beauty of it—we didn’t have to be brilliantly prescient designers to find a less tightly coupled design. We just had to be intolerant of pointless effort in writing the tests. We still had to demonstrate skill in finding and executing our alternative design. So, it’s not that design skill goes away—the intensity of the feedback substitutes for the ability to guess right. We don’t have to guess (well, not for long) about the right
DESIGN
ADVERTISER / PRODUCT INDEX design. We design something simple we think might work, and then seconds later we have confirmation. If the feedback shows we oversimplified, we make the design more complicated. If the feedback shows we overdesigned, we simplify.
S ■ ■
■
■
o, test-first
encourages you to be explicit about the scope of the implementation, helps separate logical design from physical design from implementation, grows your confidence in the correct functioning of the system as the system grows, and simplifies your designs.
Is that all? I’ll leave you with two more points. First, test-first helps you communicate clearly with your pair-programming partner. Martin and I had a fabulous example of this one time when we were programming in Munich. He wanted the expected value to be 0.1 and I wanted it to be 10. Can you spot the disagreement? (It took us far too long to realize we were talking about different representations of a percentage, but that’s programming in public for you). Second, test-first tests leave a Rosetta stone for the future, answering the all-important question, “What was this idiot thinking when he wrote this?” When shouldn’t you test first? Writing user interfaces test-first is hard. I’d love to find a UI testing strategy where test-first actually helped me design better interfaces. Give it a spin. See how it goes.
Kent Beck is director of Three Rivers Institute, exploring the interaction of technology, business, and humanity. He has pioneered Extreme Programming, CRC cards, the TimeTravel patterns, the xUnit family of testing frameworks including the award-winning JUnit, and the Hillside Group. He wrote Planning Extreme Programming, Extreme Programming Explained: Embrace Change, Kent Beck’s Guide to Better Smalltalk: A Sorted Collection, and Smalltalk Best Practice Patterns. Contact him at PO Box 128, Merlin, OR 97532;
[email protected].
SEPTEMBER
/
OCTOBER
2001
Advertising Personnel
James A. Vick IEEE Staff Director, Advertising Businesses Phone: +1 212 419 7767 Fax: +1 212 419 7589 Email:
[email protected] Marion Delaney IEEE Media, Advertising Director Phone: +1 212 419 7766 Fax: +1 212 419 7589 Email:
[email protected] Marian Anderson Advertising Coordinator Phone: +1 714 821 8380 Fax: +1 714 821 4010 Email:
[email protected] Sandy Brown IEEE Computer Society, Business Development Manager Phone: +1 714 821 8380 Fax: +1 714 821 4010 Email:
[email protected] Debbie Sims Assistant Advertising Coordinator Phone: +1 714 821 8380 Fax: +1 714 821 4010 Email:
[email protected] Atlanta, GA C. William Bentz III Email:
[email protected] Gregory Maddock Email:
[email protected] Sarah K. Huey Email:
[email protected] Phone: +1 404 256 3800 Fax: +1 404 255 7942
Advertiser / Product
San Francisco, CA Matt Lane Email:
[email protected] Telina Martinez-Barrientos Email:
[email protected] Phone: +1 408 879 6666 Fax: +1 408 879 6669 Chicago, IL (product) David Kovacs Email:
[email protected] Phone: +1 847 705 6867 Fax: +1 847 705 6878 Chicago, IL (recruitment) Tom Wilcoxen Email:
[email protected] Phone: +1 847 498 4520 Fax: +1 847 498 5911 New York, NY Dawn Becker Email:
[email protected] Phone: +1 732 772 0160 Fax: +1 732 772 0161 Boston, MA David Schissler Email:
[email protected] Phone: +1 508 394 4026 Fax: +1 508 394 4926 Dallas, TX Royce House Email:
[email protected] Phone: +1 713 668 1007 Fax: +1 713 668 1176 Japan German Tajiri Email:
[email protected] Phone: +81 42 501 9551 Fax: +81 42 501 9552 Europe Glesni Evans Email:
[email protected] Phone: +44 193 256 4999 Fax: +44 193 256 4998
Page Number
Java Development Conference 2001
Cover 2
ParaSoft
Cover 4
Software Engineering Conference 2001
11
Sticky Minds
15
IEEE Computer Society 10662 Los Vaqueros Circle Los Alamitos, California 90720-1314 Phone: +1 714 821 8380 Fax: +1 714 821 4010 http://computer.org
[email protected]