Advances
in COMPUTERS
VOLUME 52
This Page Intentionally Left Blank
Advances in
COMPUT RS Fortieth Anniversary Volume: Advancing into the 21st Century EDITED BY
M A R V I N V. Z E L K O W I T Z Department of Computer Science and Institute for Advanced Computer Studies University of Maryland College Park, Maryland
V O L U M E 52
ACADEMIC PRESS A Harcourt Science and Technology Company
San Diego San Francisco New York London Sydney Tokyo
Boston
This book is printed on acid-free paper. Copyright ,~ 2000 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com Academic Press A Harcourt Science and Technology Company 32 Jamestown Road, London NW1 7BY, UK h ttp://www, academicp re ss. c o m ISBN 0-12-012152-2 A catalogue for this book is available from the British Library
Typeset by Mathematical Composition Setters Ltd, Salisbury, UK Printed in Great Britain by Redwood Books, Trowbridge, Wiltshire 00 01 02 03 04 05 RB 9 8 7 6 5 4 3 2 1
Contents CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii xi
Eras of Business C o m p u t i n g Alan R. Hevner and Donald J. Berndt 1. 2. 3. 4. 5. 6. 7.
A H a l f C e n t u r y of Business C o m p u t i n g . . . . . . . . . . . . . . . . . . . Business C o m p u t i n g Eras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T h e C o m p u t a t i o n Platform: H a r d w a r e a n d O p e r a t i n g Systems Communication: Computer Networking ................... Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business System A r c h i t e c t u r e s . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..
2 4 16 43 58 73 81 85
N u m e r i c a l W e a t h e r Prediction Ferdinand Baer 1. 2. 3. 4. 5. 6.
Introduction ...................................... Computational Methods .............................. D a t a Analysis, Assimilation, a n d Initialization . . . . . . . . . . . . . . R e g i o n a l Prediction M o d e l i n g . . . . . . . . . . . . . . . . . . . . . . . . . . Ensemble Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . Conclusions ....................................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92 102 137 144 148 151 153
M a c h i n e Translation Sergei Nirenburg and Yorick Wilks 1. 2. 3. 4. 5. 6. 7. 8.
Introduction ...................................... Is M a c h i n e T r a n s l a t i o n Impossible? . . . . . . . . . . . . . . . . . . . . . W h a t Sort of C o m p u t a t i o n is M T ? . . . . . . . . . . . . . . . . . . . . . . M a i n P a r a d i g m s for M T - - D i v e r s e Strategies for Solving or N e u t r a l i z i n g the C o m p l e x i t y of L a n g u a g e Use . . . . . . . . . . . . . . T h e E v o l u t i o n of M T O v e r its 50-year H i s t o r y . . . . . . . . . . . . . . Choices a n d A r g u m e n t s F o r a n d A g a i n s t M T P a r a d i g m s . . . . . . M T in the Real W o r l d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V
160 161 163 165 169 173 180 185
vi
CONTENTS
9. C o n c l u s i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References ........................................
186 187
The Games Computers (and People) Play Jonathan Schaeffer 1. 2. 3. 4.
Introduction ...................................... Advances ........................................ A d v a n c e s in C o m p u t e r G a m e s . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions ....................................... Acknowledgments .................................. References ........................................
190 191 211 261 262 262
From Single Word to Natural Dialogue Neils Ole Benson and Laila Dybkjaer 1. 2. 3. 4.
Introduction ...................................... Task-oriented Spoken Language Dialogue Systems ........... M a n a g i n g the D i a l o g u e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion ....................................... References ........................................
269 271 278 325 325
Embedded Microprocessors: Evolution, Trends and Challenges Manfred Schlett 1. 2. 3. 4. 5. 6. 7. 8.
Introduction ...................................... T h e 32-bit E m b e d d e d M a r k e t p l a c e . . . . . . . . . . . . . . . . . . . . . . General Microprocessor and Technology Evolution .......... Basic P r o c e s s o r C l a s s i f i c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . Processor Architectures ............................... Embedded Processors and Systems ...................... The Integration Challenge ............................ Conclusion ....................................... References ........................................
330 332 337 342 348 363 374 376 377
AUTHOR INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
381
SUBJECT INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
389
CONTENTS OF VOLUMES IN THIS SERIES . . . . . . . . . . . . . . . . . . . .
401
Contributors Ferdinand Baer received his professional training from the Department of Geophysical Sciences at the University of Chicago from which he graduated in 1961. He began his academic career as an Assistant Professor at Colorado State University where he was one of the founding members of their Department of Atmospheric Science. In 1971 he took a position as Professor at the University of Michigan, and in 1977 he moved to the University of Maryland where he took on the chairmanship of the newly created Department of Meteorology. In 1987 he retired from his administrative post to devote himself to teaching and research as a professor in the same department. During his tenure at the various universities, Professor Baer was a WMO (World Meteorological Organization) expert to India, a research fellow at the GFDL (Geophysical Fluid Dynamics Laboratory) laboratory of Princeton University, a Visiting Professor at the University of Stockholm and the Freie University of Berlin and occasionally a summer visitor at NCAR (National Center for Atmospheric Research). His research interests span a variety of topics including atmospheric dynamic, numerical weather prediction, numerical analysis, initialization, spectral methods, atmospheric energetics, gravity waves, and high performance computing applications. He is a member of a number of professional societies and a fellow of the American Meteorological Society, the Royal Meteorological Society, and the American Association for the Advancement of Science. He has directed to completion the PhD research of 15 students and has several more in line. He has, or has had, research support from NSF, NASA, DOE, NOAA (National Oceanographic and Atmospheric Administration), and DOD. In support of his community he has served on a variety of boards and committees which included NAS/BASC (National Academy of Sciences/ Board on Atmospheric Sciences and Climate), two terms as a UCAR trustee, member representative to UCAR (University Corporation for Atmospheric Research) from UMCP (University of Maryland at College Park), and most recently, chair of the AAAS Section on Atmospheric and Hydrospheric Sciences. Donald J. Berndt is an Assistant Professor in the Information Systems and Decision Sciences Department in the college of Business Administration at the University of South Florida. He received his MPhil and PhD in Information Systems from the Stern School of Business at New York University. He also holds a MS in Computer Science from the State
vii
viii
CONTRIBUTORS
University of New York at Stony Brook and a BS in Zoology from the University of Rhode Island. Dr. Berndt's research and teaching interests include the intersection of artificial intelligence and database systems, knowledge discovery and data mining, data warehousing, software engineering methods, and parallel programming. He was a research scientist at Yale University and Scientific Computing Associates, where he participated in the development of commercial versions of the Linda parallel-programming environment. He also developed artificial intelligence applications in academic settings and at Cognitive Systems, Inc. In 1993-95 Dr. Berndt was a lecturer in the Computer Science Department at the State University of New York at Stony Brook and taught several courses for the Stern School of Business at New York University. He is a member of Beta Gamma Sigma, AAAI, ACM, and AIS (Association for Information Systems). Niels Ole Bernsen is Director of the Natural Interactive Systems Laboratory and Professor of Engineering at the University of Southern Denmark - - Odense. His research interests include interactive speech systems and natural interactive systems more generally, systems for communities, design support tools, usability engineering, modality theory and multimodality, and best practice in systems development and evaluation. He is Coordinator of the European Network for Intelligent Information Interfaces (i3net) and a member of the Executive Board of the European Language and Speech Network (Elsnet), and takes part in several European collaborative research projects in the above-mentioned research areas. He has authored and edited 10 books and is the author of more than 300 papers and reports. Laila Dybkja~r is a senior researcher at the Natural Interactive Systems Laboratory at the University of Southern Denmark. Her research interests include intelligent user interfaces, interactive speech systems, usability design, best practice, evaluation, dialogue model development, dialogue theory, corpus analysis, and multimodal systems. She received an MS and a PhD in computer science from the University of Copenhagen. Alan R. Hevner is an Eminent Scholar and Professor in the Information Systems and Decision Sciences Department in the College of Business Administration at the University of South Florida. He holds the Salomon Brothers/Hidden River Corporate Park Chair of Distributed Technology. Dr. Hevner's areas of research interest include information systems development, software engineering, distributed database systems, healthcare information systems, and telecommunications. He has published over 75 research papers on these topics and has consulted for several Fortune 500
CONTRIBUTORS
ix
companies. Dr. Hevner received a PhD in Computer Science from Purdue University. He has held faculty positions at the University of Maryland and the University of Minnesota. Dr. Hevner is a member of ACM, IEEE, AIS, and INFORMS (Institute for Operations Research and the Management Sciences). Sergei Nirenburg is Director of the Computing Research Laboratory and Professor of Computer Science at New Mexico State University. Dr. Nirenburg has written or edited 6 books and has published over 130 articles in various areas of computational linguistics and artificial intelligence. He has founded and is Steering Committee chair of a series of scientific conferences on Theoretical and Methodological Issues in Machine Translation, the eighth of which took place in August 1999 in Chester, UK. Between 1987 and 1996 he was Editor-in-Chief of the journal Machine Translation. He is a member of the International Committee on Computational Linguistics (ICCL). Jonathan Schaeffer is a professor of computing science at the University of Alberta (Edmonton, Canada). His BSc is from the University of Toronto, and his MMath and PhD degrees are from the University of Waterloo. His major research area is in artificial intelligence, using games as his experimental testbed. He is the principal author of the checkers program Chinook, the first program to win a human world championship in any game in 1994. He received an NSERC E.W.R. Memorial Fellowship in 1998. Manfred Schlett is currently working as Product Manager at Hitachi Europe GmbH. He received a diploma in technical mathematics in 1991 and a PhD in mathematics in 1994 for his work in numerical semiconductor device simulation, both from the University of Karlsruhe. In 1995 he joined hyperstone electronics as a VLSI design engineer working on the DSP integration into the 32-bit RISC hyperstone E1 architecture. Later he became a project and marketing manager at hyperstone. In 1998, he joined Hitachi Europe's marketing team focusing on Hitachi's 32-bit SuperH series. His research interests include microprocessor architectures, advanced VLSI design, signal processing, and multimedia. He has published several articles on numerical semiconductor device simulation, design of embedded microprocessors, and the unification of RISC and DSP architectures.
Yorick Wilks is Professor of Computer Science at the University of Sheffield and Director of the Institute of Language, Speech and Hearing (ILASH). He has published numerous articles and five books in that area of artificial intelligence, of which the most recent are Artificial Believers (with Afzal
X
CONTRIBUTORS
Ballim) from Lawrence Erlbaum Associates (1991) and Electric Words." Dictionaries, Computers and Meanings (with Brian Slator and Louise Guthrie), MIT Press (1995). He is also a Fellow of the American Association for Artificial Intelligence, and a member of the International Committee on Computational Linguistics (ICCL).
Preface to Volume 52" 40th Anniversary issue Advancing into a new century
Humanity is often distinguished from other animals by its ability, even its need, to see patterns in everyday life. As the 20th century draws to a close and we enter a new millennium according to the calendar, all aspects of society seem to want to take stock of what has happened in the past and what is likely to happen in the future. The computer industry is no different from others. The Advances in Computers series has been published continuously since 1960 and this year's volume is the 50th technical volume (if you ignore the two index volumes that have been produced) in the series. Since it is the 40th year of publication, and is being published in the year 2000, if you believe in numerology, since 40 times 50 is 2000, all the signs point to something special for this edition. As we enter the 21 st century, we decided to look back on the changes that have occurred since Volume 1 of Advances in Computers appeared in 1960. We looked at the six chapters of that initial volume and decided that an appropriate anniversary volume for this series would be a series of papers on the same topics that appeared in 1960. What has happened to those technologies? Are we making the progress we thought we would or are events moving more slowly? To refresh your memory, Volume 1 of the Advances contained the following chapters: 1. General-purpose programming for business applications, by Calvin C. Gotlieb 2. Numerical weather prediction, by Normal A. Phillips 3. The present status of automatic translation of languages, by Yehoshua Bar-Hillel 4. Programming computers to play games, by Arthur T. Samuel 5. Machine recognition of spoken words, by Richard Fatehchand 6. Binary arithmetic, by George W. Reitwiesner. We hope that the chapters included in this present volume will give you an appropriate current perspective on these technologies.
xi
xii
PREFACE TO VOLUME 52
In Volume 1, C. C. Gotlieb discussed business data processing. It is strange to think that this chapter predates the era of COBOL, while the first chapter of the present volume is describing a post-COBOL world. Alan Hevner and Donald Berndt, in their chapter entitled "Eras of business computing," give a history of business data processing that goes through the evolution of technology from the large mainframe processor to today's world wide web based electronic commerce (e-commerce). It seems clear that at least in the short term, web-based applications using an object oriented design process will dominate business applications. In the second chapter, Ferdinand Baer updates our knowledge of numerical weather prediction. In 1960, weather prediction was rather primitive; lack of computing power was a known primary problem. Today's machines are orders of magnitude faster, and weather prediction up to 14 days in advance, long regarded as the maximum length of time to make a prediction, is becoming a reality. Today's models are becoming more explicit, and the reduction of forecasting errors is on the horizon. In the third chapter, Sergei Nirenburg and Yorick Wilks update our knowledge of machine translation of natural language. In 1960 the prevailing opinion was that machine translation was important, but probably impossible to achieve in all cases. Although some people still hold that opinion, great advances have been made. Machine translation is now an economic necessity in our international economic community, and many routine documents can now be translated automatically. In 1960, Arthur Samuel wrote about his famous checkers program, the first successful game-playing computer program. In the fourth chapter of this volume, Jonathan Schaeffer updates our knowledge of computer game playing. Chess programs are now rated at the grandmaster level and have even succeeded in competing successfully against grandmasters. Schaeffer describes current search strategies in game-playing programs, and updates our knowledge on computer programs that play games such as backgammon, bridge, checkers, chess, Othello, poker, and Scrabble. In the fifth chapter, "From single word to natural dialogue" by N. O. Bernsen and L. Dybkja~r, the authors discuss spoken dialogue systems. In 1960 a speech recognition system could recognize about 10 spoken words, whereas today you can purchase systems that can recognize about 5000 words, and with some training, the systems can be taught to recognize about 60 000 words. Here is one research area that may be on the verge of going out of business as industrial competitiveness is leading to further and further advances in incredibly short periods of time. In the final chapter in 1960, George Reltwiesner discussed the ability to build better algorithms to process binary arithmetic. How to do addition, multiplication, division, extraction of square roots, etc., faster and more
PREFACE TO VOLUME 52
.
.
.
Xlll
efficiently was the major hardware design issue in these early machines. Today, we are not so concerned about such issues; we believe we have optimal or almost optimal instructions for such procedures. Of more concern is to embed computers so that they can work efficiently as part of our industrial society. In the final chapter of this present volume, "Embedded microprocessors: Evolution, trends, and challenges," Manfred Schlett discusses the role of the microprocessor as part of an embedded architecture, and how technologies such as reduced instruction set computers (RISC) are allowing hardware designers to build ever faster processors. We hope that you enjoy these chapters and that they will provide a graphic demonstration of how our industry has evolved from 1960 to today. If you have any suggestions for chapters in future volumes, please contact me at
[email protected]. MARVIN ZELKOWITZ University of Maryland, College Park, Maryland Fraunhofer Center for Experimental Software Engineering, College Park, Maryland
This Page Intentionally Left Blank
Eras of Business Computing ALAN R. HEVNER AND DONALD J. BERNDT Information Systems and Decision Sciences College of Business Administration, University of South Florida Tampa, FL 33620 USA {ahevner, dberndt}@coba.usf.edu
Abstract The past half-century has seen amazing progress in the use of information technology and computer systems in business. Computerization and communication technologies have truly revolutionized the business organization of today. This chapter presents a structured overview of the evolution of business computing systems through six distinct eras: 9 Era of Calculation 9 Era of Automation 9 Era of Integration and Innovation 9 Era of Decentralization 9 Era of Reengineering and Alignment 9 Era of the Internet and Ubiquitous Computing Advances in each of the major computing technologies--Computational Platform, Communications, Software, and System Architecture--are surveyed and placed in the context of the computing eras. The focus is on how technologies have enabled innovative strategic business directions based on new business system architectures. A key observation is that around 1975 the principal role of the computer in business systems changed from a computation engine to a digital communications platform. We close the chapter by presenting a set of major conclusions drawn from this survey. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact into the 21st century.
1. 2.
A Half Century of Business Computing . . . . . . . . . . . . . . . . . . . . . . Business Computing Eras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Era of Calculation (Before 1950) . . . . . . . . . . . . . . . . . . . . . . . 2.2 Era of Automation (1950-64) . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Era of Integration and Innovation (1965-74) . . . . . . . . . . . . . . . . . 2.4 Era of Decentralization (1975-84) . . . . . . . . . . . . . . . . . . . . . . 2.5 Era of Reengineering and Alignment (1985-94) . . . . . . . . . . . . . . . 2.6 Era of the Internet and Ubiquitous Computing (1995 onwards) . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
2 4 5 7 9 12 14 15
Copyright :~i 2000 by AcademicPress All rights of reproduction in any form reserved.
2 3.
4.
5.
6.
7.
ALAN R. HEVNER AND DONALD J. BERNDT
The Computation Platform: Hardware and Operating Systems . . . . . . . . . . 3.1 Three Classic Computer Hardware Generations . . . . . . . . . . . . . . . 3.2 The Role of Universities and Military Research . . . . . . . . . . . . . . . 3.3 Twin Roads: Scientific Calculation and Business Automation . . . . . . . . 3.4 The Rise of the General Purpose Computer . . . . . . . . . . . . . . . . . 3.5 Computing Means Business . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Computers Leave the Computer Room . . . . . . . . . . . . . . . . . . . 3.7 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication: Computer Networking . . . . . . . . . . . . . . . . . . . . . . 4.1 ARPA: Advanced Research Projects Agency . . . . . . . . . . . . . . . . . 4.2 Packet Switched Networking . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 A R P A N E T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Xerox PARC: The Office of the Future . . . . . . . . . . . . . . . . . . . 4.5 LANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Internetworking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 LANs, WANs, and the Desktop . . . . . . . . . . . . . . . . . . . . . . . Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Algorithmic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Data: File Systems and Database Systems . . . . . . . . . . . . . . . . . . 5.3 H u m a n - C o m p u t e r Interaction (HCI) . . . . . . . . . . . . . . . . . . . . 5.4 Software Development Processes and Methods . . . . . . . . . . . . . . . . 5.5 Software Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business System Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Manual Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Mainframe Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 On-Line, Real-Time Architectures . . . . . . . . . . . . . . . . . . . . . . 6.4 Distributed, Client-Server Architectures . . . . . . . . . . . . . . . . . . . 6.5 Component-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . 6.6 Web-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Computers as Tools of Business . . . . . . . . . . . . . . . . . . . . . . . 7.2 Miniaturization of the Computational Platform . . . . . . . . . . . . . . . 7.3 Communications Inflexion Point . . . . . . . . . . . . . . . . . . . . . . . 7.4 Growth of Business Information and Knowledge . . . . . . . . . . . . . . . 7.5 Component-Based Software Development . . . . . . . . . . . . . . . . . . 7.6 MIS and Business System Architecture Synergies . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Computers changed
A Half Century of Business Computing and by
business functions management most
the related
o u r w o r l d in m a n y
revolutionized
arts and
use
the introduction of manufacturing, of
sciences of digital computing
have
w a y s . O v e r t h e p a s t 50 y e a r s , b u s i n e s s h a s b e e n
have been redefined
effective
16 17 17 18 24 27 30 35 43 45 46 48 51 55 56 58 58 58 62 65 68 73 73 75 75 76 77 79 80 81 81 82 82 83 84 85 85
innovative
of computing accounting,
and reengineered computing
machinery. marketing many
strategies
Traditional
and sales, and
times to make and
the
information
ERAS OF BUSINESS COMPUTING
3
technologies. The great business visionaries of the first half of the 20th century (e.g., Ford, Rockefeller, Morgan, Vanderbilt, Mellon) would be overwhelmed by the pervasive use of computing technology in today's businesses. Computers and information technology are central to our understanding of how business is conducted. The invention and development of the original electronic computers occurred during the final years of World War II and the late 1940s. At first, computers were employed as ultra-fast calculators for scientific equations and engineering applications. Only with the advent of stored programs and efficient data storage and file-handling capabilities did their use for business applications become apparent. Albeit slowly at first, computers became more and more accessible to businesses for the automation of routine activities, such as sorting, searching, and organizing business data. Then, with increasing rapidity, computers were applied to all functions of the business, from assembly line production (e.g., robotics) to the highest levels of executive management (e.g., decision support systems). Without question, the dominant use of computing technology today is for business applications. This chapter presents a necessarily brief survey of how business computing has evolved over the past 50 years. We organize this survey along two dimensions:
Eras of Business Computing: We divide the timeline of business computing into six eras distinguished by one or more dominant themes of computer use during that era. Section 2 describes each of the eras and its dominant themes. Business Computer System Technologies: An integrated business computer system is composed of three essential technology components: - the computational platform (i.e., the hardware and systems software) - telecommunications (i.e., data transmission and networking) - the software (i.e., programming and application software). A business system architecture integrates these three components into a
functional, effective business system. Figure 1 illustrates the inter-relationships of the technologies in a business system. Sections 3-6 cover the evolution of these respective technologies through the six business computing eras. We conclude the chapter by presenting a summary of the major conclusions we draw from this survey of business computing. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact into the 21st century.
4
ALAN R. HEVNER AND DONALD J. BERNDT
Sotware
Computational platform FIG.
Teleaommunications
1. Technologycomponents of a business computer system.
2.
Business C o m p u t i n g Eras
A business is a pattern of complex operations in the lives of people concerning all the functions that govern the production, distribution, and sale of goods and services for the benefit of the buyer and the profit of the seller. Businesses and essential business processes have existed since the beginning of civilization. The ability to manipulate and manage information has always been a critical feature of all business processes. Even before number systems were invented, shepherds would count their flocks by using a pebble to represent each animal. Each morning as animals left the fold to graze, the shepherd would place a pebble in his pocket for each animal. Then in the evening he would remove a pebble for each animal entering the fold. If any pebbles remained in his pocket, the shepherd knew that he would need to search for those who were lost. ~ Throughout the history of business, personnel and techniques for managing information have been central to all business processes. In the early part of this century, the term computer was defined in the Oxford English Dictionary as "one who computes; a calculator, reckoner; specifically a person employed to make calculations in an observatory, in surveying, etc." Thus, computers, as people, have always been with us. Several important mechanical tools and techniques to support the task of computing have evolved over the millennia, such as the abacus, the slide rule, adding machines, punched cards, and filing systems. Charles Babbage's Difference Engine, circa 1833, was a particularly interesting and significant example of a mechanical device that aided in the calculation and printing of I This example is taken from an exhibit in Boston's Computer Museum.
ERAS OF BUSINESS COMPUTING
5
large tables of data [1]. A limited ability to program the Difference Engine to perform different tasks foreshadowed the stored program concept of modern computers. The coming of the electronic computer during the 1940s changed the popular conception of a computer from a person to machine. From that point on the history of business computing has been one of rapid technical advances and continual evolution marked by several revolutionary events that divide the past 50 years into 6 distinct eras. Table I proposes a classification of Business Computing Eras. In the remainder of this section we discuss the major business computing themes found in each of the eras.
2.1
Era of Calculation (Before 1950)
It is possible to trace the links between the Moore School and virtually all the government, university, and industrial laboratories that established computer projects in America and Britain in the late 1940s. [2] Before 1945, humans as computers performed manually all of the essential business processes in organizations. They managed business data, performed calculations on the data, and synthesized the data into business information for decision-making. Human computers were aided by sophisticated mechanical tools to perform both routine and complex business tasks. However, the limitations of human capabilities to deal efficiently with large amounts of information severely constrained business effectiveness. There are several excellent references on this era of business computing [3, 4, 5]. In addition to Babbage's Difference Engine, several other business computing milestones stand out: 9 The 1890 Census and the Hollerith Card [6]: Herman Hollerith invented a punched-card record keeping system that was employed in tabulating the 1890 US census. The system was an overwhelming success. The quality of the data was improved, the census was completed in 2.5 years compared with 7 years for the previous census, and the cost was significantly reduced. Hollerith became a successful entrepreneur with his punched-card systems, founding a company that eventually led to the beginning of International Business Machines (IBM). 9 The Mechanical Office: Essential business processes were performed on innovative devices such as adding machines, typewriters, and Dictaphones. The mechanical office was considered a marvel of efficiency in this era before electronic computers became business tools.
6
ALAN R. HEVNER AND DONALD J. BERNDT
TABLE I BUSINESS COMPUTING ERAS Era
Years
Themes
Era of Calculation Digital computers for scientific calculation
< 1950
Era of Automation General purpose computers for business
1950-64
Era of Integration and Innovation Hardware and software for business solutions
1965-74
Era of Decentralization Communication dominates computation
1975-84
Era of Reengineering and Alignment Effective utilization of the technology spectrum to solve business problems
1985-94
Era of the Internet and Ubiquitous Computing Reorganization for the wired world
> 1995
Human computers The mechanical office IBM and punched-card machines NCR and point-of-sale equipment World War II and military research (Harvard Mark I, Whirlwind, ENIAC) UNIVAC and the Census Bureau UNIVAC predicts 1952 election IBM domination and the 700 series Automation of basic business processes with high cost-benefit IBM 1401 and the business solution IBM System/360 redefines the market Software outlives hardware Minicomputers enter the fray Human-computer synergies exploited in business systems Winds of change (integrated circuits, microprocessors, Xerox PARC and the office of the future) Powerful microprocessors Personal computers LANs WANs and the Internet Focus on the desktop PC networking supported All the pieces are in place Using all the pieces System and software architectures WWW and "'.com" Total Quality Management initiatives IT enables the reengineering of critical business processes Alignment of business strategy with information technology strategy Critical mass of desktops The Internet means business Traditional organizational boundaries fall The virtual organization New business models Electronic commerce and the digital economy
ERAS OF BUSINESS COMPUTING
7
9Cash Registers and the National Cash Register Company (NCR): Under John Patterson and Charles Kettering, NCR developed advanced cash register technology and invented the key sales practices of the industry. NCR, through its sales force, introduced point-of-sale computing technology across the US. 9The Founding of IBM [7]: Thomas Watson, Sr. was an early graduate of the NCR sales training school. His business genius brought together a number of fledging office machine companies into International Business Machines (IBM). By 1940, IBM was the leading seller of office machines in the world. The rapid evolution of mechanical business machines was eclipsed by the intense scientific pressures of World War II which resulted in the beginnings of the digital computer. Throughout history, the demands of war have often given rise to scientific advances and the resulting military inventions often proved crucial on the battlefield. World War II, coupled with already advancing technologies in many fields, was a truly scientific war and military research would play a central role in the field of digital computing. The goal of these early computing projects was scientific computing and the construction of flexible and fast calculators. The critical role of military research funding is easily recognized by the now famous projects that were initiated by the armed services. The Navy sponsored Howard Aiken (along with IBM) and the Harvard Mark I, the Air Force had Whirlwind at MIT under the direction of Jay Forrester, and the Army sponsored the construction of the ENIAC by Eckert and Mauchly at the University of Pennsylvania. All of these projects made significant contributions in both computer technology and the training of early computer pioneers. Probably the most influential work was that of Eckert and Mauchly on the ENIAC and EDVAC. Through their work, and the consultation of others such as John von Neumann, the fundamental principles of the stored-program computer would emerge, providing the foundation for the modern computers we use today. From a business computing perspective, Eckert and Mauchly were among the first to recognize the general-purpose nature of computers and would pursue their vision in a commercial form by developing the UNIVAC--marking the transition to the Era of Automation.
2.2
Era of Automation (1950-64)
When Eckert and Mauchly established their Electronic Control Company in March 1946, they were almost unique in seeing the potential for computers in business data processing, as opposed to science and engineering calculations. [2]
8
ALAN R. HEVNER AND DONALD J. BERNDT
The invention of the electronic computer and its realization in the ENIAC at the University of Pennsylvania Moore School by J. Presper Eckert and John Mauchly circa 1945 led to the first computer start-up firm and the UNIVAC. As the first UNIVAC became operational and passed the Census Bureau acceptance tests in 1951, the Era of Automation began. Over the next two decades, electronics companies (e.g., RCA, GE, Honeywell) and business machine companies (e.g., IBM, Burroughs, Control Data, NCR) scrambled to become leaders in the computer field. For a time, the UNIVAC became synonymous with modern business computing, enjoying public successes such as the 1952 Eisenhower election prediction televised on CBS. However, IBM would come to dominate the industry by the middle 1950s when the IBM installed computer base first exceeded that of the UNIVAC. Soon the computer industry was characterized as IBM and the seven dwarves, later to become IBM and the BUNCH as acquisitions and retrenchments reduced the competitors to five: Burroughs, UNIVAC, NCR, Control Data, and Honeywell. IBM would remain at the pinnacle for decades, defining most business computing trends, and continues to this day to be a powerful force in the industry. Two important themes emerged during this era of business computing 9 First, the essential technologies of hardware and software evolved rapidly as companies invested in computer-related research and development. 9 Secondly, the application of technology was conservative, targeting high-payoff, low-risk business processes, due to the difficulties and expense of constructing hardware intensive information systems. The computing platforms of the day included the IBM 700 series, culminating in the 705 for business computing and the 704 (with FORTRAN) for scientific applications. The high-end 700 series would form the basis for the introduction of a transistorized machine, the IBM 7090, the quintessential room-sized mainframe. At the low end, the IBM 650 used reliable and inexpensive magnetic drum technology to offer an affordable business computer. IBM would apply the lessons learned in providing solutions to the emerging cost-conscious business computing market, developing the IBM 1401 and associated peripherals. Thus, the business solution became the computer. Legacy code developed for the successful model 1401 is still in use as we enter the new millennium. A fascinating snapshot of software-based computer technology as it existed during the Era of Automation (in 1961) is provided by Calvin Gotlieb in his chapter, "General-Purpose Programming for Business Applications," which appeared in the first volume of Advances in Computers
ERAS OF BUSINESS COMPUTING
9
[8]. Gotlieb considered the major data processing problems of the day. These problems still sound familiar: 9 understanding and representing system requirements 9 achieving sufficient speed and storage for business applications 9 performing program checking and maintenance. Gotlieb examined the 1961 state-of-the-art in programming techniques and file management methods. Several example systems were presented (e.g., the IBM 705, Flow-Matic) and the chapter concluded with two future directions of business programming--parallel operations (which we still have not mastered) and improved computer instruction sets. As the era's second theme, organizations began to identify the critical business processes that would adapt most readily to automation by computer. The term automation entered the business lexicon in 1947 via the Ford Motor Company and was given widespread coverage in the business press [9]. For instance, John Diebold defined automation in a book by that title as the application of computer-based feedback mechanisms to business and industrial practice [10]. Resistance to being on the leading (or bleeding) edge of computerization was great, with the cost of new computers remaining high. Only the most obvious business processes with a high costbenefit payback were considered for automation. Early adopters of computer technology to automate their business processes were the US Census Bureau, the Prudential Insurance Company, A.C. Nielsen, and Northrop. The most dramatic initial application of computing was during the CBS broadcast of the 1952 election returns. A UNIVAC computer correctly predicted the Eisenhower landslide over Stevenson on the basis of preliminary voter results. However, election officials and network executives refused to announce the prediction because they did not trust the computer. The initial successes of business automation led rapidly to the introduction of computers into a majority of organizations throughout the 1950s and early 1960s. However, scarcity of qualified computer operators, the cost of computer hardware, and limited application software dampened rapid dissemination of computers into all aspects of the business. Automation was applied to the business processes considered "low hanging fruit" on the basis of thorough cost analyses.
2.3
Era of Integration and Innovation (1965-74)
The architecture of IBM's System/360-370 series of compatible processors is one of the most durable artifacts of the computer age. Through two major revisions of the product line and 23 years of technological change, it has remained a viable and versatile interface between machine and user. [11]
10
ALAN R. HEVNER AND DONALD J. BERNDT
As the title indicates, this era is characterized by two quite independent themes. The first is the introduction of the IBM System/360 family of compatible computers, which fostered the notion of true systems integration and platform-independent software. The second theme is the incredible changes in computing technologies, such as component miniaturization and new developments in data communications, that occurred even while IBM enjoyed its large-scale success in systems integration. The seeds of change were being sown. The new era in business computing began with IBM's announcement of the System/360 in April 1964. The daring move by IBM to replace its entire product line with a new and uniform computer system architecture caught the business community off guard. But when organizations realized the advantages of an integrated product line that ranged from smaller departmental computers to large organizational mainframes, they rushed to purchase the System/360. Within the first 2 years of production, IBM could fill less than half the over 9000 orders [2]. As it turned out, the 360370 computers remained the backbone of the IBM product line for nearly 30 years. During the decade from 1965 to 1974, businesses matured in their use of computer technology to support their business functions. The simplistic automation of basic business processes gave way to the full computerization of critical business functions at the center of major organizations. To a large extent, the success of System/360 recognized that the software investment was the important cost and that a family of compatible systems minimized the risks. This maturity resulted from two dominant themes of the e r a - integrated systems approaches for solving business problems and technology innovations in hardware, software, and telecommunications. The innovations that characterize the Era of Integration and Innovation include some important developments that laid the foundation for the radical changes of the next era. Though there were many innovations during this era, four stand out for the current discussion. The first is the development of integrated circuits, invented through pioneering projects at Texas Instruments and Fairchild Semiconductor begun in the late 1950s with patents granted in 1964. It is the astounding progress in integrated circuits as the technology moved through mediumscale integration (MSI), large-scale integration (LSI), and very large-scale integration (VLSI) that eventually led to the microprocessor--the veritable computer on a chip. Medium-scale integration provided the technological foundation for inexpensive, yet powerful minicomputers. As chip densities increased and components were miniaturized, the price/performance ratio of these machines followed the historic and favorable march that led to modern desktop computing. In fact, it was in 1964 that Gordon Moore, a
ERAS OF BUSINESS COMPUTING
11
pioneer at Fairchild Semiconductor and a cofounder of Intel, noted that integrated circuit density was doubling every year, an observation that came to be known as Moore's law. However, it would take some time before computer researchers were able to internalize Moore's law and grow accustomed to developing computing technologies ahead of the performance levels that would be delivered by the staggering leaps in integrated circuit technology. Essentially, Moore's law predicted that by the end of this era in 1975, all of the circuits of a classic mainframe such as the IBM 7090 could be implemented on a single chip. A second innovation was the developing model of interactive computing. That is, how should we use the new departmental computer and how do we get computing power to the desktop? While a wide range of researchers contributed to the vision, many of the pieces came together at Xerox PARC through their efforts to build the office of the future. The desktop metaphor and the W l M P (windows, icons, mouse, and pull-down menus) interface were realized on the Xerox Alto connected via a local area network. All these elements would find their way into commercial products in the next era. The third and fourth innovations are in the realm of data communications. In the local area environment, Ethernet was developed at Xerox PARC, breathing life into their conception of the office of the future. In the wide area environment, A R P A N E T was successfully implemented and flourishing in the research community. The infrastructure for internetworking was being developed as the descriptions of the influential Transmission Control Protocol and Internet Protocol (TCP/IP) suite were being published. These developments would provide the range of connectivity options that have wired the world. In summary, the Era of Integration and Innovation saw IBM and its flagship System/360 family of integrated computers and peripherals further dominate the industry. Even though minicomputers became a force in the commercial sector, they were perceived as co-existing rather than competing with the mainframe. Mainframe systems represented a business-sensitive balance of processor performance, critical peripherals, and data management through fast input/output channels. Herbert Grosch is associated with a less scientific law, Grosch's law [12, 13]. This stated that you would get more computer power for your money by buying a single large machine rather than two smaller machines. In fact, the rule was probably quite accurate at the time owing to the balance of strengths that mainframe systems embodied and the difficulties in connecting a group of smaller machines for work on coordinated tasks. Improvements in networking, software, and storage systems were necessary to make alternative information system architectures more competitive. Developments in many of these
12
ALAN R. HEVNER AND DONALD J. BERNDT
key technologies would characterize the Era of Decentralization and beyond. 2.4
Era of D e c e n t r a l i z a t i o n
(1975-84)
Ethernet was up against "sneakernet" from the very start. All that changed overnight in 1975 with the advent of SLOT, Starkweather's laser printer. The virtues of the combined system called EARS--the Ethernet, the Alto, the research character generator, and SLOT--were too powerful to ignore. One could now write a memo, letter, article, or dissertation and with the push of a button see it printed in professional-quality type. [14] The four innovations described in the previous era: microprocessors, interactive computing, local area networking, and internetworking protocols, as well as many other innovations, would be commercially realized in a torrent of groundbreaking products in the Era of Decentralization. This leads us to highlight 1975 as the computation-communication inflexionpoint when the role of the computer became more of a communication device than a computing machine (see Fig. 2). All of the technological pieces for effective computer communications came together in this era. Hardware that spans the range from large mainframes to minicomputers to desktop computers became available. Connectivity options in both the wide area and local area environment matured, moving into the business computing market. Both computation and communication technologies finally provided the flexibility to create a rich set of information systems architectures. The stage was set for a radical change in our model of computing, with traditional computation being supplanted by communication as the goal of the digital computer. The 1970s were characterized by rapid developments on a host of fronts. The microprocessor was commercially realized in the early 1970s with Intel's introduction of the 4004 and 8008. However, the appearance of the
Computation
Communication
9 focus
" 1950
1960
1975
1990
2000 Time
FIG. 2.
The computation-communication inflexion.
ERAS OF BUSINESS COMPUTING
13
Altair 8800 on the cover of Popular Electronics in January 1975 has become synonymous with the dawn of personal computing. The Altair was based on the Intel 8080 microprocessor and was essentially a minicomputer available for under $400. This entry in the hobbyist market soon gave way to personal computers from Apple and IBM. It was the 1981 introduction of the IBM Personal Computer (PC), with an open architecture, primitive operating system, and low cost that moved the personal computer from the hobbyist benches to corporate desktops running programs such as Lotus 1-2-3. It was the legendary Xerox Palo Alto Research Center (PARC) that implemented the interactive desktop, developed the laser printer, and networked the components to form the office of the future around the experimental Alto computer in the 1970s. Though Xerox would delay commercial introduction of much of this technology, eventually introducing the Xerox Star, it would be the lower cost offerings by Apple and IBM that would bring the desktop computing metaphor to the business computing market. The IBM PC was a major gamble by IBM to enter a market previously dominated by hobbyists, game players, and educators. Another IBM gamble was to outsource the PC processor, the 8088 chip, to Intel and the operating system, MS-DOS, to a little-known firm named Microsoft. Contrary to IBM's traditional way of doing business, the IBM PC architecture was open to competitors to make their own PC clones. All of these features plus the low individual unit cost made the PC the right vehicle to allow businesses to distribute processing power throughout their organizations. In 1983 Time magazine named the Personal Computer as its Man of the Year. In 1984, Apple Computer introduced the Macintosh with a commercial during the American Football Super Bowl that was recently named the greatest commercial in TV history by TV Guide. By 1985, businesses had accepted the distribution of computing power to the desktop and were ready to redesign their business processes based on the new distributed computing architectures. In the area of data communications, developments in both local area networking (LAN) and wide area networking (WAN) technologies would add to this revolutionary era. Again, Xerox PARC played a central role in fostering new technologies by developing and utilizing Ethernet as their LAN technology. Xerox would then license the technology for a nominal fee, spurring the development of a cost-effective LAN standard that remains the technology of choice to this day. Influential WAN technologies that had developed under the auspices of the ARPANET and stood the test of time became part of the computer research community infrastructure. In addition, the TCP/IP internetworking protocols were being refined. The ARPANET, which originally developed among a small group of pioneering institutions, moved to the TCP/IP protocol in 1983. The National Science
14
ALAN R. HEVNER AND DONALD J. BERNDT
Foundation (NSF)created a national network, NSFNET, based on TCP/IP to broaden connectivity throughout the computer research community as the original ARPANET was transformed into a collection of new networks - - a true Internet. So, by 1985 personal computers, the desktop metaphor, and a range of computer networking technologies were in place. The question became, how to use these technologies creatively in the business environment?
2.5 Era of Reengineering and Alignment (1985-94) Reengineering is the fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical, contemporary measures of performance, such as cost, quality, service, and speed. [15] An impressive technological toolkit was invented and deployed during the previous two eras. The new Era of Reengineering and Alignment is characterized by efforts to apply the wide range of technologies to better support existing business processes, as well as to enable the design of entirely new processes. New business system architectures were developed to take advantage of inexpensive computing power and networking options for distributed information processing. In particular, client-server architectures provided the combined advantages of centralized storage of large businesscritical data while moving the application processing to the client location. Technical advances in LANs and WANs supported efficient data transmission among clients and servers. The mid-1980s saw the rise of total quality management (TQM) initiatives. TQM gurus, such as W. Edwards Deming and Phillip Crosby, brought the message of statistical quality control and continuous process improvement to business organizations all over the world [16, 17]. Business process reengineering (BPR) became an important activity to achieve higher levels of quality and productivity. Organizations took a clean, fresh look at their critical business processes in order to discover revolutionary new approaches. BPR was made possible by the new, flexible computer system architectures. Michael Hammer and Thomas Davenport championed the strategy of employing innovative information technologies to reengineering the business organization [18, 19]. By the early 1990s, businesses had accepted the importance of aligning their corporate strategies with their information technology strategies and vice versa [20,21]. Computer systems no longer played an administrative support role in the organization. They became a primary factor in determining whether the organization succeeded or failed in a highly competitive business environment.
ERAS OF BUSINESSCOMPUTING
2.6
15
Era of the Internet and Ubiquitous Computing (1995 onwards)
The Internet is quickly becoming the global data communications infrastructure. Whatever underlying changes are made to its technologies, the result will be called the Internet. [22] Most of the past eras included the invention of new technologies as major themes. In a sense, the Era of the Internet and Ubiquitous Computing is more about reaching a critical mass than any single technological innovation. The range of technologies described earlier, such as the PC and LAN, were deployed throughout the corporate environment, as well as the home. Of course, the ceaseless march of price/performance improvements provided new and better platforms, but the main theme is that a critical mass of desktops had been reached. In large part this latest era of business computing is centered on the revolutionary influence of the Internet and the World Wide Web (WWW). Again, the technological innovations had been developed in past eras, with the evolving precursor to the Internet used by the research community. In the early 1980s the Internet adopted the T C P / I P protocol suite and supported thousands of nodes. By the mid-1980s a consensus developed around the domain name system (DNS) giving us the now familiar naming conventions including ".com" for commercial entities. In 1989, Tim Berners-Lee at C E R N in Geneva had the insight to develop a simple protocol to utilize the Internet as a global hypertext system to share information. These initial protocols have expanded to become the World Wide Web, again a development driven by the critical mass of computing stations reached in this new era. Future visions of the WWW would have it becoming a World Wide Computer or even a World Wide Brain for our global community [23]. The potential of the Internet for business was not fully realized until around 1995. Product sales on the Internet in 1996 were approximately $500 million. Estimated Internet sales for 2000 are in the range of $10 billion. The reengineering of critical business processes is no longer enough for business success and survival. The business environment has changed dramatically. The critical mass of connected desktops and "tabletops," has been highlighted in the business press with a November 1994 issue of Business Week stating "How the Internet will change the way you do business" and the 70th anniversary issue of Business Week (October 4, 1999) announcing "The Internet Age." Traditional organizational boundaries and business models do not hold any more. Especially affected are the marketing and sales functions. The Internet provides connectivity to a large percentage of the world's population for dissemination of marketing information. However, important issues of information overload and privacy/security concerns must be understood and addressed.
16
ALAN R. HEVNERAND DONALDJ. BERNDT
As business boundaries dissolve, organizations must establish a recognizable and trusted presence on the Internet. In essence, the organization becomes a virtual organization. Most customer and client relationships are conducted via the push and pull of information on the Internet. Business computer systems during this Era of the Internet and Ubiquitous Computing will look very different from those in past eras. New business computing strategies will necessarily involve a major telecommunications component to support access to the Internet, internal intranets, and private networks with strategic partners. The opportunities and challenges facing organizations and their business computing systems are many. The future holds an exciting New World of business computing with future eras yet to be defined.
11
The C o m p u t a t i o n Platform: H a r d w a r e and Operating Systems
The subject of this section is the business history of computing machinery-the hardware and associated software, such as operating systems, that provide the computing platforms on which we construct business information systems. Historical accounts tell us that the first business computer was the LEO (Lyons Electronic Office), a British computer system that was used in the J. Lyons & Company catering firm [24, 25]. In 1951 the first routine office task, weekly bakery valuations, was automated via the LEO computer. LEO computers were used for other business applications in Britain during the 1950s, including payrolls and inventories. However, the story of business computing is largely a tale of the explosive growth of the American digital electronics industry, spurred on by World War II and fueled by a rapidly growing economy [26]. Although wartime demands and military funding played a critical role, an incredible array of research universities and corporate research laboratories developed technologies on many fronts, involving a truly international community of scholars that were drawn to a near endless stream of leading-edge projects. There is a rich history and body of literature that chronicles the rise of the computing engines, as well as their inventors, which have changed the direction of an industry and the world. Two excellent books on the history of computing are [2] and [9]. An interesting article-length treatment is found in [27]. We will draw upon classic historical milestones in the current discussion, but will focus our attention on how computing platforms evolved in support of business processes. The hardware and software that make up modern computing platforms were developed during the second through fourth eras: Automation,
ERAS OF BUSINESS COMPUTING
17
Integration and Innovation, and Decentralization. The last two eras, Reengineering and Alignment, as well as the Internet and Ubiquitous Computing Era, focus on the application of technologies that matured during the earlier eras. This section begins by considering the scientific computing that characterized the Era of Calculation, then moves on to consider the Era of Automation that began with the UNIVAC. The Era of Integration and Innovation is marked by the introduction of IBM's longlived System/360, but also includes innovations such as integrated circuits and other technologies that would lay the foundation for the following era. The Era of Decentralization focuses on the microprocessor, desktop computing platforms, and computer networking advances that made this one of the most revolutionary eras.
3.1
Three Classic Computer Hardware Generations
The history of computer hardware is often divided into three generations, corresponding to the transformational technologies that enabled new types of computers: vacuum tubes, transistors, and integrated circuits. A fourth generation is sometimes added with the advent of large-scale integrated circuits and microprocessors that provided the foundation for personal computers [28]. Each of these technologies allowed the next generation of computers to be more reliable, less expensive, smaller in size, and vastly more powerful. These technological milestones therefore define fundamental hardware generations. However, from a business computing perspective, the information systems that have developed result from a combination of computing platforms, communication technologies, and the ability to provide solutions through software applications. Even within the computer hardware industry, success in business computing did not always go to the most technologically advanced. The great business computing system products arose from strong computing platforms, well-designed peripheral equipment, knowledgeable sales and support teams, and a focus on business solutions and software applications. In viewing the history of computing from a business systems perspective, the combinations of technologies and the information systems architectures provide a more applicable set of business computing eras as outlined in Section 2.
3.2
The Role of Universities and Military Research
The development of computing technologies, clearly one of the most astounding industrial revolutions in history, was influenced at every early step by university and military research. World War II and the race for wartime advances drove the early development of computing technologies
18
ALAN R. HEVNER AND DONALD J. BERNDT
and investments by the military and other government-sponsored programs would continue to payoff at key points in the rise of the computer. However, it is even more interesting that a loose collection of universities and individual research scientists would play a pivotal role in the developments in almost every area of computing. From the University of Pennsylvania and the work of Eckert and Mauchly to the rise of Silicon Valley and Boston-based computer firms, universities and research laboratories have played a central role in the generation of whole new industries. Although universities throughout the world have made substantial contributions to the field, American research universities have been at the center of information technology advances, drawing graduate students and researchers from around the globe. The tale of computing history provides one of the most powerful arguments for continued support of research universities, the incubator of the Information Age.
3.3
Twin Roads: Scientific Calculation and Business Automation
The initial research projects and commercial computing endeavors were scientific in nature. The early computers were the equivalent of weapons, intensively pursued in the laboratories of research universities, military organizations, and the defense industry. Therefore, the natural focus of early computer projects was scientific computing. In fact, even after computer pioneers such as Eckert and Mauchly, as well as the businessoriented IBM, saw the commercial potential of computers, there remained a distinction between scientific and business computing. The twin markets did not move in tandem, often requiring different capabilities to create the next generation. The origins and early applications of the computer were driven by the demands of scientific calculations--the Era of Calculation.
3.3. 1 Punched-card Methods As the digital computer was emerging, most large businesses relied on punched-card machines for business data processing, usually IBM products. The legendary IBM sales force built strong relationships with accounting departments in most industries, and business processes reflected the information handling capabilities of punched-card equipment. The typical installation employed a collection of special purpose machines that could read, tabulate, or print reports from a deck of cards. The extraordinary benefits of the punched card are outlined in an IBM sales brochure from the early 1960s (Fig. 3) [9].
ERAS OF BUSINESS COMPUTING
19
What the punched hole will do? 9 It will add itself to something else. 9 It will subtract itself from something else. 9 It will multiply itself by something else. 9 It will divide itself by something else. 9 It will list itself. 9 It will reproduce itself. 9 It will classify itself. 9 It will select itself. 9 It will print itself on an IBM card. 9 It will produce an automatic balance forward. 9 It will file itself. 9 It will post itself. 9 It will reproduce and print itself on the end of a card. 9 It will be punched from a pencil mark on the card. 9 It will cause a total to be printed. 9 It will compare itself to something else. 9 It will cause a form to feed to a predetermined
position, or to be ejected automatically, or to space one position to another. FIG. 3. The benefits of punched cards.
Punched cards were a flexible and efficient medium for data processing. This technology was among the most successful attempts at automating business processes and was in widespread use in the 1930s. In many ways, the familiarity and reliability of punched-card technology stood in contrast with the fragile nature of early computers. One final observation from the IBM sales brochure points out another obvious benefit since "An IBM card--once punched and verified--is a permanent record" [9]. One of the most important aspects of the punched-card model of computation was that simple operations were performed on decks of cards, while human operators shuffled these decks from machine to machine. This meant that the order of data processing steps had to be carefully considered, with a single step being applied to an entire deck of cards. However, most business processes of the time had evolved in tandem with the punched-card model, and therefore were well suited to the technology. Applying a complex sequence of operations would have required moving a single card from machine to machine; a style of processing that would be a liberating characteristic of the newly emerging digital computers. The demands for complex sequences of operations--essentially the equivalent of our modern-day computer programs--came at first from the
20
ALAN R. HEVNER AND DONALD J. BERNDT
scientific and engineering communities. Scientists began using punchedcard equipment for scientific calculations, with one of the most influential centers being the IBM sponsored Watson Computing Bureau established at Columbia University in 1934 [9]. Wallace Eckert of the Computing Bureau outlined the scientific use of punched-card machines in his book Punched Card Methods in Scientific Computation [29]. In time, the generalpurpose computer would provide a uniquely flexible environment, becoming the dominant tool for both scientific calculations and business automation.
3.3.2
IBM 601 Series and the Card Programmed Calculator
The demand for complex sequences of operations led to a series of innovations that allowed punched-card machines to "store" several steps and apply them as each card was read. The work of Wallace Eckert noted above included the development of control switches that would allow short sequences of operations to be performed by interconnected punched-card machines. IBM would develop similar capabilities in the Pluggable Sequence Relay Calculator (PSRC), with the first machines being specially built for the Ballistic Research Laboratory (BRL) at the Aberdeen Proving Ground in Maryland. The Ballistic Research Laboratory, wartime challenges, and later military research would play a continuing role in the development of modern computing technology. In fact, the ENIAC would soon take up residence at the B RL, alongside several more relay calculators. Adaptable punched-card machines became a commercial reality with the IBM 601, a multiplying punch introduced in 1935 that found widespread use for scientific and statistical applications. In 1946, the IBM 603 was introduced, based on vacuum tube technology. Punched-card processing would reach its zenith with the IBM 604 and 605, among IBM's most popular products of the time, with an installed base of over 5000 machines [9]. These machines combined vacuum tube technology and support for short operation sequences, to produce a reliable and flexible computing platform. Similar developments by scientific customers, such as Northrop Aircraft, led IBM to continue development of interconnected punchedcard machines, eventually marketing the Card Programmed Calculator (CPC). Commercial digital computers would render these complex punched-card machines obsolete. However, the Era of Calculation includes several other historic efforts that mark the transition from calculator to computer.
ERAS OF BUSINESS COMPUTING
3.3.3
21
Harvard Mark I: The IBM Automatic Sequence Controlled Calculator
As early as 1929, IBM had established a research relationship with Columbia University, providing calculating equipment and funds for statistical computing. IBM would fund another early computer researcher's effort at Harvard University. Howard Aiken became interested in computing through his research in theoretical physics and began searching for sponsors for a Harvard-based research project. Aiken presented the project to IBM in 1937 [2]. The fact that IBM became not just a sponsor, but a research collaborator, is indicative of IBM's early interest in computing and the pivotal role research universities would play in this emerging industry. Howard Aiken rediscovered the work of Babbage while developing the specifications for the calculating machine. In fact, fragments of Babbage's calculating engine were found in a Harvard attic, having been donated by Babbage's son [2]. IBM provided funds, equipment, and the long-term engineering expertise to actually construct the behemoth electromechanical calculator using the high-level specifications provided by Aiken. The IBM Automatic Sequence Controlled Calculator, better known as the Harvard Mark I, could perform 3 addition/subtraction operations per second and store 72 numbers, with more complex operations taking substantially longer. The actual inauguration of the machine was somewhat controversial as Aiken failed to acknowledge the financial and engineering contributions of IBM, alienating Watson and other IBM executives [30]. The rapid development of digital computers at the end of World War II made the Harvard Mark I obsolete as it was completed. Though operational for 15 years, it served simply as a Navy calculator. However, it is a milestone since it was one of the first automatic calculating machines to be successfully built and was a fertile training ground for early computing pioneers such as Grace Murray Hopper and a host of IBM engineers. IBM would renew its commitment to Columbia University and gain further expertise by developing an even more advanced calculator, the Selective Sequence Electronic Calculator (SSEC), which was eventually installed in the lobby of IBM's New York City headquarters [2].
3.3.4
University of Pennsylvania, the ENIAC, and the EDVAC
J. Presper Eckert and John Mauchly pursued an intensive research program to deliver computing technology to the military during World War II at the University of Pennsylvania's Moore School of Electrical
22
ALAN R. HEVNER AND DONALD J. BERNDT
Engineering [31, 32]. The computing projects at the Moore School led to the development and clarification of a computing model that would spark the development of commercial computing and remain relevant to this day. The first of the Moore School electronic computers, the Electronic Numerical Integrator and Computor (ENIAC) 2 was a massively complex device for its time, using roughly 18 000 vacuum tubes, 90 000 other devices, such as resistors, capacitors, and switches, and several miles of wire [33]. It is a tribute to Eckert's engineering talents and the commitment of all involved that the machine came into existence. However, before the machine was even completed, several design drawbacks became apparent. The most important from the current perspective is the difficulty in "setting up" or programming the machine for new computations. It was necessary to rewire the machine using patch cords to specify a computation, a painstaking task that took hours or even days. The wire-based approach was adopted in order to deliver the instructions at electronic speed. Typically, paper tape, punched cards, or some other input mechanism was used, which was adequate for speeds such as the 3 operations per second executed by the Harvard Mark I. However, the speed of the ENIAC processor, at 5000 operations per second, required a correspondingly fast mechanism for instruction delivery in order to keep the processor working efficiently, hence the use of wiring to specify the program [33]. The mismatch between processor speed and primary storage persists to this day and has resulted in a complex storage hierarchy in modern computers, with cache memory and other mechanisms used to deliver needed information at high speeds. The careful hand-wired specification of a particular computation suited the numerical applications involved in the nascent scientific computing efforts, but was not an economical model for the everyday demands of business computing. Although the notion of storing "programs" or sequences was already recognized and being incorporated in a very limited fashion in punched-card machines, it was becoming clear that a more flexible and robust scheme was necessary for general computing. In addition to difficult programming, and the mismatch between processor speed and instruction delivery, the machine also suffered from a limited memory size. Before the ENIAC was completed, the Moore School team was already at work on a new machine, the Electronic Discrete Variable Computer (EDVAC). One important innovation was Eckert's proposal for a mercury delay line, circulating microsecond-long pulses, to provide 1000 bits of permanent memory [34]. This would address the most obvious shortcoming of the ENIAC, a very small main memory. 2The spelling is taken from the nameplate on a piece of equipment in the Smithsonian collection.
ERAS OF BUSINESS COMPUTING
23
As the designers debated improvements to the ENIAC and focused on increasing the storage, one of the great insights of the computer age would emerge--the stored-program concept. The designers of the ENIAC recognized that programs, as well as data, could be stored in computer memory. Thus programs could be loaded into computer memory using a peripheral device, such as a paper tape, with both data and instructions delivered to the processor at electronic speeds during execution. In fact, the program itself could then be manipulated as data, enabling the future development of programming tools and new research areas such as artificial intelligence [35]. This fundamental stored-program concept, simple in hindsight, is one of the essential characteristics of all digital computers to this day. John von Neumann learned of the ENIAC project through a chance meeting with Herman Goldstine, the liaison officer from the Ballistics Research Laboratory, and eventually joined the project as a consultant during the design of the EDVAC [36]. Von Neumann had an established reputation and his participation brought a new level of visibility to the project, which had originally engendered little confidence among the military sponsors [37]. He was very interested in the logical specification of these new computing machines and helped the original designers address the problems inherent in the ENIAC's design. In 1945, the EDVAC design was mature enough that yon Neumann wrote his famous treatise, A First Draft of a Report on the ED VAC, clearly specifying the logical foundations of the machine, as well as the stored-program concept [38]. This famous report is recognized as one of the founding documents of modern computing. The initial draft report was meant for internal distribution, but it became widely available. John von Neumann's sole authorship is an unfortunate historical accident and contributed to tensions between the original project leaders, Eckert and Mauchly, and von Neumann's group that contributed to the abstract design [2,9]. However, von Neumann's clear exposition laid out the major components of the EDVAC design at an abstract level, including the memory (for both data and program), control unit, arithmetic unit, and input/output units. This architecture, often called the yon Neumann architecture, along with the stored-program concept, has served as the foundation for modern digital computers. In the summer of 1946, the Moore School held a series of lectures in which the stored-program concept and computer designs were laid out for a group of researchers representing most of the leading laboratories, thereby influencing almost all of the developing projects.
3.3.5
The ABC and Z3
Although Eckert and Mauchly's work at the Moore School matured to become the most influential model for early general-purpose computers, other
24
ALAN R. HEVNER AND DONALD J. BERNDT
efforts provided guidance in the evolution of the computer. Code-breaking and other military needs led to pioneering efforts in computing in the US, England, and Germany during World War II. A university-based project led by Professor John Vincent Atanasoff began in 1937 at Iowa State University, and with the help of Clifford Berry, a graduate student, resulted in a rudimentary electronic computing device by 1939. John Mauchly would visit Atanasoff in 1940 at Iowa State University and stayed for several days at Atanasoff's home, learning about what we now call the Atanasoff-Berry Computer (ABC) [2]. The direct and indirect influence that Atanasoff's work had on the later Moore School project has been debated both from a historical and legal perspective (for patent applications). Other historians attribute the first working, fully programmable generalpurpose computer to Konrad Zuse in Germany [39]. His Z1 programmable calculator was completed in 1938 and the nearly equivalent Z3 computer was completed in 1941. These machines performed calculations for the German war effort. The Z4 saw postwar operation at the Federal Technical Institute in Zurich [9]. A consensus has evolved that there is no one inventor of the computer, with more than enough credit in the birth of computing to recognize all involved.
3.4 The Rise of the General Purpose Computer Eckert and Mauchly were among the early pioneers who recognized the commercial applications of the Moore School projects. They would leave the University of Pennsylvania to form one of the original computer startup firms, the Eckert-Mauchly Computer Corporation (EMCC), ushering in the Era of Automation. Though EMCC was ultimately absorbed by Remington Rand, Eckert and Mauchly's perception of the computer market would prove to be correct. IBM would also capitalize on the early computer research efforts and a deep understanding of business processes honed by years of punched-card data processing sales. The fundamental insight of these early business computer companies was that manipulating data, essentially symbolic computing, could be accomplished using the same technologies that were first focused on scientific computing.
3.4.1
The Universal Automatic Computer'. UNIVAC
In 1946, Eckert and Mauchly began work on the Universal Automatic Computer (UNIVAC), which would later become synonymous with the word "computer" until IBM came to dominate the industry. The Census Bureau was the first to order a UNIVAC from the fledgling company. As early as
ERAS OF BUSINESS COMPUTING
25
1946 during the Moore School lectures, Mauchly was discussing sorting and other clearly business-oriented topics in his lectures [2]. So, as Eckert and Mauchly began their commercial endeavor, they had a distinctly businessoriented focus, whereas most earlier projects had pursued the scientific computer. Even IBM, which maintained an early interest in the emerging computer, pursued the Defense Calculator (later marketed as the IBM 701) at the expense of data processing projects, giving the UNIVAC several extra years without serious competition in the business world. Of course, IBM faced the difficult dilemma of how to enter the emerging computer business while protecting the immensely successful punched-card product lines. Technically, the UNIVAC was to be a commercial incarnation of the ENIAC and EDVAC, embracing the stored-program concept and using the mercury delay line for high-speed memory [34]. Perhaps the most innovative aspect of the new machine was the use of magnetic tape to replace punched cards and paper tape. The construction of a magnetic tape storage device involved the development of low-stretch metal tape and many mechanical engineering challenges. In addition to magnetic tape storage, other peripheral devices to input and print information were required. In 1951, the UNIVAC passed the Census Bureau acceptance tests and ushered in the world of business computing. Within a year two more machines would be delivered to government agencies and orders for several more machines were booked.
3.4.2
IBM and the Defense Calculator
The very public success of the UNIVAC and its clear focus on business data processing spurred IBM into action [40]. The firm had decided to focus on supplying several high-performance computers to the technologyoriented defense sector, drawing resources away from two important business projects, a less expensive drum-based machine and a magnetic tape machine that could compete directly with the UNIVAC. The IBM 702 data processing machine, based on the Tape Processing Machine (TPM), was announced in 1953 [9]. However, the actual machines would be delivered in 1955, giving the UNIVAC a further advantage in the early business computing market. Once the IBM 700 series reached the market, it became the market leader with a larger installed base than the UNIVAC by 1956 [2]. The success of the IBM 702 was due in large part to the much more reliable magnetic tape technology, developed by the experienced IBM mechanical engineers. In addition, faster memory technology, modular construction, and the IBM sales force all contributed to the success. When core memory became available, models 701 and 702 were upgraded to models 704 and 705, maintaining a scientific computing and data processing dichotomy. One of the most successful machines, the IBM 704, had floating-point arithmetic,
26
ALAN R. HEVNER AND DONALD J. BERNDT
core memory, and the F O R T R A N programming language, as well as the support of IBM's sales and field engineering teams.
3.4.3
Drum Machines and the IBM 650
As early as the Atanasoff-Berry project, as well as during Eckert and Mauchly's work on the ENIAC, the idea of a rotating magnetic storage device was explored. The main drawback of the technology was the relatively slow speed of the electromechanical devices as compared with the electronic processors being developed - - a relationship that still holds today between magnetic disks and memory/processor speeds. However, magnetic drums were much lower in cost and more reliable than other available memory technologies, such as mercury delay lines. The commercialization of magnetic drum machines is typified by Engineering Research Associates (ERA), a firm that had roots in Navy research and would go on to market scientific and business computers [9]. As with the Eckert-Mauchly Computer Corporation, ERA would need more capital to pursue the business computing market and would also become part of Remington Rand. The magnetic drum devices developed for internal use and direct component sales by ERA were very reliable and could store up to 2 million bits, with access times measured in milliseconds. Though the speeds did not match the digital processor speeds, inexpensive and reliable business computers with adequate storage could be built. The IBM 650 would turn out to be a tremendous financial success. Whereas the 700 series provided technical leadership, the 650 became the business workhorse with roughly a thousand machines installed [9]. In developing the 650, IBM was forced to consider business solutions rather than computing power to attract their traditional punched-card users to business computing. This would be an important lesson for IBM and the rapidly growing computer industry. The IBM 650 was a less expensive computer based on the Magnetic Drum Computer (MDC) project within IBM. Though slower, the lower-cost machine was very reliable and had software that supported the business solution approach. In addition, IBM established an innovative program with universities, providing deep discounts on the IBM 650 for use in computing courses [2]. The result was the first generation of computing professionals being trained on IBM equipment, as well as increasing the role of universities in further research.
3.4.4 Magnetic Disk Secondary Storage In 1957, IBM marketed the first magnetic disk, allowing large-scale random access storage to be attached to a wide range of computers. IBM
ERAS OF BUSINESS COMPUTING
27
engineers would develop several innovations that allowed high capacity disk drives to be built at a reasonable price. The geometry of magnetic disks provided more surface area than drums, especially when the disks were stacked like platters. The IBM disk technology used read/write heads that could be positioned over the surface and basically flew on a thin film of air (i.e., the boundary layer) created by the motion of the disk. IBM produced the model 305 disk storage device, a 5 million character storage device using a stack of 50 disks, which became known as the Random Access Memory ACccounting machine (RAMAC). The random access nature of the device was demonstrated at the Brussels World's Fair in 1958, where visitors could question "Professor RAMAC" via a keyboard and receive answers in 10 different languages [9]. Both magnetic disk storage devices, as well as the interactive style of computing ushered in by this technology, were lasting innovations that would flourish on future desktops. 3.5
C o m p u t i n g M e a n s Business
This section discusses the classic business machines that developed at the end of the Era of Automation, and the introduction of the System/360 that marks the beginning of the Era of Integration and Innovation. The computers that evolved during the Era of Automation focused on providing business solutions. Two of the most successful machines were the classic IBM 7090/7094 mainframes and the less expensive IBM 1401 business computer. These machines were often installed together to create the batch processing systems that sprawled across computer rooms, with the 7094 providing the computing power and the IBM 1401 handling the cards and magnetic tape. The introduction of the System/360 redefined the market and ushered in the Era of Integration and Innovation. The robust System/360 architecture would last for decades and span an incredible range of products [41]. IBM's focus on a uniform family of machines for software compatibility highlighted the growing importance of software. Finally, the impressive range of System/360 peripherals demonstrated the importance of a solution-based approach, as opposed to a narrow focus on pure processor performance. Clearly, the System/360 is an important milestone in business computing.
3.5. 1 IBM 1401: A New Business Machine The IBM 1401, announced in 1959, represented a total business computing solution, employing modular construction and advanced peripherals, such as the innovative IBM 1403 "chain" printer. The high-speed printer, capable of printing 600 lines per minute, was probably one of the most important
28
ALAN R. HEVNER AND DONALD J. BERNDT
features for the business market. The main goals of improved performance and reliability were achieved by simply replacing vacuum tubes with transistors and employing magnetic core memory, a technology pioneered by Jay W. Forrester during Project Whirlwind [2]. Magnetic core technology was the basis for reliable, high capacity, nonvolatile storage that supported the construction of random access memory (RAM), a perfect complement to the new transistorized processors. However, it was the business lessons learned from the IBM 650 that guided the development and marketing of the IBM 1401, as well as firmly establishing IBM as the dominant force in the rapidly growing mainframe computer market. Approximately 10 000 model 1401 machines were installed, dwarfing the successful model 650 installed base and all other computers of the day [9]. Whereas many computer makers were fascinated by processor designs, IBM focused on supplying business solutions, reliable products, excellent support, and a strong relationship with the business customer. In addition to a strong hardware product, the IBM 1401 was the computing platform for the Report Program Generator (RPG) programming language. This language was specifically designed for programmers with previous experience with patch cord setup and the IBM accounting machines. Unlike FORTRAN, the RPG language was designed for business computing and a smooth transition from previous platforms. Like FORTRAN, RPG remains in use to this day, with its punched-card machine heritage obscured by a long line of increasingly sophisticated computing machines. While supporting customer software development, IBM also used its familiarity with business processes to develop applications that made its machines even more attractive. By virtue of IBM's dominant position in the market, the software applications were bundled with the hardware, spreading the software development costs over many customers. It was hard for many manufacturers to take advantage of similar economies of scale and the computing industry was characterized as "IBM and the seven dwarves." As applications became more complex and software costs escalated, the practice of bundling software could no longer be supported. Many customers were dissatisfied with the embedded software costs and this, coupled with government pressure, led to IBM's decision to "unbundle" software in 1968 [9]. This event would fuel the growth of the tremendous software industry that continues to thrive today.
3.5.2
IBM 7090: The Classic Mainframe
In addition to the low cost IBM 1401, the scientific computers from the 700 series (701,704, and the subsequent model 709) were strong products for
ERAS OF BUSINESS COMPUTING
29
IBM. The 700 series evolved into the IBM 7090, the quintessential mainframe. The classic room-sized computer, complete with arrays of blinking lights and rows of spinning tape drives, defined the image of computing during the heyday of mainframe machines. The IBM 7090 was based on transistor technology and included all the significant peripheral devices that IBM had been so successful in engineering. The upgraded model 7094 was installed at hundreds of locations. The mainframe model of computing remains relevant to this day, not so much for the "batch" mode of computing, but for the storage and manipulation of vast amounts of data. The mainframe computers were characterized, in part, by carefully designed input/output channels (often dedicated processors) that could handle large-scale data processing. Mainframe computers were full of data-processing tradeoffs, balancing raw processor performance, memory capacity, physical size, programming flexibility, and cost. Faster scientific supercomputers would be built for less data intensive computations, but the mainframe typified large general purpose computing during the 1960s. The highly successful 704 and 7094 models, incorporating floating-point arithmetic, also showed that computers developed for scientific applications often doubled as business data processing machines. IBM would formally combine its product line, ending the scientific computing and data processing dichotomy, when it re-invented its entire product line with the development of System/360.
3.5.3 New Product Line: System/360 Despite IBM's tremendous success, the company suffered from an increasingly complex product line, a problem that grew more urgent as software became an essential component of business solutions that spanned the hardware spectrum. IBM was left to continually maintain a plethora of software versions in order to support customer migration paths. After lengthy internal debate, IBM decided on a revolutionary strategy to build a family of compatible computing platforms [42]. The so-called New Product Line, later marketed as System/360--evoking "all points of the compass" or "a full circle" - - remains one of the largest commercial research and development efforts and was really a gamble that affected the entire company. Although some of the System/360 technical developments were quite conservative with regard to actual computer architecture, it employed the successful business solution strategy that characterized the IBM 650, 1401, and 7094 [11]. However, it added two new elements that revolutionized the industry--the entire product line was developed as a compatible family, allowing software to run across the entire spectrum of platforms, and a wide
30
ALAN R. HEVNER AND DONALD J. BERNDT
range of peripherals allowed tremendous variety of configurations. The scale of the product announcement that accompanied the introduction of System/360 in 1964 (initially 6 compatible computers) including some 44 peripheral devices redefined the computer industry. The System/360 would prove to be a tremendously successful product line, fueling IBM growth for decades [43]. The enhanced System/370, which employed integrated circuits, extended the dominant position of the 360-370 architecture for many more years [44]. The life of the System/360-370 series would continue even longer than planned as IBM's attempt at a second comprehensive architectural reinvention, the Future Series, foundered after several years of effort [2].
3.6 Computers Leave the Computer Room The Era of Decentralization focuses on the incredible progress in integrated circuits and the advent of the microprocessor. The microprocessor would break the boundaries of computing and allow the creation of truly personal computers at both the low end (microcomputers) and the high end (workstations). In addition, advances in LANs and WANs provided the connectivity options necessary to begin linking the rapidly growing personal computer market. The technological foundations for these developments were laid in the Era of Integration and Innovation, an era that is characterized by two somewhat contradictory themes: the redefinition and success of large-scale computing marked by the IBM System/360, and innovations in integrated circuits and networking that would lead to the dramatic changes in the Era of Decentralization. These innovations led to a shift from the centralized computer room to a decentralized model of computing, vastly expanding our ability to create new information systems architectures.
3.6. 1 Integrated Circuits and the Minicomputer From 1965 through the mid 1970s, integrated circuits were developed and refined, providing the third classic transformational technology that would reshape the computer industry [45]. Integrated circuits would reduce the cost of computers by orders of magnitude, improve reliability, and eventually result in the microprocessor--"a computer on a chip." Like vacuum tubes and transistors, the integrated circuit arose in the electronics industry, outside the mainstream computer manufacturers, and was adopted by a new set of computer makers, some already existing and some newly formed. During this critical time, the entrepreneurial electronics industry was centered around Route 128 and MIT as well as Silicon Valley and Stanford University. In 1957 Harland Anderson and Kenneth Olsen, an MIT-trained
ERAS OF BUSINESS COMPUTING
31
engineer who had worked on several aspects of Project Whirlwind, formed Digital Equipment Corporation (DEC) [9]. The original aim of DEC was to enter the computer business and compete with the established mainframe manufacturers. Although DEC could certainly develop competitive processor designs, the costly barriers to entry included the business software applications, sales and support teams, and all the associated peripherals (e.g., printers, disk storage, tape drives). So, with venture funding from American Research and Development (ARD), one of the original postwar venture capital firms formed by Harvard Business School faculty member George Doriot, DEC began by making component circuit boards. The successful component business funded DEC's move into the computer business with the announcement of the first computer in the venerable Programmed Data Processor (PDP) series. Though the earlier PDP models had some success, it was the 1965 introduction of the PDP-8 that would usher in the minicomputer era, with over 50 000 installed systems [9]. The PDP-8 was the first integrated circuit-based computer and was marketed to the scientific and engineering industry, often finding use in dedicated applications that would have been impossible for an expensive mainframe. The PDP-8 was small in size and cost, priced under $20 000. The PDP-8 machines brought a personal style of computing to university research environments, nurturing a new breed of computer professionals just as the IBM 650 had created the first wave of collegiate computing. By 1970, DEC was the third largest computer maker and minicomputers formed a new market allowing DEC to become a serious IBM competitor. From a business computing perspective, the minicomputer enabled department-level computing and more distributed information systems architectures. The use of a centralized computing model became a choice, rather than a dictate of the mainframe platform. It is worth noting that the technology for desktop or personal computing was simply a technological progression of the increasingly dense integrated circuits. Large scale integrated circuits were remaking industries, fostering products such as digital calculators and watches. The first commercial microprocessors were developed in the early 1970s, typified by the Intel 4004 and 8008. Therefore, the technology for desktop computing was in place, but the market was not exploited until years later. It would take an eclectic mixture of hobbyists, engineers, venture capitalists, and visionaries to create a new market. In fact, many university researchers viewed the PDP-8 as a personal computer, but for most of the 1970s the minicomputer would be marketed to both the scientific and business market. In framing the evolutionary role of the minicomputer in terms of information systems architecture, the smaller department-level server became a reality. The incredible growth in computing performance made these machines a powerful force.
32
ALAN R. HEVNER AND DONALD J. BERNDT
DEC also introduced larger machines that began to challenge traditional IBM strongholds. One of the most successful high-end PDP systems was the PDP-10, a time-sharing system that became an important platform at universities and commercial research laboratories. Introduced in 1966, this machine was in use at many of the research centers during the development of the ARPANET. After the PDP-11, DEC would continue expansion into the large systems market with the introduction of the VAX (Virtual Address eXtension) architecture and the VAX 11/780. The VAX products were among the first 32-bit minicomputers [28]. The VAX architecture was meant to span a wide range of products, much as the earlier System/360 architecture had done for IBM, and was also introduced with the VMS operating system. Though the VAX line was successful, the computer industry was being redefined by the microprocessor and DEC would face stiff competition on several fronts [46].
3.6.2 Desktop Computing Although the move from minicomputers to microcomputers was a natural technical progression as microprocessors became available in the early 1970s, the market had to be recognized and essentially created [47]. The personal computer industry grew out of the electronic hobbyist culture. Micro Instrumentation Telemetry Systems (MITS) developed electronics kits for the hobbyist market and launched the Altair 8800 into an ill-defined market with its appearance on the cover of the January 1975 issue of Popular Electronics. MITS received a flood of orders and the Altair galvanized a new community of hardware and software suppliers. The makings of a small computer system were readily available from component suppliers, with the microprocessor at the heart of the system. The personal computer market was soon a crowded mix of small start-up firms constructing machines from off-the-shelf components, and the Altair quickly vanished from the scene. The rise of Microsoft and Apple Computer out of this market has been the subject of much fascination. Although these firms developed interesting technologies, their survival during the tumultuous birth of personal computing probably results from a combination of technical abilities, perseverance, timing, and good luck. Of course, many established computer firms would enter the personal computing market once it was established, with IBM and Intel defining the standard and Apple Computer supplying the spice. Probably the most sophisticated vision of personal computing emerged from Xerox PARC, with its Alto computer and graphical desktop [14]. This machine would directly influence Apple Computer, as well as Microsoft, using the now classic model of technology transfer through personnel recruitment.
ERAS OF BUSINESS COMPUTING
33
However, the breakthrough was more a matter of cost than sophistication. IBM's 1981 introduction of an inexpensive machine with an open architecture transformed and legitimized the market. IBM followed with successive machines, such as the IBM PC/XT and IBM PC/AT, introducing the new PS/2 product line in 1987. Apple Computer introduced the Macintosh in 1984, raising personal computing standards with a powerful and intuitive user interface [48]. The emergence of desktop computers put in place the client computer, capable of interacting with minicomputers and mainframes, as well as doing local computations. A wide range of potential information systems architectures were now possible, with small-scale servers, intelligent clients, and centralized data processing facilities all contributing necessary components. All that was required were the interconnection mechanisms being supplied by the evolving computer networking and communications industry.
3.6.3 High-Performance Workstations In addition to the personal computer, the era of decentralization saw the introduction of high-performance desktop computers (i.e., workstations) in the early 1980s, initially targeting the scientific and engineering markets. These were sophisticated desktop computers introduced by start-up firms such as Apollo Computer, Sun Microsystems, and Silicon Graphics, as well as more established companies like Hewlett-Packard. Three important factors set these early workstations apart from the low-cost personal computers exemplified by the IBM PC. Workstations were based on powerful new microprocessors such as the Motorola 68000, utilized sophisticated operating systems with UNIX being a natural choice, and embraced networking technology such as Ethernet from the outset. Whereas microcomputers evolved from a hobbyist culture and stressed low cost, workstations emerged from sophisticated computer firms and were purchased by equally sophisticated customers. As has been the case throughout the business computing eras, the relentless and rapid progress in technology continually redefines the industry. The distinction between workstations and personal computers has blurred, with high-end Intel-based personal computers offering performance levels appropriate for the scientific and engineering tasks that defined the workstation class. To meet the need for a sophisticated operating system, Microsoft developed Windows NT to take advantage of these powerful new machines. There is a lot of development activity at the high-end desktop market and the only sure trend is that price and performance will continue to improve dramatically.
34
ALAN R. HEVNER AND DONALD J. BERNDT
One of the most successful and influential workstation firms is Sun Microsystems, founded in 1982 by a group of business people and researchers associated with Stanford University in a now familiar tale of laboratory-to-market technology transfer [49]. The company commercialized workstation hardware and the UNIX operating system, taking its name from the Stanford University Network (SUN) workstation project and building high-performance desktop computers at a reasonable cost for the engineering market. Two key researchers who joined the firm were Andy Bechtolsheim on the hardware side and Bill Joy on the software side. With a strong commitment to UNIX, Sun Microsystems has gone on to market a series of successful UNIX versions (first SunOS and now SOLARIS) derived from the original Berkeley UNIX system that Bill Joy had helped to develop. In addition to support for local area networking in the form of Ethernet, the implementation of the TCP/IP protocol in UNIX linked the expanding workstation market and the development of the Internet [50]. Another important development associated with the rise of highperformance workstations is the adoption of reduced instruction set computer (RISC) technology [51]. Most computers developed throughout the previous business computing eras provided increasingly large sets of instructions, pursuing what has come to be known as complex instruction set computer (CISC) technology. Two factors initially favored CISC technology: the fairly primitive state of compiler optimization technology, and the cost or speed of memory access. If the machine provided a rich set of instructions, compilers could easily generate target code that used those high-level instructions without having to apply involved optimization strategies, reducing the "semantic gap" between programming and machine languages [52]. Secondly, fairly complex operations could be performed on processor-resident data avoiding memory input/output. However, faster memory technology and improving compiler technology changed these underlying assumptions and led to work by IBM and others that focused on building simpler, but extremely fast processors. Essentially, if the processor implemented a simpler and more consistent set of instructions, great care could be taken to ensure that these instructions executed quickly, and with more possibilities for exploiting parallelism. John Cocke pioneered the approach at IBM, where the experimental IBM 801 was used to test the approach [53, 54]. Additional early research initiatives were begun at Berkley [51] and Stanford [55]. The approach generated considerable debate, but the experimental results were promising and RISC processors would rapidly evolve within the high-performance workstation market. For example, Sun Microsystems commercialized the technology as SPARC (Scalable Processor ARChitecture) and licensed the technology to other companies. MIPS Computer Systems also produced a RISC
ERAS OF BUSINESS COMPUTING
35
processor, derived from the Stanford MIPS (Millions of Instructions Per Second) project, which was used by many workstation vendors. IBM produced the RT and the more successful R/6000. Finally, Motorola joined with IBM and Apple to design the Power PC as a direct competitor to the long-lived Intel product line [52].
3.7 Operating Systems Though they are certainly software systems, operating systems are so closely associated with the underlying hardware that we often consider the combination as comprising the computing platform. Therefore, we have chosen to consider a brief history of operating systems in conjunction with the computing hardware discussed in this section. There is a rich history of operating systems, but we concentrate on a few systems that provided innovative business computing capabilities. The truly malleable nature of software has enabled operating system designers to borrow important ideas from previously successful systems, leading to an operating system history marked by cross-fertilization and common threads. The discussion will follow the eras outlined in Section 2, concentrating on the first four eras as did the hardware discussion. It is during the Eras of Calculation, Automation, Integration and Innovation, and Decentralization that mainstream operating system technologies matured.
3. 7. 1
Personal Calculators
The machines that developed in the Era of Calculation were the handcrafted ancestors of the modern digital computer. These machines were fragile, room-sized behemoths that routinely crashed as vacuum tubes failed during computations. The typical applications of these machines were well-understood numerical algorithms for computing important tables or solving engineering equations. These machines were mostly used by a closeknit group of experts that included the designers and developers, scientific colleagues, and other team members who maintained and operated the hardware. In fact, most users were highly skilled and very familiar with the one-of-a-kind computing environment they often helped create. These experts used the machine much like a personal calculator, signing up for a time slot during which the machine was completely dedicated to their tasks. Programming was usually accomplished by hand-wiring a plugboard, a task that could take hours or even days. However, the plugboard was one of the few early technologies that could deliver instructions at electronic speeds to the processor.
36
ALAN R. HEVNER AND DONALD J. BERNDT
3. 7.2 Batch Processing The Era of Automation ushered in a separation between the designers and developers of the computer and the commercial customers who would be programming the machines for specific tasks. These first commercial machines were housed in special rooms and maintained by a group of trained operators. Though the operators added a level of indirection, the dedicated model of computing was maintained. Typically, the programmers would write programs on paper using COBOL, F O R T R A N , or assembly language, transfer them to punched cards, and submit them to the operators for processing. Once a computation or "job" was complete, the output would be printed and placed in an output bin for later collection by the programmer. Though this style of interaction was a vast improvement over patch cords and plugboards, despite the bustling activities of operators the machine was often idle as the jobs were ushered around the computer room. The high cost of early commercial computers meant that any idle time was viewed as a serious waste of an expensive resource. Computer users quickly investigated alternative strategies to reduce idle time and manage their computing resources with more efficiency. The strategy adopted by most computer installations was batch processing. A group or "batch" of jobs was collected and then written to magnetic tape, the tape was then transferred to the main computer where all the jobs were processed, and the output was also written to tape for later processing. A rudimentary operating system was used to read the next job off the tape and execute it on the processor. One of the first such operating systems was developed in the mid-1950s at General Motors for an IBM 701 [56]. The successful use of these early batch operating systems led to more inhouse development by large customers and vendor offerings, including IBSYS from IBM, one of the most influential early operating systems for the IBM 7090/7094. In order to communicate with the computer operators to make requests such as mounting data tapes, messages were associated with a job, and eventually a complex Job Control Language (JCL) developed to communicate with operators and their software counterpart--operating systems. Desirable hardware support for emerging operating systems included memory protection, a hardware timer, privileged instructions, and interrupt handling [57]. Often relatively inexpensive machines were used to write the batch to an input tape and print the results from an output tape, thereby using the expensive mainframe only for running the jobs. A typical installation used a machine like the IBM 1401 to read card decks and produce input tapes, while an IBM 7094 might be used for the actual computations [28]. The IBM 1401 would again be used to print the output tapes, often using the
ERAS OF BUSINESS COMPUTING
37
innovative high-speed IBM 1403 chain printer. Though the turnaround time was still measured in hours, at least each machine was used for the most appropriate tasks and the expensive mainframes were kept busy.
3.7.3 Multiprogramming and Time Sharing With the IBM 360 series, IBM embarked on one of the most ambitious software engineering projects to build the operating system for this formidable product line--OS/360. The challenge facing OS/360 designers was to produce an operating system that would be appropriate for the lowend business market, high-end scientific computers, and everything in between. This wide spectrum of the computer market was characterized by very different applications, user communities, peripheral equipment requirements, and cost structures. In meeting these conflicting demands, OS/360 became an incredibly complicated software system, with millions of lines of assembly language written by thousands of programmers. The story of this massive system development effort has been told in many forms, including the classic book The Mythical Man Month by Frederick Brooks, one of the OS/360 designers [58]. In order to interact with OS/360, there was an equally complex JCL with book-length descriptions required [59]. Despite these obstacles, OS/360 managed to work and contribute to the success of System/360, even though bugs and new releases remained permanent fixtures. Three important operating system innovations are associated with OS/360 and other operating systems from the classic third generation of computers (i.e., integrated circuits): multiprogramming, spooling, and time sharing [28]. The execution of a program entails many activities including both arithmetic computations and input/output (I/O) to secondary storage devices. The order-of-magnitude difference in secondary storage access times makes I/O a much more costly operation than an arithmetic computation and may impose long waiting times during which the processor remains idle. Different types of computing tasks often have very different profiles with respect to I/O activity. Heavy computational demands on the processor typify scientific computing, whereas I/O operations often predominate during business data-processing tasks. Long I/O wait times and idle CPUs are factors that lead to inefficient resource usage, much like the human operator activities that first led to batch processing. For instance, on earlier systems, such as the IBM 7094, the processor remained idle during I/O operations. Multiprogramming or multitasking is a technique in which several jobs are kept in main memory simultaneously, and while one job is waiting for I/O operations to complete, another job can be using the processor. The operating system is responsible for juggling the set of jobs,
38
ALAN R. HEVNER AND DONALD J. BERNDT
quickly moving from one job to another (i.e., a context switch) so that the processor is kept busy. This capability greatly increases job throughput and efficient processor utilization. The second innovation was aimed at the need for supporting machines, such as the IBM 1401, for card reading and tape handling. Spooling (Simultaneous Peripheral Operation On Line) was used to read card decks directly to disk in preparation for execution, as well as for printing output results. This technique provided efficient job management without the need for supporting machines and a team of operators. Though multiprogramming and spooling certainly contributed to greater job throughput and processor utilization, it was still often hours from the time a programmer submitted a job until output was in hand. One of the benefits of the dedicated processing model was the immediate feedback. In order to provide a more interactive experience and meet response time goals, the concept of time-sharing was explored [60]. Essentially, time-sharing uses the same techniques as multiprogramming, switching quickly between jobs to efficiently use the processor. However, time-sharing users were connected via on-line terminals and the processor would switch between them giving the illusion that the machine was dedicated to each of them. Since many of the interactive tasks included human thought time (or coffee breaks) with sporadic demands for processor time, the processor could switch quickly enough to keep a whole set of interactive users happy and even manage to complete batch jobs in the background. Among the first time-sharing systems to be developed was the Compatible Time-Sharing System (CTSS) at MIT, running on an IBM 7094 [61]. At first CTSS only supported a few simultaneous users but it was a successful demonstration of a time-sharing system and provided the foundation for a larger project. Project MAC (Man and Computer) at MIT, with Bell Labs and General Electric as industrial partners, included development of a "computer utility" for Boston. The basic idea was to provide computing services using a model much like electric power distribution, where you plugged into any convenient outlet for computer time. Essentially, this was time-sharing on a grand scale with hundreds of simultaneous users on largescale systems. At the center of this effort was the construction of the MULTICS (MULTiplexed Information and Computing Service) operating system, which eventually worked well enough to be used at MIT and a few other sites [62-65]. The MULTICS system was implemented on a GE 645 rather than an IBM System/360, which did not support time-sharing well, causing concern at IBM and giving GE an edge in the emerging time-sharing market. However, implementing MULTICS was a huge endeavor and both Bell Labs and GE dropped out of the project along the way. Though
ERAS OF BUSINESS COMPUTING
39
MULTICS never did become widely adopted and the idea of a computer utility withered, MULTICS was an influential test-bed for many operating system concepts that would find a place in future systems. IBM would also develop a time-sharing capability for its System/360, but it would be the later System/370 using integrated circuit technology that handled time sharing using the Conversational Monitoring System (CMS) and Time Sharing Option (TSO). The System/360 architecture has endured through successive generations of IBM products and increasingly sophisticated operating systems were developed, including OS/SVS (Single Virtual Storage) to take advantage of virtual memory (16 MB) on the upgraded System/370. The interim SVS was replaced with MVS (Multiple Virtual Storage) to satisfy growing memory requirements, providing a 16 MB address space for each job. When the underlying hardware was upgraded to handle 31-bit addresses, MVS/XA (Extended Addressing) provided 2 GB per job. This was later extended further in MVS/ESA (Enterprise System Architecture) to allow up to 32 GB per job to be assembled from 2 GB address spaces. MVS is one of the most complex operating systems ever developed and has been a long-lived software component of the System 360/370 computing platform [57].
3.7.4 Desktop Computing and Network Operating Systems The Era of Decentralization is the period when advances in microprocessors and computer networking provided the missing components in our information technology tool kit that enabled desktop computing, highspeed LANs, and global connectivity via internetworking. Most of these technologies are discussed elsewhere, but there were several related operating system developments during this era. Two of the most widely used operating systems, UNIX and MS-DOS, flourished in this period. In addition, both of these operating systems would eventually support networking, completing the connectivity picture and allowing individual desktops access to the Internet. Though UNIX was developed somewhat earlier on a DEC PDP-7 minicomputer (the First Edition dates from 1969), the early versions were intended for internal Bell Labs use [66]. It was not until Ritchie and Thompson published an influential paper on UNIX in 1974 that AT&T began shipping UNIX, along with the source code, to interested universities and research organizations [67]. The nearly free distribution of the UNIX operating system and source code led to a most interesting software development project. An informal worldwide community of developers
40
ALAN R. HEVNER AND DONALD J. BERNDT
adapted and extended the UNIX system, creating one of the most powerful and influential operating systems ever written [68]. The UNIX system began when Bell Labs pulled out of the MULTICS project and Ken Thompson set out to re-create some of the functionality in a programming environment for use at Bell Labs. The UNICS (UNIplexed Information and Computing Service) system, a pun on the MULTICS acronym, was later called UNIX [28]. Thompson designed UNIX to be a lean system that provided only the essential elements of a good programming environment without the grand-scale baggage required to implement a metropolitan computer utility. The strengths of the UNIX system have influenced subsequent operating systems and include the following points. 9 The system was designed with simplicity in mind. It had the ability to connect components together to construct more complex programs via pipes that carry streams of data. 9 The system and source code were freely distributed and became the focus of decentralized development efforts. One of the most influential versions of UNIX was developed by graduate students and researchers at Berkeley, released as a series of Berkeley Software Distributions (BSD) [69, 70]. 9 UNIX was implemented using a high-level system programming language, C. The C implementation of UNIX meant that the operating system was more easily ported to other hardware platforms, as long as there was a C compiler. Of course, C has become one of the most widely used programming languages. 9 Later versions of UNIX included support for networking, with Ethernet in the local area environment and the TCP/IP protocol suite for the Internet. The adoption of TCP/IP in Berkeley UNIX made the protocol suite a de facto standard. Once TCP/IP became freely available, both UNIX and the Internet developed together in many ways. Probably the most significant shortcoming of UNIX has been the lack of a standard, a flaw due to the loosely organized worldwide community that was responsible for so many of its strengths. The UNIX system was adopted by many computer makers and subsequently modified to include incompatible extensions that have contributed to divisions within the UNIX community. There have been several attempts to standardize the UNIX operating system, including AT&T standards such as the System V Interface Definition (SVID). The IEEE Standards Board began an effort to reconcile the two main branches of the UNIX family tree under the POSIX
ERAS OF BUSINESS COMPUTING
41
(Portable Operating System) project. 3 The POSIX committee eventually produced a set of standards, including the 1003.1 standard that defines a set of core system calls which must be supported. Unfortunately, a subsequent division between vendors such as IBM, DEC, HP, and others that formed the Open Software Foundation (OSF) and AT&T with its own UNIX International (UI) consortium exacerbated the problem of creating a unified UNIX system. Many vendors developed their own UNIX versions such as IBM's AIX, DEC's ULTRIX, Hewlett-Packard's HP-UX, and Sun Microsystem's SOLARIS. The UNIX system has dominated in the non-Intel workstation market, especially RISC-based systems, and has more recently been a force in the database and web server arena [71]. In addition, there have been several freely distributed versions such as the LINUX system, which has recently become a popular UNIX system for personal computer platforms. The Intel-based personal computer market has been dominated by Microsoft's MS-DOS since the introduction of the first IBM PC. Though the first MS-DOS versions were primitive single-user systems lacking many of the features that had evolved since the early batch processing systems, subsequent releases added functionality, drawing heavily on UNIX. 4 For the introduction of the PC, IBM went to Microsoft to license the BASIC interpreter and planned to use the CP/M-86 operating system developed by Gary Kildall at Digital Research. When the C P / M schedule slipped, Microsoft was asked to supply an operating system as well. In what would have been an interesting alternative computing history, Microsoft might have turned to UNIX, but the requirements for at least 100 K of memory and a hard disk prevented its initial use. Microsoft bought 86-DOS (Disk Operating System), internally code named QDOS for Quick and Dirty Operating System, from Seattle Computer Products and hired its original developer, Tim Paterson, to enhance it for the IBM PC launch [9]. The renamed MS-DOS was to become the software counterpart to IBM PC hardware and the Intel-based microcomputers. IBM called the system PCDOS and Microsoft retained the rights to market it to the clone market as MS-DOS. If anyone had been able to predict the explosive growth in the microcomputer market, more thought would have been given to both the hardware and software components of this revolutionary computer platform. For instance, the IBM PC did not incorporate hardware protection or privileged instructions to prevent programs from bypassing 3The " I X " in POSIX was added to give the Portable Operating System project a sort of UNIX-like sound [28]. 4 Originally, Microsoft was licensed by AT&T to distribute UNIX.
42
ALAN R. HEVNER AND DONALD J. BERNDT
MS-DOS, leading to a huge number of bug-infested and non-portable software applications. Within a decade there were over 50 million PCcompatible platforms using versions of MS-DOS [9]. IBM found itself in the strange position of leading the personal computer industry with a product composed of components designed by others and using an operating system over which it had only indirect control. MS-DOS version 2.0 was a major rewrite that was delivered with the IBM PC/XT. This version drew heavily on UNIX for the file system structure, shell I/O redirection and interaction, as well as system calls, though it still remained a single-user system without any of the time-sharing or protection features. The 1984 introduction of the IBM PC/AT brought version 3.0. Though the Intel 80286-based system included 16 MB of memory (versus 640 K on earlier systems), kernel and user modes, hardware protection features, and support for multiprogramming, MS-DOS used none of these features and ran in an 8086 emulation mode [28]. However, version 3.0 provided networking support, a critical event that contributed to the Era of Decentralization. In 1987, IBM introduced the PS/2 family and planned to produce a more robust operating system in partnership with Microsoft--OS/2. OS/2 was one of the few new operating systems to be commercially engineered without concern for backward compatibility, providing many advanced features (e.g., extended memory addressing, protected modes, and multiprogramming) in a clean and elegant design. Unfortunately, the market did not respond. Although technically inferior, MS-DOS with its huge collection of applications held the superior market position. Microsoft abandoned OS/2 in 1991 and in response IBM formed a software development alliance with Apple Computer. Under market pressure, IBM itself developed MS-DOS version 4.0, which Microsoft reverse-engineered for the clone market. Finally, Microsoft released MS-DOS 5.0 in 1991. This major new version finally provided strong support for extended memory, a separate command shell, help facilities, a screen editor, and a simple user-initiated multiprogramming feature. It was also the first version to be sold directly to users, rather than supplied solely to computer manufacturers. MS-DOS remained an idiosyncratic operating system that provided a very difficult programming environment, but the inertia provided by millions of personal computers made it one of the most long-lived operating systems. At the same time, there were several technically superior operating systems available in the personal computer market. UNIX became the system of choice on high-performance workstations, using a window system to support the all-important WIMP interface. IBM continued the development of OS/2. The Apple Macintosh System 7 extended the model pioneered at Xerox PARC and provided one of the most intuitive and
ERAS OF BUSINESS COMPUTING
43
well-engineered operating system environments. Microsoft embarked on a strategy to provide a pleasing interface on MS-DOS by developing Microsoft Windows, a WIMP-style interface in the tradition of Xerox PARC and Apple. In the most recent clean-slate operating system project to be undertaken in the commercial sector, Microsoft began development of Windows NT. NT provides a Windows-like interface on top of a newly implemented and sophisticated operating system [57]. Like OS/2 and Macintosh System 7, Windows NT is an operating system capable of exploiting the tremendous power of current microprocessors, which have blurred the line between workstations and traditional personal computers, providing the performance of the not-too-distant minicomputers and mainframes of the past. Windows NT is based in large part on UNIX, especially a more recent incarnation called Mach, which has been used to explore important new operating system capabilities for threads (i.e., lightweight processes) and multiple processors [72, 73]. It will be interesting to watch the evolution of operating systems as new leaps in performance provide incredibly powerful desktop systems, based on both single and multiple processors.
4.
Communication" C o m p u t e r N e t w o r k i n g
Without computer networking technologies, the role of the computer would have remained a simple extension of the original calculating machines. The revolutionary nature of computing and our ability to construct a rich variety of information system architectures are inextricably linked with the growth of computer networking. Indeed, as we enter the age of a wired world, new forms of communication and a vast new digital economy are arising from the synergy of computing and networking technologies. The rapid growth of computing technology during the postwar years and the unprecedented leaps in performance enabled by large-scale integrated circuits overshadowed emerging networking technologies. Although the early steps in networking technology may appear to have lagged behind growth in pure processing power, recent gains in networking protocols and raw speed have been dramatic. A paper by several leading database researchers discussed these trends in computing and communications, noting that each of the following measures of computing performance had improved by an order of magnitude (i.e., a factor of 10) or more every decade [74]: 9 the number of machine instructions per second 9 the cost of a typical processor
44
ALAN R. HEVNER AND DONALD J. BERNDT
9 the amount of secondary storage per unit cost 9 the amount of main memory per unit cost. The expectation was that these measures would continue to improve and that two new measures would join the march: the number of bits transmitted per unit cost as well as bits per second. Networking has clearly joined computing in the climb toward ever-higher levels of performance. Most textbooks on computer networking draw a coarse division between Wide Area Networks (WAN) and Local Area Networks (LAN) technologies [75]. Though this distinction is blurred somewhat by switched technologies that can play a role in both situations, such as Asynchronous Transfer Mode (ATM) or even mobile networking, the essential characteristics and historical development are highlighted by this separation. WAN and internetworking technologies arose through efforts to connect geographically dispersed centers of computing excellence at university and military research centers. Although the proprietary interconnection schemes for peripherals and terminal devices could be considered the earliest form of local area networking, LAN technologies, such as Ethernet, developed in response to the demands for interconnecting large numbers of desktop machines. At an abstract level the twin themes of resource sharing and communication underlie both WANs and LANs, but under closer scrutiny the details reinforce the distinction. WANs developed to share the expensive computing resources and expertise of the mainframe, with the somewhat unexpected side effect of spawning the now ubiquitous electronic mail (email) technology. In fact, email would become the largest component of traffic on the early research networks. LANs developed to allow office users to share both resources, such as printers and servers, and information or documents for collaborative work. Essentially, LANs made economic sense when inexpensive computers became the norm. Therefore, local area networking required a cost-effective technology, without the reliance on expensive dedicated networking equipment (e.g., routers) that characterized wide area networking. For example, Ethernet provides a simple protocol and continually improving levels of performance, where all the machines on a particular network share the interconnection media. The use of shared media, rather than the dedicated point-to-point links that are used in WANs, is an artifact of the different economics of the environments. In the sections that follow, the growth of wide area networking and the fundamental technology of packet switching are discussed. The use of packet switching, enabled by digital computing technologies, is at the heart of current WANs and the most publicly visible manifestation, the Internet. In the local area network arena, Ethernet has become a dominant technology. With the more recent introductions of Fast Ethernet and
ERAS OF BUSINESS COMPUTING
45
Gigabit Ethernet, the technology promises to be an important factor in the years ahead. The phenomenon of global networks is made possible by both technologies working in concert. Like a tree, the wide area packet switched networks provide the trunk and major branches, but the millions of leaves or personal computers are connected using LANs. The story begins, as so much of computing history, with the interaction of military funding and university research.
4.1
ARPA: Advanced Research Projects Agency
On the heels of the 1957 launch of the Soviet Sputnik satellite, President Eisenhower formed the Advanced Research Projects Agency (ARPA) as an agile organization, free of the normal armed services bureaucracy and with the mission of ensuring continued American leadership in scientific research [76]. Military funding fueled the evolution of computing hardware. ARPA's mission was to foster similar innovations outside the individual branches of the armed services. As a general during World War II, Eisenhower was clearly familiar with the military and had little patience for the bickering and redundancies that sometimes plagued the armed services. Therefore, he sought civilian leadership for the newly formed ARPA, funded it well, and had the Director of ARPA report directly to the Secretary of Defense [77]. If America was going to invest heavily in a scientific Cold War, an independent agency with the necessary contractual power and nearly unlimited research scope seemed an appropriate vehicle. Initially, a large part of ARPA's agenda was focused on developing the space program and exploring the military uses of space. However, Eisenhower had already planned for an independent space agency and the National Aeronautics and Space Administration (NASA) was formed in 1958. Responsibility for space programs moved to NASA, with military applications being once again the province of the individual armed services. Although still a fledgling agency, ARPA was at a crossroads and would either redefine itself or disband. The staff at ARPA developed a new plan for the agency to foster American basic research and high-risk projects that stood little chance of being funded by the armed services. The focus of ARPA changed from the traditional military-industrial complex to the nation's research universities. Under successive directors the agency remained small, but both the budget and scope of research projects grew [77]. ARPA's support of computing research started when time-sharing emerged as a promising technology that might allow expensive computing resources to be used for both military and civilian research. In addition, the late 1950s saw the emergence of the minicomputer. ARPA management
46
ALAN R. HEVNER AND DONALD J. BERNDT
recognized the increasing importance of computing and Jack Ruina, the first scientist to lead ARPA, decided to form a group charged with supporting research in computing. J.C.R. Licklider, a psychologist with a growing interest in computing, became the first director of what would eventually become ARPA's Information Processing Techniques Office (IPTO) [76]. Licklider had a vision of the computer as a powerful tool, working in synergy with the human mind, to leverage the potential of the human intellect. His vision stood in contrast to the more traditional scientific computing focus and after only a few years of experience with computing, he authored the influential paper, " M a n - c o m p u t e r symbiosis" [78]. Under his enthusiastic leadership ARPA would invest heavily in computing research and begin pursuing projects in the newly emerging field of computer networking. Licklider's handpicked successor in 1964 was Ivan Sutherland, a leading computer graphics researcher, who would continue to emphasize similar research in computing. Sutherland would hire Bob Taylor, who would then fund and directly administer the first efforts to interconnect the isolated islands of high-performance computers that existed in the military and research universities. He would eventually convince Larry Roberts from the Lincoln Laboratory to administer the project that would evolve into the Internet.
4.2
Packet Switched Networking
Paul Baran, who started his career as a technician at the Eckert-Mauchly Computer Corporation, went on to pursue a graduate engineering degree at UCLA and became a researcher at the RAND Corporation in 1959 [77]. RAND was formed in 1946 to continue developing the nation's wartime operations research capabilities and evolved into a leading research institution. At RAND, Baran became interested in the survivability of communication systems during nuclear attack at a time when both the US and the Soviet Union were assembling nuclear ballistic missile arsenals. Baran recognized that digital computing technology might serve as the basis for a survivable communications system and began refining his ideas, authoring a series of RAND technical reports. 5 Two of the most powerful ideas that Baran developed form the basis of what we now call packet switching networks [79]. 9 The first idea was to introduce redundancy into the network, moving away from centralized or decentralized topologies to what he called 5Baran's original RAND reports are available at the Internet Society web site www.&oc.org under the sections on Internet history.
ERAS OF BUSINESS COMPUTING
47
distributed networks. The mesh-like distributed network loosely modeled the interconnections between neurons in the brain. 9 The second idea was to divide a message up into small message blocks or packets that could take different paths through the network and be reassembled at the destination. In the face of network failures, this would allow message blocks to continue to be delivered over surviving portions of the network.
These two fundamental ideas are among the defining characteristics of packet switched networks and were explored by Baran as he developed a model for a distributed communications network. By dividing up the message into pieces or blocks, a postal service model could be used to deliver electronic data and voice communications using separately addressed blocks. A key insight was that digital computers could disassemble, route, and reassemble the messages fast enough to meet the communication requirements. Survivability would be introduced by building redundant links and forming a distributed network. After running simulation experiments, Baran estimated that the surprisingly low "redundancy level" of between 3 and 4 connections per node would provide very high reliability. Essentially his experiments showed that low-cost and somewhat unreliable links could be used to build a highly reliable network by introducing a modest amount of redundancy. Baran also pictured each node in the network as a digital computer that would route the message blocks at each juncture over the best surviving links. He outlined a scheme that called for a "self-learning policy at each node, without need for a central, and possibly vulnerable, control point" [79]. The routing decisions would be made dynamically in what he called "hot potato routing," building a high-speed store-and-forward network with reliability through redundancy. Much like the early telegraph system, the nodes would be capable of storing the message blocks and forwarding them along the most appropriate path, depending on the state of network links or connections. Each node would contain a routing table that described the current network topology, indicating the number of links and preferred routes to other nodes. The uniform message blocks would allow for efficient handling of all types of data. These ideas, taken together, constituted a revolutionary new communications model enabled by the digital computer. Baran had difficulty convincing both RAND colleagues and later AT&T management that his ideas represented a practical approach to designing a digital communications network. AT&T never did recognize the value in Baran's early outline of packet switching technology, despite his considerable efforts in trying to convince the dominant communications company
48
ALAN R. HEVNER AND DONALD J. BERNDT
of the benefits. AT&T was firmly entrenched as the monopoly telephone company across the nation and viewed communications technology from a circuit switching perspective. The telephone system relied on switching equipment that would form a dedicated physical circuit over which voice communications would travel. The circuit exists for the duration of the call and is dedicated to the communicating parties, whether they are speaking or not. The idea that you could break up communications into little pieces, route them along various paths, and reassemble them at their destination was a foreign concept to telephone company management. In fact, packet switching networks would turn out to be flexible enough to provide circuitswitching-like behavior using virtual circuits. Virtual circuits are negotiated paths that allow the bulk of the packets to follow the same route, providing sequenced delivery and simplified routing decisions [75]. Baran's fundamental ideas would await rediscovery by ARPA and ultimately form the basis for one of the greatest breakthroughs in computing history.
4.3
ARPANET
Robert Taylor served as a deputy director under Sutherland and became the third director of the IPTO in 1966. Thus, he was already familiar with many of ARPA's activities in computing research. Mainframe computing was the dominant model, with the leading research universities and laboratories each committing substantial resources to remain at the leading edge of this rapidly growing field. ARPA was funding a lot of these initiatives and Taylor was struck by the amount of duplication that was developing throughout the research centers. Computers were large and expensive, with each ARPA-sponsored investigator aspiring to own a stateof-the-art machine. Each new investment created an advanced, but isolated computing center. Taylor was able to quickly convince the current ARPA director Charles Herzfeld that an electronic network linking the computing centers would make financial sense and enable a new level of collaborative work. So, Taylor now needed to assemble a technical team to administer and implement the project [77]. In 1966, Lawrence Roberts left Lincoln Laboratory to head the networking project that Taylor had initiated. Roberts had many connections within the computing research community and began to identify key contributors to the project. Essentially, ARPA needed to identify principal investigators to actually conduct the research and build the prototype network. Taylor convened a meeting with many of the leading researchers, but there was not a tremendous amount of enthusiasm for resource sharing among these well equipped centers, as well as some hesitation to commit valuable computing resources to the network itself. It was an important idea
ERAS OF BUSINESS COMPUTING
49
put forward by Wesley Clark after the meeting that would change Roberts' conception of the network [76]. Clark suggested a network of small identical computers be used to provide the actual network infrastructure, with simple links connecting the computing centers to the network computers. This arrangement would simplify the network compatibility problems and lighten the burden on host computers. He also suggested Frank Heart as a computer engineer capable of building the system. Roberts knew Frank Heart from Lincoln Laboratory, but Heart had moved to the consulting firm of Bolt, Beranek, and Newman (BBN) in Cambridge, Massachusetts. Also among the early advisors was Leonard Kleinrock, Roberts' long-time friend from Lincoln Laboratory who would head the Network Measurement Center at UCLA. At a subsequent meeting, Roberts would learn of the on-going work of Donald Davies, as well as the initial work of Paul Baran. The pieces were in place to try a packet switching network on a grand scale. Roberts drafted a request for proposals and circulated it among the informal advisors that had attended the early meetings. The proposal outlined a network of dedicated Interface Message Processors (IMPs) that would manage the network links, routing packets and handling transmission errors. The newly emerging minicomputers would be small and inexpensive enough to serve as IMP computer platforms, making the use of dedicated network computers a revolutionary, yet practical idea. Proposals were submitted by dozens of leading computer companies, including IBM, Control Data Corporation (CDC), and Raytheon. The small consulting firm of BBN would draft a very detailed proposal, exploring many of the issues that would be faced in constructing the network. BBN was often called "the third university" among Boston's formidable collection of research universities and was well connected within the computing research community. BBN brought an innovative research focus to this novel project and would succeed brilliantly in designing and implementing the early network.
4.3. 1 The IMPs: Interface Message Processors BBN assembled a small, talented staff and began the process of designing the Interface Message Processors, the heart of the network. These machines would have to handle the packet traffic and make all the routing decisions, while still providing performance that would make the network a viable real-time store-and-forward system. BBN decided to base the IMPs on "ruggedized" versions of Honeywell DDP 516 minicomputers. Since these computers were made to withstand battle conditions, the theory was that they could also be safely installed in the laboratories of research universities. ~
50
ALAN R. HEVNER AND DONALD J. BERNDT
The IMPs, the first packet switching routers, once implemented exceeded performance expectations. The first IMPs began arriving at the research centers in 1969, with IMP 1 going to Kleinrock's group at UCLA, IMP 2 going to SRI, IMP 3 going to UC Santa Barbara, and IMP 4 going to the University of Utah [77]. The A R P A N E T was taking shape. BBN would get IMP 5 and the first crosscountry connection. Using IMP 5, BBN would go on to develop remote network management features that would lay the foundation for modern network operations centers. Once the IMPs were in place, the research centers were charged with writing host-to-IMP connections for their specific machines. At BBN, Bob Kahn would write the specification for host-toIMP interconnections that would serve as a blueprint for sites to connect to the network. BBN would provide the IMPs and the basic packet handling facilities, but the host-to-IMP and later host-to-host connections would be left to the research centers to design.
4.3.2
Network Protocols
An informal group of research center members, mostly graduate students, evolved into the Network Working Group (NWG) [80]. The N W G began debating the type of host-to-host services that should use the underlying packet switching network being implemented by BBN. The N W G became an effective, yet democratic organization that started the immense job o f charting a long-term course for the network. Among the early participants were three graduate students from Klienrock's group at UCLA: Steve Crocker, Vint Cerf, and Jon Postel. In 1969, the N W G issued its first Request for Comment, or RFC1, setting the inclusive tone for the organization that still permeates many of the Internet's governing committees. RFC 1 addressed the basic host-to-host handshake that would be used to initiate communications. The N W G continued to issue RFCs and would arrive at the notion of a layered protocol, where new more advanced services are built upon lower-level common services. In 1971, BBN took a more active role in the N W G hoping to accelerate the work on network protocols. The ability to login to remote machines and a mechanism for transferring files were identified as two important services that were necessary for resource sharing. The relatively simple T E L N E T protocol was the first higher level service that supported remote logins across the network, and it is still in widespread use today. The second protocol was the file transfer protocol (FTP), again a protocol that has remained an important service. These two protocols were the nucleus for resource sharing and made the A R P A N E T a success on many levels. At the first International Conference on Computer Communication in 1972, the
ERAS OF BUSINESS COMPUTING
51
team of ARPANET investigators hosted a large-scale interactive display of the network, generating widespread interest and galvanizing a host of new participants in the rapidly growing network field [76].
4.3.3
Electronic Mail: Communication not Computation
Though resource sharing was the initial motivation for the ARPANET, electronic mail would quickly account for the majority of traffic. On many of the large time-sharing systems, ad-hoc facilities for depositing electronic messages were developed. For example, the MAILBOX system was available on the Compatible Time-Sharing System at MIT in the early 1960s. However, in the post-ARPANET world the problem was to scale up these simple electronic mailboxes to operate in a true network environment. Electronic mail started out as a simple program written by Ray Tomlinson at BBN to provide simple mailbox facilities on PDP-10s running the BBN developed operating system TENEX [77]. Tomlinson enhanced his locally oriented programs to allow delivery between two BBN PDP-10s, changing the focus of electronic mail from local to the world of networking. The use of FTP to carry messages provided the bridge for Tomlinson's programs to provide electronic mail over the ARPANET. These early communication programs soon spawned a host of increasingly sophisticated electronic mail handlers. Tomlinson's initial program may have been simple, but electronic mail offered a new type of communication that has remained at the center of network usage. In fact, the rather unexpected growth in email traffic was among the first pieces of evidence that communication, not computation, may be the most important benefit of the digital computer.
4.4
Xerox PARC: The Office of the Future
Xerox's famed Palo Alto Research Center (PARC) opened in 1970 near Stanford University. (A detailed history of this important center is fascinating reading [14].) Several alternative locations were explored, including New Haven and Yale University for its proximity to the Stamford, Connecticut headquarters of Xerox. George Pake, PARC's first director, persuaded other executives that the dynamic Palo Alto area would be the perfect location. Unlike many universities at the time, Stanford University was eager to build relationships with industry and viewed PARC as an important model of cooperation. Xerox PARC would assemble one of the most impressive groups of researchers and generate advances on multiple technological fronts. Though the story of how Xerox itself failed to capitalize on PARC technologies has become legend, the innovative work of
52
ALAN R. HEVNER AND DONALD J. BERNDT
the research center helped to guide many of the key technologies of business computing. In fact Xerox did find valuable products, such as the laser printer, among the technologies pursued under the PARC creative umbrella. The commercialization of computing technology has been unpredictable and the rewards have not always gone to the technological leaders. However, within Xerox PARC several key technologies would come together to form a lasting vision of the office of the future and the role of personal computing. Xerox PARC was able to attract an incredible array of talented individuals owing to its location, unrestricted research agenda, and attractive salary levels. Cutbacks in research funding by the military, formalized in part by the Mansfield Amendment, and general economic conditions led to a buyers' market for research and development talent [14]. In this environment, Xerox PARC was able to hire many of the top computer science researchers. In fact, PARC almost directly inherited the computing research mantle from ARPA as federal funding diminished. ARPA had concentrated its funding in a few universities, such as CMU, MIT, Stanford, UC Berkeley, UCLA, and the University of Utah. Researchers from almost all these laboratories as well as Robert Taylor, the ARPA Information Processing Techniques Office (IPTO) director, would end up at PARC. PARC would lead the way on many fronts, but in particular four important technologies would come together and offer a glimpse of the office of the future: the laser printer, the personal computer, the graphical user interface, and the LAN.
4.4. 1 The Laser Printer: Images not Characters Gary Starkweather, a Xerox optical engineer, began using lasers to paint the surface of xerographic drums on an experimental basis at the traditional Rochester-based Xerox research facility. His initial work did not find acceptance at the more product-focused laboratory and he managed to get transferred to Xerox PARC where he was free to pursue the ideas that would lead to the laser printer. In building the earlier prototypes, Starkweather used a spinning disk with mirrored facets that redirected the laser beam across the surface of the xerographic drum. By modulating the laser, millions of dots could be etched on the drum surface to form a complete image. These early experiments relied on technologies that would be too expensive to produce commercially, such as extremely precise mirrors. A return to simpler technologies would provide the breakthrough. Starkweather used a lens to passively correct any imperfections in the mirrored surfaces and the whole arrangement worked. The newly developed Scanning Laser Output Terminal (SLOT) was a flexible output device capable of producing images, not simply
ERAS OF BUSINESS COMPUTING
53
the character output of mechanical printers [14]. The translation from a computer representation to SLOT input was a difficult task at the time as well. It is solved easily now with abundant and inexpensive memory, but Butler Lampson and Ron Rider would develop the Research Character Generator (RCG) at PARC using wire-wrapped memory cards, essentially a high performance print buffer. With the RCG in place, the laser printer would become an operational technology within PARC, even though Xerox would not market a commercial version (the Xerox 9700) until 1977. Just as the IBM 1403 "chain printer" made the IBM 1401 a successful business computing solution, the laser printer would provide the bridge from the digital to the physical world in the office of the future.
4.4.2 The Alto: Computing Gets Personal Charles Thacker and Butler Lampson would start a somewhat unofficial project to implement Alan Kay's vision of a small personal computer with a high-resolution display for graphical interaction. The tight timetable and unofficial nature of the project forced Thacker and fellow designer Edward McCreight to avoid unnecessary complexity. The final design incorporated an innovative technique in which the processor would play many roles, mediating between all the peripheral devices and the memory while implementing a "microparallel processing" strategy that simplified other control functions for the disk drive, keyboard, and graphical display. Another innovation was to use memory to store and manipulate a bitmap for the high-resolution display. The first Alto was built in 1973 and made its now famous public debut by animating Cookie Monster, a Sesame Street favorite, for a gathering of computer researchers. The final design was so simple that sometimes people would simply requisition the parts and assemble their own machine. Somewhere in the neighborhood of 2000 Altos were eventually produced, far surpassing the original plan for 30 machines [14]. Although at over $12 000 an Alto was quite expensive, the march from current medium-scale integration (MSI) to large-scale integration (LSI), and then to very large-scale integration (VLSI) would make the technology affordable in a few short years. These machines provided a model for the future and became essential equipment throughout PARC, forming the nexus for the other three critical technologies.
4.4.3
The Graphical User Interface: A Digital Desktop
The desktop metaphor and overlapping windows that characterize virtually all the personal computers and workstations of today were first
54
ALAN R. HEVNER AND DONALD J. BERNDT
fully realized at Xerox PARC. Much of the software for the Alto would be implemented using Smalltalk, an object-oriented programming language developed by Alan Kay and other researchers at PARC. The first "killer application" would be Bravo, a "What You See is What You Get" (WYSIWYG) text processor. Documents could be readily composed on the Alto and sent to the laser printer for publication. Indeed, Lynn Conway and Carver Mead would use the internal Xerox text processing system to quickly publish their influential textbook that ushered in the widespread use of very large-scale integration (VLSI) and the microprocessor. Today the windows, icons, mouse, and pull-down menus (WIMP) style interface has become familiar to all. These technologies would directly influence developments at two fledgling personal computer companies Apple and Microsoft, as employees left PARC to work at these firms, and, in the case of Apple, through a series of demonstrations at PARC itself authorized by Xerox headquarters [ 14].
4.4.4
Ethernet: The Network is the Office
Robert Metcalfe was called on to develop networking technology for the Altos, and together with David Boggs, implemented an elegant shared media solution they coined Ethernet [81]. Though it was an option on the Altos, it quickly became another crucial technology allowing the Altos to serve as a communication device, as well as effectively using the developing laser printer technology. This essential piece allowed Xerox PARC to use the office of the future and demonstrate it to others.
4.4.5
The Xerox Star
These four technologies, working in concert, created the networked office environment populated by personal computers that offer an intuitive desktop m e t a p h o r - - a lasting standard. Xerox PARC faced many challenges in trying to commercialize these technologies within the corporate framework. Eventually, a complete "office of the future" was constructed for the 1977 Xerox World Conference in Boca Raton, Florida and demonstrated to all of the corporate executives [14]. Xerox formed a new research division to commercialize the technology and the resulting Xerox Star was a dazzling system that made its debut at the National Computer Conference in 1981. It was an impressive embodiment of all the PARC technologies, but finally reached the market at a cost of more than $16 000 in a year when IBM introduced its low-cost personal computer with an open architecture that invited third-party development. The model was sound, but the technologies would be exploited by others.
ERAS OF BUSINESS COMPUTING
4.5
55
LANs
4.5. 1 University of Hawaii and ALOHANET N o r m Abramson was one of the principal designers of an innovative network at the University of Hawaii [82]. In 1969 ARPA funded the A L O H A N E T , which used small radio transmitters to exchange data between computers. Two fundamental ideas were developed during the A L O H A N E T project. The first was that wireless networks were possible and remain of growing interest today. The second fundamental insight was that the network could use the identical transmission frequency for all the nodes. Rather than construct point-to-point or particular links between each pair of computers, a shared radio frequency would be used. However, a shared frequency approach meant that collisions would occur occasionally and that some means of recovery would be necessary. If a collision did occur, the message would be undecipherable and no acknowledgement would be received. So, a node would simply re-transmit after a random interval, hoping to avoid any further collisions. This shared media approach would form the basis of a new hardwired L A N at Xerox PARC [75].
4.5.2
Xerox PARC and Ethernet
Robert Metcalfe, a PARC researcher specializing in computer networking, got involved in the effort to build an effective way of connecting computers. He had been part of the A R P A N E T team at MIT and had become a network facilitator, an experienced member of the early A R P A N E T sites who assisted new sites. After being introduced to the A L O H A N E T project and incorporating an analysis in his thesis, Metcalfe drew upon several ideas to develop a short-distance network or LAN at PARC. The fundamental insight was to use a shared media approach, replacing A L O H A N E T radio transmitters with network interfaces that would allow the machines to broadcast on a shared piece of cable [81]. A shared cable meant the network was inexpensive to set up and no complex and expensive routers were required. Cost was an important design constraint since the machines themselves were inexpensive and would hopefully become a fixture on everyone's desk. A shared media approach implies that there may be collisions when two or more nodes try to transmit simultaneously. The Ethernet protocol involves listening for an idle line before attempting to transmit a message. If a collision is detected during the transmission, the node waits a random interval and begins re-transmitting the message. Random wait times mean
56
ALAN R. HEVNER AND DONALD J. BERNDT
that nodes are unlikely to continue to demand the line at the same time [75]. The protocol resembles a human conversation when two people try to speak at once, except the events are measured in microseconds. Xerox developed the Ethernet standard along with DEC and Intel for commercial release in 1980, licensing the technology for a nominal fee. Ethernet is still the dominant LAN technology and has kept pace in the demand for faster speed with the introduction of Fast Ethernet (100 Mbps) and Gigabit Ethernet. The LAN provided the missing link that allowed massive numbers of personal computers to join large computers on the longdistance packet switching networks.
4.6 Internetworking Bob Khan and Vint Cerf collaborated on one of the most important protocol developments, the Transmission Control Protocol and the Internet Protocol (TCP/IP) suite. The protocol was intended to support internetworking, the interconnection of independent networks - - a network of networks, or what we now call an Internet. The most important constraint was that the underlying packet switched networks should remain unchanged, so their scheme was based on the idea of a gateway that would provide a bridge between dissimilar networks [80]. Their 1974 paper laid out a design for an internetworking protocol [83]. Xerox PARC had an influence here as well since Metcalfe was both a PARC employee and an A R P A N E T facilitator. Since PARC had large networks of Altos already running, an interconnection scheme was being pursued in a fully operational setting. Through the open A R P A N E T forum, ideas from PARC and other research groups contributed to the development of TCP/IP. The layered protocol approach allowed important capabilities to be situated at an appropriate level. Common functions were implemented lower in the protocol "stack," and more specialized functions were implemented at a higher level. Therefore, some of the more difficult debates revolved around where to locate specific capabilities. T E L N E T and FTP, along with SMTP for electronic mail and the newer HTTP protocol for the WWW all implement special functions that rely on the T C P / I P foundation. The IP provides the bedrock and, as a common service used by all higher level components, represents a level from which any unnecessary functionality must be removed. The IP protocol provides a "best-effort" delivery mechanism that does not offer any guarantees. The internetworking scheme called for gateways to be able to speak two network dialects that would provide a route from one network to the next. The individually addressed packets or datagrams are propagated from gateway to gateway
ERAS OF BUSINESS COMPUTING
57
using only the IP header or address information, leaving the interpretation of the contents to the destination. This simple hop-by-hop propagation is a lightweight approach without costly error handling techniques since delivery is not guaranteed. The TCP provides a reliable service for those applications willing to pay for the overhead. The TCP protocol is the responsibility of the source and destination, giving the protocol an end-to-end orientation. That is, any problems are resolved over the entire path by sender and receiver. This makes errors more costly, but the redundancy of the packet switched approach and vast improvements in reliability provided by fiber optics have made this a powerful approach. Direct access to the inexpensive "besteffort" service is provided through the User Datagram Protocol (UDP), allowing higher level applications to choose the appropriate level of service [84]. The official ARPANET transition to TCP/IP took place in 1983 and the growing network was split, with MILNET formed to handle classified military traffic [80].
4.6. 1
NSFNET
By the late 1970s, the ARPANET was ingrained in the research community, but remained restricted to research groups receiving Department of Defense (DoD) funding. The small close-knit computing research community that existed when ARPA first started funding research operated informally on many levels. However, in the new environment of rapidly growing computing research, ARPANET membership threatened to become a divisive factor. The National Science Foundation (NSF) stepped in to create a virtual network, CSNET, based on dial-up lines hosted by a BBN machine that provided inexpensive connectivity to the ARPANET and other networks [77]. This would suffice until a more permanent open-access network could be constructed. The NSF had already funded six supercomputing centers at universities across the country, so a network connecting these centers could form the backbone for regional networks. The NSFNET was patterned after the ARPANET with small, dedicated computers handling the networking as well as links to the supercomputers. The NSFNET was a spectacular success, calling for several backbone upgrades to accommodate the rapidly growing traffic. The MERIT consortium took over the administration of the network. In 1990, NSF began the process of transferring the network to the commercial sector, starting with Advanced Network Services (ANS), a non-profit corporation formed by MERIT, MCI, and IBM. The unprecedented growth of the Internet from these beginnings surpassed all expectations.
58
ALAN R. HEVNER AND DONALD J. BERNDT
4.7
LANs, WANs, and the Desktop
The theme of the communication section is that three important technologies converged in the mid 1970s, providing the complementary pieces that together allowed the computer to become a tool for communication not just computation. 9 The first technology was the packet switched networks outlined by Baran around 1960 and successfully implemented in the ARPANET. 9 The second important technology is local area networking that has allowed an innumerable number of desktops to be inexpensively connected to an increasingly networked world. 9 The third technological piece is the personal computer and the model of interactive use that was so successfully demonstrated by the Alto at Xerox PARC, and quickly made affordable by Apple and IBM. These three technologies combined to form the nascent computing environment that we see evolving today: one in which you sit at a personal computer interacting with windows, icons, mouse, and pull-down menus (i.e., the WIMP interface), sharing laser printers and servers across a LAN, yet capable of communication across the world via interconnected packet switched networks.
5.
Software
The third component of the business computing system is the application software. From its fairly primitive beginnings, software has come to dominate the cost of a typical business system. Yearly software sales run in the tens of billions of dollars, growing rapidly every year. This section presents a brief overview of the progress made in business software development during the business computing eras. The presentation is structured based on the software triangle shown in Fig. 4. Software is composed of three basic parts--the algorithmic procedure in a programming language; the data in a structured format; and the human-computer interaction interface. Bringing these parts together in a functional business system requires well-defined software development processes and development methods. 5.1
Algorithmic Programming
Solving a business problem first requires the creation of an algorithm that provides a step-by-step procedure for accepting inputs, processing data, and
59
ERAS OF BUSINESS COMPUTING Human-computer interaction
Algorithms
Data
FIG. 4. Components of business software.
producing outputs. Algorithms are typically represented in natural language or some form of structured format such as pseudocode or flowcharts. The objective of business programming is to code this algorithm in a form such that an important business problem can be solved via the use of a computer system. The history of computer programming languages has been documented in several excellent sources [85, 86].
5. 1.1 Early Business Programming Languages Programming in the Era of Calculation usually involved wiring plugboards, setting toggle switches on the side of the computer, or punching holes in paper tape. The wiring, switch settings, and holes represented instructions that were interpreted and executed by the computer. Each time a program was run the boards had to be rewired, switches had to be reset, or the paper tape rerun. The stored-program computer changed this onerous task by storing the program in internal memory and executing it on command. The Era of Automation brought the major advances of program compilers and assemblers. Grace Murray Hopper worked with Howard Aiken on programming the Harvard Mark I via paper tape. She applied this knowledge to the development of the first program compiler, the A-0, for the UNIVAC. The role of a compiler is to translate a high-level language in which humans can write algorithms into a computer's internal set of instructions. The assembler then translates the computer instructions into binary machine code for placement in the computer's memory. The research and development of compilers and assemblers led rapidly to the creation of the first computer programming languages.
60
ALAN R. HEVNER AND DONALD J. BERNDT
IBM developed FORTRAN (FORmula TRANslation) for the 704 computer in early 1957, contributing to the popularity of the product line. John Backus and his team defined the language to support engineering applications that required fast execution of mathematical algorithms [87]. The handling of large data files was a secondary consideration. FORTRAN has gone through many generations and remains popular today for engineering applications. Business computing had different requirements. Efficient data handling was critical and business programmers had the need for a more userfriendly, English-like language. In 1959, a team of Department of Defense developers, including Grace Murray Hopper, defined a business-oriented programming language COBOL (COmmon Business Oriented Language). COBOL is a highly structured, verbose language with well-defined file handling facilities. The English-like syntax makes programs more readable and self-documenting. It was also the first language to be standardized so its programs could run on different hardware platforms. COBOL has survived as the primary business programming language through all the eras of business computing. More legacy business computing systems are run via COBOL programs than any other language programs. Even today, COBOL advocates extol its advantages over more recent languages. COBOL provides sophisticated features for heterogeneous data structures, decimal arithmetic, powerful report generators, and specialized file and database manipulation [88]. Its most important advantage may be its impressive portability across nearly all hardware and software platforms. A new COBOL 2000 standard is being prepared with features for objectorientation, component-based development, web programming, and other state-of-the-art language features. It appears that COBOL will be around for a long while yet. IBM introduced RPG (Report Program Generator) as a report definition language in the early 1960s. With RPG the programmer defined a business form with data fields. RPG then produced reports by executing the forms based on underlying data files in the computer system.
5. 1.2 Structured Programming Languages As business programming moved into the Era of Innovation and Integration (circa 1965), a crisis in software development was beginning to be noticed. Large application programs (e.g., 50 000-100 000 lines of code) were being developed. These programs were very difficult to read, debug, and maintain. Software routinely failed and repairs were difficult and timeconsuming. The worldwide nature of the software problem was reflected in the NATO software engineering conferences held in 1968 (Garmisch, Germany)
ERAS OF BUSINESS COMPUTING
61
and 1969 (Rome) [89, 90]. The term "software engineering" was coined to generate discussion as to whether the development of software was truly an engineering discipline. The issues of software development and how to solve the software crisis debated at these early conferences remain relevant. Edsger Dijkstra's influential 1968 paper, "Go-to statement considered harmful," [91] addressed a major problem in existing programming languages. Flow of logical control through a program was often haphazard, leading to "spaghetti code" programs. Software researchers, such as Dijkstra and Harlan Mills, proposed structured programming as the answer to out-of-control program flow [92]. The use of only three simple structures--sequence, selection, and iteration--can express the control flow of any algorithm [93]. This understanding led to the development of new structured programming languages. The languages Pascal and ALGOL-68 initiated some of the principal structured programming concepts. However, they had little commercial impact. Meanwhile, new versions of FORTRAN-IV and COBOL integrated new structured features. IBM attempted to combine the best features of these two languages in PL/1 (Programming Language/One). Although some business systems were written in PL/1, the language never really caught on in the business world. The effectiveness of structured programming was clearly demonstrated on IBM's New York Times project, delivered in 1971. Structured programming techniques were used to build an on-line storage and retrieval system for the newspaper's archives. The completed system contained around 85 000 lines of code and was of a complexity well beyond previous IBM projects. Structured programming techniques gave the program team enhanced control of the system development process and produced a highly reliable system that crashed only once during the first year and reported only 25 errors during that year [94].
5. 1.3 Recent Programming Languages In the early 1970s, Ken Thompson and Dennis Ritchie developed a new systems programming language called C, using the language to implement UNIX. By allowing access to low level machine operations, this language defied many of the tenets of structured programming. Nevertheless, C has become a very popular programming language, particularly within the last two eras of business programming as personal computers have dominated the desktops. The language C + + evolved from C for the programming of object-oriented business applications [95]. Visual programming languages, such as Visual Basic (VB), incorporate facilities to develop Graphic User Interfaces (GUIs) for business applications.
62
ALAN R. HEVNER AND DONALD J. BERNDT
Such languages are particularly effective in the development of client-server distributed applications where the end-user at the client site needs an efficient, friendly interface. The advent of the Internet during the 1990s has provided the impetus for efficient, platform-independent programs that can be distributed rapidly to Internet sites and executed. The Java programming language was developed at Sun Microsystems to fit this need [96]. A consortium of industry proponents is working toward the standardization of Java for the programming of business applications on the Internet. 5.2
Data: File S y s t e m s and Database S y s t e m s
The management of data in systems predates recorded history. The first known writing was done on Sumerian stone tablets and consisted of a collection of data on royal assets and taxes. Writing on papyrus and eventually on paper was the predominant manner of manual data management up to the beginning of the 20th century. First mechanical and then electronic machinery rapidly changed the ways in which businesses managed data.
5.2. 1 Punched-card Data Management Although automated looms and player pianos used punched cards to hold information, the first true automated data manager was the punched-card system designed by Herman Hollerith to produce the 1890 US census. Automated equipment for handling and storing punched cards was the primary means of managing business data until the Era of Automation. An entire data management industry, whose leader was IBM, grew up around punched cards.
5.2.2 Computerized File Management The use of the UNIVAC computer system for the 1950 US census heralded the Era of Automation in business computing. To replace punched cards, magnetic drums and tapes were developed to store data records. Without the constraints of an 80-column card format, new, innovative data structures were devised to organize information for fast retrieval and processing. Common business applications for general-ledger, payroll, banking, inventory, accounts receivable, shipping invoices, contact management, human resources, etc. were developed in COBOL. All of these programs were centered on the handling of large files of data records. The prevailing system architecture during this era was that of batch-oriented
ERAS OF BUSINESS COMPUTING
63
processing of individual transactions. Files of transactions were run against a master file of data records, usually once a day or once a week. The problems with this architecture were the inherent delays of finding and correcting errors in the transactions and the lack of up-to-date information in the master file at any point in time [97].
5.2.3
On-Line Data Processing
Direct access storage devices, such as magnetic disks, and improved terminal connection devices opened the way for more effective business processes based on on-line processing. Users of the business system could conduct a complete transaction, reading and writing data records, from an on-line terminal connected directly to the mainframe computer. The computer was able to handle many terminals simultaneously via multiprocessing operating system controls. On-line transaction processing dominated the Era of Innovation and Integration from 1965 to the mid1970s. Direct access to data in magnetic storage led to improved structures for rapidly locating a desired data record in a large data file while still allowing efficient sequential processing of all records. Hierarchical data models presented data in hierarchies of one-to-many relationships. For example, a Department record is related to many Employee records and an Employee record is related to many Project records. Sophisticated indexing techniques provided efficient access to individual data records in the file structure. Popular commercial file systems, such as IBM's Indexed Sequential Access Mechanism (ISAM) and Virtual Sequential Access Mechanism (VSAM), provided very effective support for complex business applications based on large data sets. Hierarchical data models lacked a desired flexibility for querying data in different ways. In the above example, the hierarchy as designed would not efficiently support the production of a report listing all employees working on a given project. A more general method of modeling data was needed. An industrial consortium formed the Data Base Task Group (DBTG) to develop a standard data model. Led by Charles Bachman, who had performed research and development of data models at General Electric, the group proposed a network data model. The DBTG network model was based on the insightful concepts of data independence and three levels of data schemas: 9External Subschema: Each business application had its own subset view of the complete database schema. The application subschema was optimized for efficient processing.
64
ALAN R. HEVNER AND DONALD J. BERNDT
o Conceptuat Schema: The global database schema represented the logical design of all data entities and the relationships among the entities. 9Physical Schema: This schema described the mapping of the conceptual schema onto the physical storage devices. File organizations and indexes were constructed to support application processing requirements.
Data independence between the schema levels allowed a developer to work at a higher level while remaining independent of the details at lower levels. This was a major intellectual advance that allowed many different business applications, with different sub-schemas, to run on a single common database platform. Business architectures centered on large hierarchical and network databases were prevalent throughout the 1970s and well into the 1980s. Many such systems still perform effectively today.
5.2.4
Relational Databases
E.F. Codd, working at IBM Research Laboratory, proposed a simpler way of viewing data based on relational mathematics [98]. Two-dimensional relations are used to model both data entities and the relationships among entities based upon the matching of common attribute values. The mathematical underpinnings of the relational data model provided formal methods of relational calculus and relational algebra for the manipulation and querying of the relations. A standard data definition and query language, Structured Query Language (SQL), was developed from these foundations. Commercialization of the relational model was a painstaking process. Issues of performance and scalability offset the advantages of easier conceptual modeling and standard SQL programming. Businesses were reluctant to abandon their mission-critical mainframe database systems for the new relational technology. Advances in query optimization led to relational systems that could meet reasonable performance goals. An impetus to change came with the Era of Reengineering and Alignment. The relational model fit nicely with new client-server architectures. The migration of processing power to distributed client sites called for more user-friendly GUIs and end-user query capabilities. At the same time, more powerful processors for the servers boosted performance for relational processing. Oracle and IBM are the leaders in providing the commercial relational database systems that are at the heart of many of today's most interesting business applications.
ERAS OF BUSINESS COMPUTING
5.2.5
65
Future Directions in Data Management
Effective management of data has always been and will remain the center of business computing systems. The digital revolution has drastically expanded our definition and understanding of data. Multimedia data includes audio, pictures, video, documents, touch (e.g., virtual reality), and maybe even smells. Business applications will need to find effective ways of managing and using multimedia data. Object-oriented methods of systems development attempt to break the boundary between algorithmic procedures and data [99]. Objects are real world entities that encapsulate internal data states and provide procedures that interact with the environment and modify internal states. Two main themes characterize the use of object technology in database management systems: object-relational technology integrates the relational model and support for objects, whereas object-oriented systems take a more purist approach. Commercial object-oriented database systems have had limited success but have demonstrated promise in several important business applications [100]. A major challenge for the future of data management will be how to manage the huge amounts of information flowing over the WWW. It is estimated that a majority of business (e.g., marketing, sales, distribution, and service) will be conducted over the Internet in the near future. New structures for web databases must support real-time data capture, on-going analyses of data trends and anomalies (e.g., data mining), multimedia data, and high levels of security. In addition, web-enabled business will require very large databases, huge numbers of simultaneous users, and new ways to manage transactions. 5.3
Human-Computer
Interaction (HCI)
The effectiveness of any business computer system is determined by the quality of its interfaces with the external world. This area of research and development has been termed Human-Computer Interaction (HCI).
5.3. 1 Early Computer Interfaces The first computer interfaces were toggle switches, blinking lights, and primitive cathode-ray tubes. Quickly the need for more effective input/output devices brought about the use of paper tape drives and Teletype printers. Up to 1950 however, interaction with the computer was the domain of computer specialists who were trained to handle these arcane and unwieldy interfaces.
66
ALAN R. HEVNER AND DONALD J. BERNDT
The use of computers in business systems required more usable, standard HCIs. During the Era of Automation, the standard input medium was the Hollerith card. Both the program and data were keypunched on cards in standardized formats. The cards were organized into card decks, batched with other card decks, and read into the computer memory for execution. Output was printed on oversized, fan-fold computer paper. The IBM 1401 computer system, introduced in 1959, was a major business success primarily because of its high-speed (600 lines per minute) 1403 chain printer. Businesses were required to hire computer operations staff to maintain the computer systems and to control access to the computer interfaces. End-user computing was rare during this era.
5.3.2
Text-Based Command Interfaces
As business uses of the computer grew during the 1970s, the demand from end-users for more effective, direct interaction with the business computer systems grew correspondingly. Moving from batch computer architectures to on-line distributed architectures necessitated new terminal-based interfaces for the application users. Computer terminals were designed to combine a typewriter input interface with a cathode-ray tube output interface. Terminals were connected to the mainframe computer via direct communication lines. The design of the computer terminal was amazingly successful and remains with us today as the primary HCI device. HCI interfaces for on-line applications were either based on scrolling lines of text or on predefined bit-mapped forms with fields for text or data entry. Standard business applications began to proliferate in the environment of on-line computing. For example:
9 Text Editing and Word Processing: The creation, storage, and manipulation of textual documents rapidly became a dominant use of business computers. Early text editors were developed at Stanford, MIT, and Xerox PARC. Commercial WYSIWYG word processing packages came along in the early 1980s with LisaWrite, a predecessor to MacWrite, and WordStar. 9Spreadsheets: Accounting applications are cornerstone business activities. Commercial accounting packages have been available for computers since the 1950s. The spreadsheet package VisiCalc became a breakthrough product for business computing when it was introduced in 1979. Lotus 1-2-3 and Microsoft Excel followed as successful spreadsheet packages. 9Computer-Aided Design: The use of computers for computer-aided design (CAD) and computer-aided manufacturing (CAM) began
ERAS OF BUSINESS COMPUTING
67
during the 1960s and continues today with sophisticated application packages. 9Presentation and Graphics: Research and development on drawing programs began with the Sketchpad system of Ivan Sutherland in 1963. Computer graphics and paint programs have been integrated into business applications via presentation packages, such as Microsoft's PowerPoint. Text-based command languages were the principal forms of HCI during the 1970s and 1980s for the majority of operating systems, such as UNIX, IBM's MVS and CICS, and DEC's VAX VMS. The users of these systems required a thorough knowledge of many system commands and formats. This type of text-based command language carried over to the first operating systems for personal computers. CPM and MS-DOS constrained users to a small set of pre-defined commands that frustrated end-users and limited widespread use of personal computers.
5.3.3
The WIMP Interface
Many years of research and development on computer GUIs have led to today's WIMP HCI standards. Seminal research and development by J.C.R. Licklider at ARPA, Douglas Englebart at Stanford, and the renowned group at Xerox PARC led to the many innovative ideas found in the WIMP interface [101]. The first commercial computer systems popularizing WIMP features were the Xerox Star, the Apple Lisa, and the Apple Macintosh. The X Window system and the Microsoft Windows versions made the WIMP interface a standard for current business computer systems. More than any other technology, the WIMP interface and its ease of use brought the personal computer into the home and made computing accessible to everyone. Advantages to businesses included an increase in computer-literate employees, standard application interfaces across the organization, decreased training time for new applications, and a more productive workforce.
5.3.4
Web Browser Interfaces and Future Directions
As with all computer technologies the Internet has brought many changes and new challenges to HCI. The WWW is based on the concept of hypertext whereby documents are linked to related documents in efficient ways. Documents on the Internet use a standard coding scheme (HTML and URLs) to identify the locations of the linked documents. Specialized web
68
ALAN R. HEVNER AND DONALD J. BERNDT
browsers provide the interfaces for viewing documents on the web. Mosaic from the University of Illinois was the first popular web browser. Currently, Netscape and Microsoft provide the most widely used web browsers. Improvements, such as XML, the successor to HTML, will support new browsing features and capabilities for the future of the WWW. There are numerous future directions in the field of HCI. For example
[lO1]: 9Gesture Recognition: The recognition of human gestures began with light pens and touch sensitive screens. The recording and recognition of handwriting is a subject of on-going research. 9Three-dimensional Graphics: Research and development on 3D interfaces has been an active area, particularly in C A D - C A M systems. 3D visualization of the human body has the potential to revolutionize surgery and healthcare. 9 Virtual Reality: Scientific and business uses of virtual reality are just now being explored. Head-mounted displays and data gloves, funded by NASA research, will become commercially viable in the near future for marketing demonstrations and virtual design walkthroughs. 9 Voice Recognition and Speech: Audio interfaces to computer systems have been available for the past decade. However, the limited vocabulary and requirements for specific voice pattern recognition remain problems to overcome before widespread use.
5.4
Software Development Processes and Methods
The three software technologies of algorithmic programming, data, and HCI are brought together in the design and implementation of a business application via software development processes and methods. A software development process is a pattern of activities, practices, and transformations that support managers and engineers in the use of technology to develop and maintain software systems. A software development method is a set of principles, models, and techniques for effectively creating software artifacts at different stages of development (e.g., requirements, design, and implementation). Thus, the process dictates the order of development phases and the transition criteria for transitioning from one phase to the next, while the method defines what is to be done in each phase and how the artifacts of the phase are represented. The history of business computing has seen important advances in both software processes and software methods. We briefly track the evolution of these advances in this section.
ERAS OF BUSINESS COMPUTING
69
5.4. 1 Software Development Processes Throughout the Era of Automation very little attention was paid to organizing the development of software systems into stages. Programmers were given a problem to solve and were expected to program the solution for computer execution. The process was essentially "code and fix." As problems became more complex and the software grew in size this approach was no longer feasible. The basic "waterfall process" was defined around 1970 [102]. A welldefined set of successive development stages (e.g., requirements analysis, detailed design, coding, testing, implementation, and operations) provided enhanced management control of the software development project. Each stage had strict entrance and exit criteria. Although it had several conceptual weaknesses, such as limited feedback loops and overly demanding documentation requirements, the waterfall process model served the industry well for over 20 years into the 1990s. The principal Department of Defense process standard for the development of software systems during this period, DOD-STD-2167A, was based on the waterfall approach. Innovative ideas for modeling software development processes include the spiral model [ 103] and incremental development [ 104]. The spiral model shows the development project as a series of spiraling activity loops. Each loop contains steps of objective setting, risk management, development/verification, and planning. Incremental development emphasizes the importance of building the software system in well-defined and well-planned increments. Each increment is implemented and certified correct before the next increment is started. The system is thus grown in increments under intellectual and management control. A standard, flexible software development process is essential for management control of development projects. Recent efforts to evaluate the quality of software development organizations have focused on the software development process. The Software Engineering Institute (SEI) has proposed the Capability Maturity Model (CMM) to assess the maturity of an organization based on how well key process areas are performed [105]. The CMM rates five levels of process maturity: 9 initial: ad hoc process 9 repeatable: stable process with a repeatable level of control 9 defined: effective process with a foundation for major and continuing progress 9 managed: mature process containing substantial quality improvements 9 optimized: optimized process customized for each development project.
70
ALAN R. HEVNER AND DONALD J. BERNDT
The principal goal of the CMM is for organizations to understand their current process maturity and to work toward continuous process improvement. The international ISO-9000 standards contain similar provisions to evaluate the effectiveness of the process in software development organizations.
5.4.2 Early Development Methods Early methods of program design were essentially ad-hoc sketches of logic flow leading to the primary task of writing machine code. These design sketches evolved into flowcharting methods for designing program logic. Basic techniques also evolved for the development activities of requirements analysis, software design, and program testing. The creation of software was initially considered more of a creative art than a science. As business system requirements became more complex into the 1960s, development organizations quickly lost the ability to manage software development in a predictable way. Defined development processes and methods brought some controls to software construction. Structured methods for the analysis and design of software systems appeared during the late 1960s and early 1970s. Two primary approaches were defined-procedure-oriented methods and data-oriented methods. The development of procedure-oriented methods was strongly influenced by the sequential flow of computation supported by the dominant programming languages, COBOL and FORTRAN. The focus of software development under this paradigm is to identify the principal functions (i.e., procedures) of the business system and the data flows among these functions. Persistent data stores are identified along with the data flows among functions and data stores. The system functions are hierarchically decomposed into more detailed descriptions of sub-functions and data flows. After sufficient description and analysis, the resulting functions and data stores are designed and implemented as software modules with inputoutput interfaces. Primary examples of procedure-oriented system development methods include structured analysis and structured design methods [106, 107] and the Jackson development methods [108]. Data-oriented system development places the focus on the required data. The data-centric paradigm is based on the importance of data files and databases in large business applications. System data models are developed and analyzed. System procedures are designed to support the data processing needs of the application. The design and implementation of the application software is constructed around the database and file systems. Primary data-oriented methods included the Warnier-Orr methods [109] and information engineering [110].
ERAS OF BUSINESS COMPUTING
71
5. 4.3 Object-Oriented Methods In the early 1980s, object-oriented (OO) methods of software development were proposed for building complex software systems. Object-orientation is a fundamentally different view of a system as a set of perceptible objects and the relationships among the objects. Each object in the application domain has a state, a set of behaviors, and an identiO'. A business enterprise can be viewed as a set of persistent objects. Business applications are developed by designing the relationships and interactions among the objects. Advocates point out several significant advantages of object-oriented system development, including increased control of enterprise data, support for reuse, and enhanced adaptability to system change. Risks of object-oriented development include the potential for degraded system performance and the startup costs of training and gaining object-oriented experience. Early object-oriented languages included Simula-67 and Smalltalk. Today, C + + and Java are the most popular object-oriented languages. A plethora of object-oriented development methods in the 1980s have converged into the Unified Modeling Language (UML) standards [111]. A significant percentage of new business applications are being developed as object-oriented software systems.
5.4.4 Formal Development Methods The requirement for highly reliable, safety-critical systems in business, industry and the public sector has increased interest in formal software development methods. Formal methods are based on rigorous, mathematics-based theories of system behavior [112, 113]. Formal methods, such as the Cleanroom methods [114], support greater levels of correctness verification on all artifacts in the development process: requirements, design, implementation, and testing. The use of formal methods requires the use of mathematical representations and analysis techniques entailing significant discipline and training in the development team. Although anecdotal evidence of the positive effects (e.g., improved quality and increased productivity) of formal methods is often reported [115, 116] more careful study is needed [117, 118]. However, we are seeing a move to require the use of formal methods on all safety-critical software systems by several countries and standards bodies [119].
5.4.5 Component-Based Development (CBD) Methods The latest emerging trend for the development of business systems is component-based development (CBD). CBD extends the ideas of software
TABLE II SOFTWARE TECHNOLOGIES IN THE BUSINESS COMPUTING ERAS
Era of Decentralization 1975-84
Era of Reengineering and Alignment 1985-94
Era of the Internet and Ubiquitous Computing > 1995
Era of Calculation < 1950
Era of Automation 1950-64
Era of Integration and Innovation 1965-74
Switches Paper tapes Punched cards
FORTRAN COBOL RPG
Structured programming Programming teams
Visual programming
Object-Oriented programming
Internet programming Java XML
Punched cards Paper documents
Magnetic tapes Data file organization
Magnetic drums and disks Hierarchical databases Network databases (DBTG)
Relational databases
Optical disk Object-oriented databases
Multimedia data Data warehousing Data mining
Human-mechanical interaction
Punched-card input Printer output
On-line terminals Text command language
WlMP interfaces
Extended WIMP interfaces
Web browsers
None
Code and fix
Waterfall model
Prototyping Spiral model Incremental
CMM and ISO 9000
Optimized, Adaptive Process Models
)
ERAS OF BUSINESSCOMPUTING
73
reuse into a full-scale development process whereby complete business applications are delivered based upon the interaction and integration of software components. Wojtek Kozaczynski gives a business-oriented definition of a component in [120]: A business component represents the software implementation of an autonomous business concept or business process. It consists of the software artifacts necessary to express, implement, and deploy the concept as a reusable element of a larger business system. The component is essentially a black box with well-defined interfaces to the external world. Thus, each component provides a service to the business system. New products and standards for middleware provide the "glue" for building systems from individual components. The technologies of DCOM, CORBA, and Enterprise JavaBeans are a start for enabling CBD processes and methods. Object-oriented concepts, such as encapsulation and class libraries, and emphasis on system architectures, such as n-tier client-server, support the realization of CBD in business environments. As the underlying technologies mature and the software development industry accepts and adopts enabling standards, component-based development will become a dominant paradigm for building business software systems [121].
5.5
Software S u m m a r y
Table II summarizes the mapping of the software technologies to the eras of business computing.
6.
Business System Architectures
The three technology components of a business computer system w computing platform, communications, and software--are integrated via system architecture into a functional, effective business system. We explore the evolution of business system architectures by bringing together two separate streams of research and practice--computer system and software architectures and the management of information systems (MIS). Strong synergies can be seen in the interplay of computer technology architectures and the MIS models of business computing over the eras of business computing during the past 50 years. A classic paper by Zachman [122] presented an information systems architecture framework made up of the elements of data, process, and networking. The Zachman IS architecture combines representations of these elements into an architectural blueprint for a business application. The
74
ALAN R. HEVNER AND DONALD J. BERNDT
framework of this chapter differs by including the computational platform (hardware and OS) as a fundamental element and combining algorithmic procedure (i.e., process) and data into the software element. However, our framework objectives are similar to Zachman's in providing a basis for understanding how information technology elements are brought together into effective business systems via architectural descriptions. The importance of software architectures for the development of complex systems has been emphasized in the software engineering literature [123, 124]. Architectural styles are identified based on their organization of software components and connectors for the transmission of data and control among components. We structure our presentation of business system architectures by discussing major computer architecture milestones. We find a close correspondence among these milestones, the eras of business computing in which they occurred, and the MIS focus prevalent during that era. Figure 5 summarizes the presentation. The business system architectures reflect the cumulative effect of the MIS loci over the years. Business systems have evolved and become more complex due to the requirements for meeting many, sometimes conflicting, business objectives. We will see how the architectural solutions attempt to satisfy these multiple objectives.
On-line, real-time architectures
Mainframe, data-flow architectures
Manual processes
Web-based Event-driven, architectures component-based Distributed, architectures client-server ,~ WWW architectures focus A l i g n m e n t focus C u s t o m e r focus P e r f o r m a n c e focus
Technology focus
'
Era of
1
Calculation 1950
Business process focus I I ' Era of Automation
Era of 1965 Integration and Innovation
FIG. 5.
Era of 1975 Decentralization
] Era of i Era of the 1985 Reengineering 1995 Internet and and Alignment Ubiquitous Computing
Business system architectures.
ERAS OF BUSINESS COMPUTING 6.1
75
M a n u a l Business Processes
In the centuries leading up to the invention of the computer, businesses focused their creative energies on the development of effective business processes for production, personnel management, accounting, marketing, sales, etc. Standard operating procedures (SOPs) and workflow processes were widely used throughout business history. The concept of a "General Systems Theory" guided the structure and application of these business processes. The following passage from Barnard [125] shows organizational systems thinking: A cooperative system is a complex of physical, biological, personal, and social components which are in a specific systematic relationship by reason of the cooperation of two or more persons for at least one definite end. Such a system is evidently a subordinate unit of larger systems from one point of view; and itself embraces subsidiary systems--physical, biological, etc.--from another point of view. One of the systems within a cooperative system, the one which is implicit in the phrase "cooperation of two or more persons" is called an "organization". Even before the advent of computers, intellectual leaders such as Herbert Simon and C. West Churchman were extending the ideas of systems thinking into business organizations [126]. Such systemic business processes were performed manually during the Era of Calculation up to 1950. However, the business focus of getting the critical business processes right before automation remains an underlying tenet of all successful organizations today and for the foreseeable future.
6.2
Mainframe Architectures
The automation of business processes with the original large mainframe computer systems occurred slowly at first. Management focus was on how computer technology could best be introduced into the organization. Gorry and Scott Morton [127] suggested a framework for the development of management information systems. Nolan [128] proposed a widely cited six-stage model of data processing (DP) growth within an organization. His six stages of growth with suggested technology benchmarks are: 9Stage 1--Initiation: 100% batch processing 9Stage 2--Contagion: 80% batch processing, 20% remote job entry 9Stage 3--Control: 70% batch processing, 25% database processing, 5 % time-sharing
76
ALAN R. HEVNER AND DONALD J. BERNDT
9Stage 4--Integration." 50% batch and remote job entry, 40% database/data communications processing, 10% minicomputer and microcomputer processing 9Stage 5--Data Administration." 20% batch and remote job entry, 60% database/data communications processing, 15% minicomputer and microcomputer processing, 5% personal computers 9Stage 6--Maturity." 10% batch and remote job entry, 60% database/ data communications processing, 25% minicomputer and microcomputer processing, 5% personal computers. Growth benchmarks are also provided for applications portfolio, planning and control, organizational structure, and user awareness. Although other researchers have pointed out weaknesses in Nolan's stage model (e.g., [129]), the technology benchmarks cited above clearly demonstrate a management focus on the evolution of business system architectures. The 1950s and early 1960s saw a vast majority of business application programs written in COBOL based on basic data flow architectures. During this Era of Automation, computer systems consisted primarily of the computational platform (e.g., mainframe and operating system) and early application software systems. In a data flow architecture, data in the form of variables, records, or files move from one computer system application to the next until the required business process is completed. The simplest form of a data flow architecture is known as a batch sequential architecture. Data is batched into large files and the application programs are batched for sequential runs on the data files. The classic Master File-Transaction File applications are based on batch sequential processing. The pipe and filter architecture is a more general model of data flow. Pipes carry data from one filter to the next in a network data flow. A filter accepts streams of data as input and produces streams of data as output. The filter performs some local transformation of an input into an output on a continuing basis. Each filter is independent of all other filters in the data flow architecture. The pipe and filter structure provided the underlying computational model for the UNIX operating system [68]. 6.3
On-Line, Real-Time Architectures
During the Era of Integration and Innovation from 1965 to 1974, businesses began to realize the competitive advantages of on-line, real-time processing. The evolving technologies of databases, data communications, and the computational platform (e.g., minicomputers and real-time operating systems) enabled sophisticated real-time business applications to
ERAS OF BUSINESS COMPUTING
77
be developed. Business computer systems moved from strictly operational, back office systems to initial forays into truly strategic information systems [130]. Examples of seminal real-time business applications included: 9 American Hospital Supply placing on-line order entry terminals in hospitals 9 Merrill Lynch introducing its Cash Management Account 9 American Airlines developing the Sabre computerized reservation system. Recognizing the overwhelming success of these ventures, businesses moved to new computer system architectures that would provide the performance to support on-line, real-time applications. On-line processing required important new advances in data communications (e.g., remote job entry, real-time data queries and updates), database repositories (e.g., hierarchical and network databases), and operating systems (e.g., multiprogramming, real-time interrupts, resource allocation). The critical need was to integrate these new technologies into a computer architecture with sufficient performance to meet rigorous response time and data capacity requirements. The principal business system architecture used to meet these requirements was a repository architecture. A central repository of business data in file and database formats represents the current state of the application system. Multiple sources of independent transactions (i.e., agents) perform operations on the repository. The interactions between the central repository and the external agents can vary in complexity, but in the early days of on-line applications they consisted mostly of simple queries or updates against the repository. The real-time operating system provides integrity and concurrency control as multiple transactions attempt to access the business data in real-time. The data-centric nature of most business applications has made the repository architecture with real-time requirements a staple of business application development. 6.4
Distributed, C l i e n t - S e r v e r Architectures
The data communication inflexion point occurring around 1975 ushered in the Era of Decentralization. For businesses, the ability now to decentralize the organization and move processing closer to the customer brought about major changes in thinking about business processes and the supporting business computer systems. The new customer focus was exemplified by the critical success factor (CSF) method for determining business requirements. Critical success factors are the limited number of
78
ALAN R. HEVNER AND DONALD J. BERNDT
areas in which results, if they are satisfactory, will ensure successful competitive performance for the organization [131]. Key executives from the business are interviewed and CSFs are identified from four prime sources: 9 9 9 9
structure of the particular industry competitive strategy, industry position, and geographical location environmental factors temporal factors.
The CSFs are collected, organized, and prioritized based on feedback from the executives. Business processes are then developed to support realization of the organization's critical success factors. The results of CSF studies invariably found that the decentralization of information and processing closer to the customer location is an effective business strategy. The technology components to support true distributed processing were available during this era to support these decentralized business strategies. Networks of communicating computers consisted of mainframes, minicomputers, and increasingly popular microcomputers. Distributed architectures became the norm for building new business computer systems [132, 133]. Distributed computing provided a number of important advantages for business systems. Partitioning the workload among several processors at different locations enhanced performance. System availability was increased due to redundancy of hardware, software, and data in the system. Response time to customer requests was improved since customer information was located closer to the customer site. The ability to integrate minicomputers and microcomputers into the distributed architecture provided significant price-performance advantages to the business system. The potential disadvantages of the distributed system were loss of centralized control of data and applications and performance costs of updating redundant data across the network. An important variant of the distributed architecture is the client-server architecture. A server process, typically installed in a larger computer, provides services to client processes, typically distributed on a network. A server can be independent of the number and type of its clients, while a client must know the identity of the server and the correct calling sequence to obtain service. Examples of services include database systems and specialized business applications. To aid in handling the complexity of the new distributed business systems, the architectural style of layering was applied. Layered architectures were proposed for managing data communication protocols on networks. The
ERAS OF BUSINESS COMPUTING
79
three most popular communication architectures are: 9 International Standards Organization (ISO) Open Systems Interconnection (OSI) Seven Layer Architecture [75] 9 IBM's Systems Network Architecture (SNA) [134] 9 TCP/IP Protocol Suite [84]. These layered architectures allowed distributed applications to be written at high levels of abstraction (i.e., higher layers) based on services provided by lower layers in the architecture.
6.5 Component-Based Architectures The Era of Reengineering and Alignment was predicated on the principles of total quality management (TQM) as discussed in Section 2. Longstanding business processes are reengineered to take greatest advantage of new information technologies and computer system architectures. An equally important management focus that occurred during this era was the alignment of business strategy with information technology (IT) strategy in the organization. A strategic alignment model proposed by Henderson and Venkatraman [20] posits four basic alignment perspectives:
1. Strategy Execution: The organization's business strategy is well defined and determines the organizational infrastructure and the IT infrastructure. This is the most common alignment perspective. 2. Technology Transformation: The business strategy is again well defined, but in this perspective it drives the organization's IT strategy. Thus, the strategies are in alignment before the IT infrastructure is implemented. 3. Competitive Potential: The organization's IT strategy is well defined based upon innovative IT usage to gain competitive advantage in the marketplace. The IT strategy drives the business strategy which in turn determines the organizational infrastructure. The strategies are aligned to take advantage of the IT strengths of the organization. 4. Service Level: The IT strategy is well defined and drives the implementation of the IT infrastructure. The organizational infrastructure is formed around the IT infrastructure. The business strategy does not directly impact the IT strategy. Although all four perspectives have distinct pros and cons, the alignment of the business strategy and the IT strategy before the development of the organizational and IT infrastructures in perspectives 2 and 3 provides a consistent vision to the organization's business objectives.
80
ALAN R. HEVNER AND DONALD J. BERNDT
The period from 1985 to 1994 was a period of continuous and rapid change for business from the proliferation of desktop workstations throughout the organization to the globalization of the marketplace [135]. The fundamental business strategy of "make and sell" was transformed into a strategy of "sense and respond" [136]. Two new computer system architectures were devised to meet these changing environmental demands. Event-driven architectures have become prevalent in business systems that must react to events that occur in the business environment [137]. When an important event occurs a signal is broadcast by the originating component. Other components in the system that have registered an interest in the event are notified and perform appropriate actions. This architecture clearly performs well in a "sense and respond" business environment. Note that announcers of events are unaware of which other components are notified and what actions are generated by the event. Thus, this architecture supports implicit invocation of activity in the system. This architecture provides great flexibility in that actions can be added, deleted, or changed easily for a given event. The important new development ideas of component-based development have led naturally to the design and implementation of component-based architectures. Business systems are composed of functional components glued together by middleware standards such as CORBA, DCOM, and Enterprise JavaBeans. In many cases the components are commercial off the shelf (COTS) products. Thus, organizations are able to build complex, highperformance systems by integrating COTS components via industry standard middleware protocols. This minimizes development risk while allowing the business organization to effectively align its IT strategy with its business strategy via judicious selection of best practice functional components. Enterprise Resource Planning (ERP) business systems from vendors like SAP, Baan, PeopleSoft, and Oracle utilize component-based architectures to allow clients to customize their business systems to their organization's requirements. 6.6
Web-Based Architectures
The influence of the WWW has required businesses to rethink their business and IT strategies to take greatest advantage of its revolutionary impact [138]. This Era of the Internet and Ubiquitous Computing will generate new web-based architectures for integrating the Internet into business functions of marketing, sales, distribution, and funds transfer. The rapid exchange of information via push and pull technologies to any point of the globe will eliminate most boundaries and constraints on international commerce. However, critical issues of security, privacy, cultural differences
ERAS OF BUSINESS COMPUTING
81
(e.g., language), intellectual property, and political sensitivities will take many years to be resolved.
7.
Conclusions and Future Directions
The past half-century has seen amazing progress in the use of information technology and computer systems in business. Computerization has truly revolutionized the business organization of today. This chapter has presented a structured overview of the evolution of business computing systems through six distinct eras: 9 9 9 9 9 9
Era Era Era Era Era Era
of of of of of of
Calculation Automation Integration and Innovation Decentralization Reengineering and Alignment the Internet and Ubiquitous Computing.
Advances in each of the major computing technologies--Computational Platform, Communications, Software, and System Architecture--have been surveyed and placed in the context of the computing eras. We close the chapter by presenting a summary of the major conclusions we draw from this survey of business computing. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact.
7.1
Computers as Tools of Business
Businesses have been most successful in their use of computing technologies when they have recognized the appropriate roles of computers as tools of business. The underlying business strategies and the effective business processes that implement the strategies are the critical success factors. Computing systems provide levels of performance, capacity, and reach that are essential for competitive advantage in the business environment. Recent popular management theories of business process reengineering and business strategy-information systems strategy alignment have emphasized the close, mutually dependent relationships between business goals and computer systems support for these goals. The expansion and application of these theories will be important future directions for forward moving business organizations. Smart businesses realize that innovative technologies, such as the Internet, have the potential to enable new forms of business.
82
ALAN R. HEVNER AND DONALD J. BERNDT
Thus, new business processes must be defined that take greatest advantage of the capabilities of the new technologies. However, it is important to keep in mind that it is the horse (business strategy and processes) that pulls the cart (computer systems support).
7.2
Miniaturization of the Computational Platform
One of the most astonishing trends over the past 50 years is the consistent growth in computer chip performance as governed by Moore's law which predicts a doubling of chip capacity every 18 months. Just as important for business applications, as performance has skyrocketed the size of the computing platform has grown smaller and smaller. Miniaturization of the computing platform has essentially erased physical boundaries of computer proliferation. Embedded computing systems now reside in nearly every appliance we own--automobiles, washing machines, microwaves, and audio/video equipment. Current research on wearable computers is expanding the range of computerization to the human body [139]. Miniature information processing devices can be embedded in eyeglasses, wristwatches, rings, clothing, and even within the body itself. The future research direction of biocomputing holds intriguing possibilities for business applications. We still have a great deal to learn about how the brain processes information and makes decisions [140]. New computer architectures based on biological brain patterns have the potential to revolutionize the future of computing platforms. Connecting these miniature computer brains with the Internet as a world wide brain could result in quite interesting business application scenarios. 7.3
Communications
Inflexion Point
A major trend throughout the eras of business computing has been the increasing dominance of communications technology in business systems. We have noted the progression from on-line systems to distributed systems to pervasive networking (e.g., WANs and LANs) to the WWW. We observed that a communications inflexion point occurred around 1975 where computing systems changed from being defined by their hardware (i.e., computing platform) architecture to being defined by their communications architecture. Since this point in time businesses have increasingly relied upon their telecommunications infrastructure to support essential business applications. Even basic office applications such as email and FTP have dramatically enhanced business processes by speeding the communication paths among employees [141]. It is clear that electronic commerce will have a pervasive role in the future marketplace of products and services. It is
ERAS OF BUSINESS COMPUTING
83
projected that sales of business-to-business electronic commerce applications will grow to $93 billion in year 2000. Thus, we believe that the trend for communications technologies to increase in importance as a critical component in business computing systems will continue into the future.
7.4
Growth of Business Information and Knowledge
The amount of information in the world is growing at an exponential rate. In parallel, the means for accessing this information is expanding rapidly via the Internet and other digital media. Businesses are faced with numerous challenges and opportunities in deciding how to access, filter, and use the massive amount of information most effectively in critical business processes. The study of business information and rules for applying the information make up the emerging research area of knowledge management. The inherent knowledge and acquired intelligence of the business organization is being recognized as its most important asset. The history of business computing systems has seen many forms of knowledge management support systems.
9Decision support systems support the transformation of raw data into business information that is used for managerial decision making. Decision models are used to structure and transform data into information. 9Expert systems utilize expert business rules and inference engines to generate predictions, diagnoses, and expert decisions. 9 More recently, intelligent agents have been applied to the collection, filtering, and presentation of business information. An intelligent agent is an autonomous software component that can react to events in business environments and deliver information results to a decisionmaker. The requirements for storing and manipulating huge amounts of business information have led to the exciting fields of data warehousing and knowledge discovery (i.e., data mining). A data warehouse is distinctly different from an operational business database system. The data warehouse is a repository of information drawn from many operational databases to provide a more comprehensive view of the business enterprise. The information is time-stamped to allow trend analyses over periods of interest. Thus, information is not updated once it is in the data warehouse. Instead, new data values with a timestamp are added to the warehouse on top of previous data values. Sophisticated querying and reporting capabilities allow users efficient access to the business information.
84
ALAN R. HEVNER AND DONALD J. BERNDT
Data mining strategies are applied to the data warehouse in order to discover new and interesting relationships (i.e., knowledge) among the information entities. Business organizations are just now beginning to reap the benefits of data warehousing and data mining. This future direction provides great potential for businesses to capitalize on their business knowledge and to enhance their business strategies.
7.5 Component-Based Software Development The size and complexity of business application software have grown at an exponential rate over the past 50 years. Maintaining intellectual control over the development of such complex systems is difficult at best, and nearly impossible without disciplined development approaches. Underlying principles of system hierarchies, data encapsulation, referential transparency [104], and structured processes and methods support evolving best practices to control the development of complex business systems. An important future direction of software development is component-based software development as discussed in Section 5.4.5. The effective use of software architecture concepts provides a blueprint for the construction of large systems made up of smaller, pre-developed system components. The components are selected, connected, and verified to provide required business system functionality. A goal of component-based software development is to have an open marketplace for vendors to develop state-of-the-art components that will plug-and-play in open system architectures. Although it is still something of a utopian dream, several research and development activities portend this future direction of software development:
9Open System Standards:
Software development organizations are becoming more amenable to the need for open standards in order to support the integration of software components. Software engineering standards come from international (e.g., ISO) and national (e.g., ANSI) regulatory bodies [142], from industrial consortia (e.g., Open Management Group), and from organizations with dominant market presence (e.g., Microsoft). The principle of natural selection allows the best standards to survive and flourish in the marketplace. 9Open Source Software: The appeal of open source software, such as the LINUX operating system, lies in the continuity of support across a large user base. Since no one vendor "owns" the source code, business applications are shared and supported by the open source software community and culture. 9Commercial Off-The-Shelf Software (COTS): The majority of businesses cannot afford to staff and support an internal software
ERAS OF BUSINESS COMPUTING
85
development organization. An effective information systems strategy for such businesses is to evaluate, select, and purchase COTS application systems. A COTS strategy requires an enterprise architecture foundation for the integration of the various systems into an effective whole. Enterprise Resource Planning (ERP) Systems: The promise of ERP systems is that a single, integrated software system can provide total control of an organization's processes, personnel, and inventory. Moreover, the integration of business information across multiple functions will produce new business intelligence and will support new business applications. ERP vendors, such as SAP, Baan, Oracle, and PeopleSoft, provide a component-based strategy where clients can customize their business systems by selecting needed business functions for integration into the vendor's ERP architecture. The ERP market is estimated to grow to just under $30 billion in the year 2000.
7.6
MIS and Business System Architecture Synergies
An intriguing contribution of this survey is the relationship found between the management of information systems literature and the prevalent business system architectures during the eras. As shown in Fig. 5 and described in Section 6, a new management focus in each era created expanded systems requirements that were met by innovative architectural designs. The current management foci on electronic commerce and ubiquitous computing will lead to future business computing architectures that will fully integrate organizational intranet and WWW components. REFERENCES [1] Swade, D. (1991). Charles Babbage and His Calculating Engines, Science Museum, London. [2] Campbell-Kelly, M. and Aspray, W. (1996). Computer." A Histoo' of the Information Machine, Basic Books, New York. [3] Aspray, W. (ed.) (1990). Computing Before Computers, Iowa State University Press, Ames, IA. [4] Cortada, J. (1993). Before the Computer." IBM, NCR, Burroughs, and Remington Rand and the Industry The)' Created, 1865-1956, Princeton University Press, Princeton, NJ. [5] Williams, F. (1985). A Histoo'of Computing Technology, Prentice-Hall, Englewood Cliffs, NJ. [6] Austrian, G. (1982). Herman Hollerith." Forgotten Giant of Information Processing, Columbia University Press, New York. [7] Rodgers, W. (1969). THINK: A Biography of the Watsons and IBM, Stein and Day, New York.
86
ALAN R. HEVNER AND DONALD J. BERNDT
[8] Gotlieb, C. (1961). General-purpose programming for business applications. Advances in Computers I, Academic Press, Boston, MA. [9] Ceruzzi, P. (1998). A Histoo'of Modern Computing, MIT Press, Cambridge, MA. [10] Diebold, J. (1952). Automation, Van Nostrand, New York. [11] Gifford, D. and Spector, A. (1987). Case study: IBM's System/360-370 architecture. Communications of the ACM 30(4). [12] Grosch, H. (1991). Computer. Bit SlicesjJ'om a Lfle, Third Millennium Books, Novato, CA. [13] Ein-Dor, P. (1985). Grosch's law re-revisited. Communications of the ACM 28(2). [14] Hiltzik, M. (1999). Dealers of Lighming." Xerox PARC and the Dawn of the Computer Age, HarperCollins, New York. [15] Hammer, M. and Champy, J. (1993). Reengineering the Corporation." A Manifesto for Business Revolution, HarperCollins, New York. [16] Crosby, P. (1979). QualiO' is Free. The Art o1'"Making QualiO' Certain, McGraw-Hill, New York. [17] Gabor, A. (1990). The Man Who Discovered Quality. How W. Edwards Deming Brought the Quality Revolution to America, Penguin, London. [18] Davenport, T. (1992). Process Innovation. Reengineering Work Through Information Technology, Harvard Business School Press, Cambridge, MA. [19] Hammer, M. (1990). Reengineering work: don't automate, obliterate. Harvard Business Review, July-August. [20] Henderson, J. and Venkatraman, N. (1993). Strategic alignment: leveraging information technology for transforming organizations. IBM Systems Journal 32(1). [21] Luftman, J. (ed.) (1996). Competing in the ln/'ormation Age. Strategic Alignment in Practice, Oxford University Press, Oxford. [22] Crocker, D. (1997). An unaffiliated view of internet commerce, in Readings in Electronic Commerce, R. Kalakota and A. Whinston (eds.), Addison-Wesley, Reading, MA. [23] Berners-Lee, T. (1997). World-wide computer, Communications of the ACM 40(2). [24] Bird, P. (1994). LEO. The First Business Computer, Hasler Publishing, Berkshire, UK. [25] Caminer, D. (ed.) (1997). Leo. The hwredible Story o[ the World~ First Business Computer, McGraw-Hill, New York. [26] Cortada, J. (1997). Economic preconditions that made possible application of commercial computing in the United States, IEEE Annals o[ the Histo~3' o/ Computing 19(3). [27] Cortada, J. (1996). Commercial applications of the digital computer in American corporations 1945-1995. IEEE Annals o[the Histol 3' of Computing 18(2). [28] Tanenbaum, A. (1992). Modern Operating Systems, Prentice Hall, Englewood Cliffs, NJ. [29] Eckert, W. (1940). Punched Card Methods in Scient([ic Computation, Thomas J. Watson Astronomical Computing Bureau, Columbia University, New York. [30] Comrie, L. (1946). Babbage's dream comes true. Nature 158, October. [31] Eckert, J. P. (1976). Thoughts on the history of computing. IEEE Computer, December. [32] McCartney, S. (1999). ENIAC: The Triumphs and Tragedies of the World's First Computer, Walker and Co., New York. [33] Winegrad, D. (1996). Celebrating the birth of modern computing: the fiftieth anniversary of a discovery at the Moore School of Engineering of the University of Pennsylvania. IEEE Annals of the History o[ Comput#lg 18(1) [an introductory article from a special issue on the ENIAC]. [34] Eckert, J. P. (1998). A survey of digital computer memory systems. IEEE Annals of the History of Computing 20(4). Originally published in the Proceedings o[ the IRE, October 1953. [35] Feigenbaum, E. and Feldman, J. (eds.) (1963). Computers and Thought, McGraw-Hill, New York.
ERAS OF BUSINESS COMPUTING
87
[36] Goldstine, H. (1972). The Computer.fi'om Pascal to yon Neumann, Princeton University Press, Princeton, NJ. [37] Aspray, W. (1990). John yon Neumann and the Origins ol'Modern Computing, MIT Press, Cambridge, MA. [38] von Neumann, J. (1993). A first draft of a report on the EDVAC. IEEE Annals of the Histoo, of Computing 15(4). An edited version of the original 1945 report. [39] Rojas, R. (1997). Konrad Zuse's legacy: the architecture of the Z1 and Z3. IEEE Annals of the History of Computing 19(2). [40] Pugh, E. (1995). Building IBM. Shaping and Industry and Its Technology, MIT Press, Cambridge, MA. [41] Pugh, E., Johnson, L. and Palmer, J. (1991). IBM's 360 and Early 370 Systems, MIT Press, Cambridge, MA. [42] Amdahl, G. (1964). The structure of the SYSTEM/360. IBM Systems Journal 3(2). [43] Flynn, M. (1998). Computer engineering 30 years after the IBM Model 91. IEEE Computer, April. [44] Case, R. and Padegs, A. (1978). Architecture of the System/370. Communications of the ACM 21(1). [45] Wolff, M. (1976). The genesis of the IC. IEEE Spectrum, August. [46] Bell, G. (1984). The mini and micro industries. IEEE Computer 17(10). [47] Goldberg, A. (1988). A History of Personal Workstations, ACM Press, New York. [48] Guterl, F. (1984). Design case history: Apple's Macintosh. IEEE Spectrum, December. [49] Hall, M. and Barry, J. (1990). Sunburst." The Ascent qf Sun Microsystems, Contemporary Books, Chicago. [50] Segaller, S. (1998). Nerds 2.0.1." A Brief Histoo" o/the hlternet, TV Books, New York. [51] Patterson, D. (1985). Reduced instruction set computers. Communications of the ACM 28. [52] Diefendorff, K. (1994). History of the PowerPC architecture. Communications of the ACM 3"7(6). [53] Radin, G. (1983). The 801 minicomputer. IBM Journal of Research and Development 2'7(5). [54] Agerwala, T. and Cocke, J. (1987). High performance reduced instruction set processors. IBM Technical Report, March. [55] Hennessy, J. and Jouppi, N. (1991). Computer technology and architecture: an evolving interaction. IEEE Computer, September. [56] Weizer, N. (1981). A history of operating systems, Datamation, January. [57] Stallings, W. (1995). Operating Systems, 2nd Edn, Prentice Hall, Englewood Cliffs, NJ. [58] Brooks, F. (1975). The Mythical Man-Month. Essays on Software Engineering, AddisonWesley, Reading, MA. [59] Cadow, H. (1970). 0S/360 Job Control Language, Prentice Hall, Englewood Cliffs, NJ. [60] Watson, R. (1970). Timesharing System Design Concepts, McGraw-Hill, New York. [61] Corbato, F., Merwin-Daggett, M. and Daley, R. (1962). An experimental time-sharing system, Proceedings of the AFIPS Fall Joint Computer Conference. [62] Corbato, F. and Vyssotsky, V. (1965). Introduction and overview of the MULTICS system, Proceedings off the AFIPS Fall Joint Computer Conference. [63] Corbato, F., Saltzer, J. and Clingen, C. (1972). MULTICS--the first seven years, Proceedings of the AFIPS Spring Joint Computer Conference. [64] Organick, E. (1972). The Multics System, MIT Press, Cambridge, MA. [65] Saltzer, J. (1974). Protection and control of information sharing in MULTICS. Communications of the ACM 17(7). [66] Ritchie, D. and Thompson, K. (1978). The UNIX time-sharing system. Bell System Technical Journal 57(6). This special issue included a series of articles describing aspects of the UNIX time-sharing system.
88
ALAN R. HEVNER AND DONALD J. BERNDT
[67] Ritchie, D. and Thompson, K. (1974). The UNIX time-sharing system. Communications of the ACM 17(7). [68] Bach, M. (1987). The Design of the UNIX Operating System, Prentice Hall, Englewood Cliffs, NJ. [69] Quarterman, J., Silberschatz, A., and Peterson, J. (1985). 4.2BSD and 4.3BSD as examples of the UNIX system. Computing Surveys 17, December. [70] Leffler, S., McKusick, M., Karels, M. and Quarterman, J. (1989). The Design and Implementation of the 4.3BSD UNIX Operating System, Addison Wesley, Reading, MA. [71] Salus, P. (1994). A Quarter Century of UNIX, Addison-Wesley, Reading, MA. [72] Black, D. (1990). Scheduling support for concurrency and parallelism in the Mach operating system, IEEE Computer, May. [73] Tanenbaum, A. (1995). Distributed Operating Systems, Prentice Hall, Englewood Cliffs, NJ. [74] Silberschatz, A., Stonebraker, M., and Ullman, J. (1996). Database systems: achievements and opportunities, SIGMOD Record 25(1). [75] Stallings, W. (1997). Data and Computer Communications, 5th edn, Prentice-Hall, Englewood Cliffs, NJ. [76] O'Neill, J. (1995). The role of ARPA in the development of the ARPANET 1961-1972. IEEE Annals of the History of Computing 17(4). [77] Hafner, K. and Lyon, M. (1996). Where Wizards Stay Up Late. The Origins of the lnternet, Simon & Schuster, New York. [78] Licklider, J. (1965). Man-computer symbiosis, IRE Transactions on Human Factors in Electronics, March. Reprinted in [47]. [79] Baran, P. (1964). On distributed communications networks. IEEE Transactions on Communications Systems, March. [80] Leiner, B., Cerf, V., Clark, D. et al. (1997). The past and future history of the Internet. Communications of the ACM 40(2). A more detailed version of the paper is available at the Internet Society web site (www.isoc.org) in the section on Internet history. [81] Metcalfe, R. and Boggs, D. (1976). Ethernet: distributed packet switching for local computer networks. Communications of the ACM 19(7). [82] Abramson, N. (1985). Development of the ALOHANET. IEEE Transactions on Information Theory 31, March. [83] Cerf, V. and Kahn, R. (1974). A protocol for packet-network intercommunication. IEEE Transactions on Communications, May. [84] Comer, D. (1995). Internetworking with TCP/IP, 3rd edn., Prentice-Hall, Englewood Cliffs, NJ. [85] Wexelblat, R. (ed.) (1981). Histoo' of Programming Languages L Academic Press, Boston, MA. [86] Bergin, T. and Gibson, R. (eds.) (1996). Histoo'of Programming Languages II, ACM Press, Addison-Wesley, Reading, MA. [87] Backus, J. (1979). The history of FORTRAN I, II, and III. IEEE Annals of the History of Computing 1(1). [88] Glass, R. (1999). Cobol: a historic past, a vital future? IEEE Software 16(4). [89] Naur, P. and Randell, B. (1969). Software Engineering." Report on a Conference Sponsored by the NATO Science Committee, Garmisch, Germany, 1968. [90] Buxton, J. and Randell, B. (1970). Software Engineering Techniques. Report on a Conference Sponsored by the NATO Science Committee, Rome, Italy, 1969. [91] Dijkstra, E. (1968). The go-to statement considered harmful. Communications of the ACM 11(3). [92] Mills, H. (1986). Structured programming: retrospect and prospect. IEEE Software 3(6).
ERAS OF BUSINESS COMPUTING
89
[93] Boehm, C. and Jacopini, G. (1966). Flow diagrams. Turing machines, and languages with only two formation rules. Communications of the ACM 9(5). [94] Baker, T. (1972). System quality through structured programming. Proceedings of the AFIPS Conference, Part 1. [95] Stroustrup, B. (1994). The Design and Evolution of C++, Addison-Wesley, Reading, MA. [96] Gosling, J., Yellin, F. and the Java Team (1996). The Java Application Programming Interface, Addison-Wesley, Reading, MA. [97] Gray, J. (1996). Evolution of data management. IEEE Computer 29(10). [98] Codd, E. (1970). A relational model of data for large shared data banks. Communications of the ACM 13(6). [99] Stonebraker, M. (1996). Object-Relational DBMSs. The Next Great Wave, Morgan Kaufmann, San Francisco, CA. [100] Chaudhri, A. and Loomis, M. (eds.) (1998). OBject Databases in Practice, Prentice-Hall PTR, Englewood Cliffs, NJ. [101] Myers, B. (1998). A brief history of human-computer interaction technology. Interactions, March/April. [102] Royce, W. (1970). Managing the development of large software systems: concepts and techniques, Proceedings o1 WESTCON, August. [103] Boehm, B. (1988). A spiral model of software development and enhancement. IEEE Computer, May. [104] Trammell, C., Pleszkoch, M., Linger, R., and Hevner, A. (1996). The Incremental development process in cleanroom software engineering. Decision Support Systems 17(1). [105] Paulk, M. (ed.) (1994). The Capability Maturity Model: Guidelines for hnproving the Software Process, Addison-Wesley, Reading, MA. [106] Stevens, W., Myers, G. and Constantine, L. (1974). Structured design. IBM Systems Journal 13(2). [107] Yourdon, E. (1989). Modern Structured Anah'sis. Yourdon Press/Prentice-Hall, Englewood Cliffs, NJ. [108] Cameron, J. (1938). JSP & JSD. The Jackson Approach to Software Development. IEEE Computer Society Press, Washington DC (1989). [109] Orr, K. (1977). Structured Systems Development, Yourdon Press/Prentice-Hall, Englewood Cliffs, NJ. [110] Martin, J. (1989). Information Engineering. Books 1-3, Prentice-Hall, Englewood Cliffs, NJ. [111] Booch, B., Rumbaugh, J., and Jacobson, I. (1999). The Un(fied Modeling Language User Guide, Addison Wesley Longman, Reading, MA. [112] Wing, J. (1990). A specifier's introduction to formal methods. IEEE Computer 23(9). [113] Luqi and Goguen, J. (1997). Formal methods: promises and problems. IEEE Software
14(1). [114] Linger, R. (1994). Cleanroom process model. IEEE Software 11(2). [115] Gerhart, S., Craigen, D., and Ralston, A. (1993). Observation on industrial practice using formal methods, Proceedings o[ the 15th International Con[Z~rence on Software Engineering, Computer Society Press, Los Alamitos, CA. [116] Sherer, S., Kouchakdjian, A., and Arnold, P. (1996). Experience using cleanroom software engineering. IEEE Software 13(3). [117] Pfleeger, S. and Hatton, L. (1997). Investigating the influence of formal methods. IEEE Computer 30(2). [118] Glass, R. (1999). The realities of software technology payoffs. Communications of the ACM 42(2). [119] Bowen, J. and Hinchey, M. (1994). Formal methods and safety-critical standards. IEEE Computer 27(8).
90
ALAN R. HEVNER AND DONALD J. BERNDT
[120] Brown, A. and Wallnau, K. (1998). The current state of CBSE. IEEE Software 15(5). [121] Butler Group (1998). Component-Based Development. Application Delivery and Integration Using Componentised Software, UK, September. [122] Zachman, J. (1987). A framework for information systems architecture. IBM Systems Journal 26(3). [123] Shaw, M. and Garlan, D. (1996). Software Architecture: Perspectives on an Emerging Discipline, Prentice-Hall, Englewood Cliffs, NJ. [124] Bass, L., Clements, P., and Kazman, R. (1998). Software Architecture in Practice, Addison-Wesley, Reading, MA. [125] Barnard, C. (1938). The Functions o/the Executive, Harvard University Press, Cambridge, MA. [126] Kast, F. and Rosenzweig, J. (1972). General systems theory: applications for organization and management. Academy of Management Journal, December. [127] Gorry, G. and Scott Morton, M. (1971). A framework for management information systems. Sloan Management Review, Fall. [128] Nolan, R. (1979). Managing the crises in data processing. Harvard Business Review, March-April. [129] Benbasat, I., Dexter, A., Drury, D. and Goldstein, R. (1984). A critique of the stage hypothesis: theory and empirical evidence. Communications of the ACM 27(5). [130] Rackoff, N., Wiseman, C. and Ullrich, W. (1985). Information systems for competitive advantage: implementation of a planning process. MIS Quarterly 9(4). [131] Rockart, J. (1979). Chief executives define their own data needs. Harvard Business Review, March/April. [132] Peebles, R. and Manning, E. (1978). System architecture for distributed data management. IEEE Computer 11(1). [133] Scherr, A. (1978). Distributed data processing. IBM Systems Journal 17(4). [134] McFadyen, J. (1976). Systems network architecture: an overview. IBM Systems Journal 15(1). [135] Peters, T. (1987). Thriving on Chaos. Handbook for a Management Revolution, Knopf, New York. [136] Haeckel, S. and Nolan, R. (1996). Managing by wire: using IT to transform a business from make and sell to sense and respond, Chapter 7 in Competing in the Information Age." Strategic Alignment in Practice, ed. J. Luftman, Oxford University Press, Oxford. [137] Barrett, D., Clarke, L., Tarr, P., and Wise, A. (1996). A framework for event-based software integration. A CM Transactions on Software Engineering and Methodology 5(4). [138] Whinston, A. Stahl, D., and Choi, J. (1997). The Economics of Electronic Commerce, Macmillan Technical Publishing, New York. [139] Dertouzos, M. (1997). What Will Be. How the New World of Information Will Change Our Lives, HarperCollins, New York. [140] Pinker, S. (1997). How the Mind Works, W. W. Norton, New York. [141] Gates, B. (1999). Business ~ The Speed of Thought. Using a Digital Nervous System, Warner Books, San Francisco, CA. [142] Moore, J. (1998). Software Engineering Standards. A User's Road Map, IEEE Computer Society, New York.
Numerical Weather Prediction FERDINAND BAER Department of Meteorology University of Maryland College Park, MD 20742 USA baer@atmos, u ud.edu
Abstract Astounding advances in numerical weather prediction have taken place over the last 40 years. Atmospheric models were rather primitive in the early 1960s and were able to yield modest forecasts at best for one day over a very limited domain at only one to three levels in the atmosphere and for only one or two variables. Today reliable forecasts are produced routinely not only over the entire globe, but over many local regions for periods up to 5 days or longer on as many as 80 vertical levels and for a host of variables. This development is based on dramatic improvements in the models used for prediction, including the numerical methods applied to integrate the prediction equations and a better understanding of the dynamics and physics of the system. Most important is the growth of computing power during this era which allowed the models to expand by more than five orders of magnitude, thus significantly reducing errors and increasing the number and range of variables that can be forecast. Concurrently, the processing of data used by models as initial conditions has also benefited from this explosion in computing resources through the development of highly sophisticated and complex methodology to extract the most information from accessible data (both current and archived). In addition, increased communication speeds allow the use of more data for input to the models and for rapid dissemination of forecast products. Numerous regional models have sprung up to provide numerical forecasts of local weather events with a level of detail unheard of in the past, based on the rapid availability of appropriate data and computational resources. New modeling techniques and methods for reducing forecast errors still further are on the horizon and will require a continuation of the present acceleration in computer processing speeds.
1. 2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Finite-Element Method . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Spherical Geodesic Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Time Truncation and the Semi-Lagrange Method . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
91
92 102 107 123 127 131
Copyright 2000 by Academic Press All rights of reproduction in any form reserved.
92
3. 4. 5. 6.
FERDINAND BAER
Data Analysis, Assimilation, and Initialization . . . . . . . . . . . . . . . . . . Regional Prediction Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . EnsemblePrediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
137 144 148 151 153
Introduction
Before the advent of computers, weather prediction was more an art than a science. Forecasters used what little data was available to them to prepare charts (maps) on which they analyzed that data to provide fields of variables, such as temperature or wind, from which they could forecast the evolution of those fields based on intuitive concepts learned subjectively from observations of many such patterns or from heuristic formulae derived from the theory of fluids. In addition, some statistical formulae were available which were derived from limited data archives. In all, although some of these forecasters were brilliant, neither the product nor the prospects were encouraging. The one shining light in this cloudy view was the recommendation of Richardson [1] who clearly saw weather forecasting as a computational problem based on fundamental prediction equations, but did not have the computing resources to prove his point. Then with the advent of computers in the late 1940s, numerical weather prediction began in earnest. Von Neumann selected the weather prediction problem as his primary choice to demonstrate the feasibility of the digital computer as the tool for solving complex nonlinear problems [2]. Richardson's equations were resurrected and simplified to meet the computer's capabilities, and the march to successful forecasts was at last on a road from which it has not yet deviated, nor is there any indication that the quest will cease. Phillips [3] summarized the state of this development by 1960 in a comprehensive article in this series, and since that article will be referenced frequently in this chapter, further numerical reference to it is omitted. References to other work by Phillips will be cited in the usual way. Since 1960, numerical weather prediction knowledge and capability has mushroomed. Terminologically, the weather prediction problem is approached by defining a "model," and solutions to this model are sought. Thus Richardson's equations which described the atmosphere and which he postulated for use in weather prediction served as one of the earliest models, of which there are now countless numbers. One can find a model to predict almost any aspect of the weather. Thus it seems appropriate to summarize briefly how our atmosphere is Constituted and what might be predictable. The Earth's atmosphere is a thin fluid shell composed of a number of gases that are in constant motion. Since the human population lives at the bottom
NUMERICAL WEATHER PREDICTION
93
of this mass and interacts with it constantly, it is hardly surprising that we are interested in its evolution. The description of the atmosphere is given by a number of variables that represent molecular composites of the gases such as velocity, temperature, water content in all three phases, and aerosols. These variables are distributed continuously in three-dimensional space and vary with time. One can either determine the evolution in time of these variables at each point in space (defined as an Eulerian approach) or follow the particles through time (defined as a Lagrangian approach). Although the Eulerian approach has long been favored, recent developments have suggested advantages to the Lagrangian approach. More will be said on this subsequently. The model from which the future state of the variables is assessed is based on known physical and dynamical principles. These principles take the form of mathematical equations, including the equations of motion (also known as the Navier-Stokes equations), an equation for conservation of mass, an equation to determine the change in entropy, equations which determine the changes in water substance in its various phases, and chemical equations for movement and changes of aerosols. Phillips presented some of these equations, and they are repeated here for ease of reference. The additional equations are presented only formally since they have many forms depending on the particular model with which they are associated. Details may be given subsequently if relevant. The equations are written in their Eulerian form, so the time derivative is taken locally at a given point in the fluid. To determine the motion of the fluid, the equation of motion for the vector velocity V in all three space dimensions and relative to the rotating Earth is 0V
1 = -V.
VV-
2Q x V
Vp-
Ot
gk + F
(1.1)
p
where f~ is the angular velocity of the earth, p and p are the density and pressure at an atmospheric point respectively, g is the gravitational acceleration in the k (unit vertical vector) direction, and F comprises all the frictional forces per unit mass. Conservation of mass is represented by an equation of continuity:
Op
-- = -V.
Ot
pV
(1.2)
The thermodynamics of the system are described by changes in entropy:
Os Ot
1 = -V.
Vs +-
T
q(q~,,
ql, qi, aj, ...)
(1.3)
94
FERDINAND BAER
where s denotes specific entropy, q is the rate of heating per unit mass, and T is the temperature. Note that q depends on the heating rates associated with water vapor, qv, ice, q~, liquid water, ql, aerosols, ax-, and other factors depending on the dependent variables, such as radiation. Each of the variables qk and ak will have its own prediction equation,
Oqk Ot
= Qk
(1.4)
where the Qk represent complex formulae relating some or all of the dependent variables. As noted earlier, these representations vary significantly from model to model. This entire system of equations constitutes the "model" that is integrated in time to predict the future state of the fluid. The complications associated with the solution process of this system are enormous and have spun off countless research endeavors. The model clearly needs boundary conditions both at the top of the atmosphere and at the surface where the model interfaces with either the oceans or the landmasses of the Earth. The model needs initial conditions, which come from data that is observed. But is the data sufficient, is it suitable to represent the model variables, or is it even possible to observe some of the dependent variables? The model represents an infinite range of scales from the global to the microscale; the system must therefore be truncated to represent both meaningful scales and realistically computable ones. Indeed, since it is impossible to encompass too wide a range of scales in one model, separate models have been designed to represent particular scale ranges. Extreme examples might be models to describe hurricanes or tornadoes. The forces that drive the models, in addition to the boundary conditions, are represented by the terms that make up the qk and ak, and these functions also vary depending on scale. It should be evident from this discussion that weather prediction raises enormous problems both physically and computationally. Indeed, computational resources may be the limiting factor in our ability to improve predictions. Shuman [4] proposed a tantalizing hypothesis in this regard. He noted that over the years it has been possible to correlate the improvement in forecast skill with enhanced computing power. From this information he speculates on the possibility of predicting the future improvements of forecasting skill on estimated increases in computing power. As an example of how computers have evolved over the period since Phillips' 1960 review, Bengtsson [5] discusses the evolution of forecasting in Europe by comparing forecasts made in Sweden in 1954 with a simple prediction model on a Besk machine to a recent product (1998) from the European Center for Medium-Range Weather Forecasts (ECMWF) using
NUMERICAL WEATHER PREDICTION
95
one of the world's most advanced models on a 16-processor Fujitsu VPP 700. He notes that the number of calculations per unit time during this 45 year period has increased by over five orders of magnitude! Comparable increases in computing capability have been noted by Kalnay et al. [6] for predictions produced by the National Centers for Environmental Prediction (NCEP) of NOAA. In 1960 their model ran on an IBM 7090 with a performance speed of 67 Kflops, whereas in 1994 the model (clearly much more sophisticated) ran on a 16-processor Cray C90 at 15 Gflops. The models and their forecasting skill, as presented by Phillips in 1960, were indeed impressive when one considers that hardly a decade had passed since the process of using models in conjunction with computers had begun. However, by today's standards, those models and the features surrounding them appear primitive. Of course computing power itself was then correspondingly primitive. Consider that the models of the time could reasonably represent only the largest significant horizontal weather scales with forecasts limited to 24 h. The model domain ranged from continental regions to hemispheric and had at most two or three levels in the vertical. Moisture was not considered in great detail. Data management for initial conditions was fragile, the observing network was limited except for the heavily populated regions of the northern hemisphere, and the analysis of this data for input into the models was carried out using methods that were elementary and not carefully tuned to the models' needs. Computations were made only with classic finite difference methods, although the numerical limitations of the techniques were well understood. The changes that have occurred in the science of numerical weather prediction over the past 40 years, when measured against what Phillips reported, are staggering. Enormous progress has been achieved in the application of countless research developments that have been incorporated into models, and these models are now flourishing at numerous prediction centers. Whereas only a few numerical forecast centers existed in 1960 and these were primarily experimental, many countries now have their own numerical forecast center. Many of these national centers produce forecasts for local consumption with regional models, using global forecast data created by larger centers. An example of such a large center is the European Center for Medium-Range Weather Forecasts (ECMWF), a center created jointly by some 19 European nations during the 1970s. The E C M W F is one of the primary centers for numerical weather prediction (NWP) in the world today, and is competitive in producing some of the best global forecasts. Of course the US has kept pace with numerous forecast products on many scales generated by the National Centers for Environmental Prediction (NCEP) as part of the National Weather Service of NOAA. Other large centers that provide valuable global predictions as well as additional
96
FERDINAND BAER
forecast products include the United Kingdom Meteorological Office (UKMO), the Japanese, and the Canadians. Although many of the products delivered by these centers share a common purpose, the tools used in the form of models reflect the enormous advances that have taken place. Thus no two centers use exactly the same model for global predictions, although some of the more basic features in the models, which the community has agreed are well understood, do recur. Those aspects of models that are still subject to research evaluation may appear in various representations in the models of these larger centers. Competition for high quality forecasts is keen among the centers and, although their products are uniformly good, the best forecast on any given day will not always come from the same center. For those centers using regional models to forecast smaller scale events, the differences among these models is still pronounced. Perhaps the most apparent difference amongst models is in the technique selected to convert the basic nonlinear differential equations that describe the forecast system, i.e., (1.1-1.4), to a numerical form suitable for computation and integration on a digital computer. As noted from Phillips' discussion, the method of choice at the time was to apply finite differences in both the time and space dimensions. Since the vertical dimension has unique properties when compared to the horizontal dimensions in the atmosphere, advances in representing the equations in these two domains developed apace. For the horizontal representation, the spherical nature of the domain led to the application of a Galerkin approach known as the spectral method. Considering that at any given height in the atmosphere there exists a closed spherical surface on which the dependent variables describing the fluid are prescribed and predicted, the spectral method assigns a set of continuous orthogonal functions over the domain to represent these variables; only the coefficients of these functions vary depending on time and the particular variable represented. When all the variables are described in this way, the resulting equations are integrated over the global domain, leading to a set of ordinary nonlinear differential equations in time and vertical level. Differentiation in the vertical coordinate has continued to be transformed to finite differences. As computational problems with the method were resolved, most global models adopted some form of this spectral method, and until recently, it has remained the most popular method for solving the forecast equations. With the evolution of models from the basic geostrophicforecast system in which the divergence field is not predicted to the hydrostatic forecast system (primitive equations) where the divergence is predicted (see Phillips for definitions of these systems), the representation of the vertical coordinate became more important. In particular, the inclusion of surface topography
NUMERICAL WEATHER PREDICTION
97
in the models created computational complications that were partly resolved with the introduction of the sigma (c~) coordinate. If pressure is taken as the independent vertical coordinate using hydrostatic considerations, a is the pressure normalized by the surface pressure. Utilization of this coordinate in the vertical has become very popular and successful and additional variants have been explored. Modelers have developed a system whereby the c~ coordinate is used in the lower atmosphere and gradually transforms to pressure with height in the atmosphere. Indeed, some modelers have advocated entropy as a coordinate for the upper atmosphere. Other methods of representing the prediction equations in their horizontal coordinates have also been explored over the years, but have not gained popularity because they may have been too demanding of computers. These include the finite element method, which allows for variable grid sizes over the domain, and spherical geodesic grids, which provide exceptionally homogeneous gridding over the entire spherical surface and use an integration technique to solve the equations. Both methods have seen a resurgence of interest in the last few years with the advent of more suitable computing hardware, i.e., parallel processors. Since the forecast equations represent an initial value problem as well as a boundary value problem, the truncation of the time dimension is equally as important as that of the space dimension. The transform to finite differences in time had been thoroughly studied by Phillips' time and he noted many of its properties. The selection of explicit schemes to solve the atmospheric model equations has remained popular to this day, in particular the threelevel leapfrog scheme. However, with the ascendancy of the primitive equations, high frequency gravity waves were unleashed in the models, and because of the computational stability requirements inherent in explicit methods (note the CFL criterion [7]--see also Phillips for details on computational stability), very short time-steps were required to perform stable integrations. Indeed these computational requirements relative to available computing resources slowed progress in numerical weather prediction for a significant period of time. This impasse led researchers to develop composite schemes where the implicit and explicit methods were combined, and this approach was called the semi-implicit scheme. The nonlinear terms in the equations, which could be shown to propagate with relatively slow frequencies, were integrated using an explicit scheme whereas the linear terms, which propagated with high frequency, were integrated using an implicit scheme. Since the implicit scheme is always stable, the system of prediction equations in this form could be integrated using the larger time-step chosen from the explicitly integrated terms, thereby saving substantial computing time. Needless to say, the results of forecasts using this method proved satisfactory.
98
FERDINAND BAER
As computing resources increased over the years, model complexity grew rapidly. In particular, modelers were not content to predict only the largest scales; with increased resources they decreased the horizontal grid length in the models to improve the predictability at more regional scales. Not only did this increase the computational size of the model, it also required a decrease in the time increment to minimize computational error. Thus, despite the advantages of the semi-implicit scheme, numerical weather predictions continued to tax the limits of computing resources on the largest available machines. In this environment researchers reassessed the advantages of using the Lagrangian approach which had proved disastrous in early studies because particles tended to cumulate in regions of high vorticity [8]. The more recent studies solved this problem by interpolating particles to a fixed grid after each time incremental cycle. Indeed it was found that the method was computationally stable for all time increments (like to the implicit method) yet experiments with state-of-the-art-models indicated that a significantly longer time-step could be used to achieve the same quality of forecast when compared to other schemes. Thanks to this, the method has become extremely popular and is now incorporated to some extent in the working models of most large centers. Concurrent with these developments in numerical processing of prediction models, other features of the models that affect predictive ability have been studied in detail and significant improvements have been incorporated into operational models. These features include the assessment and processing of data for establishing model initial conditions, parameterization of physical processes which act as external forces to alter the model dependent variables in time and space, and the modification of the general prediction equations to suit the unique characteristics of particular spatial scales. These issues are deeply interrelated, yet must be investigated independently. Different forces in the prediction system dominate depending on the scales of interest; note that the principal forces which drive the cloud systems on the hurricane scale will not play a major role on the synoptic scale and vice versa. Additionally, the initial conditions must be closely tuned to the selected forecast scales. The interrelationships will become evident from the ensuing discussion. As the numerical representation of models developed, it became evident that initial conditions also play an equally important role in the quality of predictions. Forecasters realized that it was not sufficient to use simple interpolation to adjust observations as the initial state for integration. Several factors were at play here. First, new observations became available which were not conveniently distributed either in time or space to be simply used, but were exceptionally useful to fill in data gaps when treated appropriately. More importantly, however, was the gradual recognition that
NUMERICAL WEATHER PREDICTION
99
data analysis as input for initial conditions to a model must be uniquely tuned to the model itself and not analyzed by arbitrary and independent formulae. This awareness arose from the realization that no model is a perfect representation of reality but is merely an approximation. Thus data which represents reality perfectly if used in a model which is merely an approximation to that reality may not produce the optimum prediction from that model. Indeed, data that is suitably tuned to a model may well optimize the output of that model when used as initial conditions. As this awareness took hold, intense effort was applied to the data analysis problem starting with what was known as data initialization. The initial data fields for the relevant prediction variables were adjusted to match the equations from which they would be predicted. For example, if the model imposed geostrophic constraints, the initial conditions were adjusted to satisfy these constraints. Of course one could argue that with time the model would ultimately adjust any dataset, but nonlinear systems are notorious for going awry and are not infrequently unstable if they are perturbed slightly out of equilibrium. Moreover, a model is expected to give an optimum forecast rather than act as an analysis tool. But as research evolved, it became clear that a model could do both. Thus a new process for analysis known as variational analysis took hold and is still being refined. The procedure uses the model to adjust input data by forward integration for a finite time, returning to the original time by backward integration with the adjoint of the model. During this process asynchronous data may also be incorporated; indeed statistical data may be applied in regions where there is a severe paucity of data. This development has been incorporated into the models of most large forecast centers and has substantially improved their forecast products. Additionally it has led to the reanalysis of old data, making available long records of data archives that have consistency built into them and are valuable for testing new methodology. At the time Phillips was writing, numerical weather prediction was the exclusive domain of fluid dynamicists. Given the computational resources available, details on the physics and chemistry of the fluid could not have been effectively incorporated into the models. Thus developments in model improvements remained focused on both the numerics and dynamics of the computational system. Although the fundamentals of radiation were understood, lack of observational data inhibited research progress in that discipline for many years. Studies on the microphysics of clouds were in their infancy and progress was slow. It was many years before clouds were considered in models, and at first only the liquid phase of water substance was incorporated. That clouds grew and dissipated through entrainment, that they passed through freezing levels and changed phase, thereby releasing or gaining heat, was well understood but not added to models for
100
FERDINAND BAER
decades. Concurrently, detailed boundary conditions at the surface of the atmosphere such as the oceans and the biosphere were not applied to the models until their need in climate models was demonstrated. Interactive models in which predictions are made of all systems that interface at the boundary and are appropriately coupled are gradually beginning to surface, primarily in climate modeling efforts, but have not yet been sufficiently studied to be systematically utilized in short-term weather prediction. Atmospheric chemistry, the study of chemical species and aerosols which make up the atmosphere and which interact so as to impact on the evolution of the total fluid, is perhaps the most neglected of the physical processes that may play a role in the forecasting of weather. Even now, no large NWP model interactively involves details of the chemistry of the fluid. During recent decades, studies have begun using the results of prediction models to provide winds to move chemical species and aerosols about the fluid as passive components; only recently have efforts been made to consider actual changes in these tracers with time and how those changes might affect the primary predictive variables of the models. Although it is beyond the scope of this review to discuss the developments in our understanding of these physical processes in detail, it is noteworthy that as interest in prediction products expands to the smaller space scales, more emphasis will fall on the chemistry and physics of the fluid. Indeed when forecasts are made on urban scales of tens of kilometers, people will want to know not only that it might rain, but something of the intensity of the rain and its composition; i.e., whether it might contain hail. Furthermore, the concentration of ozone in the air and its persistence will become one of the primary products of a summertime forecast on these scales. Hypothetically, the prediction equations as they are written should describe all horizontal scales from the global to the microscale, but obviously this is not computationally feasible. In selecting scale ranges, which is done by truncating the computational form of the equations to the desired spatial increment, the planetary scales are the easiest to predict both because they require the fewest initial conditions and because they can be integrated with a larger time increment to maintain computational stability, thus using less computing time. Unfortunately, fully global models are best formulated by the primitive equations, which do not create problems at the equator. Thus the benefit of scaling is offset by the complexity of the model construction. Historically one notes that the earliest models were quasi-geostrophic and not global; they were hemispheric or nearly so, depending on computer limitations. As computers became more powerful, the models became global and used the primitive equations. As demand grew for predictions with more resolution, i.e., on smaller spatial scales, the method for providing
NUMERICAL WEATHER PREDICTION
101
regional and smaller scale predictions became a critical issue. Various approaches emerged. One perspective was to use only the basic system of equations and truncate them at the desired scale even if that scale required the interaction in the model of many scales from the planetary to the regional, say 50-100 km or less. This is not difficult to formulate but presents a formidable computational effort, although now within the capacity of the largest computers. Another approach, which has attracted a large following, is to generate a regional model which focuses on the smaller scales, assuming that neither the interactions with the larger scales (those not incorporated in the model) nor their changes during the integration period have much impact on the forecast scales during the prediction. A variant of this approach is to locally embed the regional model in a global model, thus providing time varying boundary conditions from the global model to the regional model during the integration period. A recent development, not yet in use for routine predictions, is the concept of grid stretching. In this application a variable grid is used which has finer resolution over the region requiring more detailed spatial prediction, and the grid is systematically enlarged away from that region. The approach allows for complete interaction of all scales over the entire global domain, but substantially reduces the computational requirement of integrating the model everywhere with the smallest scales. A finite element modeling approach which can accomplish a similar effect but with several regional domains over the globe is currently under development and is also used in ocean modeling. With reference to developments for predicting weather on various scales, there exist currently a variety of models to forecast tornadoes, hurricanes, dust devils, and convective complexes, and there are others in various stages of construction. Most of these models are nonhydrostatic and can be adapted to give predictions over any selected geographic domain. The output from these models is gradually being entrained into the products distributed by the largest forecast centers. Finally, perhaps the most insightful observation on numerical weather prediction since the time of Phillips' presentation was the theoretical limit on predictability enunciated by Lorenz [9]. Using an elegantly simple model of the atmosphere, Lorenz demonstrated by an error propagation method that all forecasting ability must cease within about 2 weeks after the start of an integration, no matter how small the initial error. Numerical weather prediction experts have taken this assessment to heart. In the last few years they have developed an ensemble forecasting approach. Since errors will grow in any forecast and these errors are random, a suite of forecasts is undertaken with this procedure, each of which has a slight perturbation in the initial state. Rather sophisticated methods have been developed to create
102
FERDINAND BAER
these perturbations. The ultimate forecast is a statistical composite taken from this ensemble of forecasts. The method improves on individual numerical forecasts and is gradually being adopted by the large forecast centers. Section 2 presents most of the more popular methods that have been developed and exploited to solve the prediction system on available computing systems. As computational resources have improved, these methods has risen or fallen in the popularity stakes. Each has its own advantages and limitations, but without the thorough study of these methods, the level of accuracy in the forecast products now available from numerical prediction models would not exist. In Section 3 the preparation of data needed to activate the prediction models as initial conditions is assessed and the procedures that have been developed to help ensure accurate model input are enumerated. It is evident that if models begin a forecast with less than high quality input data they are doomed to failure, since initial errors will propagate in time during the calculation. Improvements in data assimilation over the last 40 years, as will be shown, have indeed been astounding. It is unreasonable to expect that all space scales can be accurately predicted with one model. As success with prediction of planetary scales using global models accelerated, some modelers turned their attention to creating regional models to provide more local predictions. This development has been stimulated by the dramatic advances in available computing resources and is discussed in Section 4. Finally, no single model integration can provide a perfect forecast, no matter how accurate the initial state or how good the dynamics and numerics of the model are. This is simply due to the fact that the exact differential equations must be approximated. Numerous model integrations have demonstrated that minuscule perturbations of the initial state lead to identifiable changes in a model's forecast. These differences are known as model variability and apply to all models. Section 5 discusses this issue and suggests that application of recent developments in ensemble forecasting may help to reduce such errors.
2.
Computational Methods
Although the equations describing the atmosphere presented in the previous section are differential equations, their nonlinear nature implies that a numerical representation is essential to their solution. Indeed, Phillips discussed in detail the numerical features of the finite difference method applied to the system, including computational errors inherent in difference
NUMERICAL WEATHER PREDICTION
103
procedures, stability requirements, and solutions to associated numerical boundary value problems. The success with these techniques to date is astounding insofar as the uniqueness of solutions to the difference systems cannot be established. Tradition led early modelers to use finite differencing, but once predictions with these systems became established, certain inherent problems became apparent. It was Phillips [10] himself who brought the problem of nonlinear instability to the attention of the modeling community; it is an issue requiring careful attention lest the evolving solutions become unstable. A cursory view of (1.1) shows that nonlinear terms of the form u Ou/Ox exist and must be calculated. For simplicity let us assume that u is one of the velocity components and is merely a function of one space dimension (x). Since the variables must be represented on a grid of points truncated at some Ax, one could equally well represent u as a Fourier series with the shortest resolvable wave length of 2Ax and corresponding wave numbers m >17flAx. If for simplicity only sine functions are considered, u can be written as, M
u- Z
ui sin mix
(2.1)
i-1
over M grid points. The nonlinear product of any two waves, mi and mj, becomes
Ou Ox
- mj sin mix cos mjx + . . . .
mj[sin(mi 89 + mj)x + s i n ( m / - mj)x] 4 - . . . (2.2)
Consider now that u can be predicted from this term and others as
u(t + A t ) = u(t) + At(uOu/Ot) + . . .
(2.3)
So long as mi + mj ) r . (*A , J
U) ds
It should be apparent how (2.28) can be extended to involve more dependent variables and any number of levels in the vertical. However, if more variables exist in the system, these variables will be coupled nonlinearly through the coefficients F~. Suppose that the series for c~ is truncated at Me as suggested. This implies that all values of f~ for c~> Me vanish. However, on calculating the
116
FERDINAND BAER
nonlinear product F(f~) one could get coefficients Fo for c~ ~ Me once a calculation has begun. Several properties of the barotropic vorticity equation (BVE) exist that make the comparison of the spectral method with the finite difference method convenient. It is seen from (2.28) that no error in the linear phase speeds of waves propagating in the fluid need be incurred using the spectral method since the linear solution can be computed exactly. Indeed, it is a simple matter to transform the variables so that the linear terms do not appear in the equations [23]. However, many studies have shown that this is not true for finite difference equations. The errors that result from linear wave dispersion can have a dramatic effect when they are included in the nonlinear wave interactions, setting up systematic errors as the time integration evolves. Moreover, for variables such as the stream function that converge rapidly, the truncated representation in spectral form converges to the true solution whereas this is not the case for finite difference truncation. As noted earlier, the BVE conserves several second-order moments including kinetic energy, enstrophy, and momentum which are generally not conserved in a truncated system. However, it has been demonstrated by both Platzman [23] and Lorenz [24] that the truncated form of the spectral equations as represented by (2.28), including the truncation of the nonlinear terms as indicated, does maintain these conservation conditions and does not require a special process as is the case for the finite difference equations [13]. This is of course not true for the more general truncated primitive equations, but research with the shallow water equations indicates that the errors are small [25]. An additional and closely connected consequence of the spectral equations is their freedom from nonlinear instability as discussed by (2.2). Since the products with c~ > Me are discarded, they cannot fold back into the domain of c~ < Me to corrupt the coefficients in that range. This allows for stable computations with no requirement for artificial viscosity, regardless of the complexity of the prediction system. The truncation of c~ = Me is somewhat intricate since two real indices are involved, n and m. Because of the structure of the expansion functions (2.21), n/> 0 and n i> I m I whereas - - l l l m a x ~ m ~<mmax. The set of all allowed indices may best be described as the intersections of integers in a grid on an n,m plane; such a plane is depicted in Fig. 2. Although the allowed points fall on an infinite triangle bounded by the lines n = +m, it is sufficient to present only the triangle for m i> 0. All sequential values of n and m beginning at the origin are generally selected to satisfy convergence
117
NUMERICAL WEATHER PREDICTION
2M
....
%
9 .
]
i//
.
~
n@N
.
.
.
.
.
..
.a = n + l m 9
.
.
.
.
.
.
.
9
.
.
.
.
.
.
.
.
/
.
.
.
/,/i.
.
.
.
.
.
.
.
M
N
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
2M
m
FIG. 2. The domain and allowable range of indices m and n for triangular and rhomboidal truncations.
requirements for the dependent variables that they represent, but a relationship between maximum values must be chosen. Two options have become dominant over the years. The first, rhomboidal truncation, has a specified maximum value of mmax--ZM and allows for all values of n ~ < l m l + M for each ]ml~< M. The corresponding figure (this configuration actually describes a parallelogram) is represented in Fig. 2 and the notation is written as, for example, R30 if M = 30. The advantage of this truncation is that each planetary wave m is represented by the same number of expansion coefficients, thereby allowing equal resolution for all waves. However, since the energy of atmospheric flow decreases rapidly with increasing wave number (m), resolution of the shorter waves may not be as important as for the longer waves. This observation leads to triangular truncation, perhaps the most popular form of truncation, in which n ~M, a predetermined integer. Usually N is selected equal to M and this option is described as a triangle in Fig. 2 with the notation given as T30 if N = 30 for
118
FERDINAND BAER
example. In terms of scaling, it would be advantageous to truncate the expansion at a fixed scale, but it appears that in two dimensions there are two scales (in the current case these are the indices m and n). However, the eigenvalues of the expansion functions when operated on by the Laplacian represent in some sense this two-dimensional index, but as seen from (2.24) depend on only the one index n, and do so linearly. Thus truncation at fixed N seems appropriate and efficient [26]. Moreover, Baer [26] has analyzed data archives of atmospheric winds and demonstrated that for decreasing scale, the kinetic energy decreases as a function of the n index alone, essentially independent of the m index. This result is an additional justification for selecting triangular truncation. The ultimate choice for truncation is to optimize the resolution of the model in terms of the number of scales included and to minimize the computing requirements by selecting the least degrees of freedom compatible with resolution. The degrees of freedom here are clearly measured by the total number of indices allowed. If one accepts the argument that n is a measure of scale, triangular truncation seems the obvious choice to meet these conditions. Nevertheless, rhomboidal truncation has been utilized successfully in a number of research studies, but may be less efficient in its use of computing resources. Thus it is not often employed in large forecast models which are run every day. Since all prediction models are computationally intensive, despite the apparent advantages of the spectral method in reducing errors it must compete in the efficient utilization of available computing resources if it is to be selected as the method of choice. It is apparent from (2.28) that most of the computing time required involves the calculation of the coefficients F~ and much effort has gone into optimizing this calculation. The earliest attempts [18,27,28] followed the obvious procedure of substituting the expansion series (2.25) for ~b into (2.26) to represent F(~) and calculating F~ from (2.28). This results in the following relation: i
F~(t) - 2 ~
,3
~
~b3(t)~,(t)I~. j. ~, (2.29)
Is a,-Y- (ca - c~,) ' '
m3 Y3 ___Z' _ m~, Y~ 0# ' ' 0#
Yo dS
The indices ~ and 1/go over the same range as c~ which is determined by the selected truncation and the integration is over the unit sphere. The integrals I~, ~, 7 are called interaction coefficients and have exact solutions which were first presented by Gaunt [29]. Applying (2.29) in (2.28) shows that the time
NUMERICAL WEATHER PREDICTION
1 19
change of any expansion coefficient of the set c~ depends on the coupling of all the coefficients allowed in the spectral domain (see Fig. 2) and each couple is weighted by its own interaction coefficient. Since each index consists of two real numbers, the set of interaction coefficients can be as large as the largest allowed index to the sixth power. In practice, because of the simple addition rules of trigonometric functions, the integration over longitude reduces this by one order, requiring mo = m3 + m r for nonvanishing interaction coefficients. The vector of these coefficients can be stored and need be computed only once. However, the number of multiplications that must be performed at each time-step is daunting, especially as the truncation limit, say N, becomes large. Careful study of these coefficients indicates that the number of non-significant or vanishing values is negligible and thus the calculation of (2.29) cannot be reduced significantly [23]. Although (2.28) is a demonstration for using this technique with the BVE, the more complex system (2.18) can be represented identically by simply increasing the number of expansion coefficients to include additional variables. Early attempts to integrate such a system proved prohibitive on the computers available at the time, and consequently application of the procedure with production forecast models languished at the expense of the finite difference method with which computations are considerably more efficient. An additional limitation of this process, and a shortcoming not yet resolved, concerns the convergence rate of a few dependent variables included in the general set B,,. As noted, the expansion of the flow variable included in the BVE converges rapidly and truncation creates no problem for computation. This is also true for most of the variables represented in the primitive equations such as temperature, density, etc. However, liquid water and its related precipitation do not converge rapidly when expanded in a series of global functions. Indeed, if one considers a grid of points on the sphere with typical separation of a few hundred kilometers, a typical grid interval used for many years in global models, it is quite probable that only one point among a sequence of many may have a nonvanishing value of precipitation. Any attempt to represent such a near-singular function by the expansion (2.25) would lead to a serious Gibbs phenomenon where the shortest scales would contain most of the variance of the function. It is evident from the foregoing discussion on nonlinear products, as represented by the example given in (2.29), that significant truncation errors will ensue with time integration utilizing such functions. Until a physically meaningful procedure is identified and employed to smooth functions that oscillate rapidly between observations, the interaction coefficient method will be seriously compromised when using nonsmooth functions as part of the dependent variable set describing a prediction model.
120
FERDINAND BAER
A breakthrough occurred in 1970 when independently, Orszag [30] and Eliasen et al. [31] developed what is known as the t r a n s f o r m m e t h o d , an alternate procedure for calculating the coefficients F~ but yielding the same results as the interaction coefficient method (or better). Their technique involves the transformation of the integrand in (2.28) onto a special grid and solving the integral by quadrature. If the grid is selected appropriately, the integral will be evaluated exactly and at a great reduction in computing cost. In the longitudinal direction the quadrature is most conveniently done by a trapezoidal formula since it is known that
,ll
e imAi
e imAdA - -
27r
(2.30)
J j=l
The summation is taken over an equally spaced grid of points Aj, and uses twice the number of points as the maximum wave number. Since the functions in latitude are Legendre polynomials, a Gaussian quadrature is preferred. In this case the quadrature is such that
j
,1
K
H(#) d # -
-1
~
Gk(#, K)H(#k)
(2.31)
k=l
and is exact if the polynomial H is of degree 2K - 1 or less. The Gk are called Gaussian weightsmtheir definition may be found in [32] or [16]--and the grid points #k are the roots of the Legendre polynomial PK(#). Thus the appropriate grid for this calculation is all allowed values (Aj, #k) as specified. The range of the grid points is determined by the functions of the integrand in (2.28). The derivatives in F(~)msee (2.26)--must be taken before evaluating the function on the grid. Based on (2.22) and (2.21) the differentiation with A is straightforward, but the #-derivative requires more information. The Legendre polynomials can be shown to satisfy the following differential equation: (1 - #2)1/2
dP,
= boP,_
l - b , + 1 P , +l
(2.32)
where the coefficients b~ are constants and this can be used to evaluate the latitudinal derivatives. Finally, (2.19) can be used to evaluate the Laplacian. Following this procedure, F(~) may be reduced to a quadratic series over the indices (/3, 7) in terms of the complex exponential functions in longitude and the associated Legendre polynomials in latitude, both of which can be evaluated on the specified grid from known formulas. The actual calculation
NUMERICAL WEATHER PREDICTION
121
proceeds as follows. First the quadrature over longitude is taken:
Fro~(#, t)--~---~ F( O(A, It,t))e-im"a dA - - Z
J ./= 1
F( f:(A/,#, t))e-im~'x/ (2.33)
where the sum goes over the value J - - 3 M - 1 if triangular truncation is chosen since the quantity under summation in (2.33) is proportional to ei(m~+ m- -,,,~))~j. The calculation is made over those latitudes # specified from the quadrature over latitude, which is
F~(t)
21
i
Fm (#, t)P~(#) d#
K
Z i Gk(#k, K)Fm.(#k, t)P~(#k)
(2.34)
Since the polynomial under summation in (2.34) is H(#) of (2.31) and is the product of three Legendre polynomials less one order, as each has a maximum order of N, it can easily be shown that K - ( 3 N - 1)/2. An analysis of the computing requirements for (2.33) and (2.34) indicate that the maximum number of calculations is proportional to N 3, which is a remarkable reduction from the N 5 needed by the interaction coefficient method. Indeed, as N increases this disparity suggests that there is a dramatic redundancy in the interaction coefficient method since the final integrals from both methods are identical. This demonstration of the reduction in the need for computational resources with the transform method was the basis for the almost universal acceptance of the spectral method for numerical weather prediction. Additionally, when using the method with a number of variables (the primitive equations, for example) some of which have unacceptable convergence properties over the spherical domain yet contribute to the right-hand side of (2.11), their representation in terms of solid harmonics is not essential. Given their distribution on the transform grid, their input may be included directly into the quadrature formulae. Since all the forcing functions may be summed over the grid before quadrature is completed, any singularities from individual terms will be smoothed out and their effects will be minimized. Indeed, this procedure has proved highly successful. As a historical sidelight, Baer [17] first applied the transform method to the BVE over a channel with constant boundaries at latitudes away from the poles and periodic conditions in longitude. Using a latitude and longitude grid, it was possible to expand the stream function in complex exponential functions in both dimensions. Given the simplicity of the trapezoidal quadrature in both dimensions which was employed, however,
122
FERDINAND BAER
no insights were provided for application to the spherical surface. Since the transform method took hold, many advances have taken place. It was soon discovered that if both velocity components were used in the spectral expansion, and if a nonvanishing wind existed at the poles, a discontinuity existed there and the expansion does not converge. This problem was solved by using the transformation of wind to vorticity and divergence (see 2.5) and converting the prediction equations to those variables [33]. Although this procedure is not recommended for the finite difference method because an additional boundary value problem must be solved, that boundary value problem has a trivial solution in the spectral domain. Most spectral models currently in use employ these transformed variables, but other modifications to the wind components that eliminate the polar singularities have been recommended and applications with them in prediction models have been successful. Perhaps the first successful primitive equation spectral model may be attributed to Bourke [34]. Since that time, most prediction centers have adopted the method. The Canadians and Australians implemented the method in 1976, the National Meteorological Center of NOAA did so in 1980, the French in 1982, and the ECMWF in 1983. As an example of how computing power has evolved, production spectral models at ECMWF have grown in resolution from T63 in 1983 to T213 in 1998, with experiments currently running at T319. Before writing off the interaction coefficient scheme permanently, it is worthwhile to relate its potential application to computing hardware. To understand how the machine time needed to compute a model integration might be reduced, define a computing cycle to be that time required to do once all calculations which are systematically repeated to complete the entire calculation. For conventional marching problems of the type discussed here, the computing cycle is one complete time-step. On a serial machine which handles only one computation at a time, the computing cycle is the total time for that operation and can only be reduced by a faster machine. On a massively parallel processor (MPP), the computing cycle is reduced insofar as many computations can be performed simultaneously provided that no calculations depend on others made during the cycle. The time required to complete this cycle could hopefully be made to converge to the time needed by the machine to perform one computation (the machine cycle) as the number of processors is increased. From this viewpoint, the exploitation of MPPs is clearly desirable. Consider a SIMD parallel processor. This machine was one of the first MPP designs and was comparatively simple and economical, with all its processors performing the same operation. Integrations using such a machine with a very large number of processors could approach the ideal
NUMERICAL WEATHER PREDICTION
123
computing cycle discussed above. With reference to (2.29), note that each quadratic product of expansion coefficients is multiplied by an interaction coefficient (IC). Since the vector of ICs is very large, the computation on a serial processor is extremely time consuming. If the ICs are distributed each to a SIMD processor however, each processor can perform the product of the expansion coefficients times their IC provided only that the two expansion coefficients have been delivered to it. Thus all processors perform the same task. A subsequent sweep and sum over all processors yields F~(t). Given a computer containing enough processors, this step can be made to approach one machine cycle. Under such conditions, the significance of the redundancy inherent in the interaction coefficient method disappears. The process, although simple and straightforward, currently has a drawback arising from limitations in communication to and from the processors. At the end of each cycle, new values must be communicated to the processors so that they can produce a new product. Innovative programming is speeding this activity, but the data distribution currently takes significant computer time. In an unpublished experiment, the writer has performed integrations with the BVE on a CM-200 and a CM-5 at Los Alamos National Laboratories (LANL), using both the IC method and the transform method for comparison. For various truncations the IC method has run as fast as or faster than the transform method, despite the limitations of communication. Additional experiments with a baroclinic model on the CM-5 showed the IC method to be comparable to the transform method in speed. Unfortunately, MPP developments have moved in the direction of M I M D machines, and SIMD machines with a large number of processors (the largest had 64 K) are no longer available. Thus this development must be set aside for the future.
2.2
The Finite-Element Method
The finite element method shares some of its features with the finite difference method and some with the spectral method. Although it is based on a selected grid of points in the space domain, it differs from the finite difference method insofar as the functions to be represented on the grid (the dependent variables Bn) are also specified everywhere on the domain as in the spectral method. This is accomplished by representing the variables at each grid point by a basis function which is defined over the entire domain but is nonvanishing only in the domain to include the neighboring points. Thus the basis functions may be considered local to each grid point, unlike the spectral basis functions which are truly global. A different basis function could be defined for each point but, for simplicity, the basis functions
124
FERDINAND BAER
selected are generally the same for each point. Moreover, they are usually low-order polynomials. For variables which may change with time, the local basis functions are weighted by a time dependent coefficient which varies from point to point. A detailed description of the method may be found in Stang and Fix [35] and applications to the atmospheric prediction problem are given by Temperton [36]. Since each of the dependent variables is represented locally by an amplitude and a basis function, (2.12) applies to the finite-element representation provided that m is any gridpoint, Me is the sum of all gridpoints which may include all three dimensions, and Z m are the basis functions. Noting that the basis functions have zero values at all points except m, the sum for each B,, is uniquely defined everywhere in the domain. To solve the prediction system (2.13), the error (2.14) may again be written. A variety of techniques to solve this equation have been suggested [37], including point co-location where c,, is set to zero at each grid point, and a least squares minimization procedure. However, it has been found that the Galerkin technique is most successful and it is generally used. In this instance the prediction equations become identical to (2.15) using the grid point notation given above. Additionally, as with the spectral method, the test functions are generally selected as the basis functions used in the expansion. To elaborate on the application of the method, it is sufficient to apply it to the BVE (2.26) which contains the basic operations of derivatives and products used in the general prediction system (2.15). Although the basis functions may be selected arbitrarily, linear functions are frequently chosen because of their simplicity, their computational efficiency and ironically, their accuracy. The typical linear functions used are denoted as h a t functions in one dimension and pyramids in two dimensions as may be seen in Fig. 3. The one-dimensional functions are defined as
Z,,,(,X) =
I )~ - )~,,,•
I
+ if Am -< A ,< A,,,+l; - if A,,,_ l -< A -< A,,,; 0 elsewhere
(2.35)
where the intervals between points need not be equal. It is convenient to do functional operations sequentially. Thus for derivatives, if W(A) = 0~(A)/0A, expand both W and Z) in basis functions,
Z
m
W,,,Z,,,(A)- ~
m
~,,,
az,,,(~) OA
(2.36)
NUMERICAL WEATHER PREDICTION
125
t(x) 1
0 0
I
e.--
Ax
-.~
I
m-1
m
m+l
(a)
1
2
6
}3 'v
5
(b)
0.0
v/ 4
FIG. 3. Finite-element basis functions: (a) hat functions in one dimension; (b) pyramids on triangles in two dimensions [37].
and using (2.35) for the derivative, multiply by the test functions (2.35), and integrate over the domain. Applying the Galerkin procedure, this yields
J
Z Wm ZmZkdA 117
~
t~,,,
I oz,,,(A) 0A Zk dA
(2.37)
!t1
Note the formal similarity to the spectral method. If higher-order polynomials are selected for the basis functions, it may be more efficient to solve for the weights by quadrature. Assuming linear functions (2.35), the weights given by the integrals lead to a three-point formula on the left-hand side and a two-point formula on the right. In vector terms, the vector W for the derivatives at the points can be calculated by the inversion of a tridiagonal matrix of weights, which is highly efficient on modern computers. It has been shown that for a uniform grid, this solution is of fourth-order accuracy. Higher-order derivatives can be considered in an identical fashion provided that the basis functions are differentiable. This is clearly not the case for the linear functions defined by (2.35). For second order derivatives, however, as needed by the Laplacian operator (2.26), because the Galerkin technique requires multiplication by the test function before integration,
126
FERDINAND BAER
integration by parts will reduce the order of differentiation by 1 and allow the use of linear basis functions. Thus for example
Z IV., Z,,,ZkdA-Z~,,, DI
J
I 0 2Z,,, Zk dA OZk
I1 l
-- Z ~b,,, ,,,
J OZmOZk0A + const 0A
(2.38)
0A
and the solution for W involves the inversion of only a tridiagonal matrix if linear basis functions are used. Unfortunately, the accuracy here is only of second order, similar to the finite difference method. The final operation involved in the solution of (2.26) is multiplication. The terms in the BVE which have this form come from the Jacobian F(~b). Assuming that the derivatives have been calculated such that W1 = &b/0A and W2-= 0V2~b/0# and only variations in A are considered, W = W1W2 must be evaluated as part of the Jacobian. Expanding in the basis functions of (2.35) and using the Galerkin technique, the solution for the vector elements of W on the grid is
Z W,,,JZ,,,ZkdA-Z Z W,,m,WZ,,,2Jz,,,,Zm2Zkd,~(2.39) rll
!111
t112
Note again the formal similarity to the interaction coefficient method. However, whereas most interaction coefficients are nonzero, only a few integrals on the right-hand side of (2.39) are non-zero. With the definition of these operations utilizing finite elements, the BVE may be readily solved provided that a suitable time extrapolation procedure is defined. By analogy, the more complex system of primitive equations may be solved using this method. The general solution again takes the form of (2.11) although, as noted from this foregoing discussion, the matrices are defined differently. As noted, the method allows for arbitrary shapes and sizes of grid domains which may be of value for special problems. However, from a computational point of view, Staniforth [38] has demonstrated that selecting a rectangular grid in two dimensions, as compared for example to a triangular grid, has significant computational advantages. This arises from the fact that the basis functions can be written as product functions in the individual dimensions and the resulting integrations can be performed in one dimension at a time, thereby reducing the computational demands dramatically. The principal advantage of this method is that it allows for an arbitrary distribution of points to accommodate for both nonuniform lateral
NUMERICALWEATHER PREDICTION
127
boundaries and variable grid spacing over the domain. If one elects to model the atmosphere with a vertical coordinate which does not provide a continuous horizontal surface at each level, it has been noted that the spectral method is not suitable. This can come about because near the surface of the Earth the mountains create a barrier. The finite-element method is ideal for such surfaces. Indeed to predict the evolution of the oceans, the domain must have boundaries at the continental coastlines and the finite-element method has been noted as highly suitable for application in such models. But perhaps the most promising use of this method is for generating high-resolution regional forecasts embedded in a lower-resolution global domain. Since this is an area of intense development in current numerical weather prediction, its discussion is deferred to a later section. The method also shows promise as a tool to represent the vertical coordinate in models, although the finite difference method is still most frequently used there. Despite numerous studies with this method and demonstrations indicating its advantages, it has not yet achieved any measure of popularity. However, as will be discussed later, this may change.
2.3
Spherical Geodesic Grids
Various other methods have been tested to solve the atmospheric prediction problem, stimulated by the apparent shortcomings of the more popular methods. The method using spherical geodesic grids was formulated in the late 1960s [39, 40] in an effort to use as uniform a grid as possible over a spherical surface. Finite difference models gradually drifted to using a latitude-longitude grid following the period discussed by Phillips, because that representation avoided the need to transform from Earth-based coordinates to projection coordinates, a process which both created additional computational errors and consumed valuable computer time. However it is evident that in latitude-longitude coordinates the elementary spatial dimension in longitude (distance between neighboring grid points) shrinks significantly as one approaches the pole. This causes dramatic variability in the spatial dimensions of grid boxes from the pole to the equator and sets up the well-known polar problem, where waves in the polar region can propagate with higher frequency than those in lower latitudes a n d r e q u i r e a shorter model integration time-step to avoid linear instability. Using shorter time-steps to solve this problem wastes valuable computer time but does not provide an improved solution since errors from space truncation significantly overshadow those from time truncation [41]. In addition, because it is much more computer cost efficient to predict velocity components in the primitive equation model using finite difference techniques, the polar singularity problem remains. The polar
128
FERDINAND BAER
problem does not occur in the spectral method, although the spatial dimension also shrinks rapidly on approaching the pole from the equator. The polar singularity arising from using velocity components is easily solved in the spectral method by using vorticity and divergence with no significant computational cost. However, the problems arising from the representation of poorly converging functions such as liquid water and topography when using the spectral method, and sometimes denoted as spectral ringing, have no obvious solution and can create difficulties with predictions. The spherical geodesic grids approach uniformity of grid elements closely over the entire domain and models using the representation can be formulated to maintain conservation of most integral constraints. Moreover, the method allows for model representation in true scalars (vorticity and divergence) thereby avoiding polar singularities, and very efficient solvers can be employed to minimize the cost of these additionally required computations. The grid is set up in a straightforward manner. A regular icosahedron is selected and is inscribed in a sphere, the surface of which represents a horizontal surface in the atmosphere; normalizing to the unit sphere is customary. The figure has twelve vertices and their connection yields 20 equilateral triangles or faces (see Fig. 4a). Note that each vertex is surrounded by five neighbor vertices creating a pentagonal element or grid box. The connecting lines may be projected onto the spherical surface, creating spherical triangles. To expand the number of elements, each connecting segment between vertices is subdivided and all points created by the subdivisions are connected forming a new mesh with many more triangles. These new points are projected onto the unit sphere unless spherical triangles are used. Each point in this new mesh except for the original points of the icosahedron are now surrounded by six neighboring points, thus forming hexagonal elements and the result is often denoted as an icosahedral-hexagonal grid (see Fig. 4b). A systematic way to increase the grid size is to sequentially bisect all lines connecting existing points, and drawing connecting lines between all the new points. This almost quadruples the original number of triangles and resulting grid elements. The process may be continued until the desired number of grid elements is created. Figure 4c shows an expanded mesh. As an example, if the division process is carried out six times, the number of elements become 40 962 and the mean distance between elements (the distance between closest neighbor points) is on average 120 km. The distances between grid elements are not uniform over the surface nor are the element areas, but the variations are not extreme and are much more uniform than the latitude-longitude grid. More details may be found in Ringler et al. [42] and Thuburn [43].
NUMERICAL WEATHER PREDICTION
129
FIG. 4. Icosahedral grids: (a) a regular icosahedron with 12 vertices: (b) the regular icosahedron with first subdivision: (c) the regular icosahedron with further subdivisions [46].
The first atmospheric model to be solved on this grid was the BVE (2.7) by Williamson [40] and Sadourny et al. [41]. Following these experiments, subsequent efforts were made with the shallow water equations model (SWE). This system, which represents a homogeneous fluid with a free surface, allows for divergence as well as rotational flow and thus contains many of the properties of the most general model represented by the primitive equations. It is thus a convenient and illustrative model to use for a demonstration of the method applied to solve prediction equations on the spherical geodesic grid. In the notation of (2.4)-(2.7), the SWE may be represented by tendency equations for absolute vorticity, divergence, and the depth of the homogeneous fluid denoted by h.
Orl --
Ot
= - J ( ~ , q) - V .
(qVX)
Oh --= Ot
- J ( ~ , h) - V - ( h V x )
06 --= Ot
- J ( x , r/) + V . (r/Vz)) - V2(h + K)
(2.40)
where the kinetic energy K - 1 / 2 V~. To predict this system, the three dependent variables must be evaluated at each cell center on the grid and the right-hand sides of (2.40) must be evaluated. Some procedure for calculating the operators J, V. and V 2 - V . V on the grid therefore needs to be established. One first notes that by virtue of the definitions of these
130
FERDINAND BAER
operators, integration of the terms involving them over a closed spherical surface vanishes. Thus, the numerical procedure selected to represent these operators should, if possible, preserve this conservative feature. Additional conservative quantities for this predictive system may be defined and their numerical representation should also be conserved. For example, the prediction equation for potential vorticity, which for the SWE model is simply Q--rl/h, as well as for kinetic energy is
OQ
= - J ( f J , Q ) - V . (QVx)
Ot OE Ot
(2.41) = -J(g,, E) - 27. (Ey7x)
and the operators are the same as those for the basic dependent variables (2.40). Consider integration over an individual hexagonal (or pentagonal) grid box. The sum of all such integrals will then satisfy the conservation condition over the closed spherical surface. The integration of all terms on the right-hand sides of (2.40) and (2.41) over the surface of the grid box can be converted to line integrals along the perimeter of the box as follows:
I
J( g;, q) dA -
V. r/Vx dA -
}
fJ m Old l
(2.42) r/On dl
where A represents the area of the grid box, l is the box's bounding curve, O1 is an elementary distance along this curve, and On is normal to the curve. It is evident that the Laplacian operator can be calculated from the second equation of (2.42) by simply setting r/to unity. A variety of computational schemes have been devised to calculate the line integrals numerically over the grid box and, for simplicity of computation, most techniques are based on linear interpolation along appropriate lines. Given a hexagonal box for example, the line integral is approximated by a sum over all the six lines bounding the box. Values needed along the lines are interpolated from neighboring box values and derivatives are approximated by linear finite differences generally with second order accuracy. Since all values of the required dependent variables are available at any given time-step, the calculations are straightforward. For details on this process, see [40-44]. Following the application of the method to the BVE by Williamson and Sadourny et al., subsequent successful experiments involving the SWE were
NUMERICAL WEATHER PREDICTION
131
undertaken by Masuda and Ohnishi [44] and Williamson [45]. Thuburn [43] and Heikes and Randall [46] had the advantage of performing integrations on the SWE test suite (a set of different and selectively chosen initial conditions) which was provided to the community by Williamson et al. [47], and allowed direct comparisons with the same model but using the spectral method and the finite-difference method. In summary, both of these experiments suggested that the method yields results comparable to those from the finite difference integrations but somewhat inferior to those from the spectral method. It should be emphasized that the model (SWE) is driven purely by dynamics and does not include external forces that might induce spectral ringing. Finally, Ringler et al. [42] presented a fully three-dimensional model of the primitive equations using a twisted icosahedral grid in the horizontal spherical surfaces and the conventional a-coordinate in the vertical. The modification to the conventional icosahedral grid described above allows for much more uniformity of area amongst the grid boxes, especially at high resolution, although the variability of box geometry remains. The model without most external forcing--now known as the dynamical core--was tested using simple Newtonian forcing and compared satisfactorily to results from the same model run using the spectral method. Indeed, for highresolution integrations, the computer time required by the model was also competitive with the spectral method. Given these favorable results from a technique that has languished for many years, it may yet emerge as a viable competitor to the currently preferred methods.
2.4
Time Truncation and the Semi-Lagrange Method
As noted from the foregoing discussion in this section, considerable effort has been expended to find more effective methods to solve the space truncation aspect of the prediction system, but little consideration has been given to the time truncation part of the solution. It is of course necessary to meet any stability conditions that arise, and for explicit time integration schemes that limitation is known as the C F L condition (see Phillips). This distribution of effort has been justified by Robert [41], who demonstrated by linear analysis that selecting time and space increments which satisfied the stability criterion for explicit time integrations yielded solutions in which the space truncation errors were two orders of magnitude larger than the time truncation errors. Since no such condition arises for implicit schemes, it has been suggested that terms inducing high-frequency modes such as gravity waves should be integrated with an implicit time scheme whereas the explicit scheme could be used for the advective terms which give rise to the slower moving Rossby waves. Implicit schemes,
132
FERDINAND BAER
although stable, can yield significant amplitude and phase errors; thus they should be applied only to terms whose amplitude is small relative to other terms in the prediction system. As gravity waves tend to have small amplitude in the atmosphere, this splitting of integration techniques is effective when the implicit method is confined to the adjustment terms, those terms giving rise to gravity wave motions. These terms generally appear linearly in the system of prediction equations. Given the CFL constraint on the advective terms, there is a dramatic difference in the errors imposed by the spatial differencing and this suggests that a method might be found which would allow for a substantial increase in time-step with no significant loss of prediction quality provided that the calculation remains stable. If Robert's estimate is realistic, computation speeds for fixed prediction periods could be enhanced dramatically. The semiLagrange method may satisfy this condition, although this feature of the method was not of primary concern to its pioneering investigators. In the Lagrangian integration scheme, particles of fluid identified at a distribution of points at some initial time are advected with their velocity to new locations, and this new position of each particle and its properties are recorded in time. Note the contrast to the Eulerian scheme wherein particles move past fixed points in the fluid and values at those points are recorded at regular intervals (time-steps). Because particles tend to convergence, as noted by Welander [8], the pure application of the method has severe limitations for longer time integrations. In application of the method to the BVE, Wiin-Nielsen [48] found it necessary to select a new set of particles to advect periodically during his integration period to avoid the distortion that developed from the particle trajectories. Also working with the BVE and the semi-Lagrange method, Sawyer [49] made an adjustment at every time-step. After selecting a uniform grid over the domain of interest initially, he advected all the particles for one time-step and then interpolated their values to the surrounding grid points, weighting by the distance from those points and the number of particles involved. With these new values at the original grid, he repeated the procedure until the integration was concluded for the required prediction time. He noted not only that the results were comparable to those of a corresponding Eulerian integration, but he was also able to use a longer time-step. On the basis of these promising results, further experiments with the method accelerated during the following decades. An additional stimulus was the ever-increasing complexity of the prediction models with its concomitant need for more computational resources. By 1981 when Robert [50], using the SWE, successfully demonstrated the time-saving advantages of splitting the advective from adjustment terms, integrating the advective terms with the semi-Lagrange method and the adjustment terms with the semi-implicit method, the stage was set for substantial advancement and
NUMERICAL WEATHER PREDICTION
133
application of the technique. The research community had by this time converged on a preferred process to be used for semi-Lagrange integration. Once a grid of points is selected to define the positions of the initial particles together with their properties, those points became the arrival points for particles at future time-steps. In this way the grid never changes although the particles at the points at any time-step are different from the original particles at those points. Moreover, it is necessary to find a departure point for each arrival point at each time-step. The particle which arrives at a given grid point at time t is advected with its velocity from a position at the previous time-step t - At, a position which generally does not coincide with a grid point. Several approximations are thus needed. The advecting velocity of the particle from the departure point must be determined. It may be considered constant during the interval At or it may change during the time interval. Given this velocity and the grid and time increments, any departure point may be found and the values associated with the particle there may be determined by interpolation from neighboring grid points. A simple example will demonstrate this process. Consider the BVE as given in (2.7) which states that the absolute vorticity q is conserved following a particle moving in time on a trajectory through the fluid. If one remains on the particle trajectory, clearly the absolute vorticity remains the same. For simplicity assume that the particle is moving in only one dimension (s) and the velocity propelling it is U--constant. Let the spatial domain be represented by finite differences with the basic grid interval of As. Then it is evident that if a particle arrives at a grid point j A s at time t + At, it must come from the point [ j - (r + 6)]As at time t since the advecting velocity is constant and the arrival value q (jAs, t + At) = q,t([j- (r + (5)]As, t), the departure value. This is demonstrated by the trajectory (often called a characteristic) in Fig. 5. The value of r + (5 = U A t / A s where r is an integer and 0 ~0?). More recently, the MTD(f) algorithm uses only minimal windows to determine the value of the root. This algorithm has been shown to be superior to a l p h a - b e t a in terms of number of nodes expanded in the search tree [12, 13].
THE GAMES COMPUTERS (AND PEOPLE) PLAY
199
2. 1.4 Search Depth Although alpha-beta is usually described as a fixed-depth search, better performance can be achieved using a variable search depth. The search can be compared to a stock portfolio: don't treat all stocks as being equal. You should invest in those that have the most promise, and reduce or eliminate your holdings in those that look like losers. The same philosophy holds true in search trees. If there is some hint in the search that a sequence of moves looks promising, then it may be a good idea to extend the search along that line to get more information. Similarly, moves that appear to be bad should have their search effort reduced. There are a number of ways that one can dynamically adjust the depth to maximize the amount of information gathered by the search. Most alpha-beta-based programs have a number of applicationdependent techniques for altering the search depth. For example, chess programs usually extend checking moves an additional ply since these moves indicate that something interesting is happening. Most programs have a "hopeless" metric for reducing the search depth. For example, in chess if one side has lost too much (e.g., a queen and a rook), it is very unlikely this sub-tree will eventually end up as part of the principal variation. Hence, the search depth may be reduced. There are a number of techniques that may be useful for a variety of domains. In chess, null-move searches have been very effective at curtailing analysis of poor lines of play. The idea is that if one side is given two moves in a row and still can't achieve anything, then this line of play is likely bad. Hence, the search depth is reduced. This idea can be applied recursively throughout the search [22, 23]. Another important idea is ProbCut [24]. Here the result of a shallow search is used as a predictor of whether the deeper search would produce a value that is relevant to the search window. Statistical analysis of the program's searches is used to find a correlation between the values of a shallow and deep search. If the shallow search result indicates that the deeper search will not produce a value that is large enough to affect the node's value, then further effort is stopped. Although both the null-move and ProbCut heuristics purport to be application independent, in fact they both rely on game-specific properties. Null-move cut-offs are only effective if the consequences of giving a side two moves in a row is serious. This causes problems, for example, in checkers where giving a player an extra move may allow them to escape from a position where having only one move loses (these are known as zugzwang positions). ProbCut depends on there being a strong correlation between the values of shallow and deep searches. For games with low variance in the leaf
200
JONATHAN SCHAEFFER
node values, this works well. If there is high variance, then the evaluation function must be improved to reduce the variance. In chess programs, for example, the variance is generally too high for ProbCut to be effective. The most common form of search extension is the quiescence search. It is easier to get a reliable evaluation of a leaf position if that position is quiet or stable (quiescent). Hence, a small search is done to resolve immediate capture moves or threats [11]. Since these position features are discovered by search, this reduces the amount of explicit application-dependent knowledge required in the evaluation function. A search-extension idea that has attracted a lot of attention is singular extensions [25]. The search attempts to identify forced (or singular) moves. This can be achieved by manipulating the search window to see if the best move is significantly better than the second-best move. When a singular move is found, then the search along that line of play is extended an additional ply (or more). The idea is that forcing moves indicate an interesting property of the position that needs to be explored further. In addition, various other extensions are commonly used, most based on extending the search to resolve the consequences of a threat [26, 27].
2. 1.5
Close to Perfection?
Numerous studies have attempted to quantify the benefits of alpha-beta enhancements in fixed-depth searches (for example, [10, 16]). Move ordering and the transposition table usually make the biggest performance difference, with other enhancements generally being much smaller in their impact. The size of trees built by game-playing programs appears to be close to that of the minimal alpha-beta tree. For example, in chess, Belle is reported to be within a factor of 2.2 [28], Phoenix within 1.4 [28], Hitech within 1.5 [28] and Zugzwang within 1.2 [29]. These results suggest that there is little room for improvement in fixed-depth alpha-beta searching. The above comparisons have been done against the approximate minimal search tree. However, finding the real minimal tree is difficult, since the search tree is really a search graph. The real minimal search should exploit this property by: 9 selecting the move that builds the smallest tree to produce a cut-off, and 9 preferring moves that maximize the benefits of the transposition table (i.e., re-using results as much as possible). Naturally, these objectives can conflict. In contrast to the above impressive numbers, results suggested that chess programs are off by a factor of 3 or
THE GAMES COMPUTERS (AND PEOPLE) PLAY
201
more from the real minimal search graph [12, 13]. Thus, there is still room for improvements in alpha-beta search efficiency. Nevertheless, given the exponential nature of alpha-beta, that programs can search within a small constant of optimal is truly impressive. Forty years of research into alpha-beta have resulted in a recipe for a finely tuned, highly efficient search algorithm. Program designers have a rich set of search enhancements at their disposal. The right combination is application dependent and a matter of taste. Although building an efficient searcher is well understood, deciding where to concentrate the search effort is not. It remains a challenge to identify ways to selectively extend or reduce the depth in such a way as to maximize the quality of the search result.
2. 1.6 Alternative Approaches Since its discovery, alpha-beta has been the mainstay of computer games development. Over the years, a number of interesting alternatives to alphabeta-based searching have been proposed. Berliner's B* algorithm attempts to prove the best move, without necessarily determining the best move's value [30, 31]. In its simplest form, B* assigns an optimistic (upper bound) and a pessimistic (lower bound) value to each leaf node. These values are recursively backed up the tree. The search continues until there is one move at the root whose pessimistic value is as good as all the alternative move's optimistic values. In effect, this is a proof that the best move (but not necessarily its value) has been found. There are several drawbacks with B*, most notably the non-standard method for evaluating positions. It is difficult to devise reliable optimistic and pessimistic evaluation functions. B* has been refined so that the evaluations are now probability distributions. However, the resulting algorithm is complex and needs considerable application tuning. It has been used in the Hitech chess program, but even there the performance of alpha-beta is superior [31]. McAllester's conspiracy numbers algorithm tries to exploit properties of the search tree [32]. The algorithm records the minimal number of leaf nodes in a search tree that must change their value (or conspire) to change the value of the root of the tree. Consider a Max node having a value of 10. To raise this value to, say, 20, only one of the children has to have its value become 20. To lower the value to, say, 0, all children with a value greater than 0 must have their value lowered. Conspiracy numbers works by recursively backing up the tree the minimum numbers of nodes that must change their value to cause the search tree to become a particular value. The algorithm terminates when the effort required to change the value at the root of the search (i.e., conspiracy number) exceeds a predefined threshold.
202
JONATHAN SCHAEFFER
Conspiracy numbers caused quite a stir in the research community because of the innovative aspect of measuring resistance to change in the search. Considerable effort has been devoted to understanding and improving the algorithm. Unfortunately it has a lot of overhead (for example: slow convergence, cost of updating the conspiracy numbers, maintaining the search tree in memory) which has been an impediment to its usage in high-performance programs. A variation on the original conspiracy numbers algorithm has been successfully used in the Ulysses chess program [33]. There are other innovative alternatives to alpha-beta, each of which is worthy of study. These include BPIP [34], min/max approximation [35], and meta-greedy algorithms [36]. Although all these alpha-beta alternatives have many desirable properties, none of them is a serious challenger to alpha-beta's dominance. The conceptual simplicity of the alpha-beta framework makes it relatively easy to code and highly efficient at execution time. The alpha-beta alternatives are much harder to code, the algorithms are not as well understood, and there is generally a large execution overhead. Perhaps if the research community devoted as much effort to understanding these algorithms as they did in understanding alpha-beta, we would see a new algorithm come to the fore. Until that happens, alpha-beta will continue to dominate as the search algorithm of choice for two-player perfect information games.
2. 1.7
Conclusions
Research on understanding the alpha-beta algorithm has dominated games research since its discovery in the early 1960s. This process was accelerated by the discovery of the strong correlation of program performance with alpha-beta search depth [37]. This gave a simple formula for success: build a fast search engine. This led to the building of specialpurpose chips for chess [38] and massively parallel alpha-beta searchers [29]. Search alone is not the answer. Additional search eventually leads to diminishing returns in the benefits achievable [39]. Eventually, there comes the point where the most significant performance gains are to be had by identifying and implementing missing pieces of application knowledge. This was evident, for example, in the 1999 world computer chess championship, where the deep-searching, large multiprocessor programs finished behind the shallower-searching, PC-based programs that used more chess knowledge. For many popular games, such as chess, checkers, and Othello, alphabeta has been sufficient to achieve world-class play. Hence, there was no need to look for alternatives. For artificial-intelligence purists, this is an
THE GAMES COMPUTERS (AND PEOPLE) PLAY
203
unsatisfactory result. By relying on so-called brute-force searching, these programs can minimize their dependence on knowledge. However, for other games, most notably Go, search-intensive solutions will not be effective. Radically different approaches are needed.
2.2
Advances in Knowledge
Ideally, no knowledge other than the rules of the game should be needed to build a strong game-playing program. Unfortunately, for interesting games it is usually too deep to search to find the game-theoretic value of a position. Hence knowledge for differentiating favorable from unfavorable positions has to be added to the program. Nevertheless, there are some cases where the program can learn position values without using heuristic knowledge. The first example is the transposition table. This is a form of rote learning. By saving information and reusing it, the program is learning, allowing it to eliminate nodes from the search without searching. Although the table is usually thought of as something local to an individual search, "important" entries can be saved to disk and used for subsequent searches. For example, by saving some transposition table results from a game, they may be used in the next game to avoid repeating the same mistake [40, 41]. A second example is endgame databases. Some games can be solved from the end of the game backwards. One can enumerate all positions with one piece on the board, and record which positions are wins, losses, and draws. These results can be backed up to compute all positions with two pieces on the board, and so on. The result is an endgame database containing perfect information. For chess, most of the five-piece and a few six-piece endgames have been computed [6]. This is of limited value, since most games are over before such a simplified position is reached. In checkers, all eight-piece endgames have been computed [42]. The databases play a role in the search of the first move of a game! Endgame databases have been used to solve the game of Nine Men's Morris [43]. A third form of knowledge comes from the human literature. Most games have an extensive literature on the best opening moves of the game. This information can be collected in an opening book and made available to the program. The book can either be used to select the program's move, or as advice to bias the program's opening move selection process. Many programs modify the opening book to tailor the moves in it to the style of the program. When pre-computed or human knowledge is not available, then the gameplaying program must fall back on its evaluation function. The function assigns scores to positions that are a heuristic assessment of the likelihood of
204
JONATHAN SCHAEFFER
winning (or losing) from the given position. Application-dependent knowledge and heuristics are usually applied to a position to score features that are indicators of the true value of the position. The program implementor (usually in consultation with a domain expert) will identify a set of features ( f ) that can be used to assess the position. Each feature is given a weight (w) that reflects how important that feature is in relation to the others in determining the overall assessment. Most programs use a linear combination of this information to arrive at a position value:
value - ~
wi x f .
(1)
i--1
where n is the number of features. For example, in chess two features that are correlated with success are the material balance and pawn structure (fl and f2). Material balance is usually much more important than pawn structure, and hence has a much higher weighting (~'l >> w2). Identifying which features might be correlated with the final result of the game is still largely done by hand. It is a complex process that is not well understood. Usually the features come from human experience. However, human concepts are often vague and hard to define algorithmically. Even well-defined concepts may be impractical because of the computational overhead. One could apply considerable knowledge in the assessment process, but this increases the cost of performing an evaluation. The more expensive the evaluation function is to compute, the smaller the search tree that can be explored in a fixed amount of time. Thus, each piece of knowledge has to be evaluated on what it contributes to the accuracy of the overall evaluation, and the cost (both programmer time and execution time) of having it. Most evaluation functions are carefully tuned by hand. The knowledge has been judiciously added, taking into account the expected benefits and the cost of computing the knowledge. Hence, most of the knowledge that is used is of a general-purpose nature. Unfortunately, it is the exceptions to the knowledge that cause the most performance problems. As chess grandmaster Kevin Spraggett said [42]: I spent the first half of my career learning the principles for playing strong chess and the second half learning when to violate them. Most game-playing program's evaluation functions attempt to capture the first half of Spraggett's experience. Implementing the second half is often too difficult and computationally time consuming, and generally has a small payoff (except perhaps at the highest levels of play). Important progress has been made in setting the weights automatically. Although this seems as if it should be much easier than building an
THE GAMES COMPUTERS (AND PEOPLE) PLAY
205
evaluation function, in reality it is a laborious process when done by hand. Automating this process would result in a huge reduction in the effort required to build a high-performance game-playing program. Temporal difference learning has come to the fore as a major advance in weighting evaluation function features. Samuel pioneered the idea [3, 4], but it only became recognized as a valuable learning algorithm after Sutton extended and formalized this work [44]. Temporal difference learning is at the heart of Tesauro's world-championship-caliber backgammon program (see Section 3.1), and has shown promising results in chess (discussed later in this section). Temporal difference learning (TDL) is a reinforcement learning algorithm. The learner has an input state, produces an output action, and later receives feedback (commonly called the reward) on how well its action performed. For example, a chess game consists of a series of input states (positions) and actions (the move to play). At the end of the game, the reward is known: win, loss, or draw. In between the start and the end of the game, a program will use a function to map the inputs onto the outputs (decide on its next move). This function is a predictor of the future, since it is attempting to maximize its expected outcome (make a move that leads to a win). The goal in reinforcement learning is to propagate the reward information back along the game's move sequence to improve the quality of actions (moves) made. This is accomplished by attributing the credit (or blame) to the outputs that led to the final reward. By doing so, the learner's evaluation function will change, hopefully in such a way as to be a better predictor of the final reward. To achieve the large-scale goal of matching inputs to the result of the game, TDL focuses on the smaller goal of modifying the learner so that the current prediction is a better approximation of the next prediction [44, 45]. Consider a series of predictions P1, P2, ..., Px on the outcome of a game. These could be the program's assessment of the likelihood of winning from move to move. In chess, the initial position of a game, P1, has a value that is likely close to 0. For a win PN 1 while a loss would have Px = - 1 . For the moves in between, the assessments will vary. If the likelihood of winning for position t (Pt) is less (more) than that of position t + 1 (Pt+ 1), then we would like to increase (decrease) the value of position t to be a better predictor of the value of t + 1. The idea behind temporal difference learning is to adjust the evaluation based on the incremental differences in the assessments. Thus, -
-
A = Pt
+1
--
Pr
measures that difference between the prediction for move t + 1 and that for move t. This adjustment can be done by modifying the weights of the evaluation function to reduce the A from move to move.
206
JONATHAN SCHAEFFER
T e m p o r a l difference learning is usually described with a variable weighting of recency. R a t h e r than considering only the previous move, one can consider all previous moves with n o n - u n i f o r m weights (usually exponential). These moves should not all be given the same i m p o r t a n c e in the decision-making process, since the evaluation of moves m a d e m a n y moves previously is less likely to be relevant to the current evaluation. Instead, previous moves are weighted by Ap, where p reflects how far back the m o v e is. The p a r a m e t e r A controls how m u c h credit is given to previous moves, giving exponentially decaying feedback of the prediction error over time. Hence, this a l g o r i t h m is called TD(A). Figure 3 gives the t e m p o r a l difference relation used by TD(A). A typical application of T D L is for a p r o g r a m with an evaluation function, but u n k n o w n weights for the features. By playing a series of games, the p r o g r a m gets feedback on the relative i m p o r t a n c e of features. T D L p r o p a g a t e s this i n f o r m a t i o n back along the m o v e sequence played, causing incremental changes to the feature weights. The result is that the evaluation function values get tuned to be better predictors. In addition to T e s a u r o ' s success in b a c k g a m m o n (Section 3.1), there are two recent T D L d a t a points in chess. First, Cilkchess, currently one of the strongest chess p r o g r a m s , was tuned using t e m p o r a l difference learning and the results are encouraging. D o n Dailey, a c o - a u t h o r of Cilkchess, writes that [46]: Much to my surprise, TDL seems to be a success. But the weight set that comes out is SCARY; I'm still afraid to run with it even though it beats the
Aw, = a(P, ~_1 - P,) ~
k'-kV,,,Pk
k=]
where: 9 9 9 9 9
w is the set of weights being tuned, t is the time step being altered, in a sequence of moves from 1,2 ..... N - 1, Aw, is the change in the set of weights at step t as a result of applying temporal differences, P, is the prediction at time step t (for the end of the game, Px, the final outcome is used), A(0 ~
--I c :10
z
.-I 0
0 DO c~
m
00 z
7"1 DO 0
288
N. O. BERNSEN AND L. DYBKJ/ER
optional: 9 9 9 9 9 9
how many people will travel? their age categories will they need return tickets? will they all need return tickets or only some of them? will they need round-trip tickets? do they need information on the notions of green and red (i.e. different kinds of discount) departures, etc.
As, in such cases, letting users speak freely may not be feasible, the challenge for the dialogue designer becomes that of eliciting all relevant pieces of information in as elegant a way as possible.
3.5. 1.2 I l l - s t r u c t u r e d t a s k s What is the task structure, if any? Some tasks are ill-structured or have no structure at all. Consider, for instance, an SLDS whose database contains all existing information on flight travel conditions and regulations for a certain airline company. This information tends to be quite comprehensive both because of the many practicalities involved in handling differently made up groups of travellers and their luggage, but also because of the legal ramifications of travelling which may surface--if the flight crashes, for example. Some users may want many different individual pieces of information from the database whereas others may want only one piece. Which piece(s) of information a particular user wants is completely unpredictable. We call such user tasks ill-structured tasks: the user may want one, two, or several pieces of information from the database, and the order in which the user may want the information is completely arbitrary as seen from the system's point of view. The system must be prepared for everything from the user all the time. One way to try to reduce the complexity of large ill-structured tasks is to use, or invent, domain structure. That is, the domain may be decomposable into a number of sectors which themselves may be hierarchically decomposed, etc. So the system asks, for instance: "Do you want to know about travelling with infants, travelling with pets, travelling for the elderly and the handicapped, hand luggage, or luggage for storage?" And if the user says "Hand luggage," the system asks: "Do you want to know about volume of permitted luggage, electronic equipment, fragile items, or prohibited luggage?," etc. In principle, any information hierarchy, however deep a n d / o r broad, can be handled by SLDSs in this way. However, few users will find it acceptable to navigate through many hierarchical levels prompted by the system in order to find a single piece of information at the bottom of some deep domain hierarchy. Making the hierarchy shallower
FROM SINGLE WORD TO NATURAL DIALOGUE
289
will often make matters even worse. No user will find it acceptable to have to go through a shallow but broad hierarchy prompted by the system in order to find a single piece of information at the end of it. Just imagine a Danish train timetable inquiry system which asks the user: "Do you want to go from Aabenr~., Aalborg, Aarhus .... " mentioning the country's 650 or so train stations in alphabetical order and in chunks of, say, 10 at a time. Another important problem about ill-structured tasks is that their vocabularies tend to differ considerably from one user to another. Unlike, say, train timetable inquiries which can start uncontroversially from the "official" city names, or used car sales catalogues which can start from the car models and production years, there is no ~official" vocabulary, used by all or most users, for inquiring about pet transportation, high-volume luggage, or staff support for the elderly. We are all familiar with this problem from the (paper) yellow pages where the categories are often labeled differently from how we would have labeled them ourselves. SLDSs in fact offer an elegant solution to this problem, i.e. an electronic dictionary of equivalents. The user speaks freely about the entry of interest. The system has only to spot a keyword in the user's input corresponding to an entry in its electronic dictionary in order to direct the user to the right entry in its hierarchy, and the user does not have to worry about knowing or remembering the headings used to structure the hierarchy itself.
3.5. 1.3
Well-structured tasks
Other tasks have some structure to
them. Consider the flight ticket reservation task handled by the Danish Dialogue System. This task is a partially ordered one. It would normally make little sense, for instance, to ask for one of the morning departures until one has specified the date; and it normally makes little sense to expect the system to tell whether or not flights are fully booked on a certain date until one has indicated the itinerary. Moreover, ordinary users know that this is the case. Task structure is helpful if the task complexity makes it advisable to control the user's input (see 3.5.2). It is important to note, however, that partial-order-only is what one is likely to find in most cases. Moreover, sub-task interdependencies may interfere with the designer's pre-conceived ideas about '~the" task order. For instance, some users may want to know which departures are available at reduced fares before wanting to know about departure times, whereas others do not care about fare reductions at all. In this way, sub-task interdependencies may also introduce an element of negotiation into the dialogue. For instance, before accepting a certain departure, the caller may want to know if the subsequent departure is the cheaper of the two. This makes the dialogue a two-way exchange where the caller and the system take turns in asking questions of one another and answering those questions.
290
N.O. BERNSEN AND L. DYBKJ,ZER
Both elements, negotiation and two-way information flow, complicate task model design. Moreover, even the partial order that the task has may be by default only. If someone simply wants to leave town as soon as possible, the itinerary matters less than the departure time of the first outbound flight that has a free seat.
3.5. 1.4 Negotiation tasks Does the task require substantial negotiation? Some tasks involve a relatively low volume of information and in addition have some structure to them. Still, they may be difficult to manage if they involve a considerable amount of negotiation. The Verbmobil meeting-scheduling task is an example. Fixing a meeting simply requires fixing a date, a time or a time interval, and possibly a place, so the volume of information to be exchanged is relatively low. Unless one knows the date, it can be difficult to tell if one is free at a certain time, so the task has some structure to it. And for busy people, the date-time pair may matter more than the exact venue. The problem inherent to the Verbmobil task is that fixing meetings may require protracted, and ill-structured, negotiations of each sub-task. The outcome of a meeting d a t e - t i m e - v e n u e negotiation is not a simple function of calendar availability and prior commitments but also depends on the importance of the meeting, the importance of the prior commitments, the possibilities of moving, canceling, or sending apologies for other meetings, the professional and personal relationships between the interlocutors, etc. In addition, Verbmobil does nothing to impose structure on the ( h u m a n human) dialogue for which it provides translation support, allowing the dialogue to run freely wherever the interlocutors want it to go. In such cases, it makes little sense to judge the task complexity in terms of the volume of information to be exchanged or in terms of the task structure that is present, because the real problem lies in negotiating and eventually agreeing to meet. Had Verbmobil been a human-machine dialogue system product, with the machine representing the diary of someone absent from the conversation, state-of-the-art engineering practice would probably have dictated much stronger system control of the dialogue, effectively turning meeting d a t e time-venue negotiation into a booking exercise (see 3.5.2). Note also that negotiation is a natural human propensity. It seems likely that most SLDSs can be seen to potentially involve an element of negotiation--Verbmobil just involves a considerable amount of it! In the case of the Danish Dialogue System, for instance, the system may propose a list of morning departures and ask if the user wants one of them. If not, the joint search for a suitable departure time continues. So the reason why negotiation is less obvious or prominent in the dialogue conducted with the
FROM SINGLE WORD TO NATURAL DIALOGUE
291
Danish Dialogue System when compared to the dialogue conducted with Verbmobil is not that the task of the Danish system by itself excludes negotiation. Rather, the reason is that the system's dialogue behavior has been designed to constrain the way in which negotiations are done.
3.5. 1.5 S u m m a r y When developing SLDSs we want to let the users speak freely. However, if the task is a complex one, the SLDS project either has to be given up as a commercial undertaking, turned into a research project, such as Verbmobil, or requires some or all of the following strategies: 1. 2. 3. 4.
input prediction (see 3.4.2.1) input language processing control (see 3.4.2.2) output control (see 3.4.2.3) control of user input (see 3.5.2).
To the extent that they are applicable, all of 1-4 are of course useful in any system but their importance grows with increasing task complexity. 4 subsumes 3 as shown in the next section. Obviously, 1-4 can also be used for very simple tasks, making dialogue engineering easier to do for these tasks than if the users were permitted to speak freely in a totally unconstrained fashion. In fact, as we shall see in section 3.5.2, this language of "not permitting" users certain things, "constraining them," etc. is misleading. Elegant dialogue management solutions can often be found which do not in the least reduce the user's sense of being engaged in natural spoken communication.
3.5.2 Controlling User Input The dialogue manager can do many things to control the user's input in order to keep it within the system's technical capabilities of recognition and understanding. Among task-oriented SLDSs, extreme lack of user input control may be illustrated by a system which simply tells the user about the task it can help solve and then invites the user to go ahead. For instance: "Freddy's Used Cars Saloon. How can I help you?" Lack of user input control allows the user to produce unconstrained input. Clear signs of unconstrained user input are: 9 very long input sentences; topics are being raised, or sub-tasks addressed, in any order 9 any number of topics is being addressed in a single user utterance.
292
N. O. BERNSEN AND L. DYBKJA~R
Even for comparatively simple tasks, the handling of unconstrained user input is a difficult challenge which most SLDS projects cannot afford to try to meet. What user input control does is to impose, through a variety of explicit and implicit means, more or less strong constraints on what would otherwise have been unconstrained user input. Some of the user input control mechanisms available to the SLDS developer are:
3.5.2. 1 Information on system capabilities This section is about explicit information to users on what the system can and cannot do. For all but the simplest SLDSs, and for all but very special user populations, it is advisable that the system, up-front, clearly and succinctly tells the user things like what is its domain, what are its task(s), etc. This helps to "tailor" the user's expectations, and hence the user's input, to the knowledge the system actually has, thereby ultimately reducing the task of the dialogue manager. For instance, the following two systems both qualify in the generic sense as ferry timetable information systems (Ss): system S1 knows everything about the relevant, current, and planned timetables; $2 knows, in addition, about significant delays and is thus able to distinguish between planned and actual arrivals and departures. It is important to tell users whether the system they are about to use is an S1 or an $2. System capability information does not have to be given to users through speech, of course. Waxholm, for instance, provides this information as text on the screen as well as in synthetic speech. If the information is given through speech, it is virtually always important that it is expressed briefly and clearly because users tend to lose attention very quickly when they have to listen to speech-only information. It is often an empirical issue how much users need to be told about the system's capabilities in order that their mental model of the system's capabilities roughly matches what the system actually can and cannot do. If the story is longer than a few facts, it is advisable to make it possible for regular users to skip the story. Also, those parts of the story which only relate to some optional loop in the dialogue might be better presented at the start of that loop than up-front. Brief on-line information on the system's capabilities cannot be made redundant by any amount of paper information about the system. 3.5.2.2 Instructions on h o w to address the system This section is about explicit instructions to users on how to address the system. By issuing appropriate instructions, the SLDS may strongly increase its control of the way it is being used. For instance, the Danish Dialogue System tells its users that it will not be able to understand them unless they answer the
FROM SINGLE WORD TO NATURAL DIALOGUE
293
system's questions briefly and one at a time. If used at all, such operating instructions should be both very brief and eminently memorable. Otherwise, they will not work because too many users will forget the instructions immediately. Indications are that the quoted instruction from the Danish Dialogue System worked quite well. However, as part of the operating instructions for the Danish system, users were also instructed to use particular keywords when they wanted to initiate meta-communication (see 3.6.2). This worked less well because too many users forgot the keywords they were supposed to use.
3.5.2.3
Feedback on what the system understood
Feedback on
what the system has understood from what the user just said helps ensure that, throughout the dialogue, the user is left in no doubt as to what the system has understood (see also 3.6.6). All SLDSs need to provide this form of (information) feedback. A user who is in doubt as to whether the system really did understand what the user just said is liable to produce unwanted input.
3.5.2.4 Processing feedback Processing feedback is about what the system is in the process of doing. When the system processes the information received from the user and hence may not be speaking for a while, processing feedback keeps the user informed on what is going on (see also 3.6.6). Most SLDSs can benefit from this form of (processing) feedback. A user who is uncertain about what is going on inside the system, if anything, is liable to produce unwanted input. 3.5.2.5 Output control The aim of output control (see also 3.4.2) is to "prime" the user through the vocabulary, grammar, and style adopted by the system. There is ample evidence that this works very well for SLDSs. Humans are extremely good at (automatically, unconsciously) adapting their vocabulary, grammar, and style to those of their partners in dialogue or conversation [33, 34]. Just think about how easily we adapt linguistically to small children, the hard of hearing, or foreigners with little mastery of our mother tongue. It is therefore extremely useful to make the system produce output which only uses the vocabulary and grammar which the system itself can recognize, parse, and understand, and to make the system use a style of dialogue that induces the user to provide input which is terse and to the point. The unconscious adaptation performed by the users ensures that they still feel that they can speak freely without feeling hampered by the system's requirements for recognizable vocabulary, simple grammar, and terse style. A particular point to be aware of in this connection is that if the system's output to the user includes, for example, typed text on the screen, then the
294
N. O. BERNSEN AND L. DYBKJAER
textual output should be subjected to the same priming strategy as has been adopted for the system's spoken output. It is not helpful to carefully prime the user through the system's output speech and then undercut the purpose of the priming operation through a flourishing style of expression in the textual output.
3.5.2.6 Focused output and system initiative If the system determines the course of the dialogue by having the initiative (see 3.5.3) all or most of the time, for instance through asking questions of the user or providing the user with instructions which the user has to carry out, a strong form of user input control becomes possible. The system can phrase its questions or instructions in a focused way so that, for each question, instruction, etc., the user has to choose between a limited number of response options. If, for instance, the system asks a question which should be answered by a "yes" or "no," or by a name drawn from a limited set of proper names (of people, weekdays, airports, train stations, streets, car model names, etc.), then it exerts a strong influence on the user's input. Dialogue managers using this approach may be able to handle even tasks of very large complexity in terms of information volume (see 3.5.1.1). Note that, in itself, system initiative is far from sufficient for this kind of input control to take place. If the system says, for instance, " A D A P Travels, how can I help you?," it does, in principle, take the initiative by asking a question. However, the question is not focused at all and therefore does not restrict the user's input. An open or unfocused system question may therefore be viewed as a way of handing over the dialogue initiative to the user. Note also that the method just described is better suited for some tasks than for others, in particular for tasks consisting of a series of independent pieces of information to be provided to the system by the user. Beyond those tasks, the strong form of input control involved will tend to give the dialogue a rather mechanical, less than natural quality. Moreover, some tasks do not lend themselves to system initiative only, and large unstructured tasks cannot be handled through focused output combined with system initiative in a way that is acceptable to the users. 3.5.2.7 Textual material The term "textual material" designates information about the system in typed or handwritten form and presented graphically, such as on screen or on paper, or haptically, such as using Braille. Typically, this information tells users what the system can and cannot do and instructs them on how to interact with the system. For particular purposes, such as when the users are professionals and will be using the SLDS extensively in their work, strong user input control can be exerted through
FROM SINGLE WORD TO NATURAL DIALOGUE
295
textual material which the user is expected to read when, or before, using the system. For obvious reasons, text-based system knowledge is difficult to rely on for walk-up-and-use systems unless the system, like Waxholm, includes a screen or some other text-displaying device. Users are not likely to have textual material on paper at hand when using the system: it is somewhere else, it has disappeared, or they never received it in the first place.
3.5.2.8 Barge-in Barge-in means that users can speak to the system, and expect to be recognized and understood by it, whenever they so wish, such as when the system itself is speaking or is processing recent input. Barge-in is not, in fact, an input control mechanism. Rather, it is something which comes in handy because full input control is impossible. In particular, it is impossible to prevent enough users from speaking when the system is not listening, no matter what means are being adopted for this purpose. The likelihood of users speaking freely "out of order" varies from one application and user group to another. In some applications, it may even be desirable that the users can speak freely among themselves whilst the system is processing the spoken input. Still, barge-in technology is advisable for very many SLDSs, so that the system is able to recognize and process user input even if it arrives when the system is busy doing something other than just waiting for it. In Waxholm, the system does not listen when it speaks. However, the user may barge in by pressing a button to interrupt the system's speech. This is useful when, for instance, the user already feels sufficiently informed to get on with the task. For instance, users who are experts in using the application can use the button-based barge-in to skip the system's introduction. The Danish Dialogue System does not allow barge-in when the system speaks. This turned out to cause several transaction failures, i.e. dialogues in which the user did not get the result asked for. A typical case is one in which the system's feedback to the user (see 3.6.6) shows that the system has misunderstood the user, for instance by mistaking "Saturday" for "Sunday." During the system's subsequent output utterance, the user says, e.g.: "No, Saturday." The system's lack of reaction to what the user said is easily interpreted by the user as indicating that the system had received the error message, which of course it hadn't, and couldn't have. As a result, the user will later receive a flight ticket which is valid for the wrong day. 3.5.3
Who Should Have the Initiative?
Dialogue in which the initiative lies solely with the system was discussed as an input control mechanism in Section 3.5.2.6. This section generalizes the discussion of initiative initiated there.
296
N. O. BERNSEN AND L. DYBKJ/ER
Dialogue management design takes place between two extremes. F r o m the point of view of technical simplicity, one might perhaps wish that all SLDSs could conduct their transactions with users as a series of questions to which the users would have to answer "yes" or " n o " and nothing else. Simpler still, "yes" or " n o " could be replaced by filled pauses ("grunts") and unfilled pauses (silence), respectively, between the system's questions, and speech recognition could be replaced by grunt detection. F r o m the point of view of natural dialogue, on the other hand, users should always be able to say exactly what they want to say, in the way they want to say it, and when they want to say it, without any restrictions being imposed by the system. Both extremes are unrealistic, of course. If task complexity is low in terms of, among other things, information volume and negotiation potential, then it is technically feasible today to allow the users to say what they want whilst still using some of the input control mechanisms discussed in Section 3.5.2. As task complexity grows in terms of information volume, negotiation potential and the other factors discussed in Section 3.5.1, it really begins to matter who has the initiative during the dialogue. We may roughly distinguish three interaction modes, i.e. system-directed dialogue, mixed initiative dialogue, and user-directed dialogue. This distinction is a rough-and-ready one because initiative distribution among user and system is often a matter of degree depending upon how often which party is supposed to take the initiative. In some cases, it can even be difficult to classify an SLDS in terms of who has the initiative. If the system opens the dialogue by saying something like: "Welcome to service X, what do you w a n t ? " - - i t might be argued that the system has the initiative because the system is asking a question of the user. However, since the question asked by the system is a completely open one, one might as well say that the initiative is being handed over to the user. In other words, only focused questions clearly determine initiative (cf. 3.5.2.6). The same is true of other directive uses of language in which partner A tells partner B to do specific things, such as in instructional dialogues.
3.5.3. 1 System-directed dialogue As long as the task is one in which the system requires a series of specific pieces of information from the user, the task may safely be designed as one in which the system preserves the initiative throughout by asking focused questions of the user. This system directed approach would work even for tasks of very large complexity in terms of information volume, and whether or not the tasks are well structured. Note that if the sub-tasks are not mutually independent, or if several of them are optional, then system initiative may be threatened (cf. 3.5.1). Still, system-directed dialogue is an effective strategy for reducing
FROM SINGLE WORD TO NATURAL DIALOGUE
297
user input complexity and increasing user input predictability (cf. 3.4.2.1). In addition, system-directed dialogue actually is relatively natural for some tasks. From a dialogue engineering perspective, it may be tempting to claim that system-directed dialogue is generally simpler to design and control than either user-directed dialogue or mixed initiative dialogue, and therefore should be preferred whenever possible. This claim merits some words of caution. Firstly, it is not strictly known if the claim is true. It is quite possible that system-directed dialogue, for all but a relatively small class of tasks, is not simpler to design and control than its alternatives because it needs ways of handling those users who do not let themselves be fully controlled but speak out of turn, initiate negotiation, ask unexpected questions, etc. Secondly, products are already on the market which allow mixed initiative dialogue for relatively simple tasks, such as train timetable information [6], and it is quite likely that users generally tend to prefer such systems because they let the users speak freely to some extent.
3.5.3.2 Mixed initiative dialogue In mixed initiative dialogue, any one of the participants may h a v e - - o r t a k e - - t h e initiative. A typical case in which a mixed initiative approach is desirable is one in which the task is large in terms of information volume and both the user and the system need information from one another. Whilst asking its questions, the system must be prepared, sometimes or all of the time, that the user may ask a question in return instead of answering the system's own question. For instance, the user may want to know if a certain flight departure allows discount before deciding whether that departure is of any interest. In practice, most SLDSs have to be mixed initiative systems in order to be able to handle user-initiated repair meta-communication. The user must have the possibility of telling the system, at any point during dialogue, that the system has misunderstood the user or that the user needs the system to repeat what it just said (see 3.6.2). Only systems which lack metacommunication altogether can avoid that. Conversely, even if the user has the initiative throughout the dialogue, the system must be able to take the initiative to do repair or clarification of what the user has said. When thinking about, or characterizing, SLDSs, it may be useful, therefore, to distinguish between two issues: 9 who has the initiative in domain communication? 9 who has the initiative in meta-communication? (more in 3.6.2). The need for mixed initiative dialogue in domain communication is a function of bidirectionality of the flow of information needed to complete
298
N.O. BERNSEN AND L. DYBKJ,ZER
the task, sub-task interdependencies, the potential for negotiation occurring during task completion, the wish for natural and unconstrained dialogue, etc. (cf. 3.5.1). Other things being equal, mixed initiative dialogue is harder to control and predict than system directed dialogue.
3.5.3.3 User-directed dialogue In user-directed dialogue, the user has the initiative all or most of the time. User-directed dialogue is recommended for ill-structured tasks in which there is no way for the system to anticipate which parts of the task space the user wants to address on a particular occasion. The flight conditions information task (see 3.5.1) is a case in point, as is the email operation task (see 3.3.3). User-directed dialogue is also useful for tasks with regular users who have the time to learn how to speak to the system to get the task done. However, user-directed dialogue is harder to control and predict than system-directed dialogue. For the time being, user-directed dialogue is not recommended for walk-up-anduse users except for very simple tasks. 3.5.4 Input Prediction/Prior Focus In order to support the system's speech recognition, language processing, and dialogue management tasks, the dialogue manager developer should investigate if selective prediction of the user's input is possible at any stage during the dialogue (see 3.4.2). This may be possible if, for example, the system asks a series of questions each requesting specific pieces of information from the user. If the task has some structure to it, it may even be possible to use the structure to predict when the user is likely to ask questions of the system, thus facilitating mixed initiative (3.5.3) within a largely system directed dialogue. For instance, the Daimler-Chrysler dialogue manager and the Danish Dialogue System use various forms of input prediction. Another way of describing input prediction is to say that the dialogue manager establishes a (selective)focus of attention prior to the next user utterance. Useful as it can be, input prediction may fail because the user does not behave as predicted. In that case, the dialogue manager must be able to initiate appropriate meta-communication (see 3.6.2). This is not necessarily easy to do in case of failed predictions because the system may not be aware that the cause of failed recognition or understanding was its failure to predict what the user said. Unless it relaxes or cancels the prediction, the risk is that the dialogue enters an error loop in which the system continues to fail to understand the user. Input prediction can be achieved in many different ways. It may be useful to distinguish between the following two general approaches.
FROM SINGLE WORD TO NATURAL DIALOGUE
299
3.5.4.1 Knowledge-based input prediction In knowledge-based input prediction, the dialogue manager uses a priori knowledge of the context to predict one or more characteristics of the user's next utterance. Note that the a priori nature of knowledge-based input prediction does not mean that implemented predictions should not be backed by data on actual user behavior. It is always good practice to test the adopted knowledge-based input prediction strategy on user-system interaction data.
3.5.4.2 Statistical input prediction In statistical input prediction, the dialogue manager uses corpus-based information on what to expect from the user. Given a corpus of user-system dialogues about the task(s) at hand, it may be possible to observe and use regularities in the corpus, such as that the presence of certain words in the user's input makes it likely that the user is in the process of addressing a specific subset of the topics handled by the system, or that the presence of dialogue acts DA5 and DA19 in the immediate dialogue history makes it likely that the user is expressing DA25. Waxholm uses the former approach, Verbmobil the latter.
3.5.5 Sub-task Identification It is a useful exercise for the dialogue manager developer to consider the development task from the particular point of view of the dialogue manager. The dialogue manager is deeply embedded in the SLDS, is out of direct contact with the user, and has to do its job based on what the speech and language layers deliver. This happens in the context of the task, the target user group, and whatever output and input control the dialogue manager may have imposed. Basically, what the speech and language layers can deliver to the dialogue manager is some form of meaning representation. Sometimes the dialogue manager does not receive any meaning representation from the speech and language layers even though one was expected. Even if a meaning representation arrives, there is no guarantee that this representation adequately represents the contents of the message that was actually conveyed to the system by the user because the speech and language layers may have got the user's expressed meaning wrong. Still, whatever happens, the dialogue manager must be able to produce appropriate output to the user. Current SLDSs exhibit different approaches to the creation of a meaning representation in the speech and language layers as well as to the nature of the meaning representation itself. An important point is the following: strictly speaking, the fact that a meaning representation arrives at the dialogue manager is not sufficient for the dialogue manager to carry on with
300
N. O. BERNSEN AND L. DYBKJ,ZER
the task. First, the dialogue manager must identf/:v to ~'hich sub-task(s), or
topics, if any, that incoming meaning representation provides a contribution.
Only when it knows that, or believes that it knows, can the dialogue manager proceed to sort out which contribution(s), if any, the incoming meaning representation provides to the sub-task(s). In other words, many task-oriented SLDSs require the dialogue manager to do sub-task identification or topic identification. The task solved by most SLDSs can be viewed as consisting in one or several sub-tasks or topics to be addressed by user and system. One or several of these sub-tasks have to be solved in order that the user and the system succeed in solving the task. Other sub-tasks may be optional, i.e. their solution is sometimes, but not always, required. An example class of optional sub-tasks is the meta-communication sub-tasks (see 3.6.2): if the dialogue proceeds smoothly, no meta-communication sub-tasks have to be solved. Basically, dialogue managers can be built so as to be in one of two different situations with respect to sub-task identification. In the first case, the dialogue manager has good reason to assume that the user is addressing a particular domain sub-task; in the second case, the dialogue manager does not know which domain sub-task the user is addressing. In both cases, the dialogue manager must start from the semantic representations that arrive from the speech and language layers, look at the semantically meaningful units, and seek to figure out which sub-task the user is addressing.
3.5.5. 1 Local focus The dialogue manager may have good reason to assume that the user's utterance addresses a specific sub-task, such as that of providing the name of an employee in the organization hosting the system. Depending on the task and the dialogue structure design, there can be many different reasons why the dialogue manager knows which sub-task the user is currently addressing: there may be only one sub-task, as in an extremely simple system; the task may be well-structured; the system just asked the user to provide input on that sub-task, etc. Generally speaking, this is a good situation for the dialogue manager to be in, as in the Danish Dialogue System. This system almost always knows, or has good reason to believe, that the user is either addressing a specific domain sub-task or has initiated meta-communication. Since it has good reason to believe which sub-task the user is addressing, the task of the dialogue manager reduces to that of finding out exactly what is the user's contribution to that sub-task (or one of those sub-tasks, if we count in the possibility that the user may have initiated meta-communication). In such cases, the system has a local focus. The system may still be wrong, of course, and then it becomes the joint task of the system and the user to rectify the situation through metacommunication.
FROM SINGLE WORD TO NATURAL DIALOGUE
301
3.5.5.2 Global focus The dialogue manager does not know which of several possible sub-tasks the user is addressing. The main reason why the dialogue manager does not know which sub-task the user is currently addressing is that the dialogue manager has given the user the initiative, for instance by asking an open question in response to which the user may have addressed any number of possible sub-tasks. Alternatively, the user has unexpectedly taken the initiative. In such situations, the dialogue manager has to do sub-task identification, or topic identification, before it can start processing the user's specific contribution to the sub-task. Sub-task identification is crucial in systems such as RailTel/ARISE, Waxholm, and Verbmobil. Waxholm uses probabilistic rules linking semantic input features with topics. Given the rules and a particular set of semantic features in the input, Waxholm infers which topic the user is actually addressing. For sub-task identification, Verbmobil uses weighted default rules to map from input syntactic information, keywords, and contextual information about which dialogue acts are likely to occur, into one or several dialogue acts belonging to an elaborate taxonomy of approximately 54 speech acts (or dialogue acts). An important additional help in sub-task identification is support from a global focus, for instance when the dialogue manager knows that the task history (see 3.7.1) contains a set of as yet unsolved sub-tasks. These tasks are more likely to come into local focus than those that have been solved already. Another form of global focus can be derived from observation of dialogue phases. Most task-oriented dialogues unfold through three main phases: the introduction phase with greetings, system introductions, etc., the main task-solving phase, and the closing phase with closing remarks, greetings, etc. Sometimes it may be possible to break down the main tasksolving phase into several phases as well. If the system knows which phase the dialogue is in at the moment, this knowledge can be used for sub-task identification support. Knowing that can be a hard problem, however, and this is a topic for ongoing research. For instance, the joint absence of certain discourse particles called topic-shift markers and the presence of certain dialogue acts may suggest that the user has not changed dialogue phase. Generally speaking, the more possible sub-tasks the user might be addressing in a certain input utterance, the harder the sub-task identification problem becomes for the dialogue manager. When doing sub-task identification, the dialogue manager may follow one of two strategies. The simpler strategy is to try to identify one sub-task only in the user's input, even if the user may have been addressing several sub-tasks, and continue the dialogue from there as in Waxholm. The more demanding strategy is to try to identify each single sub-task addressed by the user, as in
302
N. O. BERNSEN AND L. DYBKJAER
RailTel/ARISE. The latter strategy is more likely to work when task complexity is low in terms of volume of information.
3.5.5.3 After sub-task identification Depending on what arrives from the speech and language layers, and provided that the dialogue manager has solved its sub-task identification task, the dialogue manager must now determine the users' specific contribution(s) to the sub-task(s) they are addressing (see 3.5.5). Following that, the dialogue manager must do one of five things as far as communication with the user is concerned: 1. advance the domain communication (see 3.6.1) including the provision of feedback (see 3.6.6) 2. initiate meta-communication with the user (see 3.6.2 and 3.6.5) 3. initiate other forms of communication (see 3.6.3) 4. switch to a fall-back human operator 5. end the dialogue (see 3.6.7). Advancing the domain communication means getting on with the task. Initiating meta-communication means starting a sub-dialogue (or, in this case, a meta-dialogue) with the user in order to get the user's meaning right before advancing the domain communication any further. No SLDS probably can do without capabilities for 1 and 2. If 2 fails repeatedly, some systems have the possibility of referring the user to a human operator (4). Otherwise, calling off the dialogue is the only possibility left (5). In parallel with taking action vis-a-vis the user, the dialogue manager may at this stage take a smaller or larger series of internal processing steps which can be summarized as: 6. updating the context representation (see 3.7.1) 7. providing support for the speech and language layers to assist their interpretation of the next user utterance (see 3.4.2).
3.5.6 Advanced Linguistic Processing The determination of the user's contribution to a sub-task requires more than, to mention just one example, the processing of semantic feature structures. Processing of feature structures often implies value assignment to slots in a semantic frame even though these values cannot be straightforwardly derived from the user's input in all cases. If the system has to decide on every contribution of a user to a sub-task, something which few systems do at present, advanced linguistic processing
FROM SINGLE WORD TO NATURAL DIALOGUE
303
is needed. It may involve, among other things, cross-sentence co-reference resolution, ellipsis processing, and the processing of indirect dialogue acts. In nearly all systems, the processing of most of these phenomena is controlled and carried out by one of the natural language components--by the parser in the Daimler-Chrysler dialogue manager, by the semantic evaluation component in the Verbmobil speech translation system--but never without support from the dialogue manager.
3.5.7 Co-reference and ellipsis processing In case of cross-sentence co-reference and ellipsis processing, the natural language system component in charge is supported by the dialogue manager which provides a representation of contextual information for the purpose of constraining the relevant search space. The contextual information is part of the dialogue history (see 3.7.1). Dialogue history information consists in one or several data structures that are being built up incrementally to represent one or more aspects of the preceding part of the dialogue. In principle, the more aspects of the preceding dialogue are being represented in the dialogue history, the more contextual information is available for supporting the processing done in the language layer, and the better performance can be expected from that layer. Still, co-reference resolution and ellipsis processing remain hard problems.
3.5.8 Processing of indirect dialogue acts Advanced linguistic processing also includes the processing of indirect dialogue acts. In this case, the central problem for the system is to identify the "real" dialogue act performed by the user and disguised as a dialogue act of a different type. In contrast to direct dialogue acts, indirect dialogue acts cannot be determined on the basis of their surface form, which makes the frequently used keyword spotting techniques used for the identification of direct dialogue acts almost useless in such cases. Clearly, the processing of indirect dialogue acts calls for less surface oriented processing methods involving semantic and pragmatic information associated with input sequences. This is a hard problem.
3.6
Communication
3.6.1 Domain Communication The primary task of the dialogue manager is to advance the domain communication based on a representation of the meaning-in-task-context
304
N. O. BERNSEN AND L. DYBKJ,ZER
of the user's input (cf. 3.5.5 and 3.5.6). Let us assume that the dialogue manager has arrived at an interpretation of the user's most recent input and decided that the input actually did provide a contribution to the task. This means that the dialogue manager can now take steps towards advancing the domain communication with the user. Obviously, what to do in a particular case depends on the task and the sub-task context. The limiting case is that the dialogue manager simply decides that it has understood what the user said and takes overt action accordingly, such as connecting the caller to a user who has been understood to want to accept a collect call, replaying an email message, or displaying a map on the screen. Some other cases are:
3.6. 1.1 More information n e e d e d The dialogue manager inserts the user's input meaning into a slot in the task model, discovers that more information is needed from the user, and proceeds to elicit that information. 3.6.1.2 Database look-up The dialogue manager looks up the answer to the user's question in the database containing the system's domain knowledge and sends the answer to the language and speech generation components (or to the screen, etc.). 3.6. 1.3 Producing an a n s w e r The dialogue manager inserts the user's input meaning into a slot in the task model, verifies that it has all the information needed to answer the user's query, and sends the answer to the language and speech generation components (or to the screen, etc.). 3.6. 1.4 Making an inference The dialogue manager makes an inference based on the user's input meaning, inserts the result into a slot in a database and proceeds with the next question. In Waxholm, for instance, the system completes the user's "on Thursday" by inferring the appropriate date, and replaces qualitative time expressions, such as "this morning," by well-defined time windows, such as "6 a.m-12 noon." Verbmobil does inferencing over short sequences of user input, such as that a counterproposal (for a date, say) implies the rejection of a previous proposal; a new proposal (for a date, say) implies the acceptance of a (not incompatible but less specific) previous proposal; and a change of dialogue phase implies the acceptance of a previous proposal (for a time, say). It is important to note that such domain-based inferences abound in h u m a n - h u m a n conversation. Without thinking about it, human speakers expect their interlocutors to make those inferences. The dialogue manager has no way of replicating the sophistication of human inferencing during conversation and dialogue. Most current systems are able to process only
FROM SINGLE WORD TO NATURAL DIALOGUE
305
relatively simple inferences. The dialogue manager developer should focus on enabling all and only those inferences that are strictly necessary for the application to work successfully in the large majority of exchanges with users. Even that can be a hard task. For instance, should the system be able to perform addition of small numbers or not? Simple as this may appear, it would add a whole new chapter to the vocabulary, grammar, and rules of inference that the system would have to master. In travel booking applications, for instance, some users would naturally say things like "two adults and two children;" or, in travel information applications, some users may want to know about the ~'previous" or the ~'following" departure given what the system has already told them. The developer has to decide how important the system's capability of understanding such phrases is to the successful working of the application in real life. In many cases, making an informed decision will require empirical investigation of actual user behavior. Finally, through control of user input (see 3.5.2), the developer must try to prevent the user from requiring the system to do inferences that are not strictly needed for the application or which are too complex to implement.
3.6. 1.5
More constraints ne e de d
The dialogue manager discovers that the user's input meaning is likely to make the system produce too much output information and produces a request to the user to provide further constraints on the desired output.
3.6. 1.6 Inconsistent input The dialogue manager discovers that the user's input meaning is inconsistent with the database information, infeasible given the database, inconsistent with the task history, etc. The system may reply, for example, "There is no train from Munich to Frankfurt at 3.10 p.m.," or "The 9 o'clock flight is already fully booked."
3.6. 1.7 Language translation The dialogue manager translates the user's input meaning into another language. 3.6. 1.8 Summary Among the options above, the first four and the last one illustrate straightforward progression with the task. The two penultimate options illustrate domain sub-dialogues. Quite often, the system will, in fact, do something more than just advancing the domain communication as exemplified above. As part of advancing the domain communication, the system may provide feedback to the user to enable the user to make sure that what the user just said has been understood correctly (see 3.6.6).
306
N. O. BERNSEN AND L. DYBKJAER
3.6.2
Meta-communication
Meta-communication, although secondary to domain communication, is crucial to proper dialogue management. Meta-communication is often complex and potentially difficult to design. In meta-communication design, it is useful to think in terms of distinctions between
9system-initiated and user-initiated meta-communication 9 repair and clarification meta-communication. These are all rather different from each other in terms of the issues they raise, and distinction between them gives a convenient breakdown of what otherwise tends to become a tangled subject. In general, one of the partners in the dialogue initiates meta-communication because that partner has the impression that something went wrong and has to be corrected. Note that we do not include system feedback under meta-communication. Some authors do, and there does not seem to be any deep issue involved here one way or the other. We treat feedback as a separate form of system-touser communication in 3.6.6. Primarily for the user, feedback (from the system) is the most important way of discovering that something had gone wrong and has to be corrected.
3.6.2. 1 System-initiated repair meta-communication
System-
initiated repair meta-communication is needed whenever the system has reason to believe that it did not understand the user's meaning. Such cases include: 9 Nothing arrived for the dialogue manager to process, although input meaning was expected from the user. In order to provide appropriate output in such cases, the dialogue manager must get the user to input the meaning once more. It is worth noting that this can be done in many different ways, from simply saying "Sorry, I did not understand," or "Please repeat," to asking the user to speak louder or more distinctly. The more the system knows about the probable cause of its failing to understand the user, the more precise its repair metacommunication can be. Any such gain in precision increases the likelihood that the system will understand the user the next time around and thus avoid error loops (see 3.6.5) 9 Something arrived for the dialogue manager to process but what arrived was meaningless in the task context. For instance, the user may be perceived as responding " L o n d o n " to a question about departure date. In order to provide appropriate output in such cases, the dialogue manager may have to ask the user to input the meaning again.
FROM SINGLE WORD TO NATURAL DIALOGUE
307
However, as the system actually did receive some meaning representation, it should preferably tell the user what it did receive and that this was not appropriate in the task context. This is done by Waxholm and the Danish Dialogue System, for example. For instance, if the Danish Dialogue System has understood that the user wants to fly from Aalborg to Karup, it will tell the user that there is no flight connection between these two airports. Another approach is taken in RailTel/ARISE. These systems would take the user's " L o n d o n " to indicate a change to the point of departure or arrival (see below, this section). Verbmobil uses a statistical technique to perform a form of constraint relaxation in case of a contextually inconsistent user input dialogue act (cr. 3.5.6). A core problem in repair meta-communication design is that the user input that elicits system-initiated repair may have many different causes. The dialogue manager often has difficulty diagnosing the actual cause. The closer the dialogue manager can get to correctly inferring the cause, the more informative repair meta-communication it can produce, and the more likely it becomes that the user will provide comprehensible and relevant input in the next turn.
3.6.2.2
System-initiated
clarification
meta-communication
System-initiated clarification meta-communication is needed whenever the system has reason to believe that it actually did understand the user's meaning which, however, left some kind of uncertainty as to what the system should produce in response. Such cases include: 9 A representation of the user's meaning arrived with a note from the speech and/or language processing layers that they did not have any strong confidence in the correctness of what was passed on to the dialogue manager. The best approach for the dialogue manager to take in such cases is probably to get the user to input the meaning again, rather than to continue the dialogue on the basis of dubious information, which may easily lead to a need for more substantial meta-communication later on. Alternatively, as the system actually did receive some meaning representation, it might instead tell the user what it received and ask for the user's confirmation. 9 A representation of the user's meaning arrived which was either inherently inconsistent or inconsistent with previous user input. In cases of inherent inconsistency which the system judges on the basis of its own domain representation, the system could make the possibilities clear to the user and ask which possibility the user prefers, for instance
308
N.O. BERNSEN AND L. DYBKJ,ZER
by pointing out that "Thursday 9th" is not a valid date, but either "Thursday 8th" or "Friday 9th" would be. Cases of inconsistency with previous user input are much more diverse, and different response strategies may have to be used depending on the circumstances. A representation of the user's meaning arrived which was (semantically) ambiguous or underspecified. For instance, the user asks to be connected to Mr. Jack Jones and two gentlemen with that name happen to work in the organization; or the user wants to depart at "10 o'clock," which could be either a.m. or p.m. In such cases, the system must ask the user for information that can help resolve the ambiguity. The more precisely this can be done, the better. For instance, if the system believes that the user said either ~'Hamburg" or "Hanover," it should tell the user just that instead of broadly asking the user to repeat. Experience indicates that it is dangerous for the system to try to resolve ambiguities on its own by selecting what the system (i.e. the designer at design-time) feels is generally the most likely interpretation. The designer may think, for instance, that people are more likely to go on a flight at 10 a.m. than at 10 p.m. and may therefore assign the default interpretation "10 a.m." to users' "10 o'clock." If this approach of interpretation by default is followed, it is advisable to ask the user explicitly for verification through a "yes/no" feedback question (see 3.6.6). Although user meaning inconsistency and (semantic) ambiguity are probably the most common and currently relevant illustrations of the need for system clarification meta-communication, others are possible, such as when the user provides the system with an irresolvable anaphor. In this case, the system should make the possible referents clear to the user and ask which of them the user has in mind. As the above examples illustrate, system-initiated clarification metacommunication is often a "must" in dialogue manager design. In general, the design of system clarification meta-communication tends to be difficult, and the developer should be prepared to spend considerable effort on reducing the amount of system clarification meta-communication needed in the application. This is done by controlling the user's input and by providing cooperative system output. However, as so often is the case in systems design, this piece of advice should be counter-balanced by another. Speaking generally, users tend to lose attention very quickly when the system speaks. It is therefore no solution to let the system instruct the user at length on what it really means, or wants, on every occasion where there is a risk that the user might go on to say something which is ambiguous or (contextually) inconsistent. In other words, careful prevention of user
FROM SINGLE WORD TO NATURAL DIALOGUE
309
behavior which requires system-initiated clarification meta-communication should be complemented by careful system clarification meta-communication design. One point worth noting is that, for a large class of systeminitiated clarifications, yes/no questions can be used.
3.6.2.3
User-initiated
repair
meta-communication
User-initiated repair meta-communication is needed whenever the system has demonstrated to the user that is has misunderstood the user's intended meaning. It also sometimes happens that users change their minds during the dialogue, whereupon they have to go through the same procedures as when they have been misunderstood by the system. In such cases, the user must make clear to the system what the right input is. Finally, users sometimes fail to hear or understand what the system just said. In this case they have to ask the system to repeat, just as when the system fails to get what the user just said. These three (or two) kinds of user repair metacommunication are mandatory in many systems. User-initiated repair meta-communication can be designed in several different ways:
9 Uncontrolled repair input. Ideally, we would like the users just to speak freely whenever they have been misunderstood by the system, changed their minds with respect to what to ask or tell the system, or failed to get what the system just said. Some systems do that, such as Waxholm, but with varying success, the problem being that users may initiate repair in very many different ways, from "No, Sandhamn?" to "Wait a minute. I didn't say that. I said Sandhamn?" 9Repair keywords." Other systems require the user to use specifically designed keywords, again with varying success. In the Danish Dialogue System, users are asked to use the keyword "change" whenever they have been misunderstood by the system or changed their minds, and to use the keyword "repeat" whenever they failed to get what the system just said. Keywords are simpler for the system to handle than unrestricted user speech. The problem is that users sometimes fail to remember the keywords they are supposed to use. The more keywords users have to remember, the higher the risk that they forget them. For walk-up-and-use systems, 2 - 3 keywords seems to be the maximum users can be expected to remember. 9Erasing." A third approach is used in RailTel/ARISE. This approach is similar to using an eraser: one erases what was there and writes something new in its place. For instance, if the system gets "Frankfurt to Hanover" instead of ~'Frankfurt to Hamburg," the user simply has to repeat "Frankfurt to Hamburg" until the system has received the
310
N. O. BERNSEN AND L. DYBKJ,,ZER
message. No specific repair meta-communication keywords or metadialogues are needed. The system is continuously prepared to revise its representation of the user's input based on the user's latest utterance. This solution may work well for low-complexity tasks, but it will not work for tasks involving selective input prediction (see 3.5.4) and may be difficult to keep track of in high-complexity tasks.
3.6.2.4 User-initiated clarification meta-communication Userinitiated clarification meta-communication is probably the most difficult challenge for the meta-communication designer. Just like the user, the system may output, or appear to the user to output, inconsistent or ambiguous utterances, or use terms which the user is not familiar with. In h u m a n - h u m a n conversation, these problems are easily addressed by asking questions such as: "What do you mean by green departure?" or "Do you mean scheduled arrival time or expected arrival time?" Unfortunately, most current SLDSs are not being designed to handle such questions at all. The reasons are (a) that this is difficult to do and, often more importantly, (b) that the system developers have not discovered such potential problems in the first place. If they had, they might have tried to avoid them in their design of the system's dialogue behavior, i.e. through user input control. Thus, they would have made the system explain the notion of a green departure before the user is likely to ask what it is, and they would have made the system explicitly announce when it is speaking about scheduled arrivals and when it is speaking about expected arrivals. In general, this is one possible strategy to follow by the dialogue manager developer: to remove in advance all possible ambiguities, inconsistencies, and terms unknown to users, rather than to try to make the system handle questions from users about these things. We have developed a tool in support of cooperative system dialogue design [35, ~'~'~t'.disc2.dk]. Part of the purpose of this tool is to avoid situations in which users feel compelled to initiate clarification meta-communication. There is an obvious alternative to the strategy recommended above of generally trying to prevent the occurrence of user-initiated clarification meta-communication. The alternative is to strengthen the system's ability to handle user-initiated clarification meta-communication. The nature of the task is an important factor in determining which strategy to follow or emphasize. Consider, for instance, users inquiring about some sort of "Yellow Pages" commodity, such as electric guitars or used cars. Both domains are relatively complex. In addition, the inquiring users are likely to differ widely in their knowledge of electric guitars or cars. A flight ticket reservation system may be able to address its domain almost without using terms that are unknown to its users, whoever these may be. Not so with a
FROM SINGLE WORD TO NATURAL DIALOGUE
311
used cars information system. As soon as the system mentions ABS brakes, racing tyres, or split back seats, some users will be wondering what the system is talking about. In other words, there seems to be a large class of potential SLDSs which can hardly start talking before they appear to speak gibberish to some of their intended users. In such cases, the dialogue manager developers had better prepare for significant user-initiated clarification meta-communication. It is no practical option for the system to explain all the domain terms it is using as it goes along. This would be intolerable for users who are knowledgeable about the domain in question.
3.6.2.5 Summary
To summarize, the dialogue manager is several steps removed from direct contact with the user. As a result, the dialogue manager may fail to get the user's meaning or may get it wrong. Therefore, both the system and the user need to be able to initiate repair metacommunication. Even at low levels of task complexity, users are able to express themselves in ways that are inconsistent or ambiguous. The system needs clarification meta-communication to handle those user utterances. In some tasks, user clarification meta-communication should be prevented rather than allowed. In other tasks, user clarification meta-communication plays a large role in the communication between user and system.
3.6.3
Other Forms of Communication
Domain communication including feedback (see 3.6.6) and metacommunication are not the only forms of communication that may take place between an SLDS and its users. Thus, the domain-independent opening of the dialogue by some form of greeting is neither domain communication nor meta-communication. The same applies to the closing of the dialogue (see 3.6.7). These formalities may also be used in the opening and closing of sub-dialogues. Another example is system time-out questions, such as "Are you still there?," which may be used when the user has not provided input within a certain time limit. If the SLDS's task-delimitation is not entirely natural and intuitive to users (cf. 3.5.2.1), users are likely to sometimes step outside the system's unexpectedly limited conception of the task. By the system's definition, the communication then ceases to be domain communication. For some tasks, users' out-of-domain communication may happen too often for comfort for the dialogue manager developer, who may therefore want to do something about it. Thus, Waxholm is sometimes able to discover that the user's input meaning is outside the domain handled by the system. This is a relatively sophisticated thing to do because the system must be able to understand outof-domain terms. Still, this approach may be worth considering in cases
312
N. O. BERNSEN AND L. DYBKJ,ZER
where users may have reason to expect that the system is able to handle certain sub-tasks which the system is actually unable to deal with. When the users address those sub-tasks, the system will tell them that, unfortunately, it cannot help them. In the Verbmobil meeting scheduling task, users are prone to produce reasons for their unavailability on certain dates or times. Verbmobil, however, although being unable to understand such reasons, nevertheless classifies them and represents them in the topic history (see 3.7.1).
3.6.4
Expression of Meaning
Once the system has decided what to say to the user, this meaning representation must be turned into an appropriately expressed output utterance. In many cases, this is being done directly by the dialogue manager. Having done its internal processing jobs, the dialogue manager may take one of the following approaches, among others:
3.6.4. 1 Pre-recorded utterances The dialogue manager selects a stored audio utterance and causes it to be played to the user by sending a message to the player. 3.6.4.2
Concatenation of pre-recorded words and phrases
The dialogue manager concatenates the output utterance from stored audio expressions or phrases and causes it to be played to the user by sending a message to the player.
3.6.4.3
Filling in a template used by a synthesizer
The dialogue
manager selects or fills an output sentence template and causes it to be synthesized to the user.
3.6.4.4 Producing m e a n i n g A more sophisticated approach is to have the dialogue manager produce the what, or the meaning, of the intended output and then have the output language layer determine the how, or the form of words to use, in the output. In this approach, the how is often co-determined by accompanying constraints from the dialogue manager's control and context layers, such as that the output should be a question marked by rising pitch at the end of the spoken utterance.
3.6.4.5 Summary The first two options are closely related and are both used in, e.g., the Danish Dialogue System. Waxholm and the DaimlerChrysler dialogue manager use the third option. This option is compatible with relatively advanced use of control layer information for determining
FROM SINGLE WORD TO NATURAL DIALOGUE
313
the prosody of the spoken output, for example. This can also be done in the first approach but is difficult to do in the second approach because of the difficulty of controlling intonation in concatenated pre-recorded speech.
3.6.5 Error Loops and Graceful Degradation An important issue for consideration by the dialogue management developer is the possibility that the user simply repeats the utterance which caused the system to initiate repair meta-communication. The system may have already asked the user to speak louder or to speak more distinctly but, in many such cases, the system will be in exactly the same uncomprehending situation as before. The system may try once more to get out of this potentially infinite loop but, evidently, this cannot go on forever. In such cases, the system might either choose to fall back on a human operator or close the dialogue. To avoid that, a better strategy is in many cases for the system to attempt to carry on by changing the level of interaction into a simpler one, thereby creating a "graceful degradation" of the (domain or meta-) communication with the user [36]. Depending on the problem at hand and the sophistication of the dialogue manager, this can be done in many different ways, including:
3.6.5. 1 Focused questions The user may be asked focused questions one at a time instead of being allowed to continue to provide one-shot input which may be too lengthy or otherwise too complex for the system to understand. For instance, the system goes from saying "Which information do you need?" to saying "From where do you want to travel?" 9Asking for rephrasing: The user may be asked to rephrase the input or to express it more briefly, for instance when the user's answer to a focused question is still not being understood. 9Asking for a complete sentence: The user may be asked to produce a complete sentence rather than grammatically incomplete input, as in Waxholm. 9 Yes~no questions: The user may be asked to answer a crucial question by "yes" or "no." 9Spelling: The user may be asked to spell a crucial word, such as a person name or a destination. It is important to note that the levels of interaction/graceful degradation approach can be used not only in the attempt to get out of error loops but also in combination with system-initiated clarification meta-communication (cf. 3.6.2.2). So, to generalize, whenever the system is uncertain about the
314
N. O. BERNSEN AND L. DYBKJAER
user's meaning, graceful degradation may be considered. The DaimlerChrysler dialogue manager standardly accepts three repetitions of a failed user turn before applying the graceful degradation approach. This seems reasonable.
3.6.6
Feedback
System feedback to users is essential to successful dialogue management. In order to be clear about what system feedback involves, it is convenient to distinguish between two kinds of feedback, information feedback and process
feedback.
3.6.6. 1 Information feedback The user must have the opportunity to verify that the system has understood the user's input correctly. In general, the user should receive feedback on each piece of information which has been input to the system. The feedback needs not be communicated through speech. Waxholm, for instance, provides a textual representation on the screen of what the system has recognized as well as of the system's output response. The important thing is that the user can perceive the feedback and verify if what the system did was what the user intended the system to do by providing a certain input. So the system's feedback may consist in presenting a particular map on the screen, or a table packed with information of some kind or other, or in playing a certain voice mail which it believes that the user has asked for. In many cases, however, the feedback will be speech produced by the system. Imagine the following dialogue: Dialogue 1 S l: U l: $2: U2: $3:
"ADAP Travels, can ! help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "5.35 a.m." "Thank you. Goodbye." "Goodbye."
Superficially, this conversation looks OK. However, the problem is that the system mis-recognized "Hamburg" as "Hanover"! Dialogue 1, therefore, is a transaction failure. Moreover, the transaction failure is in this case a result of sloppy dialogue design. A better solution would be:
Dialogue 2 S l: U l:
"ADAP Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?"
FROM SINGLE WORD TO NATURAL DIALOGUE
$2: U2: $3"
315
"The first train from Frankfurt to Hanover on 3rd May 1999 leaves at 5.35 a.m." [Initiates repair meta-communication.]
The system feedback in the example above is relatively straightforward to design. One thing to consider is whether the system should repeat some or all of its core information to the user twice, in this case by saying "... 5.35 a.m., 5.35 a.m." However, the feedback strategy in Dialogue 2 cannot always be trusted. What may happen is the following transaction failure:
Dialogue 3 SI" UI" $2" U2: $3" U3:
" A D A P Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "The first train from Frankfurt to Hanover on 3rd May 1998 leaves at 5.35 a.m." "Thank you." "Do you need any other information?"
In Dialogue 3, the user simply ignores the system's feedback and happily walks away with what is likely to be false information, just as in Dialogue 1. If the users turn out to be too prone to make the error illustrated in Dialogue 3, the more burdensome strategy shown in Dialogue 4 might be considered:
Dialogue 4 SI" UI"
$2" U2" $3"
" A D A P Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "You want to go from Frankfurt to Hanover tomorrow morning?" [Initiates repair meta-communication.]
Experience has shown that the strategy illustrated in Dialogue 4 is more robust than the one in Dialogue 3 [7]. The price to be paid for adopting the Dialogue 4 strategy is that the user and the system have to use more dialogue turns to solve the task. The Dialogue 4 strategy can be considered an early step in graceful degradation (see 3.6.5). The four dialogues above may suffice to illustrate the subtleties that dialogue designers have to face. The amount and nature of the feedback the system should give to the user also depends on factors such as the cost and risk involved in the user-system transaction. Feedback on important bank
316
N. O. BERNSEN AND L. DYBKJAER
transfers or costly journeys is obviously more critical than feedback on which email the system should be reading to the user next. Current opinion probably is that the dialogue manager developer should prefer the safer among the two most relevant feedback options. Even travel information, if the user gets it wrong, can have serious consequences for that user. For important transactions, an additional safeguard is to give the user a full summary of the agreed transaction at the end of the dialogue, preceded by a request that the user listens to it carefully. If this request is not there, the user who has already ignored crucial feedback once, may do so again. The additional challenge for the dialogue designer in this case is, of course, to decide what the system should do if the user discovers the error only when listening to the summarizing feedback. One solution is that the system goes through the core information item by item asking yes/no questions of the user until the error(s) have been found and corrected, followed by summarizing feedback once again.
3.6.6.2 Process feedback SLDS dialogue manager developers may also consider to provide process feedback. Process feedback is meant to keep the user informed that the system is "still in the loop," i.e., that it has not gone down but is busy processing information. Otherwise, the user may, for instance, believe that the system has crashed and decide to hang up, wonder what is going on and start asking questions, or believe that the system is waiting to receive information and start inputting information which the system does not want to have. All of these user initiatives are, or can be, serious for the smooth proceeding of the dialogue. Process feedback in SLDSs is still at an early stage. It is quite possible for today's dialogue manager designers to come up with new, ingenious ways of providing the required feedback on what the system is up to when it does not speak to the user and is not waiting for the user to speak. The best process feedback need not be spoken words or phrases but, perhaps, grunts or uhm's, tones, melodies, or appropriate "earcons." Waxholm tells the user, for instance, "I am looking for boats to Sandhamn," thereby combining information feedback and process feedback. In addition, Waxholm uses its screen to tell the user to wait while the system is working.
3.6.7 Closing the Dialogue Depending on the task, the system's closing of the dialogue may be either a trivial matter, an unpleasant necessity, or a stage to gain increased efficiency of user-system interaction. Closing the dialogue by saying something like "Thank you. Good bye." is a trivial matter when the task has been solved and the user does not need to
FROM SINGLE WORD TO NATURAL DIALOGUE
317
continue the interaction with the system. Users often hang up without waiting for the system's farewell. In some cases, however, when the user has solved a task, the dialogue manager should be prepared for the possibility that the user may want to solve another task without interrupting the dialogue. This may warrant asking the user if the user wants to solve another task. Only if the user answers in the negative should the system close the dialogue. Closing the dialogue is a dire necessity when the system has spent its bag of tricks to overcome repeated error loops (see 3.6.5) and failed, or when the system hands over the dialogue to a human operator. In the former case, the system might ask the user to try again.
3.7
3. 7. 1
History, Users, I m p l e m e n t a t i o n
Histories
As soon as task complexity in terms of information volume exceeds one piece of information, the dialogue manager may have to keep track of the history of the interaction. Dialogue history is a term which covers a number of different types of dialogue records which share the function of incrementally building a dialogue context for the dialogue manager to use or put at the disposal of the language and speech layers (see 3.4.2, 3.5.6). Note that a dialogue history is not a log file of the interaction but a dedicated representation serving some dialogue management purpose. Note also that a dialogue history may be a record of some aspect of the entire (past) dialogue or it may be a record only of part of the dialogue, such as a record which only preserves the two most recent dialogue turns. In principle, a dialogue history may even be a record of several dialogues whether or not separated by hang-ups. This may be useful for building performance histories (see below) and might be useful for other purposes as well. It is useful to distinguish between several different types of dialogue history.
3.7. 1.1 Task h i s t o r y Most applications need a task history, i.e., a record of which parts of the task have been completed so far. The task history enables the system to: 9 focus its output to the user on the sub-tasks which remain to be completed 9 avoid redundant interaction 9 have a global focus (cf. 3.5.5) 9 enable the user to selectively correct what the system has misunderstood without having to start all over with the task.
318
N. O. BERNSEN AND L. DYBKJA~R
The task history does not have to preserve any other information about the preceding dialogue, such as how the user expressed certain things, or in which order the sub-tasks were resolved. If an output screen is available, the task history may be displayed to the user. If the user modifies some task parameter, such as altering the departure time, it becomes necessary to remove all dependent constraints from the task history.
3.7. 1.2 Topic history A topic history is in principle more complex than a task history. It is a record of the topics that have come up so far during the dialogue and possibly in which order they have come up. Even low-complexity systems can benefit from partial or full topic histories, for instance for detecting when a miscommunication loop has occurred (cf. 3.6.5) which requires the system to change its dialogue strategy, or for allowing users to do input repair arbitrarily far back into the preceding dialogue. Another thing a topic history does is to build a context representation during dialogue, which can be much more detailed than the context built through the task history. In spoken translation systems, such as Verbmobil, the context representation provided by the topic history is necessary to constrain the system's translation task.
3. 7. 1.3 Linguistic history A linguistic history builds yet another kind of context for what is currently happening in the dialogue. The linguistic history preserves the actual linguistic utterances themselves (the surface language) and their order, and is used for advanced linguistic processing purposes (see 3.5.6). Preserving the linguistic history helps the system interpret certain expressions in the user's current input, such as coreferences. For instance, if the user says: "I cannot come for a meeting on the Monday," then the system may have to go back through one or more of the user's previous utterances to find out which date "the Monday" is. The Daimler-Chrysler dialogue manager and Verbmobil, for instance, use linguistic history for co-reference resolution. Compared to the task history and the topic history, a linguistic history is a relatively sophisticated thing to include into one's dialogue manager at present. 3.7. 1.4 Performance history A petformance history is rather different from any of the above histories. The system would build a performance history in order to keep track of, or spot relevant phenomena in, the users' behavior during dialogue. So a performance history is not about the task itself but about how the user handles the task in dialogue with the system. For instance, if the system has already had to resolve several miscommunication loops during dialogue with a particular user, it might be advisable to connect that user with a human operator rather than continue the agony.
FROM SINGLE WORD TO NATURAL DIALOGUE
319
One way or another, performance histories contribute to building models of users, whether during a single dialogue or during a series of dialogues with a particular user.
3.7. 1.5 S u m m a r y Future systems solving high-complexity tasks are likely to include a task history, a topic history, and a linguistic history, as is the case in Verbmobil. For increased usability and adaptivity, they may need a performance history as well.
3.7.2
Novice and Expert Users, User Groups
The discussion above has focused on the central importance of the task to the dialogue manager developer. However, developers also have to take a close look at the intended users of the application as part of designing the dialogue manager. An important issue common to many different dialogue management tasks is the difference between novice and expert users. In most cases, this is of course a difference continuum, rather than an either/or matter. Furthermore, it may sometimes be important to the dialogue manager developer that there are, in fact, two different distinctions between novice and expert users. In particular, someone may be an expert in the domain of the application but a novice in using the system itself. Depending on for which of four user groups the system is to be developed (system expert/domain expert, system expert/domain novice, system novice/domain expert, system novice/domain novice), the dialogue manager may have to be designed in different ways.
3.7.2. 1 Domain and system experts If the target user group is domain and system experts only, the developer may be able to impose strict task performance order, a relatively large number of mandatory command keywords, etc., and support use of the system through written instructions, all of which makes designing the dialogue manager much easier. 3. 7.2.2 System novices If the target group is walk-up-and-use users who can be expected to be novices in using the system, a much more usertailored design is required. 3. 7.2.3 Domain and s y s t e m novices The need for elaborate, usertailored design increases even further if the system novices are also domain novices, so that any domain technicality has either to be removed or explained at an appropriate point during dialogue. For instance, even though virtually every user comes close to being a domain expert in travel
320
N.O. BERNSEN AND L. DYBKJAER
timetable information, many users do not know what a "green departure" is and therefore have to be told.
3. 7.2.4 Other user g r o u p s Depending on the task and the domain, the dialogue manager developer(s) may have to consider user groups other than novices and experts, such as the visually impaired, users speaking different languages, or users whose dialects or accents create particular problems of recognition and understanding. In the case of dialects or accents, performance history information might suggest that the dialogue manager makes use of the graceful degradation approach (cf. 3.6.5). 3.7.2.5 Mixing user g r o u p s The "downside" of doing elaborate dialogue design for walk-up-and-use users can be that (system) expert users rightly experience that their interaction with the system becomes less efficient than it might have been had the system included special shortcuts for expert interaction. Given the relative simplicity of current SLDSs, users may quickly become (system) experts, which means that the short-cut issue is a very real one for the dialogue manager developer to consider. The Danish Dialogue System, for instance, allows (system) expert users to bypass the system's introduction and avoid getting definitions of green departures and the like. Waxholm allows its users to achieve the same thing through its barge-in button. The acceptance of unrestricted user input in RailTel/ARISE means that experienced users are able to succinctly provide all the necessary information in one utterance. Novice users who may be less familiar with the system's exact information requirements may provide some of the information needed and be guided by the system in order to provide the remaining information.
3.7.3
Other Relevant User Properties
If the task to be solved by the dialogue manager is above a certain (low) level of complexity, the dialogue manager designer is likely to need real data from user interactions with a real or simulated system in order to get the design right at design-time. Important phenomena to look for in this data include:
3.7.3. 1 S t a n d a r d goals What are the users' standard goal(s) in the task domain? If the users tend to have specific standard goals they want to achieve in dialogue with the system, there is a problem if the system is only being designed to help achieve some, but not all, of these goals. Strict user input control (see 3.5.2) may be a solution--but do not count on it to work in all possible circumstances! Deep-seated user goals can be difficult or
FROM SINGLE WORD TO NATURAL DIALOGUE
321
impossible to control. Of course, another solution is to increase the task and domain coverage of the system.
3. 7.3.2 User beliefs Do the users tend to demonstrate that they have specific beliefs about the task and the domain, which may create communication problems? It does not matter whether these user beliefs are true or false. If they tend to be significantly present, they must be addressed in the way the dialogue is being designed. 3.7.3.3
User preferences Do the users tend to have specific
preferences which should be taken into account when designing the dialogue? These may be preferences with respect to, for instance, dialogue sub-task order.
3.7.3.4 Cognitive loads Will the dialogue, as designed, tend to impose any strong cognitive loads on users during task performance? If this is the case, the design may have to be changed in case the cognitive load makes the users behave in undesirable ways during dialogue. One way to increase users' cognitive load is to ask them to remember to use specific keywords in their interaction with the system: another is to use a feedback strategy which is not sufficiently heavy-handed so that users need to concentrate harder than they are used to doing in order not to risk ignoring the feedback.
3. 7.3.5 Response packages Other cognitive properties of users that the dialogue manager developer should be aware of include the response package phenomenon. For instance, users seem to store some pieces of information together, such as "from A to B.'" Asking them/1"ore where they want to go therefore tends to elicit the entire response package. If this is the case, then the dialogue manager should make sure that the user input prediction enables the speech and language layers to process the entire response package. 3. 7.4
Implementation Issues
The issue of dialogue management can be addressed at several different levels of abstraction. In this chapter we have largely ignored low-level implementation issues such as programming languages, hardware platforms, software platforms, generic software architecture, database formats, query languages, data structures which can be used in the different system modules, etc. Generic software architectures for dialogue management are still at an early stage, and low-level implementation issues can be dealt with in different
322
N.O. BERNSEN AND L. DYBKJ,ZER
ways with little to distinguish between them in terms of efficiency, adequacy, etc. Good development tools appear to be more relevant at this point. A survey of existing dialogue management tools is provided in [37].
3.7.4. 1 Architecture and modularity There is no standard architecture for dialogue managers, their modularity, or the information flow between modules. Current dialogue manager architectures differ with respect to their degree of domain and task independence, among many other things. The functionality reviewed above may be implemented in any number of modules, the modularity may be completely domain and task dependent or relatively domain and task independent, and the dialogue manager may be directly communicating with virtually every other module in an SLDS or it may itself be a module which communicates only indirectly with other modules through a central processing module. As to the individual modules, Finite State Machines may be used for dialogue interaction modeling, and semantic frames may be used for task modeling. However, other approaches are possible and the ones just mentioned are among the simplest approaches.
3.7.4.2
Main task of the dialogue manager
I f there is a central
task which characterizes the dialogue manager as a manager, it is the task of deciding how to produce appropriate output to the user in view of the dialogue context and the user's most recent input as received from the speech and language layers. Basically, what the dialogue manager does in order to interpret user input and produce appropriate output to the user is to: 9 use the knowledge of the current dialogue context and local and global focus of attention it may possess to: map from the semantically significant units in the user's most recent input (if any), as conveyed by the speech and language layers, onto the sub-task(s) (if any) addressed by the user - analyze the user's specific sub-task contribution(s) (if any) 9 use the user's sub-task contribution to: execute a series of preparatory actions (consistency checking, input verification, input completion, history checking, database retrieval, etc.) usually leading to: the generation of output to the user, either by the dialogue manager itself or through output language and speech layers. -
-
-
The dialogue management activities just described were discussed in sections 3.5.5-3.6.7 and 3.7.2-3.7.3 above. The analysis of the user's
FROM SINGLE WORD TO NATURAL DIALOGUE
323
specific sub-task contribution is sometimes called "dialogue parsing" and may involve constraints from most of the elements in the speech input, language input, context, and control layers in Fig. 1. In order to execute one or more actions that will eventually lead to the generation of output to the user, the dialogue manager may use, for example, an AI dynamic planning approach as in Verbmobil, a Finite State Machine for dialogue parsing as in Verbmobil, an Augmented Transition Network as in Waxholm, or, rather similarly, decide to follow a particular branch in a node-and-arc dialogue graph as in the Danish Dialogue System. As the dialogue manager generates its output to the user, it must also: 9 change or update its representation of the current dialogue context; and 9 generate whatever constraint-based support it may provide to the speech and language layers. These dialogue management activities were described in sections 3.4.2, 3.5.2, 3.5.4, and 3.7.1 above. At a high level of abstraction, what the dialogue manager has to do thus is to apply sets of decision-action rules, possibly complemented by statistical techniques, to get from (context+ user input) to (preparatory actions+ output generation + context revision + speech and language layer support). For simple tasks, this may reduce to the execution of a transformation from, e.g., (user input keywords from the speech layer) to (minimum preparatory actions + dialogue manager-based output generation including repair metacommunication) without the use of (input or output) language layers, context representations, and speech and language layer support. Different dialogue managers might be represented in terms of which increasedcomplexity properties they add to this simple model of a dialogue manager. Waxholm, for instance, adds semantic input parsing; a statistical topic spotter ranging over input keywords; out-of-domain input spotting; twolevel dialogue segmentation into topics and their individual sub-structures; preparatory actions, such as dialogue parsing including consultation of the topic history, and database operations including temporal inferencing; userinitiated repair meta-communication; a three-phased dialogue structure of introduction, main phase, and closing; an output speech layer; advanced multimodal output; and topic history updating.
3. 7. 4.3 O r d e r o f o u t p u t to t h e u s e r As for the generation of output to the user, a plausible default priority ordering could be: 1. if system-initiated repair or clarification meta-communication is needed, then the system should take the initiative and produce it as a matter of priority
324
N.O. BERNSEN AND L. DYBKJ,ZER
2. even if 1 is not the case, the user may have initiated metacommunication. If so, the system should respond to it 3. if neither 1 nor 2 is the case, the system should respond to any contribution to domain communication that the user may have made; and 4. then the system should take the domain initiative. In other words, in communication with the user, meta-communication has priority over domain communication; system-initiated meta-communication has priority over user-initiated meta-communication; user domain contributions have priority over the system's taking the next step in domain communication. Note that feedback from the system may be involved at all levels (cf. 3.6.6). Note also that the above default priority ordering 1-4 is not an ordering of the system states involved. As far as efficient processing by the dialogue manager is concerned, the most efficient ordering seems to be to start from the default assumption that the user has made a contribution to the domain communication. Only if this is not the case should 1, 2, and 4 above be considered.
3.7.4.4 Task and domain independence The fact that dialogue management, as considered above, is task-oriented, does not preclude the development of (relatively) task independent and domain independent dialogue managers. Task and domain independence is always independence in some respect or other, and it is important to specify that respect (or those respects) in order to state a precise claim. Even then, the domain or task independence is likely to be limited or relative. For instance, a dialogue manager may be task independent with respect to some, possibly large, class of information retrieval tasks but may not be easily adapted to all kinds of information retrieval tasks, or to negotiation tasks. Dialogue managers with modular-architecture and domain- and task-independence are highly desirable, for several reasons (cf. 3.3.3). For instance, such dialogue managers may integrate a selection of the dialogue management techniques described above while keeping the task model description and the dialogue model description as separate modules. These dialogue managers are likely to work for all tasks and domains for which this particular combination of dialogue management techniques is appropriate. The Daimler-Chrysler dialogue manager and the RailTel/ARISE dialogue manager are cases in point. Much more could be done, however, to build increasingly general dialogue managers. To mention just a few examples, it would be extremely useful to have access to a generalized meta-communication dialogue manager component, or to a domain-independent typology of dialogue acts.
FROM SINGLE WORD TO NATURAL DIALOGUE
4.
325
Conclusion
Spoken language dialogue systems represent the peak of achievement in speech technologies in the 20th century and appear set to form the basis for the increasingly natural interactive systems to follow in the coming decades. This chapter has presented a first general model of the complex tasks performed by dialogue managers in state-of-the-art spoken language dialogue systems. The model is a generalization of the theory of dialogue management in [3] and aims to support best practice in spoken language dialogue systems development and evaluation. To provide adequate context, dialogue management has been situated in the context of the processing performed by the spoken language dialogue system as a whole. So far, the dialogue management model presented here has been used to systematically generate a full set of criteria for dialogue manager evaluation. Preliminary results are presented in [38].
ACKNOWLEDGMENTS The work was carried out in the EU Esprit Long-Term Concerted Action DISC, Grant No. 24823, on Spoken Language Dialogue Systems and Components: Best practice in development and evaluation [www.disc2.dk]. The support is gratefully acknowledged, We would also like to thank the DISC partners Jan van Kuppevelt and Uli Heid who also analyzed some of the exemplar dialogue managers (see 3.1). This generated heated theoretical discussions and greatly improved our understanding of the intricacies of natural language processing in SLDSs.
REFERENCES [1] Fatehchand, R. (1960). Machine recognition of spoken words. Advances in Computers 1, 193-229. [2] Sharman, R. (1999). Commercial Viability will Drive Speech Research. Elsnews 8(1), 5. [3] Bernsen, N. O., Dybkja~r, H. and Dybkjeer, L. (1998). Desig, ing hlteractive Speech Systems. From First Ideas to User Testing, Springer Verlag, Berlin. [4] Bossemeyer, R. W. and Schwab, E. C. (1991). Automated alternate billing services at Ameritech: Speech recognition and the human interface. Speech Technology Magazine, 5(3), 24-30. [5] Aust, H., Oerder, M., Seide, F. and Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication 17, 249-262. [6] Peng, J.-C. and Vital, F. (1996). Der sprechende Fahrplan. Output 10. [7] Sturm, J., den Os, E. and Boves, L. (1999). Issues in spoken dialogue systems: experiences with the Dutch ARISE system, Proceedings of ESCA Workshopon Interactive Dialogue in Multi-Modal Systems, Kloster Irsee, Germany, pp. 1-4. [8] DARPA." Speech and Natural Language. Proceedings o['a Workshop. (1989). Morgan Kaufmann, San Mateo, CA. [9] DARPA: Speech and Natural Language. Proceedings o[a Workshophem at Hidden Valley, Pennsylvania. (1990). Morgan Kaufmann, San Mateo, CA.
326
N. O. BERNSEN AND L. DYBKJAER
[10] DARPA." Speech and Natural Language. Proceedings o['a Workshop. (1991). Morgan Kaufmann, San Mateo, CA. [11] DARPA. Proceedings of the Speech amt Natural Language Workshop. (1992). Morgan Kaufmann, San Mateo, CA. [12] Iwanska, E. (1995). Sumnmrv of the IJCAI-95 Workshop on Context in Natural Language Processing, Montreal, Canada. [13] Gasterland, T., Godfrey, P. and Minker, J. (1992). An overview of cooperative answering. Journal o[hTtellige, t hTjbrmation Svstenls, 1, 123-157. [14] Grosz, B. J. and Sidner, C. E. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175-204. [15] Heid, U., van Kuppevelt, J., Chase, L., Paroubek, P. and Lamel, E. (1998). Working paper on natural language understanding and generation current practice. DISC Deliverable D1.4. [16] Baggia, P., Gerbino, E., Giachin, E. and Rullent, C. (1994). Spontaneous speech phenomena in naive-user interactions, in Proceedings o[ TWLT8, 8th T~'ente Workshop on Speech and Language Engineering, Enschede, The Netherlands, pp. 37-45. [17] Waibel, A. (1996). Interactive translation of conversational speech. IEEE Computer, 29(7), 41-48. [18] del Galdo, E. M. and Nielsen, J. (1996). hlter,atio,al User hltelfaces, Wiley, New York. [19] Thomson, D. L. and Wisowaty, J. L. (1999). User confusion in natural language services, Proceedings o[ ESCA Workshop on hTteractive Dialogue i, Multi-Modal Systems, Kloster Irsee, Germany, pp. 189-196. [20] Wyard, P. J. and Churcher, G. E. (1999). The MUeSLI multimodal 3D retail system, Proceedings of ESCA Workshop on htteractive Dialogue in Multi-Modal Systems, Kloster Irsee, Germany, pp. 17-20. [21] Heisterkamp, P. and McGlashan, S. (1996). Units of dialogue management: an example, Proceedings of lCSLP96, Philadelphia, pp. 200-203. [22] Lamel, L., Bennacef, S., Bonneau-Maynard, H.. Rosset, S. and Gauvain, J. L. (1995). Recent developments in spoken language systems for information retrieval, Proceedings of the ESCA Workshop on Spoken Dialogue Systems. Vigso, Denmark, pp. 17-20. [23] den Os, E., Boves, L., Lamel. L. and Baggia, P. (1999). Overview of the ARISE project. Proceedings o[ Eurospeech, Budapest, pp. 1527-1530. [24] Bub, T. and Schwinn, J. (1996). Verbmobil: the evolution of a complex large speech-tospeech translation system. DFKI GmbH Kaiserslautern. Proceedi, gs of ICSLP96, Philadelphia, pp. 2371-2374. [25] Alexandersson, J., Reithinger, N. and Maier, E. (1997). Insights into the dialogue processing of Verbmobil. Proceedings of the Fifth Co,jerence on Applied Natural Language Processhtg, ANLP 97, Washington, DC, pp. 33-40. [26] Bertenstam, J. Blomberg, M., Carlson, R. et al. (1995). The Waxholm system--a progress report. Proceedings of ESCA Workshop on Spoke, Dialogue S l'stems, Vigso, pp. 81-84. [27] Carlson, R. (1996). The dialog component in the Waxholm system. Proceedings of the Twente Workshop on Language Technology ( T W L T l l ) Dialogue Management in Natural Language Systems, University of Twente, the Netherlands, pp. 209-218. [28] Failenschmid, K., Williams, D., Dybkja~r, L. and Bernsen, N. O. (1999). Draft proposal on best practice methods and procedures in human factors. DISC Deliverable D3. 6. [29] Bernsen, N. O. (1995). Why are analogue graphics and natural language both needed in HCI?, in htteractive Systems." Desig,, Spec(fication, and Ver(fication. Focus on Computer Graphics, ed. F. Paterno, Springer Verlag, Berlin, pp. 235-251. [30] Bernsen, N. O. (1997). Towards a tool for predicting speech functionality. Speech Communication, 23, 181-210.
FROM SINGLE WORD TO NATURAL DIALOGUE
327
[31] Bernsen, N. O. and Luz, S. (1999). SMALTO: speech functionality advisory tool. DISC Deliverable D2.9. [32] Fraser, N. M., Salmon, B., and Thomas, T. (1996). Call routing by name recognition: field trial results for the Operetta(TM) system. IVTTA96, New Jersey. [33] Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal o[ Man-Machine Studies, 34, 527-547. [34] Amalberti, R., Carbonell, N., and Falzon, P. (1993). User representations of computer systems in human-computer speech interaction, htternational Journal of Man-Machine Studies, 38, 547-566. [35] Dybkjaer, L. (1999). CODIAL, a tool in support of cooperative dialogue design. DISC Deliverable D2.8. [36] Heisterkamp, P. (1993). Ambiguity and uncertainty in spoken dialogue, in Proceedings of Eurospeech93, Berlin, pp. 1657-1660. [37] Luz, S. (1999). State-of-the-art survey of dialogue management tools. DISC Deliverable D2.7a. [38] Bernsen, N. O. and Dybkjaer L. (2000). Evaluation of spoken language dialogue systems, in Automatic Spoken Dialogue Systems, ed. S. Luperfoy, MIT Press, Cambridge, MA.
This Page Intentionally Left Blank
Embedded Microprocessors: Evolution, Trends, and Challenges MANFRED SCHLETT Hitachi Europe Dornacherstr. 3 85622 Feldkirchen Germany
Abstract Embedded into nearly all electrical goods, micro-controllers and microprocessors have become almost commodity products in the microelectronics industry. However, unlike desktop computer processors as used in the embedded sector, many different processor architectures and vendors are on the market. The embedded system market itself is very fragmented, with no standardized embedded platform similar to PC motherboards currently available. Besides the traditional embedded control and desktop computer segment, many new classes of processors have been derived to meet the market needs of this embedded world, Due to still impressive technology advances, makers of embedded microprocessor systems are facing new challenges in the design and implementation of new devices. This chapter describes the major changes and trends occurring in the microprocessor area through the years emphasizing embedded microprocessors, and is giving an outlook on " w h a t might happen".
1. 2. 3. 4.
5.
6.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The 32-bit Embedded Marketplace . . . . . . . . . . . . . . . . . . . . . . . . General Microprocessor and Technology Evolution . . . . . . . . . . . . . . . . Basic Processor Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 System Level Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Embedded Controller Systems . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Embedded Processor Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Computer Processor Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 System A p p r o a c h Conclusion . . . . . . . . . . . . . . . . . . . . . . . . Processor Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Architecture Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Core Implementation Methodologies . . . . . . . . . . . . . . . . . . . . . 5.3 RISC Architecture Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Core Function Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Future Performance Gains . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded Processors and Systems . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
329
330 332 337 342 342 344 346 346 347 348 348 351 357 358 361 363 363
Copyright ~i 2000 by Academic Press All rights of reproduction in any form reserved.
330 6.2 6.3
MANFRED SCH LE'I'T
Differentiating Factors and Implementation Trends . . . . . . . . . . . . . System A p p r o a c h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Scalable Systems a n d Classes of E m b e d d e d Processors . . . . . 6.5 E m b e d d e d S o f t w a r e a n d S t a n d a r d i z a t i o n Efforts . . . . . . . The Integration Challenge . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . .
365 369 370 372 374 376 377
Introduction
One of the current buzzwords in the microelectronics arena is the embedded system. After the memory price collapse in the late 1990s and the dominance of Intel in the microprocessor arena, the embedded world has still no dominant vendor and promises high growth rates. It is a market full of new developments and trends but, because there is no dominant vendor at the moment, analysts see a lot of opportunities there. Also, because the embedded market is not as unified as the PC and general computer market, a wide variety of different approaches are currently to be found. Hot topics in this area are embedded microprocessors, upcoming standard operating systems, and the system-on-chip integration challenge. Devices with very high performance but extremely low cost and low power consumption show us the way things will go. In the center of each embedded system sits an embedded microprocessor offering very high computational power. Traditionally, the market was divided into embedded control and desktop sectors, but microprocessor vendors created a lot of new classes of processors to match the requirements of today's applications. In the past five decades, there have been many new trends in the computer and microelectronics community. Looking at articles about microelectronics published 50 years ago, we can see that on one hand a great deal changed, but on the other hand we still use binary logic and switching elements. Today of course, we use more transistors than in the past. We still design microcomputers based on the von Neuman principles, and we still use computers to execute arithmetic algorithms. What has really changed is the extended use of microelectronics: nearly all of today's products are implemented using logic ICs or some form of computer. In the early 1950s, integrated microelectronics were used mainly in "calculation machines" or telecommunications equipment. Now, owing to the incredibly fast-growing computer and microelectronics industry and the immense improvements in enabling technology, the design challenge of today can no longer be compared with the challenges electronic system designers faced 50 years ago. This chapter describes the most important aspects of the changes we have witnessed and provides an overview of current microprocessor design challenges with a focus on the rapidly growing embedded world. Starting
EMBEDDED MICROPROCESSORS
331
with Intel's 4004 device in the 1970s, microprocessors have evolved to become one of the most important driving factors of the industry. Controllers are embedded, for example, into phones, refrigerators, ovens, TV sets, or printers, and are therefore used in nearly all electrical goods. When discussing microcontrollers and microprocessors, we tend to think of the Intel x86 architecture and its competitors--the I B M / M o t o r o l a Power PC, Digital Alpha, Sun Sparc, Hewlett-Packard PA-RISC, or MIPS Technologies MIPS architectures [1-7]. Designed primarily for the desktop computer market, these processors have dominated the scene, with the x86 being the clear winner. During the past two decades, these desktop architectures have been drastically further developed and architects are striving to deliver even more computational power to the desktop [8]. But in concentrating on the desktop, we may be missing the next step in microprocessor design: embedded CPUs. As David Patterson argued, Intel specializes in designing microprocessors for the desktop PC, which in five years may no longer be the most important type of computer. Its successor may be a personal mobile computer that integrates the portable computer with a cellular phone, digital camera, and video game player. Such applications require low-cost, energy-efficient microprocessors, and Intel is far from a leader in that area. [9] The evolution of the embedded microprocessor is the focus of this chapter. The chapter starts with a discussion on the driving forces of the embedded system market. What is happening in that market, and which vendors and processor architectures are market leaders? Before we look more closely at current microprocessor trends, one section illustrates the history of the microprocessor and its evolution during the past 25 years. It is always useful to look to the history first, before trying to estimate future trends. The next section defines the worlds of embedded and computer systems, providing basic definitions of the microprocessor itself and the differences between embedded control, embedded processing, and computer systems. This will allow a better understanding of the individual worlds and the needs driving new embedded concepts. The following section is dedicated to instruction set architectures (ISA), their definition, evolution, implementation, and future. On the basis of these discussions, a more detailed look at embedded processors and systems is being prepared. Can embedded systems be classified the way PCs are? Finally, the last chapter introduces the integration challenge, offering further cost reduction, higher performances, and the future integration capabilities offered by developing new and advanced systems. This chapter cannot mention all the new trends in the embedded system market, but provides at least, a basic overview of the past, current, and future situation and evolution.
332
MANFRED SCHLETT 2.
The 32-bit Embedded Marketplace
A lot of excellent research and development work has never been exploited because it was done too early or too late or was not applicable to market needs. But as it is almost impossible to cover all trends in microprocessor design or to mention all new developments, we should look more closely at market trends driving new embedded processor developments. What is happening in the 32-bit embedded domain? Compared to the workstation and PC computer market, the 32-bit embedded processor market is currently extremely fragmented. Several different processor architectures from more than 100 vendors are competing with each other to be used in new embedded applications (see, e.g., [10]). This fragmentation is perhaps the most important difference from the standard computer market. Hardware and software companies active in this embedded field face a situation which differs substantially from the PC market. For an embedded software vendor it is very difficult and resource intensive to support all architectures and devices on the market. The consequence is that vendors normally focus on a single specific architecture, but even so support is very resource intensive because there is no standard embedded platform comparable to a PC motherboard, with fixed resources and a defined system architecture. This situation is caused by the fact that every embedded application has a completely different system-level architecture as a result of different requirements. This explains the huge number of architectures and devices available on the embedded market. Looking at microprocessors, the main players in today's 32-bit embedded market are the MIPS, A R M , SuperH, PowerPC, 68000, and Sparc architectures [11-13]. In fact, most of these architectures have been developed for the desktop computer market. The MIPS architecture from MIPS Technology, Inc. as well as Sun Microelectronics' Sparc were used initially in high-end workstations before being re-used in the embedded domain. The same is true for Advanced RISC Machines A R M architecture, used in an early so-called Arcor RISC PC. The volume business of the PowerPC - - developed jointly by IBM and M o t o r o l a - - i s still the desktop computer market. Although it has been on the market for quite a few years, Motorola's 68000 architecture is still one of the best-selling 32-bit architectures in the embedded domain, which is even more impressive given that the evolution of the 68000 has now ended. The above-mentioned architectures differ a lot in the way they are marketed. A R M processors, for example, are being licensed as a core to a large number of semiconductor vendors offering Application Specific IC's (ASIC) or standard devices based on the A R M core. The most successful A R M core is the A R M 7 T D M I , the so-called Thumb A R M 7 extension [14].
EMBEDDED MICROPROCESSORS
333
As the ARM cores have been licensed quite widely, the MIPS architecture follows a similar approach but has been licensed only to a limited number of global semiconductor players such as NEC, Toshiba, LSI Logic, IDT, Philips, and Siemens. Hitachi's SuperH architecture has been manufactured so far primarily by Hitachi itself, but recently, the SuperH architecture was licensed and the next generation SH-5 architecture developed jointly by Hitachi and ST Microelectronics. All of the above architectures are leaders in specific market areas. Now, which applications and markets are actually driving the embedded processor developments? Some examples: 9 Video game consoles." In the 1990s, the video game market was one of
the most vital. The main players in that area were SuperH and MIPS architectures. Driven by improving the graphics capabilities, these processors and systems could now nearly compete successfully with high-end graphic workstations. Turley [10] describes the embedded microprocessor market as the main driver of high-volume business in the 32-bit embedded RISC arena. Because of new announcements and alliances we will see also other architectures and vendors quite active in this field in the future. 9 Cellphones." This market shows incredible growth rates and has become
a very important driving factor of the industry. Previously using 8- and 16-bit microcontrollers and a dedicated Digital Signal Processing (DSP) processor which executed mainly audio compression functionality, cellphones of today are switching to 32-bit embedded controllers, one of the leading architectures in this field being the ARM7TDMI. 9 PC companions." The growing mobility in daily business life requires a
new class of personal computer equipment. Computers are getting smaller and smaller and could be described as mobile equipment, but a complete PC requires too many peripherals and is not convenient as a personal organizer to access data quickly. New products, so-called PC companions, which can store personal information and data plus a subset of normal PC data, have been introduced to the market. For example, hand-held or palm-size PCs with a simple PC data synchronization mechanism can help make information available right in our hands. The leading architectures in that market are again MIPS and SuperH but ARM, PowerPC, and 68000 devices are also used. 9 Set-top boxes." Digital set-top boxes are changing our way of using TV sets. Video-on-demand, internet access through the TV network or interactive movies require advanced boxes between conventional TV sets and the network. As entertainment becomes a more and more important factor, set-top boxes are now driving several developments
334
MANFRED SCHLETT
in the semiconductor industry. A typical block diagram of a digital settop box includes many different blocks, making it typical of the ASIC market where numerous architectures and cores are further integrated into complex devices. 9 Internet appliances." The internet, defined as a network enabling the fast exchange of information, could be interpreted as a revolution similar to the computer revolution, which happened to start 50 years ago. The pure physical network with touters and switches initiated the development of new networking products such as Ethernet cards, routers, or servers. But the internet is much more. Services associated with the internet, such as electronic commerce, allow easy and fast access to information, personal electronic communication, or other services, and offer a huge potential for new applications. For example, ensuring secure access to internet banking services or trading over the internet require new access procedures such as fingerprint recognition or cryptographic hardware and software tools. Other examples are products providing easy and simple access to these internet services, such as combinations of phones and internet terminals, telephony over the internet, or mobile equipment. 9 Digital consumer." The digital consumer market, covering products such as photography, audio, or video equipment, has for many years been a huge market requiring embedded controllers and processors. This will also remain a major market driver also in the future. New applications such as mobile MP3 players 1 have been developed and introduced, and offer excellent market potential. Another example is the Digital Versatile Disc (DVD) player. One challenge of that market is to connect all these new devices with PCs, but it will still take some considerable effort to have a simple and standard data exchange mechanism between cameras, players, and PCs. 9 In-car information systems." Partly defined as digital consumer products, car information and entertainment systems are becoming increasingly popular. Requiring the performance of PCs but with very special price and feature conditions such as extremely high reliability, this market offers excellent opportunities for high-end embedded processors. The market leader in this emerging area is the SuperH architecture. 9 Printers, networking, and office equipment." Following the huge potential of internet use, networking applications such as links, switches, and routers are all driving factors of the embedded processor market. In 1MP3 is a compression/decompression algorithm allowing the storage of music tracks and audio, respectively, on relatively low-capacity storage media.
EMBEDDED MICROPROCESSORS
335
conjunction with printers, fax machines, and other office equipment, this market is still full of opportunities. In fact, printers and networking were the first foothold of 32-bit embedded microprocessors [10]. Motorola's Coldfire and Intel's i960 devices [15] are the household names of this market. Industrial control." The 32-bit industrial control market segment has been clearly dominated for years by M otorola's 68000 architecture. In the industrial control segment no extremely high volume product is driving new developments as in the video game console market, but there is a huge number of individual products. As the 68000 family evolution has come to an end, new players and vendors are entering that market. Typical applications are motor control, industrial automation, etc.
Basically, we can identify two categories of products embedded controller and processor market:
driving the
9new emerging products which have not yet been introduced to the market 9 convergent products combining two existing products into a new more advanced and combined product.
For example, in the mobile computing area so-called smartphones and personal communicators are being developed. These products are combinations of a PC companion and a cellphone as indicated in Fig. 1. The first approach in designing such a system is just to combine two individual systems by a simple interface mechanism. The next-generation approach is normally to unify both approaches by using only one processor with peripherals integrating both systems into a single one.
FIG. 1. Future combined PC/PC hand-held/cellphone growth market.
336
MANFRED SCHLETT
In this context, it is also worth discussing the Net Computer (NC) initiative, which illustrates that even global players can face difficulties introducing new technology. In the early 1990s, a great deal of effort was put into replacing the conventional PC by the NC. The NC was initially thought of as an extremely lean computer without a hard disk, running application programs mainly via the internet or servers: in fact, the concept was very close to the server-client concept of the mainframe era running a main computer with many "stupid" clients. The Java concept was very much intended for that purpose. Obviously, whether because of the dramatic price reduction of computers or for other reasons, the NC did not replace the PC. Another market started to become important in the early 1990s. The embedded core market became visible on the horizon, and the first players were ARM and MIPS. As embedded processors were increasingly used in high-volume applications, it became a necessity to integrate the basic microprocessor into an ASIC, mainly to reduce the overall costs further. At the end of the 1990s, it was not only a race between different semiconductor vendors, but also a race between the most popular architectures. Most of the global players in that market entered this business by licensing their processor core technology to other players in the field--for example, SPARC, x86, SuperH, PowerPC - - but new processor cores had also been developed to meet the needs of that market. For example, the Argonaut RISC Core (ARC) or hyperstone's 32-bit RISC/DSP engine had been introduced [16]. As well as all the above-mentioned general trends, most of these cores have been further developed to meet market trends, with the most visible trend being to integrate multimedia or DSP capabilities into the chip. This trend occurred at the same time in the computer processor and embedded processor markets. For example, Intel introduced the x86's M M X extension, MIPS followed with MDMX, SPARC with VIS, ALPHA with MVI, and the embedded processors with the SuperH SH-DSP approach, ARM with the Piccolo concept, hyperstone with the unified RISC/DSP specification, and finally Infineon with the Tricore development (see, e.g., [17]). Awaiting the multimedia decade, new players developed completely new devices intended, for example, for use as multimedia PC peripherals, featuring several modem data streams, graphics acceleration, voice and image (de-)compression and a lot of more integrated capabilities. Unfortunately, the market for these devices has not taken off yet. Products including Chromatic Research's MPACT or Philips' Trimedia never became household names in the computer or embedded market. Alongside all these developments and market trends, the 32-bit embedded market is probably the market with the highest growth rate at the moment,
EMBEDDEDMICROPROCESSORS
337
but yet there is no dominant market leader comparable to the x86 in the computer and PC market. The big questions in the embedded domain are, if standardization occurs as happened in the PC market, and if there will be a processor architecture and vendor dominating this market? Before discussing this subject further, let us have a look at the evolution of microprocessors and technology generally during the past five decades.
3.
General Microprocessor and Technology Evolution
To predict future developments and trends, it makes sense to look to the past and see how it was possible to start with very simple logic and then a few years later be able to integrate several million transistors into a single piece of silicon. In the center of a microelectronic system resides a programmable microcontroller or microprocessor. Such a programmable microcontroller or microprocessor is basically defined as a device executing consecutively a set of arithmetic and other logical operations on a defined set of data. A programmer of such a system can define in which consecutive order these operations are executed by the microprocessor. Another definition is given on Intel's website (www.intel.conl): A microprocessor is an integrated circuit built on a tiny piece of silicon. It contains thousands, or even millions, of transistors, which are interconnected via superfine traces of aluminium. The transistors work together to store and manipulate data so that the microprocessor can perform a wide variety of useful functions. The particular functions a microprocessor performs are dictated by software. For further details see, e.g., [18]. Looking at systems controlled by microprocessors from a very general point of view, there is no difference between a modern computer and the microelectronics control system of a washing machine. If we look at the details, these systems d i f f e r - - o f c o u r s e - - b u t the principal concept is the same. So where does this concept come from? First of all, modern computing could be interpreted as a further development of the calculating machines invented to perform basic mathematical operations such as addition, multiplication, division, or finding prime numbers. Blaise Pascal, for example, invented in 1643 an adding machine called the Pascalene, which was the first mechanical adding machine. Another famous example is Charles Babbage's Difference Engine, dating from 1832. All these developments had been invented to accelerate basic arithmetical calculations. One of the most important steps leading to
338
MANFRED SCHLETT
the computer of today was George Boole's exercise on a "system for symbolic and logical reasoning" which we still use today when designing a 1 GHz processor. Of course, these early computers were not really freely programmable, and they were too big and too slow to be integrated into a washing machine. But the computer technology evolution of the past five decades has made this possible. The same system controlling a washing machine could be reused to run a pocket calculator or a cash-register system. The most important steps toward the microelectronic system of today happened around 1950. These steps were the von Neumann machine, the invention of the transistor, the first logic integrated circuits (IC), and the first commercial computer. The huge majority of processors and microelectronic systems on the market still follow the concept of the "yon Neumann machine" introduced by John von Neumann in 1946 [19, 20]. Many people active in the computer field believe that this term gives too much credit to von Neumann and does not adequately reflect the work of the engineers involved in the development and specification phase. Eckert and Mauchly at the Moore School of the University of Pennsylvania built the world's first electronic general-purpose computer, called ENIAC. In 1944, von Neumann was attracted to that project funded by the US Army. ENIAC was the first programmable computer, which clearly distinguished it from earlier computers. The group wanted to improve that concept further, so yon Neumann helped to develop the idea of storing programs as numbers and wrote a memo proposing a stored-program computer called EDVAC. The names of the engineers were omitted from that memo, so it resulted in the common term "yon Neumann computer." In 1946, Wilkes from Cambridge University visited the Moore School; when he returned to Cambridge he decided to run the EDSAC project. The result was a prototype called Mark-I, which might be called the first operational stored-program computer. Previously, programming was done manually by plugging up cables and setting switches. Data were provided on punched cards. The ENIAC and EDVAC projects developed the idea of storing programs, not just data, as numbers and made the concept concrete. Around the same time, Aiken designed an electromechanical computer called Mark-I at Harvard. The subsequently developed Mark-III and MarkIV computers had separate memories for instructions and data. The term "Harvard architecture" is still being used to describe architectures following that approach. In the von Neumann concept, the memory used for program code and data is unified. Quite often, the white papers for processors emphasize Harvard architecture, but in fact the difference between the von Neumann and Harvard architectures is not as great as is often implied. As
EMBEDDED MICROPROCESSORS
339
instructions and data have to be separated at a certain stage anyway, it is more a question of when they are separated. Several more computer pioneers deserve credit for their contributions. Atanasoff, for example, demonstrated the use of binary arithmetic, and Zuse in Germany also made another important development during the late 1930s and early 1940s. Following these scientific developments, the first commercial computer called UNIVAC I was introduced in 1951. Actually, UNIVAC's predecessor, called BINAC, had been developed again by Eckart and Mauchly. UNIVAC I was sold for $250 000 and with 48 built units it was the first successful commercial computer. The first IBM computer (the 701) followed in 1952. At that time the microprocessor of today had not been invented; other technologies such as vacuum tubes were used to implement the basic logic equations necessary to control the system, and to execute the logical operations. However, the basic concept of modern computers and computing had already been established. For further details and information see, e.g., [21]. The next important development was the invention of the transistor in 1947. The transistor soon displaced the vacuum tube as the basic switching element in digital designs, the first logic IC using transistor technology being fabricated in 1959 at Fairchild and Texas Instruments. But it took another 10 years until the world saw the first microprocessor. In 1971 the Intel Corporation introduced the first microprocessor, the 4004; see, e.g., [22]. The 4004 was the starting point for a race achieving higher processor speeds and higher integration. The 4004 integrated approximately 2300 transistors, and measured 3.0 mm x 4.0 mm, and executed an instruction typically in 10.3 ItS. The 4004 was a 4-bit CPU used in the Busicom calculator. This was a 14digit, floating- and fixed-point, printing calculator featuring memory and an optional square-root function. The calculator used 1 4004 CPU, 2 4002 RAM chips, 4 4001 ROM chips, and 3 4003 shift register chips. The 4004 had 16 instruction types, contained 16 general-purpose 4-bit registers, one 4bit accumulator, a 4-level, 12-bit push-down address stack containing the program counter and 3 return addresses for subroutine nesting, instruction register, decoder, and control logic. Finally some bus and timing logic was required to establish the entire CPU. During the past 30 years, the microprocessor and the enabling process technology have been further developed to an incredible level. The development of microprocessors seems still to follow Moore's law, for further details see, e.g., [23]. Moore's law posits that microprocessor performance defined by the number of transistors on a chip will double
340
MANFRED SCHLETT
every 18 months (see Fig. 2). By the year 2005, further technology advances will result in at least 10 of today's microprocessors fitting onto a single chip--at least, this is the projection of the 1998 update of the International Technology Roadmap for Semiconductors, reported in [24]. For highperformance microprocessors, this consortium projects technology parameters as shown in Table I. This means that Moore's law will be valid for another period of dramatic technological improvements. Intel's 4004 was the first commercially available microprocessor, but just a year later, Intel introduced the 8008 8-bit processor, followed by numerous other 8-bit processors, such as Intel's 8080, Motorola's MC6800, and the Fairchild F8 [25]. These microprocessors were not used in computers as CPUs, but primarily in embedded applications. At that time (the early 1970s), microprocessors were seen as a means of accelerating arithmetic calculations, which explains why, in the early days of microcomputers, the focus of research and development was on improving the raw calculation speed and the arithmetical binary methodology. This difference best illustrates the most important change. Today, we focus on what a computer does instead of ho~r a computer works. Desktop computers such as the Apple II in 1976 or 5 years later the IBM Personal Computer (PC) introduced this change. In the early 1980s, the workstation
1 000 000 c'-
0
1O0 000
v cO 0 0 ,,,,,_, C
10 000
41•entium III
~
1000
~
I.-
100
Pentium II
i486
i386
~ 10
80286
~8086 ~,0 J1971
8085 8080 1976
1981
1986
1991 Year
FIG. 2. Moore's law.
1996
2001
2006
341
EMBEDDED MICROPROCESSORS TABLE I THE VLSI CHIP IN THE YEAR 2005 PROJECTED BY THE 1998 UPDATE OF THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS: SEE, E.G., [24] Minimum feature size of process technology Total number of transistors Number of logic transistors Chip size Clock frequency Number of I / O connectors Number of wiring levels Supply voltage Supply current Power dissipation
0.1 l~m 200 million 40 million 520 mm 2.0-3.5 G H z 4000 7-8 0.9-1.2 V ~ 160 A ~ 160 W
market was established and for at least the next 15 years, the PC and workstation market drove microprocessor developments; see also [26]. This paradigm shift may be the most important one in the history of computer and microelectronic systems. If Ken Olson (CEO of Digital Equipment Corporation, 1977) had seen this shift he probably would not have stated that "There is no reason for any individuals to have a computer in their home."
C O
System
~:3r} O
Chip
O
Core
> ._1
Arithmetic function Logic function Basic transistor technoloqu
Functional logic block developments
ISA optimization, caches, superscalarity,..
ASIC integration and optimization
System-onchip, complete system level integration
Y
Time FIG. 3. Mainstream trends of processor and logic IC implementation during the past five decades.
342
MANFRED SCHLETT
Looking at the history of microprocessors and semiconductor technology from the point of view of complexity, it is possible to identify five steps in the evolution of IC logic technology, as illustrated in Fig. 3. The development started with simple logic equations and has now reached system-level silicon integration. Every step has a special focus. For example, during the implementation of arithmetic functions, scientific research focused very much on "how to implement arithmetic equations best with semiconductor technology." Today, this chapter of science is almost closed. Optimized CMOS implementation of a multiplier or an adder can be found in standard texts [27]. Of course the individual steps overlap each other, and even in the very early days complete system level integration had been done. But, looking at the evolution from a mainstream point of view, the focus of every decade is expressed in the number of scientific research projects and commercial products introduced into the market.
4.
Basic Processor Classification
After looking at the basic market trends in the 32-bit embedded world and the basic technology evolution during the past five decades, this section introduces a basic processor and system classification.
4.1 System Level Approach Embedded controllers or processors were driven by completely different parameters than processors of PCs, workstations, or supercomputers, [28,29]. But, in the early days, both worlds were pretty much the same. There are still some fields where the approaches are very similar, such as industrial PCs or some industrial control applications. To understand the differences in detail, it is necessary to define the individual worlds. First of all, a traditional embedded system could be defined as a logic system embedded into a system executing a predefined set of functions. Usually, these functions are executed when a predefined situation occurs or started by a simple user interface (e.g. realized by simple buttons). An embedded system is in most cases invisible; typical examples are washing machines, air conditioning systems, car brake systems, etc. The other world is the computer system, defined as a freely programmable system with a focus on the user interface. The system reacts on downloaded software. Typical applications are office automation, general data manipulation, or specific numerical scientific algorithms. Examples are PCs, workstations, supercomputers, or office machines.
EMBEDDED MICROPROCESSORS
343
An embedded system does not necessarily include a microcontroller or microprocessor, but, as mentioned above, this application area became one of the most important for the microelectronics industry. In this area, we find a differentiation into microcontrollers and microprocessors. A microcontroller is normally associated with controlling a system, whereas a microprocessor's focus is processing data as fast as possible. Controlling a system requires different approaches than processing data. Microcontrollers are mainly 4-, 8-, and 16-bit devices and microprocessors 32- and 64-bit devices. Thanks to changes we are about to describe, this distinction is no longer valid except for 64-bit devices. But things change quickly in the semiconductor business, so even that may no longer be true in a few years. In the early days, there was no difference between microcontrollers and microprocessors. Zilog's well-known Z80 [30] or Motorola's first 68000 device were used in computers as well as embedded systems. This was possible because at that time a single CPU was not sufficient to implement the entire system. Additional supporting chips were necessary, but could be replaced by others to be compliant with the system specification. With increasing integration capabilities and thus reduced transistor costs, devices became more specialized. Today, we differentiate mainstream CPUs into the following classes: 9 embedded controllers 9 embedded processors 9 computer processors. This classification is not intended to cover all aspects of logic IC development. There are many devices available for dedicated specific applications such as supercomputers or dedicated office machines which are not covered by this approach. In principle, a computer or an embedded system consists of some basic components such as a processing core, peripheral functions, supporting system modules, and the system bus architecture illustrated in Fig. 4. This basic system is called the micro-system. Besides the micro-system, an additional context is necessary to define the entire system. This context includes visual display units, keyboard, loudspeakers, analogue measuring equipment, etc. The processing core includes the instruction execution unit, peripheral functions include interfaces for serial or parallel communication, analogueto-digital converters, display controllers, etc. The supporting modules include bus interfaces and controllers, all kinds of memories, additional interfaces, etc. Of course, systems are often much more complex than the system shown in Fig. 4, but the basic configuration is the same.
344
MANFRED SCHLETT PM: peripheralmodule PM PM
Context (display, connectors, etc.)
PM
~f
Processing/ CPU module
Core-closesupportmodules Bus Systemsupportmodules
FIG. 4. General system level approach.
The general system level approach does not visualize the differences between the various systems. The separation into embedded or desktop and computer, respectively, is introduced by the peripheral functions that define the input and output of data and the communication with other systems. As mentioned above, we can classify the realizations into three main categories defined by the applications: embedded control, embedded processing, and computer systems. Following this approach, it is possible to identify the three types of rnicrocontrollers and microprocessors as illustrated in Fig. 5. The following sections introduce typical system configurations of these three types.
4.2
Embedded Controller Systems
Khan [31] called microcontrollers the "workhorses of the electronic era." These embedded microcontrollers are self-contained ICs designed to perform specific functions by themselves. Typical state-of-the-art controllers include ROM, RAM, and a list of specific peripherals executing special functions. Today, such an embedded control system is quite often a single chip system with one single microcontroller integrating memory and peripherals controlling the application. Current examples are motor control, airbags, microwave ovens, washing machines, etc. The main features of these microcontrollers are low cost and a vast amount of control functionality. The calculation performance does not have the highest priority. A typical system configuration is shown in Fig. 6.
EMBEDDED MICROPROCESSORS
345
FIG. 5. Basic processor classification.
On-chip RAM/ROM
Context
On-chip peripheral functions, I/O Microcontroller core
FIG. 6. Typical self-contained embedded control system.
The instruction set of a typical microcontroller core includes instructions supporting control functionality, which normally means it allows fast singlebit manipulation instructions such as mask bit instructions. Such a system is not intended to be reprogrammed or updated by the end user. The functionality--once set--will not be modified further. The typical performance requirement of such a system is between 1 and 10 million instructions per second (MIPS). For further details about typical microcontrollers see, e.g., [32].
346
MANFRED SCHLETT
4.3 Embedded Processor Systems This class of embedded processor systems includes devices which still belong to the embedded world but which focus on processing data instead of controlling an entire system. Typical systems include a limited number of additional peripheral chips, for example, an additional embedded microcontroller controlling a dedicated I/O functionality, but the main system functionality is still performed directly by the embedded processor (Fig. 7). Nevertheless, the subsystem might be quite complex. Additional support chips could run an entire graphics accelerator or even a video decoder that could not be handled by the embedded microprocessor itself. Examples for extended I/O capabilities are video or graphic controllers. To a certain degree, this system could be reprogrammed or updated in the field, but, compared to the embedded control system described above, a more open approach is necessary. Program download by the end user may be possible. Typically, such systems offer a MIPS performance between 50 and 500. Examples of these systems are video game consoles, PC companions, car information or navigation systems, etc.
4.4 Computer Processor Systems The third category includes so-called computer processor systems. Control functionality is carded out by dedicated modules and chips rather than by the main CPU itself. PC mainboards are a very good example of this approach. The main CPUs most important role is the execution of general application programs such as games, financial or office software, databases, etc. Figure 8 shows a conceptual block diagram of a standard PC motherboard from the 1998-99 era [33]. A substantial part of the entire system is covered by support chips (mainly the North and South bridges). These additional devices or support chips integrate the communication with additional components such
Context
Support chip (ASIC, companion chip ....)
Embedded microprocessor
Context
Memo~
FIG. 7. Typical embedded processor system configuration.
347
EMBEDDED MICROPROCESSORS
Backside
Frontside
cache
I
cache
Microprocessor
.]
Main memory
F"I Northbridge Advanced graphics port
PCI bus ; iv-
PCI bus card adapter
South bridge (legacy bus bridge, card bus bridge, super I/O)
FIG. 8. Conceptual PC motherboard block diagram [33]. as display, hard disk, sound cards, graphics accelerators, mouse, keyboard, etc. This illustrates that the most important differentiating factor of computer processors is the execution speed of the above-mentioned general programs and the possibility of extending and upgrading the system by the end user. Compared to embedded controllers and processors, the PC motherboard has to be standard and predefined. Software developed on another or previous system must be executed without any rewriting or work. This situation requires a need for upward compatibility of systems and processors, and strict adherence to that system standard. This is maybe the most obvious difference from the embedded system world; in classical embedded systems no such standard exists. The performance of such systems is no longer measured in MIPS; several benchmark programs have been developed to measure the overall system performance.
4.5 System Approach Conclusion The system level approach to embedded control, embedded processor, and computer processor systems does not show up any detailed differences
348
MANFRED SCHLETT
between the processor architectures, but it does disclose differences in the target systems. The main difference between these systems is the far more extended bus structure. The embedded control applications do not feature complicated external bus structures, but in the case of embedded processor systems, an external bus system has been introduced, as the embedded processor system does not include all the peripherals required to run the entire system. In case of computer processor systems, several additional bus structures are necessary to perform the required functionality, which leads to a kind of distributed processing requiring sophisticated methods to avoid bottlenecks. Typically, the computer processor system includes several embedded control systems, with the super I / O part introduced in the South bridge illustrated in Fig. 8 being a typical embedded control system. The different system approaches lead to different implementations of the basic processor core. In Section 5 basic implementation methodologies are discussed to illustrate these differences.
5.
Processor Architectures
In this section, the basics of microprocessor architectures will be introduced and discussed. We discuss in which way a processor architecture works, how it can be implemented, and how we can accelerate the processing speed.
5.1
Architecture Definition
According to Federico Faggin [34], one of the pioneers of microprocessors, it is possible to divide microprocessor progress into three phases, each lasting approximately 25 years: During the first phase, semiconductor technology was the dominant factor of its progress, during the next phase, both technology and architecture will play a major role, and finally, during the last stage, architecture will be the dominant factor. This statement indicates that the choice of architecture itself is becoming a crucial factor in the future worlds of semiconductors and microprocessors. This is especially true for the fragmented embedded system world with its various system requirements. As described earlier, a microprocessor executes a set of arithmetic or logical operations on a defined set of data and data transfer operations. This leads immediately to two subjects defining an architecture: instruction set and data format. The instruction set and the data format define the resources
EMBEDDED MICROPROCESSORS
349
needed to execute an operation in a certain time slot. The time needed to complete an instruction or the time needed to execute a set of instructions provide means to compare different architectures. Because every architecture has its own instruction set, a comparison of instruction timing has to be bundled with a certain task; for example, executing a special arithmetic operation such as a 10-dimensional vector multiplication. A comparison based solely on "how many instructions can be executed per second" is useless, as the entire arithmetic or logical operation or program has to be defined, measured, and compared. The processor's clock determines when a transition from one logical state to the other is being performed and the time from one clock tick to the next one is referred to as the clock cycle. The clock speed or frequency alone does not indicate the overall speed of an architecture implementation. It has to be combined with the information being processed. Basically, there are architectures executing an entire instruction between two clock ticks, whereas others execute only a small part of the entire instruction; for example, only decoding an instruction. The basic arithmetic operation may be performed during another clock cycle. The reason for doing this is described below. The instruction set of a typical microprocessor architecture includes the following instruction types: 9 move instructions." these are used to move data from one memory
location to another 9 arithmetic instructions." executing arithmetic operations such as add,
subtract, multiply, divide, or shift 9 logical instructions." executing logical Boolean bit manipulations of one
or more operands, such as OR, XOR, NOT 9program f l o w control instructions." controlling the program flow such as
branches, or exception processing interrupting the normal program flow 9processor state control instructions." changing the state of the processor
such as reset control. The control and program flow instructions ensure a proper handling of interruptions of the normal instruction stream. This could be caused, for example, by a signal from an external control device requiring a special treatment that means an interruption of the normal instruction sequence. Every processor state is defined or controlled by a set of data stored in the control registers. This data ensures a defined transition from one state to the other (Fig. 9). The instructions are decoded in a binary format. When the control unit of the architecture gets an instruction to execute, it interprets
350
MANFRED SCHLETT
Instruction 1
I-q
Instruction 2
Time
FIG. 9. Basic instruction stages: F, fetch; D, decode; E, execute.
that binary number and changes the processors state accordingly. Usually, an instruction is executed within three steps. The instruction is first fetched, i.e., transferred from a memory location to the control unit; next, the instruction is decoded; finally, the instruction is executed. All instructions are operating on data with a predefined format. An architecture could include several different data formats, for example, 16- or 32-bit integer and floating-point or fixed-point numbers. All these numbers have a binary representation but are interpreted in a different way. If just the binary representation is available, it is not possible to decide if it is a floating-point or an integer number. Only the interpretation of the execution units and the instructions makes the distinction clear. The size of the integer data format and the bus structure connecting different units decide which kind of n-bit architecture it is. On the market we find mainly architectures for n = 4, 8, 16, 32, or 64. In conjunction with this data bus transporting the binary values of instructions and data, an additional address bus is necessary to identify the memory location used to store this data. These features define the basics of an architecture to a very high level. Going more into detail, data paths, and memory locations for the data have to be defined as well as the processing states. The instructions are performed by changing the state of a processor at a certain time in a completely defined way. This is done until the instruction is completed, and all instructions are executed in a sequential defined way. Of course, as illustrated below, several mechanisms have been developed to accelerate the processing procedure of an instruction stream. This could lead to an automatic reorganization of the instruction stream or to parallel execution, but of course the logical result of the program is not affected. To understand the mechanisms of a processor it is important to describe the program flow and how software is executed. The software is separated into tasks, which are a collection of instructions executing certain functionality. The main task is normally used to organize the entire system and the entire program flow. Sub-tasks are started and controlled within the main task. A task could also be a collection of no-operation (NOP) instructions representing an idle task. This is necessary, for example, if the system is waiting and has nothing else to do. Tasks could be interrupted by
EMBEDDED MICROPROCESSORS
351
so-called interrupts or by other tasks (task switch). Interrupts could be interpreted in a very general way, including exceptions caused by instructions or by external devices requesting processing power of the processor. This causes an interruption of the normal program flow and is handled either by software or by hardware. The mechanism for handling this could differ a lot between different architectures. Every task has its own context defined by resources such as operand registers or memory locations, and also by control registers. If a new task is interrupted, either by interrupts or by another more important task, the context of the old task has to be stored, and when the old task is restarted the context has to be restored. Controlling tasks and interrupts is done by the operating system (OS), which makes the OS a crucial part of every system; this is true for PC systems as well as embedded systems.
5.2 Core Implementation Methodologies If we look at the mainstream processors, we find two main realization philosophies: the Reduced Instruction Set Computer (RISC) and the Complex Instruction Set Computer (CISC). The main difference between the instruction implementation methodologies is that RISC processors are based on simple (thus "reduced") instructions operating on a defined set of general-purpose registers, whereas CISC processors feature an instruction set executing more complex operations. In the early 1980s, many publications discussed the pros and cons of both approaches. Although the RISC approach allows much higher performances the CISC architectures currently dominate the market, with the x86 architecture dominating the desktop computer segment and the 68000 the 32-bit embedded segment. However, nearly all new architectures now being developed are based on RISC principles. A typical CISC architecture was IBM's System/360 architecture. As many new concepts have been introduced to the market during the past 25 years, it is now difficult to give clear definitions of CISC and RISC: many features evolved so long ago that the initial ideas are now unclear. Maybe a more useful approach is to list some of the basic features of the first computer implementations. IBM's System/360 architecture is a typical example of the CISC approach. Because of a rapidly changing basic technology it was then very useful to further develop and implement architectures, but to keep the instruction set compatible. The x86 architecture is probably the best-known representative of that philosophy. Software has to run on different implementations without any further work, and this approach led to the concept of microprogramming introduced in 1951 by Wilkes [35]. These microprograms were a collection of basic machine instructions which were
352
MANFRED SCHLETT
invisible to the application software programmer and which included all necessary information to define the processor's states and its transitions. As the programmer only sees this programming interface, the underlying hardware realization could be different. This gave the vendors the opportunity to realize the processor in different versions and therefore create a family concept. The difference of various implementations was basically the execution speed of the instructions. The more expensive the hardware the faster the program was executed, but the logical result was the same. But with the advance of technology and with the development of very efficient compilers, microprogramming was no longer the most effective approach. The introduction of the RISC philosophy was sometimes described as a paradigm change, but if we look at the details, it was more a transition and a further development of the basic concepts. The RISC approach could be seen as a transfer of microprograms to pure software, which could be developed and maintained much more easily. This change from microprograms to simple instruction execution engines opened the door to several new trends and architectural advances. Because many designers called their architecture RISC it is difficult to define exactly what the term means, but typical features of RISC architectures are:
9Simple instructions." By using strictly simple instructions, much higher clock speeds became possible. The execution time of an instruction depends directly on the number of logic gates that have to switch before completion. 9Load/store architecture." All arithmetic and logical instructions operate on fast switching registers; to move data from external storage media to these registers additional move operations are necessary. 9Instruction pipelining." Each step during the execution of an instruction, such as fetch or decoding, is capsulated and can thus be paralleled with the next instruction; for example, while decoding an instruction, the next instruction to be executed could be fetched. 9Fixed instruction length." This was introduced to unify the time needed to decode an instruction. 9 Very high data throughput. One of the most typical characteristics of RISC architectures is the classic five-stage pipeline illustrated in Fig. 10a. Pipelining is a simple mechanism to parallel and thus accelerate instruction execution. As this can be done in different ways, the literature is varied. The control unit of a processor encapsulates certain stages of an instruction, for example, into
EMBEDDED MICROPROCESSORS
(a)
'nstruc"on'F'D'' ] t Instruction 2] F I
(b)
E
....
~A E
Time
I 1:1
Instruction 3
E
Instruction 3 [
,nstruct,on', Instruction 2
353
~ y
Instruction 4
F
R
E
R
E
I
I F I F
R
E
V~
R
E
~A
D
R E
D
R
E
F
D
R
F
D R F
D
F
D
Time
FIG. 10. (a) Basic five-stage RISC pipeline; (b) two-way superscalar pipeline.
five stages: Fetch (F), Decode (D), Operand Read (R), Logic Execution (E), and Result Write Back (W). Every stage is processed by a certain unit, for example, the fetch unit, the decode unit, etc. The execution of the instruction requires the processing of all five stages, but, when the fetch unit is ready with the first instruction, the next instruction can be fetched in the next clock cycle. In this next clock cycle, the first instruction is decoded. The same is done with all the stages. This means at every clock cycle an instruction can be (ideally) fetched, and at every clock cycle one instruction can be completed. Every instruction requires five clock cycles to be fully completed, but the user does not see these five clock cycles. Of course, when a program sequence is started it takes five cycles until the user sees the result of the first instruction. But after these cycles, an instruction is completed at every cycle. Sometimes, special instructions using dedicated units or load and store instructions have different pipeline structures. Thus, for different instructions, the pipelining looks different and includes memory accesses that could require more than one clock cycle. This leads to one of the most important challenges in the definition of an architecture, avoiding and managing socalled pipeline breaks. The execution of an instruction has to be stopped because a previous instruction changes the processor state. This could happen, for example, if an external memory access occurs requiring a nondeterministic amount of clock cycles. If, however, an external memory access is necessary, a refresh of the external D R A M could happen. The access has to wait until the refresh is completed. As the architecture does not know how many cycles that requires, further processing of the instruction execution has to be stopped. Depending on resource conflicts with the next instructions, the next instruction might be further processed or also have to be stopped.
354
MANFRED SCHLETT
As mentioned above, the development of optimizing compilers was crucial for the success of RISC architectures. In the very early days of RISC architectures, there were two main approaches. 9 The Stanford-RISC approach involved keeping the architecture extremely simple as the compiler had to care about resource interlocking. There was no hardware support for a resolution of hardware conflicts and the compiler had to detect in advance possible conflicts and to add NOP instructions. Resource conflicts could happen because of the pipelining concept. When an instruction has to access a certain register, this register may still be in use by one of the preceding instructions. Without the parallelism introduced by pipelining, this would not occur. The archetype of this approach is the MIPS architecture. The idea of the MIPS architecture was to be really reduced and simple. But this approach required more complex or "smart" compilers. 9 The Berkeley-RISC approach offers additional hardware support in the event of resource conflicts. An example of this approach is the Sparc architecture. Interlock mechanisms and context saving mechanisms were implemented to support the programmer and to simplify compiler development. Of course, this approach leads to a more complex hardware design. The load/store principle is also typical for RISC architectures. To avoid bottlenecks when accessing data, operands are stored locally in a register file (see Fig. 11). All arithmetic and logical instructions operate on these registers. Thus, instructions are necessary to load operands from memory to these registers and vice versa. These load and store operations are independent from the other instructions. This approach has two major benefits: 9 the pipelining stages could be held similarly 9 if the result is being reused soon it could be kept in these local register files and no external (slow) memory access is necessary. CISC architectures normally have only a few local registers. These architectures are normally stack-based, thus putting operands into a stack storing the values of the operands in certain memory locations. Depending on the program that has to be executed, both approaches have advantages and disadvantages. The load/store principle causes additional instructions if the result is not further processed immediately, and the register file belongs to the category described above which has to be saved when an interrupt or task switch occurs. The CISC approach has disadvantages when operands
355
EMBEDDED MICROPROCESSORS
General purpose register bank
Functional units
Main
memory
System and control registers
FIG. 11. Register bank as an intermediate fast operand storage medium.
are often reused and immediately further processed. Using the load/store approach allows higher clock frequencies, thus leading to a better performance. To resolve the context saving issue, several RISC architectures introduced mechanisms to avoid extra cycles for saving and restoring the context. This could be done by introducing additional register banks and simple switch mechanisms. Of course, as resources are limited, it still could happen that the context has to be stored on an external stack. A lot of different strategies, such as moving register windows or shadow registers, have been implemented to accelerate the interrupt response time or subroutine calls and returns. Which strategy is best normally depends on the application, and it is difficult to identify one that is generally superior. As mentioned earlier, most of the embedded 32-bit architectures have not been developed with the explicit goal of becoming solely an embedded processor. This is especially true for the 68000 series, the x86 family, the Sparc architecture, and the MIPS architecture. This leads to the interesting question of how embedded architectures and desktop computer architectures differ. In fact, there are only a few requirements that determine if an architecture is useful for embedded applications. This is driven by the requirements of embedded applications such as costs, power consumption, and performance. Basically, increasing the calculation performance as much
356
MANFRED SCHLETT
as possible mainly drove desktop computer applications; embedded processor architectures require a cost-efficient implementation, the integration of peripherals, and a fast processing of interrupts. If we look at the processor core, the main difference is the handling of the interrupts and the exception processing. Because embedded processors are driven by controlling a system instead of accelerating it, the further processing of events becomes a key feature. This directly affects the implementation of the register set introduced earlier. If an interrupt indicating a change in the normal program flow occurs, the context has to be saved and subsequently re-established. But, offering as many registers as possible for temporarily storing the operands can help to increase the arithmetic performance of an architecture. This leads to a trade-off between the number of registers available for a program task and the time needed to save the context. In architectures designed primarily for the embedded market, a typical number of general-purpose registers is 16, representing (together with the control registers) the context of a task. To make a fast context switch possible, many architectures integrate shadow registers allowing an extremely fast response to interrupt requests by a simple switch mechanism. Architectures primarily designed for the desktop or workstation domain feature more registers, allowing nested subroutine calls, but this slows down the task switching time. For example, several architectures integrate a much higher number of general-purpose registers but feature a windowing mechanism. A specific partial block of registers could be addressed from a single routine; by moving this window to the next block and by overlapping these register windows, a fast parameter transfer to the next routine and a fast jump to that routine could be realized. The main difference between desktop and embedded architectures is the focus of the implementation. Desktop architectures focus on fast subroutine jumps, whereas embedded architectures focus mainly on fast interrupt processing. In fact, interrupt processing requires slightly different parameter transfers. In case of an interrupt, the processor jumps to the address associated with the interrupt and stores the context. In case of a subroutine call, additional parameters have to be passed from the calling routine to the new routine. Looking at all the various architecture implementations, different flavors could be identified ranging from single register banks to multiple or windowing register banks. There is no ideal implementation: every implementation has a trade-off between task context saving time and fast exception processing. The implementation of a general-purpose register bank for intermediate data storage is based on the assumption of operand re-usability throughout a program task; otherwise, an intermediate memory would not make sense. Basically, this is true for DSP applications. Typical DSP processors do not
EMBEDDED MICROPROCESSORS
357
feature a general-purpose register bank; the functional units access data directly from the main memory, normally fast on-chip memory. For further reading, [36] is recommended.
5.3
RISC Architecture Evolution
The architectures adapting the RISC approach drove, for example, the cache discussion, superscalar instruction execution extensions, out-of-order instruction execution, branch-prediction, and very long instruction word processing (VLIW). The introduction of caches was necessary and is still very important today as the clock frequency of the microprocessor core itself is much higher than the speed of external memory. Fast on-chip memory was necessary to avoid data bottlenecks between external and internal memory and registers. A cache could be defined as an intelligent RAM storing instruction and/or data needed to ensure fast program flow. The idea behind cache memories is to reduce the number of external memory accesses and to store frequently used data and instructions in a memory block close to the microprocessor core. There are two main approaches: 9unified cache architectures, offering a reasonable speed penalty at low costs 9separated cache structures, offering higher speed but requiring more hardware resources.
Unified caches use the same memory for instructions and data whereas separated caches have two different blocks, one for instructions and one for data, but this makes additional buses and additional memory cells necessary. The cache memory itself is defined basically by the refill strategy and the information elements (associatively). A useful introduction to cache design is given in [37]. A cache memory is not necessarily the best strategy if an extremely deterministic program flow is required. This is, for example, true for DSP architectures [38]. In general, if the program flow is deterministic, a simple on-chip RAM structure could be more useful. If a general-purpose routine has to be executed, a cache structure is often the better solution as a prediction of the program flow is not possible. To further improve the execution speed, several concepts for the parallel execution of instructions have been introduced. Besides the basic pipelining concept, a further parallelism has been added by using superscalar instruction issuing, which means to start more than one instruction in parallel [39]. Figure 10b illustrates the basic pipelining and instruction issue timing of a five-stage two-way superscalar pipelined architecture.
O0
358
="
MANFRED SCHLETT
~
0
rrl X
0
0
Core Function Extensions -H
5.4
0
.m
~.
~..~.
.~
~
~
.
~
o
~
.
~q-~.
~
:_.~.~
ff~
~
~
o
o
,.., ~1~
~
~
I::::~ ~ ~
~ ~
=
~o o ~
~
~
~
o~- ~
Other approaches are VLI W and single-instruction-multiple-data (SIMD) implementations. Basically. VLIW and S I M D are much simpler to realize but require additional compiler optimizations for efficient use. Superscalar approaches require a lot of additional hardware, as, for example, register conflicts could occur during the execution of an instruction. For example, a floating-point instruction may produce an overflow exception, which could not be predicted in advance. Deeply pipelined architectures such as the Digital's Alpha architecture (now Compaq) are affected by that situation. Also, a lot of additional hardware is required in the decode unit. Other implementations such as Hewlett-Packard's PA-RISC architecture VLIW approach require highly optimized compilers to efficiently use the available resources. VLIW and SIMD basically summarize a defined number of operations into a single or long instruction which requires a reorganization of the code during compilation time. The latter approaches are currently renewed by, for example, multimedia and DSP processors such as Texas Instruments' C60 architecture or Chromatic Research's MPACT chip [40,41]. T o use all these architecture extensions efficiently. it is necessary to combine them in the right way. Cache size, superscalarity, branch prediction, and other resources have to be used in a balanced way to end up with a useful architecture. The raw implementation of as many features as possible does normally not lead to fast, powerful, or cost-efficient architectures.
~ ~
~-
~ ' ~ - ~ o ~
~-~
~ 8~ ~ ~ .~
~. ~ =
r
o
~ o ~ ~ ~
o
~
"'*
o
~
0~
t'~
~.
~o
~l~ ~.,
"
~~
"