Editor-in-Chief Hossein Bidgoli, California State University, Bakersfield, California
Senior Editors Sean B. Eom, Southeast Missouri State University Andrew Prestage, Director of Technology, Kern County School District
Associate Editors Maryam Alavi, Emory University Alan Dennis, Kelley School of Business, University of Indiana Paul Gray, Claremont Graduate University Clyde W. Holsapple, University of Kentucky William R. King, University of Pittsburgh Catherine M. Ricardo, Iona College Barbara Haley Wixom, University of Virginia
International Advisory Board Jay Aronson, University of Georgia Sven Axs.ater, Lund University, Sweden Georgios I. Doukidis, Athens University of Economics and Business George C. Fowler, Texas A&M University Gary Grudnitski, San Diego State University Robert R. Harmon, Portland State University Den Huan Hooi, Nanyang Technological University Kee Young Kim, Yonsei University Ken Laudon, Stern School of Business, New York University Patrick McKeown, University of Georgia Motofusa Murayama, Chiba University Effy Oz, Pennsylvania State University David B. Paradice, Texas A&M University J. P. Shim, Mississippi State University Fatemeh Zahedi, University of Wisconsin, Milwaukee Ahmed Zaki, College of William and Mary
Dedication To so many fine memories of my brother, Mohsen, for his uncompromising belief in the power of education.
Preface The Encyclopedia of Information Systems is the first comprehensive examination of the core topics in the information systems field. We chose to concentrate on fields and supporting technologies that have widespread applications in academic and business worlds. To develop this encyclopedia, we carefully reviewed current academic research in the management information systems (MIS) field in leading universities and research institutions. MIS, decision support systems (DSS), and computer information systems (CIS) curriculums recommended by the Association of Information Technology Professionals (AITP) and the Association for Computing Management (ACM) were carefully investigated. We also researched the current practices in the MIS field carried out by leading IT corporations. Our work assisted us in defining the boundaries and contents of this project. Its articles address technical as well as managerial, social, legal, and international issues in information systems design, implementation, and utilization. Based on our research we identified 10 major topic areas for the encyclopedia: • • • • • • • • • •
Theories, methodologies, and foundations Hardware and software Database design and utilization Data communications, the Internet, and electronic commerce Social, legal, organizational, and international issues Systems analysis and design Office automation and end-user computing Management support systems Artificial intelligence Applications
Although these 10 categories of topics are interrelated, each addresses one major dimension of information systems design, implementation, and utilization. The articles in each category are also interrelated and com-
plementary, enabling readers to compare, contrast, and draw conclusions that might not otherwise be possible. Though the entrieshave been arranged alphabetically, the light they shed knows no bounds. The encyclopedia provides unmatched coverage of fundamental topics and issues for successful design, implementation, and utilization of information systems. Its articles can serve as material for a wide spectrum of courses, such as systems theories, artificial intelligence, data communications and networking, the Internet, database design and implementation, management support systems, office automation, end-user computing, group support systems, systems analysis and design, electronic commerce, hardware and software concepts, programming languages, software design, and social, legal, organizational, and international issues of information systems. Successful design, implementation, and utilization of information systems require a thorough knowledge of several technologies, theories, and supporting disciplines. Information systems researchers and practitioners have had to consult many sources to find answers. Some of these sources concentrate on technologies and infrastructures, some on social and legal issues, and some on applications of information systems. This encyclopedia provides all of this relevant information in a comprehensive four-volume set with a lively format. Each volume incorporates core information systems topics, practical applications, and coverage of the emerging issues in the information systems field. Written by scholars and practitioners from around the world the articles fall into 10 major subject areas:
Theories, Methodologies, and Foundations Articles in this group examine a broad range of topics, theories, and concepts that have a direct or indirect effect on the understanding, role, and the impact of in-
xxix
Preface
xxx formation systems in public and private organizations. They also highlight some of the current research issues in the information systems field. These articles explore historical issues and basic concepts as well as economic and value chain topics. They address fundamentals of systems theory, decision theory, and different approaches in decision making. As a group they provide a solid foundation for the study of information systems.
mentation. These issues include social, organizational, legal, and ethical factors. They also describe applications of information systems in globalization and developing nations and introduce the obstacles involved for the introduction of information systems in a global environment. A thorough examination of these important topics should help decision makers guard against negative aspects of information systems.
Hardware and Software
Systems Analysis and Design
These articles address important hardware and software concepts. The hardware articles describe basic hardware components used in the information systems environment. Software articles explain a host of concepts and methodologies used in the information systems field, including operating systems, high level programming languages, fourth generation languages, web programming languages, and methodologies for developing programs and commercial software.
Articles in this group address tools, techniques, and methodologies for successful analysis and design of information systems. Among their subjects are traditional as well as modern systems analysis and design, software and program design, testing and maintenance, prototyping, and user/system interface design. Project management, control tools, techniques, and methodologies for measuring the performance and quality of information systems are introduced.
Database Design and Utilization The authors in this cluster cover database technologies within information systems. They examine popular database models, including relational, hierarchical, network, and object-oriented data models. They also investigate distributed database concepts, data warehousing, and data mining tools.
Data Communications, the Internet, and Electronic Commerce Articles in this group explore several fundamental technologies, infrastructures, and applications of the Internet and data communications and networking. LANs, WANs, and client-server computing are discussed and security issues and measures are investigated. Fundamentals of e-commerce technologies and their applications are summarized as are business models on the Web. This collection of articles also presents several applications of data communications and networking, including group support systems, electronic data interchange, intranets, and extranets.
Social, Legal, Organizational, and International Issues These articles look at important issues (positive and negative) in information systems design and imple-
Office Automation and End-User Computing The articles in this category examine ubiquitous information systems applications and technologies such as word processing, spreadsheets, long distance conferencing, desktop publishing, and electronic mail. They also discuss issues and technologies that affect methods for managing these productivity tools, including ergonomic factors and end-user computing.
Management Support Systems These articles examine information systems technologies containing significant decision-making capabilities, such as decision support systems, group support systems, and geographic information systems. They also look at modeling analysis and the model building process which is essential for effective design and utilization of management support systems.
Artificial Intelligence Articles in this range address the fundamentals of artificial intelligence and knowledge-based systems. This collection of articles highlight tools and techniques for design and implementation of knowledge-based systems and discusses several successful applications of these systems, including expert systems, machine
Preface learning, robotics, speech and pattern recognition and heuristic search techniques.
Applications Information systems are everywhere. In most cases they have improved the efficiency and effectiveness of managers and decision makers. Articles included here highlight applications of information systems in several fields, such as accounting, manufacturing, education, and human resource management and their unique applications in a broad section of service industries, including law, marketing, medicine, natural resource management, and accounting firms. Although these disciplines are different in scope, they all utilize information systems to improve productivity and in many cases to increase customer service in a dynamic business environment. Specialists have written this collection for experienced and not so experienced readers. It is to these contributors that I am especially grateful. This remarkable collection of scholars and practitioners have distilled their knowledge into a one-stop knowledge base in information systems that “talks” to readers. This has been a massive effort, but one of the most rewarding experiences I have ever taken. So many people have played a role that it is difficult to know where to begin.
xxxi I thank the members of the editorial board and my associate editors for participating in the project and for their expert advice and help with the selection of topics, recommendations for authors, and reviews of the materials. Many thanks to the countless number of reviewers who devoted their time advising me and the authors on how to improve the coverage of these topics. I thank Dr. J. Scott Bentley, my executive editor, who initiated the idea of the encyclopedia back in 1998. After several drafts and a dozen reviews, the project got off the ground and then was managed flawlessly by Scott and Christopher Morris. They both made many recommendations for keeping the project focused and maintaining its lively coverage. I thank Ron Lee and Nicholas Panissidi, my superb support team at Academic Press, who took paper, diskettes, and e-mail attachments and made them into this final project. Ron and I exchanged several hundred e-mail messages to keep the project on schedule. I am grateful for all their support. Last, but not least, I thank my wonderful wife Nooshin and my two lovely children Mohsen and Morvareed for being so patient during this venture. They provided a pleasant environment that expedited the completion of this project. Also, my two sisters Azam and Akram provided moral support throughout my life. To this family, any expression of thanks is insufficient. Hossein Bidgoli
Accounting Uday S. Murthy
A
Texas A&M University
I. ACCOUNTING INFORMATION SYSTEMS DEFINED II. TRADITIONAL AUTOMATED ACCOUNTING INFORMATION SYSTEMS III. MODERN DATABASE-ORIENTED INTEGRATED ACCOUNTING SYSTEMS
IV. ACCOUNTING SYSTEMS FOR REVENUE (SALES) V. ACCOUNTING SYSTEMS FOR PROCUREMENT (PURCHASING) VI. ENTERPRISE-WIDE ACCOUNTING SYSTEMS
GLOSSARY
ing information system in an organization is designed to take business transactions and events as data inputs and generate a variety of financial reports as information outputs. Until the advent of computers and the information technology revolution of the last few decades, the accounting process was performed manually, using the centuries-old double-entry bookkeeping system. With the information technology revolution of the past few decades, however, manual bookkeeping has become defunct. Even very small organizations can afford to automate their accounting system using low-cost, off-the-shelf software. However, many automated accounting systems still use the double-entry model as the basis for accounting. If accounting is seen as a system of providing information useful for decisionmaking, traditional accounting information systems that focus only on financial information and provide only periodic highly aggregated data can fall short of completely meeting the needs of users internal and external to an organization. Internal users include employees at all levels from top management to the lowest level worker who has a legitimate need for information. External users comprise investors, stockholders, creditors, customers, government and regulatory agencies, and financial institutions. Both internal and external users are increasingly demanding instantaneous user-friendly access to relevant and reliable information. Modern accounting systems must be designed to fulfill a wide range of users’ needs with reliable information that is available on-line on demand.
accounting software package Software comprising standardized modules for processing accounting transactions for each of the major accounting cycles, such as sales, purchases, inventory, receivables, payables, payroll, and general ledger. business process modeling Creation of tempates or diagrams with standardized symbols for representing various aspects of business activities. control procedures Mechanisms or activities for preventing or detecting the occurrence of errors and irregularities in data. database An organized repository of data with functionality for adding, deleting, updating, and retrieving the data. enterprise-wide system An information system spanning all the major functional areas within an organization, enabling seamless integration of business processes across the functional areas. entity-relationship diagram A widely accepted modeling convention for representing business entities and the relationships between them.
I. ACCOUNTING INFORMATION SYSTEMS DEFINED Accounting information systems primarily focus on fulfilling the accounting information needs of an organization’s internal and external users. The account-
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
1
Accounting
2
II. TRADITIONAL AUTOMATED ACCOUNTING INFORMATION SYSTEMS The first generation of computerized accounting information systems continued using the traditional accounting methodology, except that computer technology replaced manual paper-and-pencil journals and ledgers. Computer files replaced journals and ledgers, while printed reports and screen displays replaced manually created reports. Thus, rather than “reengineering” the accounting system, computers were used simply to automate manual accounting systems. Of course, computerization resulted in much greater speed and accuracy in the accounting process. In the early days of computer-based accounting (the 1960s and early 1970s), the typical approach was to develop custom transaction processing applications using common business-oriented language or COBOL. Programmers developed COBOL programs for processing the accounting transactions in each of the organization’s accounting cycles—revenue (sales, billing, collections, returns), procurement (ordering, receiving, recording purchase liability, payment), conversion (production), payroll, and general ledger. Users could not easily modify existing reports and had no ability to generate ad hoc reports on an “as needed” basis. In the late 1970s and even more so in the 1980s and 1990s, organizations increasingly adopted off-theshelf accounting software packages rather than custom building their own COBOL-driven systems. An accounting software package has a predefined user interface, chart of accounts system, various transaction processing options, and report generators. While most packages can be customized and modified to varying degrees, an organization implementing an accounting package must adapt somewhat to the idiosyncrasies of the package. For example, the package may generate sales analysis reports broken down by salesperson or by region, but not broken down by salesperson within a region. Thus, an off-the-shelf package may satisfy a firm’s bookkeeping require-
ments, but typically will not satisfy the need for nonstandard or ad hoc reports. The chief advantage of accounting packages, relative to custom-developed accounting systems, is their low cost and superior reliability. Both custom-developed COBOL programs and offthe-shelf accounting packages are geared toward automating the process of recording accounting transactions. For example, an accounting package will typically accept the following revenue cycle accounting transactions: cash sales, credit sales, collections on account, and sales returns and allowances. In a customdeveloped environment, the same transactions are typically handled by a series of COBOL programs. Thus, computerized bookkeeping retains an “accounting transactions orientation” with a focus on financial data alone. Business events such as a salesperson contact with a potential customer or vendor quotations for products will find no place in a standard accounting software package. Figure 1 depicts the functioning of the sales and accounts receivable portion of an accounting software package. A custom-developed COBOL program for handling sales and accounts receivable would function in a similar manner. As shown in Fig. 1, the “Sales and Accounts Receivable Module” is designed to accept the typical set of accounting transactions—credit sales, cash receipts from customers, and sales returns and allowances. Through proprietary file management software (the most common of which is the Btrieve structure for microcomputer-based packages), input transactions are recorded in computer files. However, it is important to recognize that the data in these files are accessible only through the file management software. Thus, reports that can be generated are only those that are already programmed into the software package, or designed by the programmer (in the case of custom-developed software). The basic structure of the system, particularly the separate books (files) for each cycle, is not fundamentally altered. Users still did not have the ability to generate custom reports that accessed data in separate subsystems. For exam-
Figure 1 Structure of traditional accounting software packages.
Accounting Table I
3 Advantages and Drawbacks of Traditional Automated Accounting Systems Advantages
Drawbacks
Automation of tedious manual accounting tasks (e.g., posting to ledgers)
Accounting transactions orientation only; non-financial events not recorded
Speed and accuracy
Periodic rather than “real-time” reports
Low cost (accounting packages only)
Limited flexibility in generating "ad-hoc" reports
Automatic generation of standard accounting reports for common needs (e.g., sales analysis reports, customer statements, financial statements).
Data accessible only through proprietary file management systems
Redundant data storage permits efficient generation of certain standard accounting reports
Cross-functional queries difficult to answer
ple, since customer and vendor data are typically maintained in separate subsystems, it is difficult to determine which customers are also vendors and vice versa. The advantages and drawbacks of computerized bookkeeping are summarized in Table I.
III. MODERN DATABASE-ORIENTED INTEGRATED ACCOUNTING SYSTEMS The need for real-time information about enterprisewide business events and the advent of database technology has led to a more modern view of accounting information systems. The new view of accounting reverts to the fundamental definition of accounting as the main information providing function within organizations. Ideally, as business events transpire, the accounting system should collect and store data about all aspects of those events. Users at all levels in the organization could then obtain instant real-time reports regarding the results of those business events. Data should be stored at the most elemental level, with all aggregation and summarization being left to individual users. The data stored should not be limited to financial data. Most importantly, all event data should be stored in a single integrated repository (i.e., a database). This approach visualizes an enterprise-wide information system for an organization spanning departmental and functional lines. Indeed, the so-called enterprise resource planning (ERP) systems that have been adopted by a large number of Fortune 500 companies implement exactly such a vision. There are two distinct approaches to constructing a database-driven enterprise information system. The first approach is to use relational database tools to custom-develop a suite of applications to meet the organization’s information needs, with all data being stored in a relational database like Oracle, Sybase,
IBM’s DB2, or Microsoft’s SQL Server. The second approach is to implement ERP software from vendors such as SAP, Oracle, J.D. Edwards, and PeopleSoft. ERP systems like SAP’s R/3 system provide the benefits of an off-the-shelf package—they are reliable and cost less than a completely custom-developed system. However, installing an ERP system in a large multinational organization can literally take years and several million dollars, since there are thousands of configurable switches that must be painstakingly set to meet the needs of the business. Moreover, businesses often reengineer their business processes as they replace their age-old (“legacy”) systems with an ERP system, increasing the cost and complexity of the ERP implementation even further. With either the customdeveloped path or the ERP path, the key advantages of this modern approach to accounting information systems are (1) the capturing of data about all significant business events and (2) the storage of that data in an integrated enterprise-wide database accessible by all users and that provides all information necessary to run the business.
A. Key Features of Modern Accounting Systems Events orientation—In contrast to the “transactions” orientation of traditional manual and automated systems, modern accounting systems focus on capturing data about business events. All relevant business events should be captured, preferably as the events transpire. A call from a potential customer, a complaint from an existing customer, and a suggestion from an employee are some examples of relevant business events that have no place in a traditional transactions-oriented accounting system. However, most managers would argue that information about customer complaints is
Accounting
4 valuable and should be stored in the organization’s information system. The “transactions” mentality in traditional systems focuses on dollar amounts and the accounts to be debited and credited. The events orientation simply seeks to record data about all events in a system that is not constrained by the debit-credit rules of double-entry bookkeeping. Enterprise-wide repository—A key feature in modern accounting systems is the storage of all entity and event data in a single integrated repository. Such a repository, in practical terms, is the enterprise’s database. The database is “integrated” in a cross-functional sense—data about events occurring in one functional area (e.g., sales order processing) are captured in the database and can automatically trigger events in another functional area (e.g., shipping). Furthermore, all organizational users that need certain information (e.g., a customer’s contact information) have instant access to that information which is stored only once and at one location. The design of the enterprise’s database is a critical issue that would determine the degree of success of the database approach. Is the database comprehensive enough? Have appropriate links been established between related sets of data within the database? Are data about all aspects of events stored in the database? Affirmative answers to these questions would indicate a successful design of
the event repository. The enterprise repository approach, which is at the heart of modern accounting systems, is illustrated in Fig. 2.
B. Core Concepts and Building Blocks of Database-Oriented Accounting Systems In creating a database-oriented, enterprise-wide accounting information system, a number of design decisions must be made relating to field formats, keys, and table types. Some core concepts and basic building blocks of database-oriented accounting systems are now discussed.
1. Hierarchy of Data At the lowest level, all computer systems store data in the form of bits. A bit is a binary digit and can take a value of either 0 (turned off) or 1 (turned on). In essence, a bit is turned on by a tiny electrical impulse and is turned off when the impulse is discharged. A group of bits forms a byte. In most computer systems, eight bits form a byte. A byte is roughly equivalent to a character, which is the lowest element understandable by humans. A group of related bytes forms a field. Thus, the group of bytes that forms the word “Jones”
REPOSITORY
Figure 2 Enterprise repository concept.
Accounting makes up a “name” field. A group of related fields makes up a record, or in relational database terminology, a row in a table. Tom Jones, Marketing, Sales Associate, 9/1/99, $2800 are a group of fields which make up an employee record showing the employee’s name, department, grade, date hired, and monthly salary. A group of logical records constitutes a file, which is equivalent to a table in relational database terminology. All employee records (rows) taken together would make up the employee file (table). A collection of sales invoices for April 2000 would form the sales file, or table, for April. Finally, a collection of logically related files/tables is a database. All files/tables relating to Division A’s operations would constitute the “Division A database.”
2. Field Formats Although field format decisions are usually not irreversible, giving some forethought to the kind of data that might be stored in each field is time well spent. Most database systems support the following formats for fields in a table: numeric, text, currency, date/time, and Boolean. Some systems also support counter fields and fields capable of storing binary objects such as a graphical image, a video clip or a sound recording. As the name suggests, numeric fields can only store numbers. In contrast, text fields can store any keyboard character. Moreover, these text fields can store any character in the complete character set used for a particular computer. All the characters (numeric and non-numeric) in the character set for a computer are called alphanumeric characters. Currency fields are used for storing dollars and cents appropriate formatted. Date/time fields can store dates or times in a variety of user-specified formats (e.g., MM/DD/YYYY or DD/MM/YYYY). Boolean fields can hold only one of two possible values—either true or false, or in some computer systems, yes or no. Counter fields are automatically incremented by one whenever a new record in the file is created. The user cannot update the value in the counter field. Fields capable of storing binary large objects, or BLOBS, are very useful for present day information systems which are increasingly becoming multimedia. No longer are information systems limited to storing text and numbers. Other forms of data such as a photograph of an employee or a video image of a product can also be stored in tables.
3. Keys In addition to selecting appropriate formats for fields in a table, another critical design decision is the des-
5 ignation of the primary key for the table. The primary key is the field or set of fields that uniquely identify the rows (records) in a table. For example, in an employee table, the social security number (SSN) could serve as the primary key because every employee would have a unique SSN. In most tables, the primary key is obvious. Customer tables typically have a customer number field that would be the primary key. The sales invoice number would be the primary key in a table of sales invoices. Some tables may require more than one field to uniquely identify records in the file. Consider a table used to store customer complaints. Assume that the customer number of the customer who makes each complaint is recorded. In addition, the date and time of the complaint is also recorded. However, no separate “complaint number” is generated. Assuming that a customer could have more than one complaint, then the customer number alone would not uniquely identify rows (records) in the table. However, if we assume that a customer will not have more than one complaint in a day, then the combination of customer number and date would uniquely identify the rows in the complaints table. If we assume that customers could make more than one complaint in a day, then the combination of customer number, date, and time, would uniquely identify the rows in the table. When a combination of fields is needed to uniquely identify records in a table, the primary key is called a composite key or a concatenated key. While primary keys uniquely identify records in a table, the other fields or attributes in the table, called “nonkey attributes” may be used to facilitate user queries. Consider a table of sales invoices with the following fields: invoice number, date, customer number, salesperson number, sales region, and sales amount. The primary key of this table is the invoice number. Each of the remaining fields is a nonkey attribute that can be used to sort the table to respond to a user request for information. Thus, the sales manager might want a listing of sales invoices by date, by salesperson number, or by sales region. These requests could be handled by sorting the table by each of the relevant fields. The enterprise database of a large organization may contain thousands of tables, each of which is related to at least one other table. In a relational database system, the interrelationships between tables are implemented using common fields across tables. These common fields are referred to as foreign keys. A foreign key is a field in a table, which is the primary key in a related table, referred to as the “master table” in the relationship. It is important to identify foreign keys in a database because they enable linking of ta-
Accounting
6 bles that are related to one another. From the viewpoint of database integrity, foreign key values must always refer to an existing value in the “master table” for that foreign key. In database terminology, this requirement is referred to as “referential integrity” and can usually be enforced by the database management system itself.
4. Relationship Types Relationships between entities and events, both of which are represented in a relational database by means of tables, can be of three types, referred to as the relationship cardinality. One-to-one (1:1), one-tomany (1:M), and many-to-many (M:M) relationships are the three relationship types. Consider the relationship between the “department” and “manager” entities. A 1:1 relationship between departments and managers implies that each department can have one and only one manager and each manager can manage one and only one department. Now consider the relationship between the “salespersons” and “customers” entities. A 1:M relationship between salespersons and customers means that each salesperson can have many customers but every customer is assigned to exactly one salesperson. Note that a 1:M relationship can be interpreted as an M:1 relationship when read from the opposite direction. Thus, the relationship from customers to salespersons is an M:1 relationship (many customers have one salesperson). An M:M relationship between salespersons and customers indicates that each salesperson can have many customers and each customer can work with many salespersons.
5. Table Types Depending on the contents of a table it can be classified under one of the following categories: master, transaction, reference, history, and backup. Master tables contain relatively permanent information and are analogous to ledgers in a manual accounting system. Customer, employee, vendor, and inventory tables are examples of master tables. Transaction tables contain relatively temporary information and are analogous to journals in a manual accounting system. Examples of transaction tables include those for storing typical business transaction data, such as sales, purchases, payroll vouchers, and cash receipts, and also nonaccounting business event data such as customer suggestions and complaints, equipment failure reports, etc. Reference tables are used to store relatively permanent information that is needed only for reference
purposes during the process of updating one or more master tables. Tax tables and price list tables are examples of reference tables. History tables are old transaction and master tables that are maintained only for reference purposes and for legal reasons. An example of a history table would be the July 1998 sales invoices table. These tables are usually maintained off-line as archive files. The last table type is backup tables. As the name suggests, backup tables are duplicate copies of transaction, master, and reference tables (since history tables are typically maintained only for legal reasons, most organizations would typically not maintain backup copies of history tables).
C. The Data Processing Cycle Given a basic understanding of the building blocks of database-oriented enterprise accounting systems as presented above, the data processing cycle can now be discussed. All information systems applications undergo a sequence of steps from data input to information output. This process, called the “data processing cycle,” comprises the sequence of steps from data capture to the generation of meaningful information for end users. Specifically, the steps in the data processing cycle are data input, data preparation, data processing, table maintenance, and information output. Data input involves collection of data and converting data into computer-readable form. Data input results in the creation of a transaction data. While in the past transaction data was first captured on paper and was then keyed into the computer system, newer technologies such as bar code scanners facilitate automatic entering of data. Data preparation may be needed to facilitate data processing and maintenance of tables. Data preparation entails two main steps: (1) validating input data to filter out erroneous transactions, and (2) sorting input data to facilitate the process of updating master tables, if the update is to be performed in “batch” mode on a periodic basis rather than “on-line” instantaneously. Data processing represents the next step in the data processing cycle. This step includes all the calculations, comparisons, and manipulations that are undertaken in the particular application. For example, in a sales application system, data processing might involve calculating the sales tax payable on each invoice. Table maintenance is the next step in the data processing cycle. This is the step where the master table(s) is(are) actually updated using transaction data. For example, customer balances in a customer table would be updated (in-
Accounting
7
Figure 3 Data processing cycle.
creased) to reflect new credit sales transactions for customers. Although data processing and table maintenance are shown as distinct steps in the data processing cycle, they are performed simultaneously in on-line database-oriented systems. Information output is the last step in the data processing cycle. This step is where reports are generated either on paper or on the user’s computer screen. For example, in sales application systems, the information output step could result in the following reports: sales analysis report, salesperson performance report, product turnover analysis, and customer statements of account. The data processing cycle is shown in Fig. 3, along with the table type(s) accessed at each stage. Note in Fig. 3 that the transaction table is accessed during the data input step in order to store input transactions and the master table is accessed during the table maintenance step in order to read and update the appropriate row in the master table.
IV. ACCOUNTING SYSTEMS FOR REVENUE (SALES) Using the basic building blocks discussed above, we focus now on accounting systems for meeting specific business needs. Every business will have slightly different business processes and it is the function of a firm’s accounting system to record, track, and report on the business processes unique to that firm. Presented next are an illustrative set of business processes for revenue, assuming a retailing organization. Generating revenue is one of the main processes at all business organizations. The primary events related to generating revenue for most retailing firms are to sell
merchandise and to obtain payment from customers. Secondary events for a retailing firm would typically include contacting potential customers, accepting sales orders, shipping merchandise to customers, and dealing with sales returns and allowances. Note that these events comprise both economic events (selling merchandise and collecting payment) and noneconomic events (contacting potential customers). The sales, collections, and returns processing functions may be carried out by one system or a set of related subsystems. In either case, the data underlying the system should ideally reside in one repository, consistent with the notion of the enterprise wide repository discussed earlier.
A. Business Processes Related to Revenue Generation At a very general level, questions that the information system for a retailing firm should provide answers for include: How much did we sell? How much do customers owe us? What is the value of sales returns and allowances? What is the total amount of collections we have received from customers? In addition to these general questions, other information needs include reports detailing the performance of salespersons, aging of accounts receivable, evaluating the performance of shippers, determining which products are selling well and which are not selling as well, and determining which products are being returned the most. The flexibility and power of the database approach allows both financial and nonfinancial information needs to be served by one integrated repository of organizational data.
Accounting
8 Revenue business processes actually begin with the advertising of merchandise for sale, salesperson contacts with potential customers, and unsolicited customer inquiries. The recording of revenue business processes within the system, however, generally begins with a customer order (i.e., a sales order) for merchandise. However, prior to receipt of a customer order there could be salesperson contacts with potential customers. It is therefore necessary to keep track of contacts between salespersons and potential customers. The extent to which these contacts result in sales orders would be one dimension of salesperson performance which management would be interested in. Contacts with existing and potential customers would also likely be a source of feedback about the company’s products and services. As a sales order is being input by a salesperson, it is necessary to first verify that the customer has sufficient credit available.1 If the customer’s credit status does not permit the order to be placed then the customer is simply referred to the credit department for further action. Each customer could obviously place many sales orders. However, each sales order must relate to only one customer. Every sale is executed by one salesperson, who can obviously input many sales orders. Each sales order is for at least one but possibly many items of merchandise inventory. Obviously, a merchandise inventory item could be sold on many orders. It is of course possible that a new inventory item has not as yet been sold on any order, but the company would still like to keep track of that inventory item. Before the order is recorded it is necessary to check whether the on-hand quantity of inventory is sufficient for each requested item. Upon recording of the sale it would be necessary to decrease the quantity on hand of the merchandise inventory items being sold. The accounting system would generate several copies of the sales order document. The first copy is given (or sent) to the customer as a confirmation of the order. One copy is forwarded to the shipping department. A warehouse clerk in the shipping department packs and ships out merchandise inventory. Each warehouse clerk can process either none or many shipments, but each shipment must be processed by exactly one warehouse clerk. The warehouse clerk prepares a shipping document (e.g., a bill of lading) for each shipment. However, there will be some time lag between the sales order and the shipping of that order. Thus, it is possible that a recorded sales order has not as yet been shipped. Every ship1
The assumption that all sales are on credit is for ease of exposition.
ping document, once prepared, must relate to exactly one sales order which has already been recorded. It is necessary to keep track of the inventory items and the quantity of each item actually shipped out on a shipment. Each shipment is assigned to one of several approved shippers with which the company does business. Each shipper could obviously be involved with many shipments. When the shipment is recorded by the warehouse clerk, a sales invoice is automatically generated and sent to the customer along with the merchandise. Note that when the sales invoice is generated, the sale is recorded in the system, but this recording of the sale is simply an information process and is not a “significant business event” in its own right. For simplicity, it is assumed that all sales orders are on credit (i.e., no immediate cash sales). Along with each shipment, the customer is sent a sales invoice and it is then necessary to update the customer’s account to increase the outstanding balance (i.e., increase “accounts receivable”). Customers make payments subsequently and each payment relates to a specific invoice resulting from a specific shipment. Collections are taken by cashiers. Each cashier can handle either none (i.e., a newly hired cashier) or many collections. Since collections from customers are received several days or weeks after the sale, it is possible that certain shipped orders do not have corresponding collections. However, a collection must relate to one sales invoice, i.e., one shipment received by the customer. Of course, there can be many collections for a given customer. Upon receipt of a collection from a customer it is necessary to update the customer’s account to decrease the outstanding balance. Cash collected from customers must be deposited in the bank. Finally, revenue business processes must also consider the possibility of returns and allowances. It is conceivable that customers could return merchandise. All returns and allowances are processed by returns clerks. Each return clerk can handle either none (newly hired clerk) or many returns. Customers could also be granted an allowance on a sale. Every return or allowance would relate to one shipment that has taken place, and it is possible that a shipment could be associated with none or many returns and allowances. The difference between a return and an allowance is that a return is associated with merchandise inventory whereas an allowance is not. For example, a customer may receive an allowance for slightly damaged merchandise, which the customer decides to keep. In the case of returns, at least one but possible many items of inventory may be returned by the customer. Returns and allowances
Accounting
9
also necessitate updates to the customer’s account to decrease the outstanding balance. For returns it would be necessary to keep track of the inventory items returned. Let us assume that the company would like to keep track of regular (“good”) inventory and returned (possibly defective) merchandise separately.
B. Entity-Relationship Data Model for Revenue Processes Based on the above narrative description of revenue business processes, the entity-relationship (ER) model shown in Fig. 4 is developed. The purpose of ER modeling is to develop a formal representation of the proposed accounting information system, in terms that both nontechnical users and information systems designers can understand. The ensuing ER model can then be implemented in a database system, as discussed later in the chapter. The ER model follows Mc-
Carthy’s “REA” framework proposed in 1982, which shows three categories of entities (rectangles)—resources on the left, events in the middle, and agents on the right. The lines connecting the entities indicate the type of relationship between the entities, including the relationship cardinality and whether the entity’s participation in the relationship is optional or mandatory. Recall that relationships can be 1:1, 1:M, or M:M. In the model below, the “many” side of a relationship is shown using crow’s feet and the “one” side of a relationship is indicated with the absence of crow’s feet. Also, note that the “|” at the end of a relationship line indicates a mandatory participation in a relationship and an “” indicates an optional participation in a relationship. The entities in the middle column in Fig. 4 represent revenue-related events in chronological order from top to bottom. The entities in the column on the left are the organization’s resources and the entities in the column on the right are the various agents,
Figure 4 ER diagram for revenue business processes.
Accounting
10 internal and external to the organization, who are involved with events. Taken together, the crow’s feet, | for mandatory participation and for optional participation make it possible to interpret the relationship between entities. For example, the relationship between the CONTACT-CUSTOMER event and the SALESPERSON entity would be interpreted as follows: each customer contact must be performed by exactly one salesperson; each salesperson may be involved with either none or many customer contact events. As another example, the relationship between CUSTOMERS and SALES-ORDERS would be interpreted as follows: each sales order must be placed by exactly one customer; each customer may place either none or many sales orders.
C. Data Repository for Storing Revenue-Related Information The ER diagram in Fig. 4 depicts the various entities and relationships that must be represented in the en-
terprise database for storing information related to revenue business processes for the illustrative retailing firm scenario. A standard set of conversion rules are applied to deduce the relational tables that should result from the ER diagram. The conversion rules are as follows: (1) a separate table is created for each entity; (2) attributes are created for each entity and the primary key in each entity is identified; (3) for the purpose of conversion, all “optional” relationships are treated as “mandatory many” relationships; (4) the primary key of the entity on the “one” side of a relationship is posted to the table of the entity on the “many” side of a relationship; and (5) a separate table is created for the relationship itself for entities participating in an M:M relationship with the primary key of each table being posted to the new relationship table to form a composite key. Attributes unique to many-to-many relationships are posted as nonkey attributes in the composite key table. Applying the conversion rules and streamlining the resulting tables to eliminate redundancies and inconsistencies, we arrive at the set of tables shown in Fig. 5. Primary keys are
Figure 5 Tables for revenue processing subsystem.
Accounting underlined and foreign keys are indicated with an asterisk at the end of the field.
D. Explanation of Tables The tables listed above should correspond to the entities on the revenue ER model. Data in the form of rows (records) in tables will be added, deleted, and updated via application programs. These programs, which could be written in languages such as C and Visual Basic, capture business event data, process the data, and update all affected tables. The various screens in a high-end ERP system such as SAP R/3 invoke programs that take data entered on the screen and update the relevant event, resource, and agent tables. The fields in the EMPLOYEES table should be self-explanatory. There are no separate tables for salespersons, warehouse clerks, shipping clerks, and cashiers, although those agents were shown separately on the ER diagram. Data pertaining to all these internal agents can be stored in one employees table, with the “department” field indicating the particular department in which the employee works. The CUSTOMERS table contains typical customer-related information of which the company would like to keep track. The “balance” field in the CUSTOMERS table shows the current balance owed by the customer. Updates to this field occur as follows: (1) increases resulting from credit sales; (2) decreases resulting from collections from customers; and (3) decreases resulting from adjustments such as returns, allowances, and perhaps bad debt write-offs (which we are not considering in our simplified example). The sum total of all “balances” in the CUSTOMERS table equals the company’s “Accounts Receivable” at any point in time. The CONTACTS table shows details about contacts between salespersons and customers. The FEEDBACK table is also used to record the actual feedback received from customers (i.e., complaints and suggestions). The SALES-ORDERS table can be thought of as being equivalent to a sales journal in a manual accounting environment. A listing of sales orders for a particular period can be generated by performing a query on the SALES-ORDERS table, specifying an appropriate criterion in the DATE field. The customer to whom the sale is made, and the salesperson who made the sale, can be found out using the foreign keys as the basis for linking the SALES-ORDERS table with the CUSTOMERS and EMPLOYEES tables. The INVENTORY table shows all the items of inventory available for sale, the current price, and the quantity on hand.
11 Three additional tables are needed for the M:M relationships. These are the ITEMS-ORDERED table (for the M:M relationship between SALES-ORDERS and INVENTORY), the ITEMS-SHIPPED table (for the M:M relationship between SHIPMENTS and INVENTORY), and the ITEMS-RETURNED table (for the M:M relationship between RETURNS-ANDALLOWANCES and RETURNED-MERCHANDISE). Attributes unique to each of these composite key tables are listed as nonkey attributes in each table. For example, the ITEMSORDERED table has two nonkey attributes: QTY-SOLD, and PRICE. These fields show the quantity of each item sold on each invoice and the price of each item sold on each invoice. The COLLECTIONS table shows payments on account received in settlement of invoices. Note that the COLLECTIONS table has SHIPMENT-NO as a foreign key. By linking the COLLECTIONS, SHIPMENTS, SALES-ORDERS, and CUSTOMERS tables it is possible to identify the customer who made each payment. As entries are made in the COLLECTIONS table, the appropriate BALANCE field for the customer who made the payment will be updated (decreased). The SHIPMENTS table also indicates the shipper assigned to make the shipment (via SHIPPERNO). The weight being shipped and shipping charges that apply are nonkey attributes in this table. The ITEMS-SHIPPED table indicates which items were actually shipped on each shipment and the quantity of each item that was shipped. The SHIPPERS table is used to keep track of shipping service providers. Amounts payable to shippers are reflected in the BALANCE field in the SHIPPERS table. Returns and allowances are recorded in the RETURNS-ANDALLOWANCES table, which has SHIPMENT-NO as a foreign key to indicate the shipment against which the return is being accepted. If the “returns and allowances” entry is simply an allowance, then no entries will be made in the ITEMS-RETURNED table. The amount of the allowance as shown in the AMOUNT field will be used to decrease the customer’s balance in the CUSTOMERS table. The CASH table can be thought of as the “cash at bank” resource of the firm. Moneys collected from customers result in an increase in the “cash at bank” resource. Finally, the ITEMS-RETURNED and RETURNED-MERCHANDISE tables are used to keep track of items returned by customers. The quantity returned and the reason for the return (i.e., defective merchandise, customer not satisfied, etc.) are indicated in the ITEMS-RETURNED table. Note that the item description is not repeated in the RETURNED-MERCHANDISE table because the de-
12 scription can be obtained by joining the RETURNEDMERCHANDISE table with the INVENTORY table.
V. ACCOUNTING SYSTEMS FOR PROCUREMENT (PURCHASING) The main business processes or events related to procurement for a retailing organization are to order merchandise from vendors and to pay vendors to settle accounts payable. Secondary events include processing requests for merchandise from departments, obtaining information about pricing and availability of merchandise from vendors, receiving and inspecting incoming merchandise, and returning defective or unneeded merchandise back to vendors. As with the discussion pertaining to revenue processes, the procurement business processes outlined here are also illustrative; actual processes would vary across companies. As with the systems for processing revenue activities, the various procurement activities could be performed by one system or a set of tightly integrated systems. In either case, the data underlying the procurement system resides in one repository, with appropriate links to the relevant revenue-related tables that were presented earlier. The essence of an enterprise wide repository is that all information needs are met via one integrated database (rather than a set of loosely coupled systems).
A. Business Processes Related to Procurement/Purchasing The procurement information system for a retailing organization should be able to answer the following questions: At what prices are various items being offered for sale by various vendors? What have user departments requested for purchase? How much did we purchase? How much do we owe vendors? What is the value of purchase returns? What are the total payments made to vendors? Which purchase orders are outstanding (merchandise not yet received)? Are there any items received for which vendor’s invoices have not been received? In addition, management would also like reports about which vendors supply the lowest cost items of each type, vendor performance in terms of on-time delivery of merchandise, quality of merchandise supplied as gauged by purchase returns, and trends in requests for items received from departments within the retail store. Many of these information items do not have a strict “financial” orientation. The design of the database for
Accounting meeting the needs of procurement business processes should consider all information needs and not just financial or accounting needs. At the same time, it is important to ensure that the database design can provide all the necessary accounting reports. The flexibility of the database approach allows both financial and nonfinancial information needs to be easily served by one integrated repository of organizational data. Procurement business processes for a retailing firm generally begin with a need for merchandise on the part of a department within the store. Let us assume that each department within the store has an employee who is responsible for keeping track of the quantity on hand of each item. This tracking would be done by accessing the INVENTORY table, which was presented in the discussion of revenue-related business processes. In the earlier presentation, the INVENTORY table was being accessed and updated to reflect the sale of merchandise. Here, the very same table is being accessed and updated to reflect purchases of merchandise. This is the essence of crossfunctional integration which is at the heart of complex ERP systems like SAP, PeopleSoft, and J.D. Edwards. The INVENTORY table has “quantity on hand” and “reorder point” fields. In essence, when the quantity on hand falls below the reorder point it is necessary to initiate a purchase requisition for that item. This tracking of the quantity on hand of each item could easily be done by means of a programmed procedure that scans each row in the INVENTORY table. Whether initiated by a programmed procedure or by an employee in each department, a purchase requisition, or a formal request for items to be purchased, must be prepared and authorized. It is this requisition (request for items) that documents a legitimate need for purchases. A purchase requisition would list at least one item needed but possibly many items needed by the department. Independent of the procedures involved in creating purchase requisitions, personnel in the purchasing department obtain information about the availability of items from different vendors. Quotations from vendors indicate the price of each item being offered for sale. It is important to keep track of this information even though it has no direct “accounting” implication. This information would be valuable when a department issues a purchase requisition. Specifically, the purchasing agent can scan the “items available for sale” table to determine whether any vendor has a special discount on the needed item(s). Keeping such information current is a proactive measure in contrast to a reactive process of soliciting quota-
Accounting tions from vendors after a purchase requisition has been received. The next step in the purchasing process is to actually issue a purchase order for a particular requisition on an approved vendor. One purchase order is placed for every purchase requisition. Due to the time lag between placing the requisition and the order it is possible that a requisition does not have an associated purchase order. A purchase order can be placed with only one vendor, but a vendor can obviously have many purchase orders. Each purchase order would contain at least one item but possibly many items to be purchased. When the vendor delivers the merchandise a receiving report must be prepared. While there may be many receiving reports for a purchase order, each receiving report can relate to only one purchase order on a specific vendor. However, due to the time lag between placement of the purchase order and receipt of the merchandise it is possible that there is no receiving report for a purchase order. In effect, the purchase orders for which there are no receiving reports constitute “open” purchase orders (i.e., outstanding orders for which the merchandise has not as yet been received). Each receiving report would have at least one but possibly many items received. Upon receipt of the vendor’s invoice for items delivered, and after the merchandise has been received, a purchase liability can be recorded. As was the case with the sales invoicing process, note that the actual recording of the purchase liability is simply an information process and is not a “significant business event” in its own right. The three prerequisites for the recognition of a purchase liability are (1) a vendor’s invoice, (2) a receiving report, and (3) a purchase order. The invoice represents a claim for payment for merchandise delivered. The receiving report acknowledges that the merchandise was in fact received. Finally, the purchase order indicates that the items received were in fact ordered. In a manual system, the three documents are physically matched before a purchase liability is recorded. In a relational database system, the existence of a valid receiving report and purchase order are determined through foreign keys. Foreign keys in a relational database system in effect constitute an “electronic audit trail.” Vendors must periodically be paid, and these payments result in a decrease in the firm’s cash resource. Every payment to a vendor is associated with one merchandise receipt, which in turn relates to one purchase order. Of course, since there will be a time lag between receipt of merchandise and the actual payment; it is possible that a merchandise receipt does not yet have an associated payment. However, when
13 merchandise is received and the vendor has submitted an invoice, as already discussed, a purchase liability exists. In effect, the purchase liabilities that are “open” (i.e., unpaid) constitute the company’s accounts payable at any point in time. The only other procedures related to procurement of merchandise have to do with returns. Defective or damaged merchandise is returned to vendors and they are issued a debit memorandum to that effect. The debit memoranda represent decreases in the accounts payable liability. Each debit memorandum for a purchase return relates to only one merchandise receipt and thus one purchase order. Assuming that there are many items received, it is possible that different items can be returned on several different returns. Obviously, it is possible that a merchandise receipt has no associated purchase returns. Each purchase return will have one or more items returned, and each item can be returned on several different purchase returns. Since a debit memorandum could be issued only to request an allowance without actually returning merchandise, it is possible that a purchase return does not actually have any items returned associated with it. The increases and decreases to the vendor’s balance and to the inventory quantity on hand indicated are simply the “debits” and “credits” to the vendor’s and inventory accounts.
B. ER Data Model for Procurement/Purchasing Processes Based on the above narrative description of procurement business processes, the following ER model is developed. Again, note that the purpose of ER modeling is to develop an easily understandable formal representation of the proposed accounting information system as the first step toward building a functioning databasedriven system. The ER model shown in Fig. 6 follows the same conventions described earlier, with resource entities on the left, event entities in the middle, and agent entities on the right. Relationships between entities also use the same conventions for indicating mandatory and optional relationships, with crow’s feet being used to indicate the “many” side of a relationship. The ER model in Fig. 6 is interpreted much the same as the model shown earlier for revenue business processes. The entities in the middle column represent procurement-related events in chronological order from top to bottom, those on the left are the organization’s resources that are affected by procurement activities, and the entities in the column on the right are the agents involved with procurement-
Accounting
14
Figure 6 ER diagram for procurement business processes.
related events. The crow’s feet, | for mandatory participation and for optional participation make it possible to interpret the relationship between entities. For example, the relationship between the VENDORS agent and the PURCHASE-ORDERS event would be interpreted as follows: vendors place none or many purchase orders; each purchase order must be placed by exactly one vendor.
C. Data Repository for Storing Revenue-Related Information Using the same rules outlined earlier, the ER model for procurement activities is converted to a set of tables, as shown in Fig. 7. As before, primary keys are underlined and foreign keys are indicated with an asterisk at the end of the field. The tables shown in Fig. 7, and the tables shown earlier as a result of converting the revenue ER model, would be implemented in a relational database management system such as Oracle 8i or Microsoft SQL Server 2000.
D. Explanation of Tables As with the tables resulting from conversion of the revenue ER model, the tables shown in Fig. 7 should correspond to the entities on the procurement ER model. In an ERP system, the tables in Fig. 7 would be updated as a result of application programs that process data entered on user screens. Note that the EMPLOYEES, INVENTORY, and CASH tables, highlighted in bold type, are the same tables shown earlier for revenue business processes. This is the essence of cross-functional integration—related business processes share information by accessing the same tables from different applications or modules. Most of the fields in the tables in Fig. 7 should be self-explanatory. Note that although the various internal agents involved with procurement processes are shown separately on the ER diagram, there is only one employees table. The EMPLOYEE-NO field in various tables indicates the employee who added the entry into the table. How is it possible to ensure that say only employees who are purchasing agents in the
Accounting
15
Figure 7 Tables for procurement processing subsystem.
purchasing department add orders to the purchase orders table? A control could be programmed into the system to verify that the “grade” of the employee entering the purchase order is in fact “purchasing agent” and that the employee is in the “purchasing” department. The “balance” field in the VENDORS table shows the current amount owed to each vendor. The sum total of all “balance” fields in the VENDORS table constitutes current accounts payable. Updates to this field occur as follows: (1) increases resulting from purchase liabilities executed by means of an “update query” (using Access terminology); (2) decreases resulting from payments recorded in the PAYMENTS table; and (3) decreases resulting from purchase returns recorded in the PURCHASE-RETURNS table. The balance field is an example of a logical field (calculated field) whose value can be derived based on other values in the database. Thus, rather than actually storing “balance” as a physical field, its value can
be recomputed whenever it is needed by running a query. For expediency, however, most organizations will find it convenient to store this value, to avoid overloading the information system especially when purchase transactions number in the millions. The ITEMS-SUPPLIED table shows which vendor supplies which items and the current price of the item. It is this table that would be accessed by purchasing department personnel upon receipt of a purchase requisition from a department. The RECEIVINGREPORTS and related ITEMS-RECEIVED tables would be accessed upon receipt of merchandise. The PAYMENTS table is accessed periodically whenever payments are made to vendors. Note the connection between PAYMENTS and CASH—payments to vendors result in a decrease in available cash. The PURCHASE-RETURNS and related PURCHASEDITEMS-RETURNED tables are accessed when items are returned to vendors. Note that, in the PURCHASEORDERS table, the vendor-invoice-no and vendor-
Accounting
16 invoice-amount fields would be left blank when the purchase order itself is created. These fields are relevant only when the vendor’s invoice is actually received. As discussed above, recording the fact that the vendor’s invoice is received and that a purchase liability exists is simply an information process. The vendor’s invoice should indicate the purchase order to which the invoice relates—this purchase order number is then used to retrieve the appropriate row in the PURCHASEORDERS table to update the vendor-invoice-no and vendor-invoice-amount fields. Periodically, a query can be run to determine the rows in the purchase orders table that have values in the vendorinvoice-no and vendor-invoice-amount fields. The resulting rows can then be matched with the RECEIVING-REPORTS table using the PO-NO foreign key in the RECEIVING REPORTS table to ensure that the items on those purchase orders have in fact been received. The ITEMS-REQUISITIONED, ITEMS-ORDERED, ITEMS-RECEIVED, and PURCHASED-ITEMSRETURNED tables all represent M:M relationships. Attributes unique to each of these composite key tables, such as the quantity needed field in the ITEMSREQUISITIONED table, are listed as nonkey attributes in each table. You might observe that the PAYMENTS table does not include VENDOR-NO as a foreign key. How can the vendor for that payment be identified? The answer is to follow the trail of foreign keys linking the PAYMENTS, RECEIVING-REPORTS, PURCHASEORDERS, and VENDORS tables. A whole host of queries can similarly be answered by following the trail of foreign keys linking related tables.
VI. ENTERPRISE-WIDE ACCOUNTING SYSTEMS In order to achieve an integrated enterprise-wide accounting information system, all enterprise-wide business processes should be modeled together in an integrated manner. Models for sales and purchases business processes were shown above. Now let us examine how these closely related business processes are modeled together, especially in a retailing organization. As shown in the ER diagram below, a comprehensive model showing all purchases and sales-related business processes can be somewhat cumbersome. The diagram shown in Fig. 8 covers sales and purchase related processes, as well as business processes related to employees, other expenses, fixed assets, and loans. An enterprise-wide model for a manufacturing organization would be much more complex since it would have to encompass all processes related
to procurement of raw materials, manufacturing, labor, work in progress, finished goods inventory, and all sales related processes. In the cross-functional ER model shown in Fig. 8, note in particular the integration points between business processes. For example, inventory and cash are the two key resources that are affected by both purchasing and selling activities. Inventory increases as a result of purchases and decreases as a result of sales; cash increases as a result of sales and decreases as a result of purchases. Another key aspect of the crossfunctional ER model to note in Fig. 8 is that every resource both increases and decreases in value. Resources are at the heart of every business and the value of an organization’s resources would fluctuate over time. In the context of the enterprise ER model above, the resource inventory increases as a result of purchases and decreases as a result of sales. By contrast, the resource cash decreases as a result of purchases and increases as a result of sales. Thus, each increase in a resource because of an event will eventually result in a corresponding decrease in another resource through another event. For example, the purchasing set of events results in an increase in the inventory resource and a corresponding decrease in the cash resource. Similarly, the sales set of events results in a decrease in the inventory resource and a corresponding increase in cash. Note also that each resource increase or decrease occurs in response to an event and there is one internal and (usually) one external agent involved in that event. In addition to revenue (sales) and procurement (purchasing) processes, the ER model also shows (1) human resources related processes—hiring, compensating (i.e., paying), and terminating employees; (2) processes for recording and paying other expenses such as utilities, maintenance expenses, phones, etc.; (3) fixed asset processes—acquiring and retiring/selling fixed assets; and (4) loan related processes—obtaining and paying off loans. The addition of these processes results in a comprehensive enterprise-wide model for meeting both accounting and other information needs.
A. Controls in Enterprise Accounting Systems A critical issue regarding enterprise accounting systems is the accuracy and integrity of the data stored within the system. Users of the enterprise system must have assurance that the information they receive from the system is error free. Recall that in an enterprise system data are stored in a single reposi-
Accounting
17
Figure 8 ER diagram for other business processes (beyond purchases and sales).
tory or database. Control and security concerns are heightened in database environments because of the single shield protecting the entire database, i.e., the database management system. However, database technology also provides opportunities to build controls into the system itself such that errors and irregularities are prevented from ever occurring. Control procedures are mechanisms designed to prevent, detect, or correct errors and irregularities. The hardware inside computer systems will usually process transactions and perform calculations in a flawless manner. However, the software that directs the functioning of computer hardware is designed and cre-
ated by humans. It is the software component of computer-based information systems, and the human component that interacts with computer-based systems, that can cause errors and irregularities in data and thus bring the need for good controls. Control procedures may be applied at various organizational and data processing levels. Control procedures that affect all information systems and subsystems within the organization are categorized as general control procedures, while controls designed to prevent or detect errors within each information system or sub-system are categorized as application control procedures.
Accounting
18 General control procedures are the methods and measures adopted within a business to promote operational efficiency and encourage adherence to prescribed managerial policies. Segregation of duties, maintenance of proper documents and records, and making sure that all accounting transactions are appropriately authorized are some of the common general control procedures that should exist in all accounting systems, regardless of the presence or extent of computerization. For computer-based accounting systems, general controls are those controls that facilitate the effective operation and management of the organization’s computer systems and all its applications. The six categories of general control procedures are (1) proper organization of the information systems department to ensure adequate supervision over employees and segregation of duties, (2) system development and program change controls including procedures to authorize, test, and document new systems and changes to existing systems, (3) hardware controls including scheduled maintenance of computing and related equipment, (4) access controls to ensure an appropriate match between the duties of employees and the type of access they have to specific files and programs, (5) computer operations controls that cover the functioning of the organization’s computing center, and (6) backup and recovery procedures, including a disaster recovery plan, to protect against accidental or intentional loss of data. Application control procedures are focused on ensuring the accuracy and reliability of data within each subsystem of the enterprise-wide information system. Computer-based application control procedures include input controls, processing controls, and output controls. As their names suggest, these three sets of control procedures are applicable during the input, processing, and output stages of the data processing cycle. Input control procedures are essentially procedures to validate the data. In an enterprise system employing a relational database, a number of data validation rules can be defined at the table level within the database. In addition, the field type designated for each field in a table can itself serve as a control mechanism. For example, fields defined as Date/Time will accept only date and time data appropriately formatted. In addition, validation rules, which are enforced automatically, can be designed to check whether data entered into a field in a table (1) fall within a certain range, (2) are of the correct length, and (3) match one of the acceptable values for the field. In on-line systems, if a field can have only one of several acceptable values, then the user can be presented with a “pick list” of acceptable values from
which a selection can be made. Another powerful feature in on-line systems is the ability to program the system to automatically enter data in certain fields. This control procedure, referred to as system generated data, can for example, enter the current date and next order number on an order entry form.
B. Beyond Transaction Processing Systems For large organizations, several gigabytes of data may be recorded in the enterprise database within a week or even a day. Moving beyond simply recording transactions, organizations are seeking to obtain business intelligence from their large data repositories. In essence, the goal is to find the “gems” of information in the gigabytes of transaction data. Data warehousing, data marts, and data mining are three concepts aimed at allowing an organization to lever its data to obtain a competitive advantage. A data warehouse is a repository of historical business transaction data, organized in a manner to facilitate efficient querying for reaching marketing, tactical, and strategic decisions. The key point is that the data warehouse is separate from the organization’s “live” database that captures and stores current transaction data. A data mart is a closely related concept. It can be defined as a repository of data gathered from operational data and other sources that is designed to serve certain users’ needs. The data in a data mart may actually come from a data warehouse, or it may be more specialized in nature. The emphasis of a data mart is on meeting the specific demands of a particular group of users in terms of analysis, content, presentation, and ease of use. The terms data mart and data warehouse often imply the presence of the other in some form. Data mining is the analysis of data for relationships that have not previously been discovered. For example, data mining of sales records for a large grocery store may reveal that consumers that buy diapers also tend to buy ice cream and apple juice. Data mining, also referred to as knowledge discovery, looks for associations (correlated events—beer purchasers also buy peanuts), sequences (one event leading to series of related events—someone buying tile for a new home later buys other home products), classification of data (generating profiles of consumers showing their interests), clustering of data (finding and visualizing groups of related facts), and forecasting (making predictions about future events based on an analysis of patterns of existing events).
Accounting
SEE ALSO THE FOLLOWING ARTICLES Computer-Aided Manufacturing • Control and Auditing • Human Resource Information Systems • Operations Management • Procurement • Productivity • Public Accounting Firms • Supply Chain Management • Transaction Processing Systems
BIBLIOGRAPHY Hollander, A., Denna, E., Cherrington, J. O. (1999). Accounting, information technology, and business solutions, 2nd edition. Boston, MA: McGraw-Hill.
19 Murthy, U. S. (2000). Database systems design and development, 2nd edition. Bloomington, IN: CyberText Publishing. . Murthy, U. S., and Groomer, S. M. (2000). Accounting information systems: A database approach, 5th edition. Bloomington, IN: CyberText Publishing. . Perry, J. T., and Schneider, G. P. (2000). Building accounting systems using Access 2000. Cincinnati, OH: Thomson Learning/Southwestern. Potter, D. A. (1993). Automated accounting systems and procedures handbook. New York, NY: John Wiley. Romney, M., and Steinbart, P. (1999). Accounting information systems, 8th edition. Reading, MA: Addison-Wesley.
Advertising and Marketing in Electronic Commerce Brenda J. Moscove and Robert G. Fletcher California State University, Bakersfield
I. MARKETING AND ADVERTISING DECISION AREAS II. MECHANICS OF ADVERTISING AND MARKETING FOR E-COMMERCE
III. IDENTIFYING OPPORTUNITIES AND PROBLEMS IV. NEW DEVELOPMENTS
GLOSSARY
cline) and determines the way in which the product/service can be advertised and marketed. psychographics Used to group target market segments according to psychological characteristics (attitudes, values, personality, lifestyle, etc.). spam Unsolicited e-mail contact with present and potential customers. It is like junk mail. target market A group of customers that have common wants and needs that the company or organization chooses to serve.
advertising Paid communications to the masses by an identifiable sponsor about a product, service, or idea. It should be a persuasive message. banner A miniature billboard that appears when an Internet user calls up a specific web page. button Smaller than a banner providing a link to the advertiser’s home page. It can be an icon and generally costs less than a banner because of its smaller size. demographics The statistical characteristics of the group (population, target market, age, income, etc.). electronic commerce (e-commerce) A buying and selling process that is supported by the use of the Internet. interstitials Animated screens that pop up briefly as information is downloaded from a web site. Quite often these screens, sometimes known as “splash pages,” are advertisements. lifestyle Describes how customers live. It is helpful in planning advertising and marketing strategies attractive to users of the product/service category. marketing concept Managing a company based upon determining the wants and needs of target markets through delivering satisfaction to these customers at a profit. In other words, the profit is derived from customer satisfaction. In the case of a not-forprofit organization, the same definition applies except that the words “at a profit” are omitted. product life cycle Defines the stages of the product category (introductory, growth, maturity, and de-
I. MARKETING AND ADVERTISING DECISION AREAS There are several decisions that must be made before designing an information system for e-commerce. Several of the important decisions are discussed in the following sections. This discussion is not meant to be comprehensive but is included to illustrate the types of questions and issues that an organization’s management should consider before setting up an information system. The information system is built to satisfy the organization’s goals and objectives for the system; thus, every information system for e-commerce advertising and marketing may be unique to the sponsoring organization. The organization’s goals and objectives may change and become more complex over time. Conversely, an organization may find that a simple system would be satisfactory. Therefore, it is necessary to revise, simplify, or expand the system to keep it practical and up to date in terms of carrying out management’s objectives.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elservier Science (USA). All rights reserved.
21
22 The primary emphasis of this article is the information system needed for e-commerce with focus on advertising and marketing specific goods and services. The issues for building a support system to facilitate advertising and marketing strategies are twofold: a consideration of target market information needed for planning and implementing the strategies; and the ability to take orders, process orders, and deliver the goods and services in a satisfactory manner. A fundamental consideration is whether or not the organization’s marketing and advertising strategies should include e-commerce. In other words, would the organization become more efficient and/or profitable by engaging in e-commerce?
A. Target Market Selection The organization needs to determine the audience for the e-commerce message. Is the audience the general public looking for information such as free information dispensed by a local, state, or national government? Another example is free information that is distributed by trade organizations, unions, and cause groups. Is the desired target a present or potential consumer? Consumers are people who will utilize or consume the product or service for their own personal use. Ski equipment, clothing, and books are examples of consumer products. Airline tickets, concert tickets, and on-line investments are consumer services. Is the target audience likely to be a business customer? A business customer uses the product and service to carry on or maintain the business activities. Examples are raw materials, component parts, equipment, supplies, and services like bookkeeping or advertising. Another category for defining the target audience is the institutional or nonprofit segment. The customer may be a governmental entity, a nonprofit hospital, a museum, or a charity, for example. Both business and institutional customers can be combined under the umbrella of “organizational” customers. A critical issue for defining target markets is the marketing strategy chosen by the organization wishing to engage in e-commerce advertising and marketing. If they choose to market to only one segment, the strategy is called concentration. For instance, if the seller identified three potential segments in the hotel industry such as budget, family, and status, the organization may decide to target only the family segment. Thus, the seller concentrates its strategies on this sin-
Advertising and Marketing in Electronic Commerce gle target market. All advertising and marketing efforts, including e-commerce, are designed to attract and satisfy this single customer group—the family. Another strategy the organization could choose is to select the family and budget segments and devise target advertising messages and marketing efforts for each group. This strategy involves multiple segments; i.e., targeting more than one segment, but not all the segments identified, and developing advertising and marketing cues for each target served. A third strategy is the combination method. The organization may consider the wants and needs of the budget and family market(s) to be fairly similar. Therefore, it combines, or aggregates, these two segments into one target market. It would reject involvement with the status segment. Advertising and marketing e-commerce efforts are designed to appeal to both the family and budget combined target market. These efforts must be cleverly designed to attract and satisfy the combined sector with one set of strategies. Of course, the final option in overall target audience identification is mass marketing. The organization decides to treat all possible buyers the same and develops only one set of strategies for the entire range of present and potential customers. Mass marketing is the least expensive strategy but can be costly in terms of lost sales because the mass marketing strategy lacks focus. A complete discussion of target markets and advertising and marketing strategies is beyond the scope of this article. However, additional, detailed information is presented in marketing and advertising textbooks such as: Principles of Marketing, 9th Edition, by Philip Kotler and Gary Armstrong, and Contemporary Advertising, 7th Edition, by William F. Arens. Further insights into business to business on-line commerce is available in “Let’s Get Vertical” in Business 2.0 magazine. Marketing and advertising information systems differ according to the type of audience sought. The starting place for determining the type of information system needed for e-commerce is defining the target market(s) the organization wants to serve and designing the system to incorporate the information needed by the target audience(s).
B. Differences in Reaching the Target Markets with E-Commerce In conventional advertising and marketing, the marketer and/or advertiser can select the method of reaching the target markets. Thus, the initiative rests
Advertising and Marketing in Electronic Commerce with the marketer to choose the media and message that will best attract the attention of the desired target market(s). Chances are that, with the right combination of media and messages, a portion of the target audience(s) will at least be exposed to the product, service, and/or advertising message. With e-commerce, the initiative is in the hands of the target market (prospective buyer). The prospects elect to visit the home pages for the various companies. This means they can elect not to view the home pages as well. The shift in balance for e-commerce, placing the selection power in the hands of the prospective buyer, creates new challenges for the advertisers and marketers. How do they get the prospects to visit their web pages? For specific hints about turning first-time visitors into repeat customers for a web site see “Success Is a Repeat Visitor,” in Fast Company, June 1998. What information is sought once the web site is visited? First, the organizations wishing to attract buyers must advertise their web sites in a way that gets the prospective customers to visit the web site. Simply having a site and posting information on it is no guarantee that the prospective customers will find any relevant information on the site. Many of the dotcom ads do not explain what the company does or why the audience should visit the site. They are not memorable. They only add to the clutter of messages that constantly bombard the public and organizational target markets. What would encourage a site visit? What would make the target audience remember the dotcom address, company, and products/services? The dotcom ads should be precise and detail the benefits to the audience for visiting the web site. Then, the target market actually must visit the web site. Often, the organization has not clearly defined what the target market wants in terms of information; and the web pages fail to attract interest and attention. Furthermore, often the web pages are not user friendly; and the prospect exits the advertiser’s site without getting the desired information. The advertisers and marketers must understand why the prospective customers are visiting the web site and make certain that the proper information and appeals are included in the messages posted on the site in a user-friendly, attractive format. For instance, do prospective customers need to read the company history or do they simply want product/service information? Do they care how many dealerships/retail outlets a seller has? Do they need to know about the size, location, and capacity of the company’s production plants? Further, do they need to know how the prod-
23 ucts are made? Such information is usually considered image rather than product/service information. Image information is aimed at constituencies like shareholders, potential investors, and lending institutions rather than customers and potential buyers. The real question is how to emphasize information and appeals that will stimulate action for the party visiting the web site instead of concentrating on the items in the above paragraph that have little impact on purchasing decisions. What kind of response action does the organization wish to elicit? Are the materials designed to stimulate the desired response? Part of the system established must contain information directly from prospective customers about what is desired (See Section D, Scope of Electronic Commerce and Marketing Activity).
C. Audience Demographics and Behavioral Characteristics Another series of questions involves whether the target market(s) is a specific demographic group of consumers, i.e., sex, age, homeowner, etc., or, a specific group of organizational customers like banks, hospitals, retailers, etc. Furthermore, final consumers can be defined in terms of life cycles, lifestyles, psychographics, and other behavioral characteristics in addition to demographic categories. Examples include: athletic, active/passive, outdoor person, couch potato, etc., lifestyles; type A or type B personalities; and nonuser, light user, medium user, and heavy user consumption patterns. Businesses can be defined in terms of organizational characteristics as well as types of business. A useful way to identify possible target markets in the United States is to consult the Standard Industrial Classification (SIC) codes issued by the federal government. These numerical codes categorize organizations into specific industries and give information about each organization like the industry type and number of organizations in the industry. It also identifies individual firms, sales volume, number of employees, location, and other information similar to demographic information for final consumers. A North American Industry Classification System (NAICS) is being developed for North American Free Trade countries. Other countries, like Singapore, have similar classification systems. For example, this type of data may be useful in determining the size of the potential business or organizational customer. Size must be defined. Is size the number of employees, the
24 total annual revenue, the yearly gross or net profit, the annual earnings per share, or any other factor that is meaningful in identifying which potential customer group(s) should be targeted? For instance, if a company is selling cardboard boxes, the potential number of cardboard boxes used per year by specific businesses may be helpful in determining its target group. The classification codes’ detailed lists of organizations for certain industries, if helpful, could be built into the information system for advertising and marketing using e-commerce techniques. In addition, target organizations may be defined in terms of organizational behavioral characteristics like always looking for new technology, the lowest price, or status products and services. Usage patterns also are important. These behavioral characteristics of organizational buyers and potential purchasers can be determined best by primary marketing research information that must be gathered directly from the desired target markets. The information cannot be found in secondary reports like government census and other statistical data banks. Behavioral characteristics are vital to the information systems for planning good marketing and advertising strategies for reaching reseller and not-for-profit markets. Another issue for consideration is the geographical boundary for the target market(s) regardless of whether it is a final consumer, business, or institutional customer. Is the market area a city, a region, a country, a group of countries, or worldwide market? Should a Geographic Information System or Global Positioning System be integrated with the advertising and marketing information system for planning effective advertising and marketing strategies? In determining the audience, these geographical factors must be considered in order to define the target market(s) and location(s) before the information system can be defined and constructed. See Section II.B for additional information. Any information useful in identifying potential target segments and their wants and needs should be built into the support system for e-commerce. The information must be timely and accurate. The success of using e-commerce for advertising and marketing is largely dependent on how well the e-commerce advertising and marketing strategies satisfy the target audience(s).
D. Scope and Purpose of E-Commerce Marketing and Advertising Activities It is important to clearly state the purpose for incorporating e-commerce into marketing and advertising
Advertising and Marketing in Electronic Commerce strategy. There are many examples of alternative purposes. Is the purpose to dispense information to the public without seeking a sale like facts about health, community information, or a listing of charities? Is the information geared for building the image (goodwill) of the sponsor rather than immediate sales? Or, is the purpose to attract investors for the sponsoring organization? Is the purpose to generate an inquiry for the sponsoring firm so that a sales person can close the sale? Is the message designed to generate a direct sale without human contact? Also, e-commerce can be used to influence future sales like providing information that may lead to a sale direct from the Internet or through an inquiry during the next year. For example, an automobile manufacturer may sponsor messages about various vehicles knowing that it may take the buyer nine months to decide to actually purchase a new vehicle. The manufacturer wants to ensure that its vehicle models will be considered when the actual purchase occurs. The e-commerce efforts can include support activities designed to influence purchases by enhancing the competitive position. Offering an e-mail service line to solve the most common problems with the product or service for the buyer may give the company a competitive edge in the market place. This strategy can also be a cost-saving measure on the part of the sponsor. Information about recurring problems and complaints must be built into the information system so that the service line can be responsive. In addition, the service line must be advertised to attract users.
E. Purpose of the Web Site There are many purposes for e-commerce efforts. Too often, organizations design web sites without considering what the purpose is for the web site. The purpose helps define the type of information needed to support the web site. The target audience(s) and purpose(s) should determine the type of information, amount of information, and mechanics for accessing the information from the support system. For instance, the prospective buyer is looking for benefits. When he/she finds the product or service that provides the desired benefits, a transaction will take place. If the seller’s message includes a complete company history, product features, and attributes, and other irrelevant information instead of benefits, the prospective buyer will quickly lose interest and visit another site. Also, the web site must be designed to provide easy access to the desired information through a logical series of linkages. The user must be able to
Advertising and Marketing in Electronic Commerce understand the linkages and quickly access only the information desired. A major issue, often overlooked, is whether the web site is designed to replace the organization’s current marketing and advertising strategies or supplement present efforts. A company can use e-commerce as its only means of generating sales or as a means to enhance its present traditional advertising and marketing endeavors. Different marketing and advertising strategies apply for brick-and-mortar companies versus those organizations that do business only over the Internet. For example, a retail chain can use e-commerce to add to its existing customer base supplementing its instore sales. E-commerce could be used to generate an inquiry for the retailer. Then, the retailer or dealer could close the sale, or, sales may be completed online. The on-line sales supplement the organization’s traditional in-store sales. Part of the information system consists of getting existing customers to visit the web site to view additional merchandise: clearance items, items that are no longer stocked by the retailers (bulky items like furniture, special orders, etc.). Establishing an interface between in-store customers and Web contacts is essential. An organization can replace its brick-and-mortar outlets by choosing to do business only by e-commerce. In this case, the existing dealerships, retail establishments, etc., are closed. For example, retail chain Z closes all stores converting all marketing operations to on-line activities. A catalog business has the same options as a brick-and-mortar operation: to continue to do business by catalog and supplement its orders through e-commerce or to convert entirely to e-commerce and dispense with the catalog. Clearly, the place strategy shifts from emphasis on the retail location to emphasis on the on-line location. However, the distribution strategy becomes more important in terms of the ability to fulfill customer orders on time and in a satisfactory manner. Promotion strategies emphasize attracting the target audience to the web site and obtaining on-line orders instead of the store or dealership visits where merchandise can be seen, touched, tasted, smelled, etc., before purchasing and, often, serviced after the sale. The information base of present and desired customers, their wants and needs, a means of tracking orders, deliveries, and customer satisfaction must be established. Finally, a new business can decide to maintain both e-commerce and brick-and-mortar sites (supplemental) or to rely exclusively on e-commerce techniques for advertising and marketing its products and services.
25
F. Interface with Other Marketing and Functional Activities (Production, Product/Service Offerings, Distribution, Pricing, and Other Promotion Activities) The information system for advertising and marketing using e-commerce techniques must consider the impact of e-commerce activities on the other functional areas of the company. A vital consideration for the information system is the type of goods and services to market on-line. For example, does the company wish to market their full line of products and services or only those that cannot be obtained elsewhere like clearance merchandise, custom-designed products/services, help-line services to back up dealer products/services, business-to-business networking services? Business networking services include those linkages with business or institutional customers requiring regular repurchasing contacts and establishing procedures that can be on-line functions. The mix of products and services most often featured online includes computer, travel, entertainment, books, gifts and flowers, apparel, and food and beverage goods and/or services. If the desired outcome of the web site is an on-line order, the information system must be designed to accommodate all the needed information for taking, processing, and delivering the order in a satisfactory manner. In terms of order fulfillment, can the present structure used for brick-and-mortar operations accommodate sales stimulated by e-commerce? Many organizations use Just-In-Time (JIT) inventory procedures. Would an expansion in sales through e-commerce interfere with the operation of these procedures? For example, would a retail organization be able to obtain and deliver the large quantities of merchandise that are ordered just before the Christmas holiday? Would producers utilizing JIT inventory procedures be able to produce the items? It is possible that JIT may have to be supplemented with inventory buildup and storage procedures needed to provide satisfactory order fulfillment. Alliances with Federal Express and United Parcel Service solved the distribution problems experienced by many companies sending individual packages to on-line purchasers. Toys R Us allows on-line customers to avoid shipping charges by using its existing stores as pick up and/or return sites. Wal-Mart hired Fingerhut, a company with a long history in logistics and distribution, to assist in handling on-line orders and deliveries. Wal-Mart associated with Books-A-Million to supply and deliver books ordered on-line. The
26 alliance also increases Wal-Mart’s ability to compete with amazon.com. These alliances require changes in the distribution and information systems appropriate to support the seller’s activities and service levels. For a more detailed discussion of the implications of e-commerce on the spatial aspects of organizations, see the working paper by Fletcher and Moscove (2000). The costs of e-commerce must be factored into the costs of doing business. In some instances, replacing brick-and-mortar outlets with e-commerce results in savings to the sellers. Cisco, an Information technology manufacturer, estimates a savings approximating $300 million per year in staff, software distribution, and paperwork costs due to e-commerce transactions. Conversely, the costs of doing business may actually increase. For instance, reorganizing distributions systems to include warehouse facilities needed to service the e-commerce customers satisfactorily may increase the total costs of transactions. Such additional costs must be taken into account when establishing prices for e-commerce offerings. Pricing philosophies of the seller’s organization must be factored into the information system as well as changes in the cost/profit relationships produced by the e-commerce activities.
G. Customer Satisfaction and Relationships Good marketing and advertising focuses on customer and potential customer satisfaction. This statement is true regardless of whether the customer is a final consumer, a business customer, or an institutional buyer. Satisfied customers demand more products and services and create more sales transactions. More sales transactions usually result in increased profits. Good advertising encourages people to try the product/service and to keep repurchasing it. It helps the purchaser determine that he/she has made the right choice, defend the purchase, and persuade others to buy the same products/services. The information system must contain helpful data and information from the customers’ perspectives, not the sellers’ perspectives, about what benefits the buyer pool expects from the products and services being marketed. The system also must contain information about how the customers evaluate their purchases after use. Thus, market research must become part of the information base including customers’ reactions to e-commerce efforts. The information base includes primary research to determine the customers’ perceptions—the customers’ viewpoints and opinions about the marketing communications, product/services, price, and place.
Advertising and Marketing in Electronic Commerce E-commerce marketing and advertising efforts are geared toward building relationships with current customers. It is easier and less expensive to sell more to an existing customer than to obtain a new customer. Many business relationships, therefore, do not end with the transaction itself but are built on establishing strong relationships with the customers that extend to after sale services and feedback. It is essential that the information system includes methods to track after sale customer satisfaction and relationships. The system also should incorporate schedules for recontacting customers about the company, activities, products and service offerings, rewards to present customers, etc. In this way, advertising can serve as a reminder to the customers about the company and its offerings. An organization considering marketing and advertising through e-commerce may find the Cisco Internet Quotient Test helpful in assessing its ability of managing customer relations for e-commerce buyers. Organizations advertising and marketing through e-commerce offer various services to customers in order to build relationships. A greeting card or gift seller could provide reminders of birthdays, anniversaries, and other special days. Reminders of maintenance needed could be provided to a purchaser of computer equipment and/or vehicles. New product information can be regularly e-mailed to present customers or posted on the web site. These are just a few of the ways in which e-commerce can be used to establish relationships. The information supporting these activities linking the product/service order to the customers must be considered concurrent to the advertising and promotion strategies and integrated into the support system.
II. MECHANICS OF ADVERTISING AND MARKETING FOR E-COMMERCE A. Advertising and Buying Time and Space for E-Commerce Ways to advertise and market using e-commerce are limited only by existing technology. Today’s technology offers web sites, banners, buttons, sponsorships, and interstitials. The web site is a storefront where customers and prospects, shareholders, and other stakeholders can discover more about the company, what the company does and believes, and its product and service offerings. Thus, the company may regard the entire web site as an ad. Because this article assumes the perspective of customers and potential buyers of the firm’s products and services, product/ser-
Advertising and Marketing in Electronic Commerce vice linkages and information inducing a sale are the primary focus. From a usage standpoint, clicking a banner sends the user to the advertiser’s location on the Web. The costs of banners vary depending on the number and types of visitors the site attracts. A typical charge for placing a simple banner ad is $10–$40 per thousand viewers. The banner approach can be supplemented by keyword purchases offered by most search engines. If a user requests information under the keyword, the seller’s ad will appear along with ads from any other seller paying for the same keyword. Buttons are less expensive than banners because they are much smaller. They are about the size of an icon. Software packages such as Java, Shockwave, Acrobat, and Enliven add motion and animation to the buttons and banners. Search engines, like Excite and Webcrawler, provide audio capabilities. Interactive banners and buttons are available. Remember that the banners and buttons should provide easy to use links with the product/service information for the customer pool. Sponsorships of web pages are available to companies. For instance, IBM sponsors the Superbowl web page. Because of the high costs of sponsorships, the sponsors’ banners, buttons, and other information are integrated with the web page messages and visuals. Interstitials (sometimes called intermercials) are emerging as an effective e-commerce technique. These visuals and messages occur while the user downloads a site. Because they are more intrusive than buttons and banners, it is likely that usage will increase. Spam, unsolicited e-mail to present and potential customers, is another form of advertising that can have less than positive effects. A number of e-mail providers offer services to eliminate spam (junk) messages. Essentially, these services block any further contact with the potential customers thus eliminating these viewers from the contact pool. If an organization wishes to engage in spam as a way of advertising, access lists and networks can be obtained and incorporated into the information system. The above techniques are offered as examples of ways to guide buyers and potential target market customers to the marketing information provided online. The techniques do not represent an all-inclusive use of advertising possibilities, search engine selection, and the e-commerce system which are beyond the scope of this article. Once the user accesses the site, the appropriate products/services and proper information and appeals must be conveyed to evoke a purchase response. Also, customer relationships should be solidified by the materials presented on-line.
27 The costs of using the various techniques to access the marketing communications and the effectiveness of the techniques should become vital parts of the e-commerce information system for marketing and advertising. Advertising on the Web emerged in the last half of the 1990’s. Since on-line advertising is relatively new, there still are many questions about the costs associated with e-commerce. A usual billing technique is cost per thousand based upon the number of page requests for a banner ad. If specific target markets are sought by purchasing space in the search engine’s categories and subcategories (finance, games, etc.), various charges result depending on the category and tier. Click-throughs are less popular ways of billing. A user actually clicking on the banner to visit the advertiser’s home page is a clickthrough. Ad networks allow pooling web pages for ad placement but costs are difficult to calculate and verify by the advertiser. Also, various discounts may be negotiated. In addition to the costs of advertising on-line, the costs of designing and producing the home page and other pages, maintaining the pages, insuring currency and accuracy of information, and establishing the proper linkages should be included in the information system for e-commerce. For example, a professional Web page designer may charge $300–1500 depending on the extent of information and complexity of the page(s). Gathering and accessing information about on-line service suppliers, costs, and media effectiveness is another consideration. These costs include, but are not limited to, home page design and updates, directory service, networking, and newsletters mailed or emailed to online customers. Typical costs can be quite low for less complex online advertisers: $475 for one year including Internet hosting fees, web page design, registration of the web site with various web indexes, and monthly updates. Larger and more complex site charges could be $995 for 6 pages to about $2395 for 20 pages, with service provider fees ranging between $35–55 per month, domain registration of $100, plus setup fees.
B. Service Providers, Software, Data Bases, and Tracking A complete listing of various types of service providers and databases is extensive. Therefore, the illustrations given here are intended to stimulate further exploration of the topic on the part of the reader. Search engines such as Yahoo, AOL.com, Excite Network, Lycos, HotBot, and Netscape are e-commerce facilitators.
28 These companies (publishers) also provide demographic and other information allowing the marketing organization to track visitors to various web-sites (www.relevantknowledge.com). An organization must advertise its web site to web search engines that index the Web. Submitit! (http://www.submit-it.com) and All4oneSubmission Machine (http://www.aall4one. com/all4submit/) are examples of ways to advertise the organization’s services to the search engine providers. The web site is registered with the most important indexes. New software developments can also assist firms in their marketing and advertising efforts. For example, OnDisplay provides infrastructure software to build supply and revenue channels; engage and retain customers; and open on-line trading network partners, distributors, and suppliers. An e-commerce development platform that allows on-line retailers to promote products to customers is Art Technology Group Inc. (ATG). A key feature of this platform is the ability to collect information on individual customers and use this information to create marketing rules based upon their shopping patterns. Establishing links to important industry sites (free or paid for) are essential for the success of e-commerce advertising and marketing strategies. Obtaining listings of trade association members can be a starting point for establishing these linkages. Obtain targeted mailing lists and news groups by using services such as Dejanews (http://www.deganews.com) to find appropriate sources. Consider joining a shopping mall. For a detailed listing of Banner Exchange Programs see http://bannertips.com/exchangenetworks.shmtl. Media brokers provide appropriate and cost-effective locations for placing banner ads especially helpful to boosting on-line sales for well-known branded products and services. General information about the Internet demographics and usage is available from Nua Internet Surveys (www.nua.ie), including research reports and demographic surveys produced by major research companies. Another source information about demographics, and other aspects of the Internet (surveys, statistical sources, articles, and reports) is www.teleport.com. Additional general statistical information about the Internet usage can be accessed at the following web site: www.marketinginc.com. For specific information about visitors to individual web sites, one source of information can be found at http://geocities.yahoo.com/home. Methods for tracking and evaluating the success of on-line marketing and advertising efforts should be built into the information system. A standardized sys-
Advertising and Marketing in Electronic Commerce tem for tracking the ads is lacking. Some basic tracking issues that should be considered are: Do potential and present customers see the ads?; how effective are the ads? Simply counting the advertising exposures (hits) from web pages is not a satisfactory way of evaluating the effectiveness of the exposure. A more fundamental question is does the advertising evoke the desired response, like an order, an inquiry, or provide additional customer satisfaction? Tracking considerations when establishing an information system include: 1. Who will provide tracking information 2. What is the cost of obtaining tracking information 3. What type of information will be used for tracking • Number of repeat visitors • Number and frequency of repeat visitors • Location of site prior to visit (search engine used, etc.) • Length of visit • Pages visited • Items examined by visitors • Domain names of visitors • Geographic location of visitors • Purchases made by visitors • Inquiries made by visitors According to the Internet Advertising Bureau (IAB), ad requests can be measured. An ad request is the opportunity to offer the advertisement to a web site visitor. When the user loads a web page containing ads, the browser pulls the ad from the server. The ads appear as banners, buttons, or interstials. The number of ad requests generated can be converted to standard cost per thousand measures. However, an ad request is no assurance that the user actually reads the materials or pursues further linkages. The click rate is a more refined measure indicating that the viewer actually clicks the linkage to another web page. The click rate is the number of clicks on an ad divided by the number of ad requests. It is used to evaluate the frequency with which users try to obtain additional information about a service or product. However, the click rate still does not measure whether or not a user ever sees the ad or retains the message in a positive way. Cookies are pieces of information that record a user’s activities. Leading browsers, like Netscape Navigator and Microsoft Internet Explorer, support cookies. The use of cookies enables tracking of user demographics and preferences. Internet tracking,
Advertising and Marketing in Electronic Commerce
29
profile, and/or rating services are provided by Internet Profiles Corp. (I-PRO) in partnership with A. C. Nielson, Media Metrix, BPA Interactive, and Relevant Knowledge. Or, organizations can build their own tracking devices into the information system. The tracking devices and costs must be factored into the information support systems.
ing and evaluating e-commerce activities, see Fletcher, Moscove, and Corey. See Corey (1998) for a sampling of web sites, e-mail services, and list servers that influence devising marketing and advertising and e-commerce strategies for urban regions derived from a comparative analysis of information technology policy for various locations.
III. IDENTIFYING OPPORTUNITIES AND PROBLEMS
IV. NEW DEVELOPMENTS
There are many advantages to organizations resulting from Internet advertising and marketing. Among the advantages are
The technology of e-commerce is dynamic, expanding exponentially. In fact, technology is advancing so rapidly that there are few predictions about the final outcome. A few trends that warrant mention are
• • • • • • • • •
An interactive marketing and advertising medium Accessibility from around the world Immediate response Selective targeting Reaching affluent, sophisticated users Ability to reach organizational users Providing detailed information Replacing brick-and-mortar operations Constant technological advances
Likewise, there are many disadvantages cited for Internet marketing efforts: • Unclear definition about the purpose of using online advertising and marketing strategies • Security and privacy risks • Lack of knowledge and standards for measuring advertising effectiveness • Costs of targeting specific markets • Other costs associated with on-line marketing • Inappropriately placed ads • Inability to fill orders and deliver goods as promised • Geographic limitations according to economic development and infrastructure of various countries • Spamming (the on-line equivalent to junk mail) • Ever-changing technology There are many issues connected to e-commerce that influence information systems designed to support marketing and advertising activities. Organizations need to consider the opportunities and problems associated with e-commerce marketing and the extent to which the information systems should accommodate the opportunities and risks involved. For a detailed listing of electronic resources for research-
1. Wireless technology including handheld devices. Wireless devices connecting “to the Internet will increase 728% . . . That’s an increase from 7.4 million United States users in 1999 to 61.5 million users in 2003.” 2. Mobile commerce. Mobile commerce negating the necessity for PCs is very popular in Japan and countries where mobile phone usage and handheld devices often surpass PC usage. 3. Broadband technologies. Improvements in bandwidth technology increase the speed with which content can be delivered across a network. 4. Push technology. Push technology allows messages to be delivered to web users of e-mail and other services for example. Often, spamming has undesirable results. The emergence of new technology and techniques for advertising and marketing for e-commerce and the absence of industry standards make it impossible to predict the final structure of information systems supporting such activities. However, the technological changes do present exciting challenges for organizations involved in or initiating e-commerce activities. The issues raised in this article are not intended to be all inclusive; they are intended to stimulate further thought and action by organizations building information systems to support e-commerce advertising and marketing strategies.
SEE ALSO THE FOLLOWING ARTICLES Business-to-Business Electronic Commerce • Electronic Commerce • Electronic Commerce, Infrastructure for • Enterprise Computing • Marketing • Sales • Service Industries, Electronic Commerce for
30
BIBLIOGRAPHY Arens, W. F. (1999). Contemporary advertising, 7th Edition, pp. 515–523. New York: Irwin/McGraw-Hill. Beck, R. (July 1999). Competition for cybershoppers on rise, p. E1. Bakersfield Californian. Bakersfield, CA. Cisco. (June 2000). Cisco internet quotient test. www.cisco. com/warp/public/750/indicator/quiz.html. Corey, K. E. (1998). Information technology and telecommunications policies in southeast Asian development: Cases in vision and leadership. The naga awakens: Growth and change in southeast Asia, pp. 145–200. Singapore: Times Academic Press. Fletcher, R. G., and Moscove, B. J. (February 2000). E-commerce
Advertising and Marketing in Electronic Commerce and regional science. A Working Paper. Western Regional Sciences Association Conference. Kauai, Hawaii. Fletcher, R. G., Moscove, B. J., and Corey, K. E. (2001). Electronic commerce: planning for successful urban and regional development. International urban settings: Lessons of success, pp. 431–467. Amsterdam: Elsevier Publishers. Greenstein, M., and Feinman, T. M. (2000). Electronic commerce; security, risk management and control. pp. 384–385. New York: Irwin/McGraw-Hill. ISP-Planet Staff. (February 2000). Wireless to outstrip wired net access. E-mail:
[email protected]. Norris, M., West, S., and Gaughan, K. (May 2000). Ebusiness essentials, pp. 252–254. New York: John Wiley & Sons. Quain, J. R., (1988). Success is a repeat visitor. Fast Company, No. 15, p. 194.
Artificial Intelligence Programming Günter Neumann German Research Center for Artificial Intelligence
I. ARTIFICIAL INTELLIGENCE PROGRAMMING LANGUAGES II. FUNCTIONAL PROGRAMMING III. FUNCTIONAL PROGRAMMING IN LISP
IV. LOGICAL PROGRAMMING IN PROLOG V. OTHER PROGRAMMING APPROACHES
GLOSSARY
and understanding symbolic information in context. Although in the early days of computer language design the primary use of computers was for performing calculations with numbers, it was found out quite soon that strings of bits could represent not only numbers but also features of arbitrary objects. Operations on such features or symbols could be used to represent rules for creating, relating, or manipulating symbols. This led to the notion of symbolic computation as an appropriate means for defining algorithms that processed information of any type and thus could be used for simulating human intelligence. Soon it turned out that programming with symbols required a higher level of abstraction than was possible with those programming languages which were designed especially for number processing, e.g., Fortran.
clauses Prolog programs consist of a collection of statements, also called clauses, which are used to represent both data and programs. higher order function A function definition which allows functions as arguments or returns a function as its value. lists Symbol structures are often represented using the list data structure, where an element of a list may be either a symbol or another list. Lists are the central structure in Lisp and are used to represent both data and programs. recursion An algorithmic technique where, in order to accomplish a task, a function calls itself with some part of the task. symbolic computation Artificial intelligence programming involves (mainly) manipulating symbols and not numbers. These symbols might represent objects in the world and relationships between those objects. Complex structures of symbols are needed to capture our knowledge of the world. term The fundamental data structure in Prolog is the term which can be a constant, a variable, or a structure. Structures represent atomic propositions of predicate calculus and consist of a functor name and a parameter list.
PROGRAMMING LANGUAGES IN ARTIFICIAL INTELLIGENCE (AI) are the major tools for exploring and
building computer programs that can be used to simulate intelligent processes such as learning, reasoning,
I. ARTIFICIAL INTELLIGENCE PROGRAMMING LANGUAGES In AI, the automation or programming of all aspects of human cognition is considered from its foundations in cognitive science through approaches to symbolic and subsymbolic AI, natural language processing, computer vision, and evolutionary or adaptive systems. It is inherent to this very complex problem domain that in the initial phase of programming a specific AI problem it can only be specified poorly. Only through interactive and incremental refinement does more precise specification become possible. This is also due to the fact that typical AI problems tend to be very domain specific; therefore, heuristic strategies have to be
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
31
32 developed empirically through generate-and-test approaches (also known as rapid proto-typing). In this way, AI programming notably differs from standard software engineering approaches where programming usually starts from a detailed formal specification. In AI programming, the implementation effort is actually part of the problem specification process. Due to the “fuzzy” nature of many AI problems, AI programming benefits considerably if the programming language frees the AI programmer from the constraints of too many technical constructions (e.g., low-level construction of new data types, manual allocation of memory). Rather, a declarative programming style is more convenient using built-in, high-level data structures (e.g., lists or trees) and operations (e.g., pattern matching) so that symbolic computation is supported on a much more abstract level than would be possible with standard imperative languages such as Fortran, Pascal, or C. Of course, this sort of abstraction does not come for free, since compilation of AI programs on standard von Neumann computers cannot be done as efficiently as for imperative languages. However, once a certain AI problem is understood (at least partially), it is possible to reformulate it in the form of detailed specifications as the basis for reimplementation using an imperative language. From the requirements of symbolic computation and AI programming, two new basic programming paradigms emerged as alternatives to the imperative style: the functional and the logical programming styles. Both are based on mathematical formalisms, namely, recursive function theory and formal logic. The first practical and still most widely used AI programming language is the functional language Lisp developed by John McCarthy in the late 1950s. Lisp is based on mathematical function theory and the lambda abstraction. A number of important and influential AI applications have been written in Lisp, so we will describe this programming language in some detail in this article. During the early 1970s, a new programming paradigm appeared, namely, logic programming on the basis of predicate calculus. The first and still most important logic programming language is Prolog, developed by Alain Colmerauer, Robert Kowalski, and Phillippe Roussel. Problems in Prolog are stated as facts, axioms, and logical rules for deducing new facts. Prolog is mathematically founded on predicate calculus and the theoretical results obtained in the area of automatic theorem proving in the late 1960s.
II. FUNCTIONAL PROGRAMMING A mathematical function is a mapping of one set (called the domain) to another (called the range). A
Artificial Intelligence Programming function definition is the description of this mapping, either explicitly by enumeration or implicitly by an expression. The definition of a function is specified by a function name followed by a list of parameters in parenthesis followed by the expression describing the mapping, e.g., CUBE(X) X X X, where X is a real number. Alonso Church introduced the notation of nameless functions using the lambda notation. A lambda expression specifies the parameters and the mapping of a function using the lambda () operator, e.g., (X)X X X. It is the function itself, so the notation of applying the example nameless function to a certain argument is, for example, ((X)X X X)(4). Programming in a functional language consists of building function definitions and using the computer to evaluate expressions, i.e., function application with concrete arguments. The major programming task is then to construct a function for a specific problem by combining previously defined functions according to mathematical principles. The main task of the computer is to evaluate function calls and to print the resulting function values. In this way the computer is used like an ordinary pocket computer, but at a much more flexible and powerful level. A characteristic feature of functional programming is that if an expression possesses a well-defined value, then the order in which the computer performs the evaluation does not affect the result of the evaluation. Thus, the result of the evaluation of an expression is just its value. This means that in a pure functional language no side effects exist. Side effects are connected to variables that model memory locations. Thus, in a pure functional programming language no variables exist in the sense of imperative languages. The major control flow methods are recursion and conditional expressions. This is quite different from imperative languages, in which the basic means for control are sequencing and iteration. Functional programming also supports the specification of higher order functions. A higher order function is a function definition which allows functions as arguments or returns a function as its value. All these aspects together, but especially the latter, are major sources of the benefits of functional programming style in contrast to imperative programming style, viz. that functional programming provides a high-level degree of modularity. When defining a problem by dividing it into a set of subproblems, a major issue concerns the ways in which one can glue the (sub) solutions together. Therefore, to increase ones ability to modularize a problem conceptually, one must provide new kinds of glue in the programming language—a major strength of functional programming.
Artificial Intelligence Programming
III. FUNCTIONAL PROGRAMMING IN LISP Lisp is the first functional programming language. It was invented to support symbolic computation using linked lists as the central data structure (Lisp stands for List processor). John McCarthy noticed that the control flow methods of mathematical functions—recursion and conditionals—are appropriate theoretical means for performing symbolic computations. Furthermore, the notions of functional abstraction and functional application defined in lambda calculus provide for the necessary high-level abstraction required for specifying AI problems. Lisp was invented by McCarthy in 1958, and a first version of a Lisp programming environment was available in 1960 consisting of an interpreter, a compiler, and mechanisms for dynamic memory allocation and deallocation (known as garbage collection). A year later the first language standard was introduced, named Lisp 1.5. Since then a number of Lisp dialects and programming environments have been developed, e.g., MacLisp, FranzLisp, InterLisp, Common Lisp, and Scheme. Although they differ in some specific details, their syntactic and semantic core is basically the same. It is this core which we wish to introduce in this article. The most widely used Lisp dialects are Common Lisp and Scheme. In this article we have chosen Common Lisp to present the various aspects of Lisp with concrete examples. The examples, however, are easily adaptable to other Lisp dialects.
33 grammer, for Lisp it is just a sequence of letters or a symbol. Lists are clause-like objects. A list consists of an open left round bracket ( followed by an arbitrary number of list elements separated by blanks and a closing right round bracket ). Each list element can be either an atom or a list. Here are some examples of lists: (This is a list) ((this)((too))) () (((((((()))))))) (a b c d) (john mary tom) (loves john ?X) (* (+ 3 4) 8) (append (a b c) (1 2 3)) (defun member (elem list) (if (eq elem (first list)) T (member elem (rest list)))) Note that in most examples the list elements are lists themselves. Such lists are also called nested lists. There is no restriction regarding the depth of the nesting. The examples also illustrate one of the strengths of Lisp: very complex representations of objects can be written with minimal effort. The only thing to watch for is the right number of left and right round brackets. It is important to note that the meaning associated with a particular list representation or atom is not “entered” into the list representation. This means that all s-expressions (as described above) are syntactically correct Lisp programs, but they are not necessarily semantically correct programs.
A. The Syntax and Semantics of Lisp
2. Semantics
1. Symbolic Expressions
The core of every Lisp programming system is the interpreter whose task is to compute a value for a given sexpression. This process is also called evaluation. The result or value of an s-expression is also an s-expression which is returned after the evaluation is completed. Note that this means that Lisp actually has operational semantics, but with a precise mathematical definition derived from recursive function theory.
The syntactic elements of Lisp are called symbolic expressions (also known as s-expressions). Both data and functions (i.e., Lisp programs) are represented as sexpressions, which can be either atoms or lists. Atoms are word-like objects consisting of sequences of characters. Atoms can further be divided into different types depending on the kind of characters which are allowed to form an atom. The main subtypes are • Numbers: 1 2 3 4 –4 3.14159265358979 –7.5 6.02E23 • Symbols: Symbol Sym23 another-one t false NIL BLUE • Strings: ”This is a string” ”977?” ”setq” ”He said: \”I’m here.\” ” Note that although a specific symbol such as BLUE is used because it has a certain meaning for the pro-
a. READ-EVAL-PRINT LOOP How can the Lisp interpreter be activated and used for evaluating s-expressions and therefore for running real Lisp programs? The Lisp interpreter is actually also defined as a function usually named EVAL and is part of any Lisp programming environment (such a function is called a built-in function). It is embedded into a Lisp system by means of the so-called read-eval-print loop, where an s-expression entered by the user is first read into the Lisp system (READ is also a built-in function). Then the Lisp interpreter is called
Artificial Intelligence Programming
34 via the call of EVAL to evaluate the s-expression, and the resulting s-expression is returned by printing it to the user’s device (not surprisingly calling a built-in function PRINT). When the Lisp system is started on the computer, this read-eval-print loop is automatically started and signaled to the user by means of a specific Lisp prompt sign starting a new line. In this article we will use the question mark (?) as the Lisp prompt. For example, ? (+ 3 4) 7 means that the Lisp system has been started and the read-eval-print loop is activated. The s-expression ( 3 4) entered by a Lisp hacker is interpreted by the Lisp interpreter as a call of the addition function and prints the resulting s-expression 7 at the beginning of a new line. b. EVALUATION The Lisp interpreter operates according to the following three rules: 1. Identity: A number, a string, or the symbols T and NIL evaluate to themselves. This means that the value of the number 3 is 3 and the value of “house” is “house.” The symbol T returns T, which is interpreted to denote the true value, and NIL returns NIL meaning false. 2. Symbols: The evaluation of a symbol returns the s-expression associated to it (how this is done will be shown below). Thus, if we assume that the symbol *NAMES* is associated to the list ( JOHN MARY TOM), then evaluation of *NAMES* yields that list. If the symbol COLOR is associated with the symbol GREEN, then GREEN is returned as the value of COLOR. In other words, symbols are interpreted as variables bound to some values.
3. Lists Every list is interpreted as a function call. The first element of the list denotes the function which has to be applied to the remaining (potentially empty) elements representing the arguments of that function. The fact that a function is specified before its arguments is also known as prefix notation. It has the advantage that functions can simply be specified and used with an arbitrary number of arguments. The empty list () has the s-expression NIL as its value. Note that this means that the symbol NIL actually has two meanings: one representing the logical false value and one representing the empty list. Although this might seem a bit
odd, in Lisp there is actually no problem in identifying which sense of NIL is used. In general, the arguments are evaluated before the function is applied to the values of the arguments. The order of evaluation of a sequence of arguments is left to right. An argument may represent an atom or a list, in which case it is also interpreted as a function call and the Lisp interpreter is called for evaluating it. For example, consider the following evaluation of a function in the Lisp system: ? (MAX 4 (MIN 9 8) 7 5) 8 Here, the arguments are 4, (MIN 9 8), 7, and 5, which are evaluated in that order before the function with the name MAX is applied on the resulting argument values. The first argument 4 is a number so its value is 4. The second argument (MIN 9 8) is itself a function call. Thus, before the third argument can be called, (MIN 9 8) has to be evaluated by the Lisp interpreter. Note that because we have to apply the Lisp interpreter for some argument during the evaluation of the whole function call, it is also said that the Lisp interpreter is called recursively. The Lisp interpreter applies the same steps, so the first argument 9 is evaluated before the second argument 8. Application of the function MIN then yields 8, assuming that the function is meant to compute the minimum of a set of integers. For the outermost function MAX, this means that its second argument evaluates to 8. Next the arguments 7 and 5 are evaluated, which yields the values 7 and 5. Now, the maximum function named MAX can be evaluated, which returns 8. This final value is then the value of the whole function call. a. QUOTING Since the Lisp interpreter always tries to identify a symbol’s value or interprets a list as a function call, how can we actually treat symbols and lists as data? For example, if we enter the list (PETER WALKS HOME), then the Lisp interpreter will immediately return an error saying something like ERROR: UNKNOWN FUNCTION PETER (the Lisp interpreter should be clever enough to first check whether a function definition exists for the specified function name before it tries to evaluate each argument). Or, if we simply enter HOUSE, then the Lisp interpreter will terminate with an error such as ERROR: NO VALUE BOUND TO HOUSE. The solution to this problem is quite easy: since every first element of a list is interpreted as a function name, each Lisp system comes with a built-in function QUOTE which expects one s-expression as argument and returns this expression without evaluating it. For exam-
Artificial Intelligence Programming ple, for the list (QUOTE (PETER WALKS HOME)) QUOTE simply returns the value (PETER WALKS HOME), and for (QUOTE HOUSE) it returns HOUSE. Since the function QUOTE is used very often, it can also be expressed by the special character ’. Therefore, for the examples above we can equivalently specify ’(PETER WALKS HOME) and ’HOUSE. b. PROGRAMS AS DATA Note that QUOTE also enables us to treat function calls as data by specifying, for example, (QUOTE (MAX 4 (MIN 9 8) 7 5)) or ’(MAX 4 (MIN 9 8) 7 5). We already said that the Lisp interpreter is also a built-in unary function named EVAL. It explicitly forces its argument to be evaluated according to the rules mentioned above. In some sense, it can be seen as the opposite function to QUOTE. Thus, to explicitly require that a list specified as data to the Lisp system be interpreted as a function call, we can specify (EVAL ’(MAX 4 (MIN 9 8) 7 5)), which returns the value 8 as described above. In the same way, specifying (EVAL ’(PETER WALKS HOME)) will cause a Lisp error because Lisp tries to call a function PETER. The main advantage of being able to treat programs as data is that we can define Lisp programs (functions) which are able to construct or generate programs such that they first build the corresponding list representation and then explicitly call the Lisp interpreter using EVAL in order to evaluate the just created list as a function. It is not surprising that due to this characteristic Lisp is still the dominant programming language in the AI area of genetic programming. c. ASSIGNING VALUES TO SYMBOLS When programming real-life practical programs, one often needs to store values computed by some program to a variable to avoid costly recomputation of that value if it is needed in another program at some later time. In a purely functional version of Lisp, the value of a function only depends on the function definition and on the value of the arguments in the call. In order to make Lisp a practical language (practical at least in the sense that it can run efficiently on von Neumann computers), we need a way to assign values to symbols. Common Lisp comes with a built-in function called SETQ. SETQ expects two arguments: the symbol (called the variable) to which a value is bound and an s-expression which has to provide the value. The Lisp interpreter treats the evaluation of SETQ in a special way, such that it explicitly supresses evaluation of SETQ’s first argument (the variable), but rather binds the value of SETQ’s second argument to the variable
35 (to understand how Lisp internally binds a value to a symbol would require too many technical details which we cannot go into in this short article). The value of the second argument of SETQ is returned as the value of SETQ. Here are some examples: ? COLOR ERROR: UNBOUND SYMBOL COLOR ? (SETQ COLOR ‘GREEN) GREEN ? (SETQ MAX (MAX 3 2.5 1)) 3 Note that SETQ actually changes the status of the Lisp interpreter because the next time the same variable is used, it has a value and therefore the Lisp interpreter will be able to return it. If this effect did not occur, then the Lisp interpreter would signal an error because that symbol would not be bound (cf. step 2 of the Lisp interpreter). Thus, it is also said that SETQ produces a side effect because it dynamically changes the status of the Lisp interpreter. When making use of SETQ one should, however, be aware of the fact that one is leaving the proper path of semantics of pure Lisp. SETQ should therefore be used with great care.
B. The List Data Type Programming in Lisp actually means defining functions that operate on lists, e.g., create, traverse, copy, modify, and delete lists. Since this is central to Lisp, every Lisp system comes with a basic set of primitive built-in functions that efficiently support the main list operations. We will briefly introduce the most important ones now.
1. Type Predicate First, we have to know whether a current s-expression is a list or not (i.e., an atom). This job is accomplished by the function LISTP, which expects any s-expression EXPR as an argument and returns the symbol T if EXPR is a list and NIL if it is otherwise. Examples are [we will use the right arrow (⇒) for pointing to the result of a function call] the following: (LISTP ’(1 2 3)) ⇒ T (LISTP ’()) ⇒ T (LISTP ’3) ⇒ NIL
2. Selection of List Elements Two basic functions exist for accessing the elements of a list: CAR and CDR. Both expect a list as their
Artificial Intelligence Programming
36 argument. The function CAR returns the first element in the list or NIL if the empty list is the argument, and the function CDR returns the same list from which the first element has been removed or NIL if the empty list was the argument. For example, (CAR ’(A B C)) ⇒ A (CDR ’(A B C)) ⇒ (B C) (CAR ’()) ⇒ NIL (CDR ’(A)) ⇒ NIL (CAR ’((A B) C)) ⇒ (A B) (CDR ’((A B) C)) ⇒ C By means of a sequence of CAR and CDR function calls, it is possible to traverse a list from left to right and from outer to inner list elements. For example, during evaluation of (CAR (CDR ’(SEE THE QUOTE))) the Lisp interpreter will first evaluate the expression (CDR ’(SEE THE QUOTE)) which returns the list (THE QUOTE), which is then passed to the function CAR which returns the symbol THE. Here are some further examples: (CAR (CDR (CDR ’(SEE THE QUOTE)))) ⇒ QUOTE (CAR (CDR (CDR (CDR ’(SEE THE QUOTE))))) ⇒ NIL (CAR (CAR ’(SEE THE QUOTE))) ⇒ ??? What will happen during evaluation of the last example? Evaluation of (CAR ’(SEE THE QUOTE)) returns the symbol SEE. This is then passed as argument to the outer call of CAR. However, CAR expects a list as argument, so the Lisp interpreter will immediately stop further evaluation with an error such as ERROR: ATTEMPT TO TAKE THE CAR OF SEE WHICH IS NOT LISTP. A short historical note: the names CAR and CDR are old fashioned because they were chosen in the first version of Lisp on the basis of the machine code operation set of the computer on which it was implemented (CAR stands for “contents of address register” and CDR stands for “contents of decrement register.” In order to write more readable Lisp code, Common Lisp comes with two equivalent functions, FIRST and REST. We have used the older names here as it enables reading and understanding of older AI Lisp code.
3. Construction of Lists Analogously to CAR and CDR, a primitive function CONS exists which is used to construct a list. CONS ex-
pects two s-expressions and inserts the first one as a new element in front of the second one. Consider the following examples: (CONS ’A ’(B (CONS ’(A D) (CONS (FIRST 3))) ⇒ (1
C)) ⇒ (A B C) ’(B C)) ⇒ ((A D) B C) ’(1 2 3)) (REST ’(1 2 2 3)
In principle, CONS together with the empty list suffice to build very complex lists, for example, (CONS ’A (CONS ’B (CONS ’C ’()))) ⇒ (A B C) (CONS ’A (CONS (CONS ’B (CONS ’C ’())) (CONS ’D ’()))) ⇒ (A (B C) D) However, since this is quite cumbersome work, most Lisp systems come with a number of more advanced built-in list functions. For example, the function LIST constructs a list from an arbitrary number of s-expressions, and the function APPEND constructs a new list through concatenation of its arguments which must be lists. EQUAL is a function which returns T if two lists have the same elements in the same order, otherwise it returns NIL. For example, (LIST ’A ’B ’C) ⇒ (A B C) (LIST (LIST 1) 2 (LIST 1 2 3)) ⇒ ((1) 2 (1 2 3)) (APPEND ’(1) (LIST 2)) ⇒ (1 2) (APPEND ’(1 2) NIL ’(3 4)) ⇒ (1 2 3 4) (EQUAL ’(A B C) ’(A B C)) ⇒ T (EQUAL ’(A B C) ’(A C B)) ⇒ NIL
C. Defining New Functions Programming in Lisp is done by defining new functions. In principle this means specifying lists in a certain syntactic way. Analogously to the function SETQ, which is treated in a special way by the Lisp interpreter, there is a special function DEFUN which is used by the Lisp interpreter to create new function objects. DEFUN expects as its arguments a symbol denoting the function name, a (possibly empty) list of parameters for the new function, and an arbitrary number of s-expressions defining the body of the new function. Here is the definition of a simple function named MY-SUM which expects two arguments from which it will construct the sum using the built-in function : (DEFUN MY-SUM (X Y) (+ X Y))
Artificial Intelligence Programming This expression can be entered into the Lisp system in the same way as a function call. Evaluation of a function definition returns the function name as value, but will create a function object as side effect and adds it to the set of function definitions known by the Lisp system when it is started (which is at least the set of built-in functions). Note that in this example the body consists only of one s-expression. However, the body might consist of an arbitrary sequence of s-expressions. The value of the last s-expression of the body determines the value of the function. This means that all other elements of the body are actually irrelevant, unless they produce intended side effects. The parameter list of the new function MY-SUM tells us that MY-SUM expects exactly two s-expression as arguments when it is called. Therefore, if you enter (MY-SUM 3 5) into the Lisp system, the Lisp interpreter will be able to find a definition for the specified function name and then process the given arguments from left to right. When doing so, it binds the value of each argument to the corresponding parameter specified in the parameter list of the function definition. In our example, this means that the value of the first argument 3 (which is also 3 since 3 is a number which evaluates to itself) is bound to the parameter X. Next, the value of the second argument 5 is bound to the parameter Y. Because the value of an argument is bound to a parameter, this mechanism is also called CALL BY VALUE. After having found a value for all parameters, the Lisp interpreter is able to evaluate the body of the function. In our example, this means that ( 3 5) will be called. The result of the call is 8, which is returned as result of the call (MY-SUM 3 5). After the function call is completed, the temporary binding of the parameters X and Y are deleted. Once a new function definition has been entered into the Lisp system, it can be used as part of the definition of new functions in the same way as built-in functions are used, as shown in the following example: (DEFUN DOUBLE-SUM (X Y) (+ (MY-SUM X Y) (MY-SUM X Y))) which will double the sum of its arguments by calling MY-SUM twice. Here is another example of a function definition, demonstrating the use of multiple s-expressions in the function body: (DEFUN HELLO-WORLD () (PRINT ”HELLO WORLD!”) ’DONE) This function definition has no parameter because the parameter list is empty. Thus, when calling
37 (HELLO-WORLD), the Lisp interpreter will immediately evaluate (PRINT ”HELLO WORLD!”) and will print the string “Hello World!” on your display as a side effect. Next, it will evaluate the symbol ’DONE, which returns DONE as result of the function call.
D. Defining Control Structures Although it is now possible to define new functions by combining built-in and user-defined functions, programming in Lisp would be very tedious if it were not possible to control the flow of information by means of conditional branches perhaps iterated many times until a stop criterion is fulfilled. Lisp branching is based on function evaluation: control functions perform tests on actual s-expressions and, depending on the results, selectively evaluate alternative s-expressions. The fundamental function for the specification of conditional assertions in Lisp is COND. COND accepts an arbitrary number of arguments. Each argument represents one possible branch and is represented as a list where the first element is a test and the remaining elements are actions (s-expressions) which are evaluated if the test is fulfilled. The value of the last action is returned as the value of that alternative. All possible arguments of COND (i.e., branches) are evaluated from left to right until the first branch is positively tested. In that case the value of that branch is the value of the whole COND function. This sounds more complicated than it actually is. Let us consider the following function VERBALIZE-PROP, which verbalizes a probability value expressed as a real number: (DEFUN VERBALIZE-PROP (PROB-VALUE) (COND ((> PROB-VALUE 0.75) ‘VERYPROBABLE) ((> PROB-VALUE 0.5) ’PROBABLE) ((> PROB-VALUE 0.25) ’IMPROBABLE) (T ’VERY-IMPROBABLE))) When calling (VERBALIZE-PROP 0.33), the actual value of the argument is bound to the parameter PROBVALUE. Then COND is evaluated with that binding. The first expression to be evaluated is (( PROB-VALUE 0.75) ‘VERY-PROBABLE). is a built-in predicate which tests whether the first argument is greater than the second argument. Since PROB-VALUE is 0.33, evaluates to NIL, which means that the test is not fulfilled. Therefore, evaluation of this alternative branch is terminated immediately, and the next alternative (( PROB-VALUE 0.5) ’PROBABLE) is evaluated. Here
Artificial Intelligence Programming
38 the test function also returns NIL, so the evaluation is terminated also. Next, (( PROB-VALUE 0.25) ’IMPROBABLE) is evaluated. Applying the test function now returns T, which means that the test is fulfilled. Then all actions of this positively tested branch are evaluated and the value of the last action is returned as the value of COND. In our example, only the action ’IMPROBABLE has been specified, which returns the value IMPROBABLE. Since this defines the value of COND, and because the COND expression is the only expression of the body of the function VERBALIZEPROP, the result of the function call (VERBALIZEPROP 0.33) is IMPROBABLE. Note that if we enter (VERBALIZE-PROP 0.1), the returned value is VERYIMPROBABLE because the test of the third alternative will also fail and the branch (T ’VERY-IMPROBABLE) has to be evaluated. In this case, the symbol T is used as the test which always returns T, so the value of this alternative is VERY-IMPROBABLE.
E. Recursive Function Definitions The second central device for defining control flow in Lisp is recursive function definitions. A function which partially uses its definition as part of its own definition is called recursive. Thus seen, a recursive definition is one in which a problem is decomposed into smaller units until no further decomposition is possible. Then these smaller units are solved using known function definitions, and the sum of the corresponding solutions form the solution of the complete program. Recursion is a natural control regime for data structures which have no definite size, such as lists, trees, and graphs. Therefore, it is particularly appropriate for problems in which a space of states has to be searched for candidate solutions. Lisp was the first practical programming language that systematically supported the definition of recursive definitions. We will use two small examples to demonstrate recursion in Lisp. The first example is used to determine the length of an arbitrarily long list. The length of a list corresponds to the number of its elements. Its recursive function is as follows: (DEFUN LENGTH (LIST) (COND ((NULL LIST) 0) (T (+ 1 (LENGTH (CDR LIST)))))) When defining a recursive definition, we have to identify the base cases, i.e., those units which cannot be decomposed any further. Our problem size is the list. The smallest problem size of a list is the empty
list. Thus, the first thing we have to do is to specify is a test for identifying the empty list and to define what the length of the empty list should be. The built-in function NULL tests whether a list is empty, in which case it returns T. Since the empty list is a list with no elements, we define the length of the empty list as 0. The next thing to be done is to decompose the problem size into smaller units so that the same problem can be applied to smaller units. Decomposition of a list can be done by using the functions CAR and CDR, which means that we have to specify what is to be done with the first element of a list and the rest until the empty list is found. Since we already have identified the empty list as the base case, we can assume that decomposition will be performed on a list containing at least one element. Thus, every time we are able to apply CDR to get the rest of a list, we have found one additional element which should be used to increase the number of the already identified list elements by 1. Making use of this function definition, (LENGTH ’()) will immediately return 0, and if we call (LENGTH ’(A B C)), the result will be 3, because three recursive calls have to be performed until the empty list can be determined. As a second example, we consider the recursive definition of MEMBER, a function which tests whether a given element occurs in a given list. If the element is indeed found in the list, it returns the sublist which starts with the first occurrence of the found element. If the element cannot be found, NIL is returned. The following are example calls: (MEMBER ’B ’(A F B D E B C)) ⇒ (B D E B C) (MEMBER ’K ’(A F B D E B C)) ⇒ NIL Similarly to the recursive definition of LENGTH, we use the empty list as the base case. For MEMBER, the empty list means that the element in question is not found in the list. Thus, we have to decompose a list until the element in question is found or the empty list is determined. Decomposition is done using CAR and CDR. CAR is used to extract the first element of a list, which can be used to check whether it is equal to the element in question, in which case we can directly stop further processing. If it is not equal, then we should apply the MEMBER function on the remaining elements until the empty list is determined. Thus, MEMBER can be defined as follows: (DEFUN MEMBER (ELEM LIST) (COND ((NULL LIST) NIL) ((EQUAL ELEM (CAR LIST)) LIST)
Artificial Intelligence Programming (T (MEMBER ELEM (CDR LIST)))))
F. Higher Order Functions In Lisp, functions can be used as arguments. A function that can take functions as its arguments is called a higher order function. There are a lot of problems where one has to traverse a list (or a tree or a graph) such that a certain function has to be applied to each list element. For example, a filter is a function that applies a test to the list elements, removing those that fail the test. Maps are functions which apply the same function on each element of a list, returning a list of the results. High-order function definitions can be used for defining generic list traversal functions such that they abstract away from the specific function used to process the list elements. In order to support high-order definitions, their is a special function, FUNCALL, which takes as its arguments a function and a series of arguments and applies that function to those arguments. As an example of the use of FUNCALL, we will define a generic function FILTER which may be called in the following way: (FILTER ’(1 3 -9 -5 6 -3) #’PLUSP) ⇒ (1 3 6) PLUSP is a built-in function which checks whether a given number is positive or not. If so, it returns that number, otherwise NIL is returned. The special symbol # is used to tell the Lisp interpreter that the argument value denotes a function object. The definition of FILTER is as follows: (DEFUN FILTER (LIST TEST) (COND ((NULL LIST) LIST) ((FUNCALL TEST (CAR LIST)) (CONS (CAR LIST) (FILTER (CDR LIST) TEST))) (T (FILTER (CDR LIST) TEST)))) If the list is empty, then it is simply returned. Otherwise, the test function is applied to the first element of the list. If the test function succeeds, CONS is used to construct a result list using this element and all elements that are determined during the recursive call of FILTER using the CDR of the list and the test function. If the test fails for the first element, this element is simply skipped by recursively applying FILTER on the remaining elements, i.e., this element will not be part of the result list. The filter function can be used for many different test functions, e.g.,
39 (FILTER ’(1 3 A B 6 C 4) #’NUMBERP) ⇒ (1 3 6 4) (FILTER ’(1 2 3 4 5 6) #’EVEN) ⇒ (2 4 6) As another example of a higher order function definition, we will define a simple mapping function, which applies a function to all elements of a list and returns a list of all values. If we call the function MY-MAP, then the definition looks like the following: (DEFUN MY-MAP (FN LIST) (COND ((NULL LIST) LIST) (T (CONS (FUNCALL FN (CAR LIST)) (MY-MAP FN (CDR LIST)))))) If a function DOUBLE exists which just doubles a number, then a possible call of MY-MAP could be (MY-MAP #’DOUBLE ’(1 2 3 4)) ⇒ (2 4 6 8) Often it is the case that a function should only be used once. Thus, it would be quite convenient if we could provide the definition of a function directly as an argument of a mapping function. To do this, Lisp supports the definition of LAMBDA expressions. We have already informally introduced the notation of LAMBDA expressions in Section II as a means of defining nameless or anonymous functions. In Lisp, LAMBDA expressions are defined using the special form LAMBDA. The general form of a LAMBDA expression is (LAMBDA (parameter . . .) body . . .) A LAMBDA expression allows us to separate a function definition from a function name. LAMBDA expressions can be used in place of a function name in a FUNCALL, e.g., the LAMBDA expression for our function DOUBLE may be (LAMBDA (X) (+ X X)) For example, the above function call of MY-MAP can be restated using the LAMBDA expression as follows: (MY-MAP #’(LAMBDA (X) (+ X X)) ’(1 2 3 4) ⇒ (2 4 6 8) A LAMBDA expression returns a function object which is not bound to a function name. In the definition of MY-MAP we used the parameter FN as a function name variable. When evaluating the lambda form, the Lisp interpreter will bind the function object to that function name variable. In this way, a function parameter is used as a dynamic function name. The # symbol is necessary to tell Lisp that it should not only bind a
Artificial Intelligence Programming
40 function object, but should also maintain the bindings of the local and global values associated to the function object. This would not be possible by simply using the QUOTE operator alone (unfortunately, further details cannot be given here due to the space constraints).
G. Other Functional Programming Languages Than Lisp We have introduced Lisp as the main representative functional programming language (especially the widely used dialect Common Lisp) because it is still a widely used programming language for a number of AI problems such as Natural Language Understanding, Information Extraction, Machine Learning, AI planning, or Genetic Programming. Beside Lisp, a number of alternative functional programming languages have been developed. We will briefly mention two well-known members, viz. ML and Haskell. Meta-Language (ML) is a static-scoped functional programming language. The main differences to Lisp are its syntax (which is more similar to that of Pascal) and a strict polymorphic type system (i.e., using strong types and type inference, which means that variables need not be declared). The type of each declared variable and expression can be determined at compile time. ML supports the definition of abstract data types, as demonstrated by the following example: DATATYPE TREE = L OF INT | INT * TREE * TREE; which can be read as “every binary tree is either a leaf containing an integer or it is a node containing an integer and two trees (the subtrees).” An example of a recursive function definition applied on a tree data structure is shown in the following example: FUN DEPTH(L _) = 1 | DEPTH(N(I,L,R)) = 1 + MAX(DEPTH L, DEPTH R); The function DEPTH maps trees to integers. The depth of a leaf is 1 and the depth of any other tree is 1 plus the maximum of the depths of the left and right subtrees. Haskell is similar to ML: it uses a similar syntax, it is also static scoped, and it makes use of the same type inferencing method. It differs from ML in that it is purely functional. This means that it allows no side effects and includes no imperative features of any kind, basically because it has no variables and no assignment statements. Furthermore, it uses a lazy evalua-
tion technique, in which no subexpression is evaluated until its value is known to be required. Lists are a commonly used data structure in Haskell. For example, [1,2,3] is the list of three integers 1, 2, and 3. The list [1,2,3] in Haskell is actually shorthand for the list 1:(2:(3:[])), where [] is the empty list and : is the infix operator that adds its first argument to the front of its second argument (a list). As an example of a user-defined function that operates on lists, consider the problem of counting the number of elements in a list by defining the function LENGTH: LENGTH :: [A] -> INTEGER LENGTH [] = 0 LENGTH (X:XS) = 1 + LENGTH XS which can be read as “The length of the empty list is 0, and the length of a list whose first element is X and remainder is XS is 1 plus the length of XS.” In Haskell, function invocation is guided by pattern matching. For example, the left-hand sides of the equations contain patterns such as [] and X:XS. In a function application these patterns are matched against actual parameters ([] only matches the empty list, and X:XS will successfully match any list with at least one element, binding X to the first element and XS to the rest of the list). If the match succeeds, the right-hand side is evaluated and returned as the result of the application. If it fails, the next equation is tried. If all equations fail, an error results. This ends our short “tour de Lisp.” We were only able to discuss the most important aspects of Lisp. Readers interested in more specific details should consult at least one of the books mentioned in the Bibliography. The rest of this article will now be used to introduce another programming paradigm widely used in AI programming, namely, Prolog.
IV. LOGICAL PROGRAMMING IN PROLOG In the 1970s an alternative paradigm for symbolic computation and AI programming arose from the success in the area of automatic theorem proving. Notably, the resolution proof procedure developed by Robinson (1965) showed that formal logic, particularly predicate calculus, could be used as a notation for defining algorithms and, therefore, for performing symbolic computations. In the early 1970s, Prolog (an acronym for Programming in Logic), the first logical-based programming language, appeared. It was developed by Alain Colmerauer, Robert Kowalski, and Phillippe Roussel. Basically, Prolog consists of a method for specifying predicate calculus propositions
Artificial Intelligence Programming and a restricted form of resolution. Programming in Prolog consists of the specification of facts about objects and their relationships and rules specifying their logical relationships. Prolog programs are declarative collections of statements about a problem because they do not specify how a result is to be computed, but rather define what the logical structure of a result should be. This is quite different from imperative and even functional programming, where the focus is on defining how a result is to be computed. Using Prolog, programming can be done at a very abstract level quite close to the formal specification of a problem. Prolog is still the most important logical programming language. There are a number of commercial programming systems on the market which include modern programming modules, i.e., compiler, debugger, and visualization tools. Prolog has been used successfully in a number of AI areas such as expert systems and natural language processing, but also in such areas as relational database management systems or education.
A. A Simple Prolog Program Here is a very simple Prolog program consisting of two facts and one rule: scientist(gödel). scientist(einstein). logician(X) :- scientist(X). The first two statements can be paraphrased as “Gödel is a scientist” and “Einstein is a scientist.” The rule statement says “X is a logician if X is a scientist.” In order to test this program, we have to specify query expressions (or theorems) which Prolog tries to answer (or to prove) using the specified program. One possible query is ?- scientist(gödel). which can be verbalized as “Is Gödel a scientist?” Prolog, by applying its built-in proof procedure, will respond with “yes” because a fact may be found which exactly matches the query. Another possible query verbalizing the question “Who is a scientist?” and expressed in Prolog as ?- scientist(X). will yield the Prolog answer “X gödel, X einstein.” In this case Prolog not only answers yes, but returns all bindings of the variable X which it finds during the successful proof of the query. As a further example, we might also query “Who is a logician?” us-
41 ing the following Prolog query: ?- logician(X). Proving this query will yield the same set of facts because of the specified rule. Finally, we might also specify the following query: ?- logician(mickey-mouse). In this case Prolog will respond with “no.” Although the rule says that someone is a logician if he or she is also a scientist, Prolog does not find a fact saying that Mickey Mouse is a scientist. Note, however, that Prolog can only answer relative to the given program, which actually means “no, I couldn’t deduce the fact.” This property is also known as the closed world assumption or negation as failure. It means that Prolog assumes that all knowledge that is necessary to solve a problem is present in its database.
B. Prolog Statements Prolog programs consist of a collection of statements, also called clauses, which are used to represent both data and programs. The dot symbol is used to terminate a clause. Clauses are constructed from terms. A term can be a constant (symbolic names that have to begin with a lowercase letter, such as gödel or eInStein), a variable (symbols that begin with an uppercase letter, such as X or Scientist), or a structure. Structures represent atomic propositions of predicate calculus and consist of a functor name and a parameter list. Each parameter can be a term, which means that terms are recursive objects. Prolog distinguishes three types of clauses: facts, rules, and queries. A fact is represented by a single structure, which is logically interpreted as a simple true proposition. In the simple example program above we already introduced two simple facts. Here are some more examples: male(john). male(bill). female(mary). female(sue). father(john, mary). father(bill,john). mother(sue,mary). Note that these facts have no intrinsic semantics, i.e., the meaning of the functor name father is not defined. For example, applying common sense, we may interpret it as “John is the father of Mary.” However, for Prolog, this meaning does not exist, it is just a symbol.
Artificial Intelligence Programming
42 Rules belong to the next type of clauses. A rule clause consists of two parts: the head which is a single term and the body which is either a single term or a conjunction. A conjunction is a set of terms separated by the comma symbol. Logically, a rule clause is interpreted as an implication such that if the elements of the body are all true, then the head element is also true. Therefore, the body of a clause is also denoted as the if part and the head as the then part of a rule. Here is an example for a set of rule clauses: parent(X,Y) :- mother(X, Y). parent(X,Y) :- father(X, Y). grandparent(X,Z) :- parent(X,Y), parent(Y,Z). where the last rule can be read as “X is a grandparent of Z, if X is a parent of Y and Y is a parent of Z.” The first two rules say “someone is a parent if it is the father or mother of someone else.” The reason we treat the first two rules as a disjunction will become clear when we introduce Prolog’s proof procedure. Before doing this, we shall introduce the last type of clause, the query clause (also called the goal clause). A query is used to activate Prolog’s proof procedure. Logically, a query corresponds to an unknown theorem. It has the same form as a fact. In order to tell Prolog that a query has to be proven, the special query operator ?- is usually written in front of the query. In the simple Prolog program introduced above, we have already seen an informal description of how a query is used by Prolog.
C. Prolog’s Inference Process Prolog’s inference process consists of two basic components: a search strategy and a unifier. The search strategy is used to search through the fact and rule database, while unification is used for pattern matching and returns the bindings that make an expression true. The unifier is applied on two terms and tries to combine them both to form a new term. If unification is not possible, then unification is said to have failed. If the two terms contain no variables, then unification actually reduces to checking whether the terms are equal. For example, unification of the two terms father(john,mary) and father(john,mary) succeeds, whereas unification of the following term pairs will fail: father(X,mary) and father(john,sue) sequence(a,b,c) and sequence(a,b)
If a term contains a variable (or more), then the unifier checks whether the variable can be bound with some information from the second term, however, only if the remaining parts of the terms unify. For example, for the following two terms: father(X,mary) and father(john,mary) the unifier will bind X to john because the remaining terms are equal. However, for the following pair: father(X,mary) and father(john,sue) the binding would not make sense, since mary and sue do not match. The search strategy is used to traverse the search space spanned by the facts and rules of a Prolog program. Prolog uses a top-down, depth-first search strategy. What does this mean? The whole process is quite similar to the function evaluation strategy used in Lisp. If a query Q is specified, then it may either match a fact or a rule. In case of a rule R, Prolog first tries to match the head of R, and if it succeeds, it then tries to match all elements from the body of R which are also called subqueries. If the head of R contains variables, then the bindings will be used during the proof of the subqueries. Since the bindings are only valid for the subqueries, it is also said that they are local to a rule. A subquery can either be a fact or a rule. If it is a rule, then Prolog’s inference process is applied recursively to the body of such subquery. This makes up the topdown part of the search strategy. The elements of a rule body are applied from left to right, and only if the current element can be proven successfully is the next element tried. This makes up the depth-first strategy. It is possible that for the proof of a subquery two or more alternative facts or rules are defined. In that case Prolog selects one alternative A and tries to prove it, if necessary by processing subqueries of A. If A fails, Prolog goes back to the point where it started the proof of A (by removing all bindings that have been assigned during A’s test) and tries to prove the next alternative. This process is also called back-tracking. In order to clarify the whole strategy, we can consider the following example query (using the example clauses introduced in the previous paragraph as Prolog’s database): ?- grandparent(bill,mary). The only clause that can match this query is the following rule: grandparent(X,Z) :- parent(X,Y), parent(Y,Z). and unification of the query with the rule’s head will return the following bindings: X = bill, Z =
Artificial Intelligence Programming mary. In order to prove the rule, the two elements of the rule body have to be proven from left to right. Note that both rules share variables with the rule head, and, therefore, the bindings computed during the match of the head with the query are also available for the respective subqueries. Thus, the first subquery is actually instantiated as parent(bill,Y) and the second subquery is instantiated as parent(Y,mary). Now, to prove the first clause, Prolog finds two alternative parent rules. Let us assume that Prolog chooses the first alternative (in order to remember that more than one alternative is possible, Prolog sets a choice point), parent(X,Y) :- mother(X, Y). Unification of the subquery with the rule head is easily possible and will bind the X variable to the term bill. This partially instantiates the single body element as mother(bill,Y). Unfortunately, there are no facts in the database which validate this subquery. Because the unification of mother(bill,Y) fails, so does the whole rule. Then, Prolog back-tracks to the choice point where it selected the first possible parent rule and chooses the second alternative, parent(X,Y) :- father(X, Y). Unification of the (still active) subquery parent (bill,Y) will instantiate father(bill,Y). This time unification is possible, returning the binding Y = john. Now the first parent subquery of the grandparent rule has been proven and the actual variables are X = bill, Y = john, Z = mary. This instantiates the second element of the grandparent rule body to parent(john,mary) (note that the Z value had already been bound after the grandparent rule was selected). The same strategy is then applied for this subquery, and Prolog will find enough facts to prove it successfully. Since both body elements of the grandparent rule have been proven to be valid, Prolog concludes that the initial query is also true.
43 Prolog supports lists as a basic data structure using conventional syntax. The list elements are separated by commas. The whole list is delimited by square brackets. A list element can be an arbitrary term or a list itself. Thus, it is quite similar to the list structures in Lisp. Here is an example of a Prolog list: [john, mary, bill] The empty list is represented as []. In order to be able to create or traverse lists, Prolog provides a special construction for explicitly denoting the head and tail of a list. [X | Y] is a list consisting of a head X and a tail Y. For example, the above list could also be specified as [john | mary, bill] We will use the member predicate as an example of how lists are treated in Prolog. This predicate will determine whether a given element occurs in a given list. Using the above notation, an element is in a list if it is the head of that list or if it occurs somewhere in the tail of the list. Using this informal definition of the member predicate, we can formulate the following Prolog program (the symbol _ denotes an anonymous variable, used to tell Prolog that it does not matter which value the unifier binds to it): member(Element,[Element | _]). member(Element,[_ | List]) :member(Element,List). Assuming the following query: ?- member(a, [b,c,a,d]). Prolog will first check whether the head of [b | c,a,d] is equal to a. This causes the first clause to fail, so the second clause is tried. This will instantiate the subquery member(a, [c,a,d]), which means that the first list element is simply skipped. Recursively applying member, Prolog tries to prove whether the head of [c | a,d] is equal to a which also fails, leading to a new subquery member(a,[a,d]) through instantiation of the second clause. The next recursive step will check the list [a | d]. This time, a is indeed equal to the head element of this list, so Prolog will terminate with “yes.”
D. Prolog Extensions In order to use Prolog for practical programming, it comes with a number of extensions, e.g., list data structures; operators for explicitly controlling the traversal of the search space by a Prolog program (namely, the cut operator); and routines for IO interfaces, tracing, and debugging. We cannot describe all these extensions in the context of this short article. We will only briefly show how lists can be used in Prolog.
E. Constraint Logic Programming Constraint logic programming (CLP) is a generalization of the (simple) Prolog programming style. In CLP, term unification is generalized to constraint solving. In constraint logic programs, basic components of a problem are stated as constraints (i.e., the structure of the objects in question) and the problem as a
Artificial Intelligence Programming
44 whole is represented by putting the various constraints together by means of rules (basically by means of definite clauses). For example, the following definite clause—representing a tiny fraction of a Natural Language (NL) grammar like English: sign(X0) ← sign(X1), sign(X2), X0 syn cat X1 syn cat X2 syn cat X1 syn agr
s, np, vp, X2 syn agr
expresses that for a linguistic object to be classified as an S(entence) phrase it must be composed of an object classified as an NP (nominal phrase) and by an object classified as a VP (verbal phrase) and the agreement information (e.g., person, case) between NP and VP must be the same. All objects that fulfill at least these constraints are members of S objects. Note that there is no ordering presupposed for NP and VP as is the case for NL grammarbased formalisms that rely on a context-free backbone. If such a restriction is required, additional constraints have to be added to the rule, for instance, that substrings have to be combined by concatenation. Since the constraints in the example above only specify necessary conditions for an object of class S, they express partial information. This is very important for knowledge-based reasoning, because in general we have only partial information about the world we want to reason with. Processing of such specifications is then based upon constraint solving and the logic programming paradigm. Because unification is but a special case of constraint solving, constraint logic programs have superior expressive power. A number of constraint-based logic programming languages (together with high-level user interface and development tools) have been realized, e.g., CHIP or the Oz language, which supports declarative programming, object-oriented programming, constraint programming, and concurrency as part of a coherent whole. Oz is a powerful constraint language with logic variables, finite domains, finite sets, rational trees, and record constraints. It goes beyond Horn clauses to provide a unique and flexible approach to logic programming. Oz distinguishes between directed and undirected styles of declarative logic programming.
V. OTHER PROGRAMMING APPROACHES In this article, we have compared AI languages with imperative programming approaches. Object-oriented
languages belong to another well-known programming paradigm. In such languages the primary means for specifying problems is to specify abstract data structures also called objects or classes. A class consists of a data structure together with its main operations, often called methods. An important characteristic is that it is possible to arrange classes in a hierarchy consisting of classes and subclasses. A subclass can inherit properties of its superclasses, which support modularity. Popular objectoriented languages are Eiffel, C, and Java. The Common Lisp Object-Oriented System is an extension of Common Lisp. It supports full integration of functional and object-oriented programming. Recently, Java has become quite popular in some areas of AI, especially for intelligent agent technology, Internet search engines, or data mining. Java is based on C and is the main language for the programming of Internet applications. Language features that makes Java interesting from an AI perspective are its built-in automatic garbage collection and multi-threading mechanism. With the increase of research in the area of Web intelligence, a new programming paradigm is emerging, viz. agent-oriented programming. Agent-oriented programming (AOP) is a fairly new programming paradigm that supports a societal view of computation. In AOP, objects known as agents interact to achieve individual goals. Agents can exist in a structure as complex as a global Internet or as simple as a module of a common program. Agents can be autonomous entities, deciding their next step without the interference of a user, or they can be controllable, serving as a mediary between the user and another agent. Since agents are viewed as living, evolving software entities, there seems also to emerge a shift from the more language programming point of view toward a more software platform development point of view. Here the emphasis is on system design, development platforms, and connectivity. Critical questions are then how the rich number of existing AI resources developed in different languages and platforms can be integrated with other resources making use of modern system development tools such as CORBA (Common Object Request Broker Architecture), generic abstract data type and annotation languages such as XML, and a standardized agent-oriented communication language such as KQML (Knowledge Query and Manipulation Language). So the future of AI programming might be less concerned with questions such as “what is the best suited programming paradigm?”, but will have to find answers for questions such as “how can I integrate different programming paradigms under one umbrella?” and “what are the best communication languages for intelligent autonomous software modules?”
Artificial Intelligence Programming
SEE ALSO THE FOLLOWING ARTICLES Engineering, Artificial Intelligence in • Evolutionary Algorithms • Expert Systems Construction • Industry, Artificial Intelligence in • Medicine, Artificial Intelligence in • ObjectOriented Programming • Programming Languages Classification
BIBLIOGRAPHY Charniak, E., Riesbeck, C. K., McDermott, D. V., and Meehan, J. R. (1980). Artificial Intelligence Programming. Hillsdale, NJ: Lawrence Erlbaum Associates. Clocksin, W. F., and Mellish, C. S. (1987). Programming in Prolog. Berlin: Springer-Verlag.
45 Keene, S. E. (1988). Object-Oriented Programming in Common Lisp. Reading, MA: Addison-Wesley. Luger, G. F., and Stubblefield, W. A. (1993). Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 2nd ed. Redwood City, CA: Benjamin/Cummings. Norvig, P. (1992). Artificial Intelligence Programming. San Mateo, CA: Morgan Kaufman. Pereira, F. C. N., and Shieber, S. M. (1987). Prolog and Natural Language Analysis, CSLI Lecture Notes, Number 10. Stanford, CA: Stanford Univ. Press. Sebesta, R. W. (1999). Concepts of Programming Languages, 4th ed. Reading, MA: Addison-Wesley. Ullman, J. D. (1997). Elements of ML Programming, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall. Watson, M. (1997). Intelligent Java Applications for the Internet and Intranets. San Mateo, CA: Morgan Kaufman.
Automata Theory Sergio de Agostino and Raymond Greenlaw Armstrong Atlantic State University
I. II. III. IV.
INTRODUCTION BASIC CONCEPTS DETERMINISTIC FINITE AUTOMATA (DFAs) NONDETERMINISTIC FINITE AUTOMATA (NFAs)
V. VI. VII. VIII.
EQUIVALENCE OF DFAs AND NFAs EQUIVALENCE OF DFAs WITH OTHER MODELS PUSHDOWN AUTOMATA AND BEYOND SUMMARY
GLOSSARY
I. INTRODUCTION
deterministic Having an unambiguous state. The machine either has a unique next state or none at all. deterministic finite automaton (DFA) A simple, theoretical machine with one read-only input tape that reads only left-to-right. deterministic pushdown automaton (DPDA) A DFA that is extended by adding a stack data structure. deterministic Turing machine (DTM) A DFA that is extended by adding a read/write work tape. -transition The ability of a machine to change state without reading from an input tape or advancing the input head. language accepted (by a DFA) The set of all strings recognized by a DFA. nondeterministic The machine may have a nonunique next state. nondeterministic finite automaton (NFA) A simple, theoretical, nondeterministic machine with one read-only input tape that reads only left-to-right. nondeterministic Turing machine A DFA that is extended by adding a read/write work tape and nondeterminism. nondeterministic pushdown automaton (NPDA) A DPDA extended with nondeterminism. regular expression A formalism that can be used to model languages recognized by DFAs. tape configuration The contents of a tape and the position of the tape head on the tape. work tape A tape that can be written-to and readfrom, allowing storage of intermediate data.
A large part of theoretical computer science is dedicated to studying computational models—also known as idealized computers. Such machines consist of a finite control (processor) and a number of tapes (memory). These machines differ in terms of their number of tapes and the functionality of their tapes. Figure 1 depicts such an idealized model. The purpose of a computational model is to capture those aspects of computation that are relevant to the particular problem you wish to solve, while hiding the other aspects that are unimportant. One can think of a computational model as a custom machine designed for your particular needs. Several of the most important models are the deterministic finite automaton, the nondeterministic finite automaton, the deterministic pushdown automaton, the nondeterministic pushdown automaton, the deterministic Turing machine, and the nondeterministic Turing machine. Each of these models is significant in the theory of computation. Before we can attempt to solve a specific problem by a machine, we must communicate it to the machine. We do this with a language. In very general terms, a language is a system of signs used to communicate information between one or more parties. Thus, the first step of understanding a problem is to design a language for communicating that problem to a machine. Since a problem requires an answer, the language has to handle both input and output communication. What is interesting is that from a theory-of-computation perspective, almost all prob-
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
47
Automata Theory
48
> > > •••
Finite control
Input tape Output tape Work tape 1 •••
>
Work tape k
Figure 1 Schematic of a finite state machine. The ! symbol denotes sequential access to a tape.
lems can be expressed just in terms of language recognition. That is, the language simply describes all the legitimate inputs and there is no output to communicate other than the binary value of YES or NO, depending on whether the input was in the language or not. A second common and powerful mechanism to describe languages is provided by the notion of grammar. We will also discuss how grammars relate to languages and automata. We begin by presenting some background and then by defining the most basic computational models— the deterministic finite automaton and its nondeterministic variant. The other models can easily be described as enhanced versions of either the deterministic or nondeterministic finite automaton.
II. BASIC CONCEPTS An alphabet is a nonempty, finite set of symbols. A string is a sequence of symbols over an alphabet. The empty string consists of a string of 0 symbols and is denoted . A language over an alphabet is any finite or infinite set of strings over . Since languages are sets, the union, intersection, and difference of two languages are immediately defined. The concatenation of two languages L1 and L 2 is the set of all strings obtained by concatenating any element of L1 with any element of L 2; specifically,
gram should respond YES if x L, and NO otherwise. This is really just answering a membership question. Notice, the language L consists of all strings that contain an even number of 0’s and no 1’s. You could imagine a C program that keeps track of whether an even number of 0’s has been encountered in the input and also whether or not a 1 has been seen. When the end of the input is reached, if an even number of 0’s have been read and no 1 has been detected, then the program should respond YES and otherwise it should respond NO. There is a useful way of diagraming such a program and this method is shown in Fig. 2; the picture is called a (state) transition diagram. The transition diagram is a directed graph enhanced with the symbol and some labeling on the nodes and edges. The symbol indicates the state that the machine starts from, called the initial state. This transition diagram consists of three nodes called states labeled q0, q1, and q2; and six edges called transitions. Notice the edges are labeled by 0 or 1, symbols from the alphabet in consideration. The node with two circles, q0, represents an accepting state, which coincidentally in this case is also the initial state. Consider an “input” to the transition diagram such as 0000. If we trace through the figure on this input starting from state q0, we go to state q1 using up one 0, back to q0 using up another 0, back to q1 using the third 0, and finally back to q0 at which point we have no more input remaining. The state we ended up in, q0, is an accepting state so the input 0000 is accepted. That is, 0000 L. In tracing the machine on input 10, from state q0 we go to state q2 using up the 1 and then stay in state q2 using up the last symbol 0. Because q2 does not have a double circle, it is not an accepting state; and therefore, the machine does not stop in an accepting state. The input 10 is rejected as it should be because 10 ∉ L. Automata theory formalizes these ideas.
L1L 2 {xy : x L1, y L 2}. We define Ln as L concatenated with itself n times, with the special case L0 {} for every language L. Finally, we define the star closure of L as L* L0 ∪ L1 ∪ L2 , and the positive closure as L L1 ∪ L2 . Consider the language L {, 00, 0000, 000000, . . .} over the alphabet {0,1}. What would a program look like that accepts this language? First of all, what exactly do we mean by this question? Suppose a user inputs a given string x over the alphabet {0,1} to a program designed to accept this language. Then the pro-
q0
q1
0
>>
0 1
1
q2 0
1
Figure 2 A sample state transition diagram useful for representing a machine that accepts the language {, 00, 0000, 000000, . . .}.
Automata Theory
49
A. Tapes
<
symbols, respectively. A tape begins with the left end marker, and ends with the right end marker. Between these two markers, all symbols present on the tape must come from the data alphabet. The end markers are never part of the data alphabet. The data alphabet plus the end markers constitutes the tape alphabet. If the data alphabet is then we denote the tape alphabet (that is, ∪ {}) by T. The contents of a tape is the string over the data alphabet that appears between the end markers. Thus a tape can contain the empty string if all it has is the left end marker followed immediately by the right end marker. The input to a machine is a string, usually denoted x, over the data alphabet. The notation x(i) is used to refer to the ith symbol of x, where i ranges from 1 to the length of x. The input is placed on an input tape. The individual characters comprising x appear in adjacent cells of the tape; the entire input is between end markers. The input tape is special in that it is read-only. Since the machine cannot write to the tape, nothing can ever be stored on the tape except the original input. Figure 3 provides an illustration of a sample input tape. In this case the input x is the string 110101. The length of x is six, and, for example, x(1) equals 1 and x(5) equals 0. Machines access a tape via a tape head. Initially when the machine starts up, the tape head is always positioned to the first square immediately to the right of the left end marker. Thus if the tape with contents x is nonempty, the tape head is over x(1) so that the first symbol of x can be read. If x is the empty string, , then the tape head will be positioned over the right end marker. Figure 4 illustrates the tape head for the input tape and its initial positioning. Further constraints can apply to the motion of the tape head. For example, some machines can only read
< 1 1 0 1 0 1 > Figure 3 An illustration of an input tape. < and > are left and right end markers, respectively.
1
1
0
1
0
1
>>
Figure 4 An illustration of the initial position of the input tape head. At this point the machine would be reading a 1 off the input tape.
the input once while others can go over the input as many times as desired; we will specify each input access method in turn. When we describe an ongoing computation of a machine, we must be able to describe the contents of a tape and the current position of the tape head. These two pieces of information constitute the tape configuration. DEFINITION 1 A tape configuration is a pair [p,x]. The head position p, with 0 p |x| 1, is a number indicating the position of the tape head on a tape with contents x. • When p 0 the tape head is over the left end marker, • when p |x| 1 the tape head is over the right end marker, and • otherwise the tape head is over the symbol x(p). The current tape symbol in a tape configuration is the symbol under the tape head, and is denoted [p,x]. The remaining tape in a tape configuration [p,x] is the contents of the tape from under the tape head up to but not including the right tape marker. The remaining tape, denoted by ([p,x]), is the string • if p |x| 1, and • otherwise it is x(p)x(p 1) x(|x|). The initial tape configuration is [1,x], also denoted by I(x). The final tape configuration is [|x| 1, x], also denoted by F(x). In general, machines can also have work tapes. These are tapes that can be written to and read from, and so are used to store the intermediate results of a computation. All work tapes are initially blank. When the machine writes to a work tape, it writes a new symbol onto the tape square under the tape head, replacing the old contents of the square. If the tape square under the tape head is an end marker, the writing of a symbol also extends the tape by one cell, effectively moving the tape marker to the left or right depending on what
Automata Theory
50 end of the tape the head is at. If the tape head is somewhere between the end markers when the write of an end marker occurs, the result is that the tape is truncated so that the new contents of the tape are the squares between the new end marker and its match. Overwriting > with < or vice versa is not permitted.
1. Is the machine deterministic or nondeterministic? 2. Are state transitions labeled with a single symbol from T, or a string from *T and is allowed as a label? Must there be a transition for every possible symbol for every possible state? 3. How many tapes are there, how do the heads move on the tapes, and which ones are read-only?
B. Finite Controls Tapes provide us with a model of how we present a machine with input, store intermediate data, and generate output. Now we have to consider the machine itself. How do we program it? What are the instructions of the machine? How do we know what instruction to execute next? What is the syntax of an instruction? What are the semantics of an instruction, that is, what happens when you execute an instruction? The finite control or state transition function is the key mechanism for defining all the machine models we study. It is the machine’s program and defines how the machine computes. One of the most important features of the finite control is the fact that it must be finite. This limits the machine to a finite set of instructions; this fixed number of instructions is independent of the input. Just like the program counter on a real machine, the current state of a finite control helps determine the next instruction to be executed by the machine. When an instruction is executed, it does any one or more of the following, depending on the type of machine: reads a symbol under a tape head, writes a symbol to the tape cell under a tape head, and moves a tape head. The instruction then determines the next state or states of the program and changes the current state. If the next state of a particular machine is always unambiguous (there is either a unique next state, or none at all) then the machine is said to be deterministic. If there are circumstances during the execution of the machine in which the next instruction is ambiguous (there are two or more possible next states) then the machine is said to be nondeterministic. The finite control of a deterministic machine is usually denoted . The finite control of a nondeterministic machine is usually denoted by . The precise details of the state transition function for a machine depend on other specifications of the machine such as how many tapes it has, and their input/output capabilities. In general, finite-state control, tape-memory machines can be classified according to the answers to the following questions:
III. DETERMINISTIC FINITE AUTOMATA (DFAs) The deterministic finite automaton or DFA is a very simple machine. It has one read-only input tape, with the restriction that the tape head can only move from left to right and can never change direction. The DFA has no other tapes. The finite control allows a DFA to read one input symbol from the input tape and then based on the machine’s current state, it may change state. As part of each computational step of a DFA, the input tape head is automatically repositioned one square further to the right and is ready for reading the next input symbol. For example, in Fig. 2 the states are represented by the circles. One part of the finite control corresponding to Fig. 2 is the transition (q0,0) q1. That is, when in state q0 and reading a 0 the machine transfers to state q1. The input head is then automatically moved one square to the right (moving to the right of the right end marker causes the machine to fail). The transition (q0,1) q2 specifies that while in state q0 and reading a 1 transfer to state q2. Again the input head is automatically moved one square to the right. The remainder of the finite control for the transition diagram shown in Fig. 2 is specified similarly. The finite control is shown in its entirety in tabular form in Table I. Each row in such a table represents one possible transition of M. This table is called the transition table of M. Note that there are no entries in the table to indicate what the next state should be when the input head is over an end marker. In such a case where there is no next state the machine simply stops. We are now ready to present the formal definition of a DFA. The description of the DFA is presented as a five-tuple so that the order of the components is fixed. DEFINITION 2 A deterministic finite automaton (DFA) is a fivetuple M (Q, , , q0, F ) with the components specified as follows:
Automata Theory
51
Table I A Convenient Method for Representing the Transition Function of a DFA Transition number
State
Input symbol
New state
1
q0
0
q1
2
q0
1
q2
3
q1
0
q0
4
q1
1
q2
5
q2
0
q2
6
q2
1
q2
Note: The transition table for the DFA presented in Fig. 2 is shown here. The transitions are numbered for convenience but this numbering is not part of the finite control.
1. Q: A finite, nonempty set of states. 2. : The data alphabet and its induced tape alphabet T ∪ {}. 3. : The transition function or finite control is a function : Q T ! Q. 4. q0: The initial state or start state, q0 Q. 5. F: The set of accepting states, F Q. The set of states is denoted Q. Note that Q is finite and nonempty. The data alphabet is denoted . These are the symbols that can occur on the input tape between < and >. End markers are not allowed as data symbols. The tape alphabet T is the set of all possible symbols that appear on the tape, and so it is union the set {}. We defer the description of for the moment. The initial state is denoted q0. This is a special state in Q and is the state from which M begins executing. Note the initial state is not expressed as a set like the other components in the definition. F is the nonempty set of accepting states. These special states are used by a DFA to signal when it accepts its input, if in fact it does. When the machine stops in a nonaccepting state this signifies the input is rejected. The notion of acceptance is described formally in Definition 6. Where does the input tape appear in the definition? The tape is utilized in the transition function . The domain of is Q T so elements in the domain of are ordered pairs. That is, takes a state and a symbol from the input tape (possibly an end marker). Note, the more complex models we present later make useful transitions on the end markers. The restrictions placed on the DFA do not allow it to take advantage of the end markers. Therefore, we only show
being defined on Q in our examples. A typical argument to would be (0,1). Using standard function notation we would write ((0,1)) to signify being applied to its arguments. To simplify notation, we drop the “extra” set of parentheses keeping in mind that the arguments to are really ordered pairs. So, for example, we write (0,1). The range of is Q. Suppose q Q, a T, and (q, a) q, where q Q. This is called a transition of M. This transition moves M from state q into state q on reading an a, and the input head is then moved one square to the right. In Fig. 2 transitions were represented by edges between states and labeled with input tape symbols. Since is a function, DFAs behave deterministically. Another way of saying this is that the machine has only one “choice” for its next transition, just like a typical C program must execute a unique next instruction. The complete specification for the DFA shown in Fig. 2 is given below. EXAMPLE 1 Formal specification of a DFA. The five-tuple for the DFA M shown in Fig. 2 is as follows: M ({q0, q1, q2}, {0,1}, , q0, {q0}), where is defined as in the transition table shown in Table I or equivalently expressed as {(q0, 0, q1), (q0, 1, q2), (q1, 0, q0), (q1, 1, q2), (q2, 0, q2), (q2, 1, q2)}. Here we have written the function : Q T ! Q as triples in Q T Q. In order to describe a computation of a DFA we need to be able to specify snapshots of the machine detailing where the machine is in its computation. What are the important ingredients in these snapshots? They are the configuration of the input tape and the current state of M. Such a snapshot is called a configuration of the machine. DEFINITION 3 A configuration of a DFA M (Q, , , q0, F) on input x is a two-tuple (q, [p, x]), where q Q and [p,x] is a configuration of the input tape. The initial configuration of M on input x is the configuration (q0, [1, x]), or equivalently (q0,I(x)). We use C0 to denote the initial configuration when M and x are understood. For machine M, the set of all possible configurations for all possible inputs x is denoted by C(M). How can we utilize the notion of configuration to discuss the computation of a DFA? They help us de-
Automata Theory
52 fine the next move relation, denoted M, as shown in the following. DEFINITION 4 Let M (Q, , , q0, F ) be a DFA. Let C(M) be the set of all configurations of M. Let C1 (q1, [p1, x]) and C2 (q2, [p2, x]) be two elements of C(M). C1 M C2 if and only if p2 p1 1 and there is a transition (q1, [p1, x]) q2. The relation M is called the next move, step, or yields relation. Notice M is a relation defined on configurations. This means M C(M) C(M). Since is a function, M is also a function. Definition 4 is saying that configuration C1 yields configuration C2 if there is a transition from C1 that when executed brings M to the new configuration C2. As an example consider the DFA, call it M, whose transition function was depicted in Table I. The initial configuration of M on input x 0011 is (q0, [1, 0011]). Applying transition 1 from Table I, we see (q0, [1, 0011]) M (q1, [2, 0011]). The machine read a 0 and moved to state q1. Continuing this trace (formally defined in Definition 5), we obtain the following series of configurations: (q1, [1, 0011]) M (q0, [2, 0011}) (by transition 3) M (q2, [3, 0011]) (by transition 2) M (q2, [4, 0011]) (by transition 6) We say the DFA halts when there is no next state or when the machine moves off the end of the tape. This can occur whenever the state transition function is undefined. A halting configuration of a DFA is a configuration Ch (q, [p,x]) C(M) with the property that (q, [p,x]) is undefined. If the DFA halts when there is no more input left to process, that is, it is in a configuration C (q, F(x)) then we say that the DFA is in a final configuration. That is, the DFA is in a configuration Ch (q, [p, x]) C(M) with the property that p |x| 1. The relation M was defined to aid in assisting with the descriptions of computations. But M stands for only one step. We would like to discuss computations of varying lengths including length zero. DEFINITION 5 Let M be a DFA with next move relation M. Let Ci C(M), for 0 i n. Define *M to be the reflexive, transitive closure of the relation M. C0 yields or
leads to Cn if C0 *M Cn. A computation or trace of M is a sequence ofconfigurations related by M as follows: C0 M C1 M M Cn. This computation has length n n or we say it has n steps. Sometimes, we write C0 M Cn to indicate a computation from C0 to Cn of length n. Notice that on an input x of length n, a DFA will run for at most n 1 steps. If the state transition function is defined on every state and data symbol then the DFA will process its entire input. For the four step computation traced above, we can write (q0, [1,0011]) *M (q2, [5, 0011]) or (q0, [1, 0011]) 4M (q2, [5, 0011]) with (q2, [5, 0011]) the final configuration. We would like to describe the computational capabilities of DFAs in terms of the languages they accept. First, we need to define what it means for a DFA to accept its input. The idea is simply that the machine reads all of its input and ends up in an accepting state. DEFINITION 6 Let M (Q, , , q0, F ) be a DFA and q Q. M accepts input x * if (q0, I(x)) *M ( f, F(x)), where f F. This computation is called an accepting computation. A halting configuration (q, F(x)) is called an accepting configuration of M if q F. If M does not accept its input x, then M is said to reject x. The computation of M on input x in this case is called a rejecting computation, and M was left in a rejecting configuration. M begins computing in its initial state, with the input tape head scanning the first symbol of x, and x written on the input tape. If M reads all of x and ends in an accepting state, it accepts. It is important to note that M reads its input only once and in an on-line fashion. This means M reads the input once from left to right and then must decide what to do with it. M cannot go back and look at the input again. In addition, even though M can sense the end of the input by detecting the > marker, this is only useful if M can reverse directions on the input tape. Thus M must be prepared to make a decision about accepting or rejecting assuming that the input might be exhausted after the symbol just read. As an example, the DFA with the transition function as shown in Fig. 2 accepts the input x 0011 since q0 F and (q0, [1, 0011]) * (q0, [5,0011]). We can now define the language accepted by a DFA M. Informally, this is simply the set of all strings accepted by M.
Automata Theory DEFINITION 7 Let M (Q, , , q0, F ) be a DFA. The language accepted by M, denoted L(M), is {x | M accepts x}. The union of all languages accepted by DFAs is denoted L DFA. That is, L DFA {L | there is a DFA M with L L(M)}. The DFA shown in Fig. 2 accepts the language {, 00, 0000, 000000, . . .}. It follows that this language is in L DFA. Let us look now at a typical application of DFAs. EXAMPLE 2 Application of DFAs involving searching a text for a specified pattern. DFAs are useful for pattern matching. Here we consider the problem of searching for a given pattern x in a file of text. Assume our alphabet is {a, b, c }. This example can easily be generalized to larger alphabets. To further simplify the discussion let x be the string abac. The techniques used here can be applied to any other string x. Formally, we want to build a DFA that accepts the language {s | s {a, b, c }* and s contains the pattern abac}. The idea is to begin by hard coding the pattern x into the states of the machine. This is illustrated in Fig. 5A. Since the pattern abac has length four, four states are needed in addition to the initial state, q0, to remember the pattern. Think of each state as signifying that a certain amount of progress has been made so far in locating the pattern. So, for example, on reaching state q2 the machine remembers that ab has been read. We can only reach state q4 if we have read the pattern abac so q4 is the only accepting state required. The next step is to fill in the remaining transitions on other characters in the alphabet. The complete DFA is shown in Fig. 5B. Notice how in the figure there are some edges with more than one label. This simply means that the corresponding transition can be applied when reading any one of the symbols labeling the transition. We now explain how the extra transitions were added by examining state q3. The following methodology can be applied in a similar fashion to the other states. From state q3 on reading a “c,” we enter the accepting state specifying that the pattern was indeed found; this is why state q4 is an accepting state. From state q3 on reading an “a,” we transition back to state q1. This is because the “a” could be the start of the pattern abac. That is, we can make use of this “a.” If we read a “b” from the state q3, then we need to tran-
53
A q0 >>
q1 a
q3
q2 a
b
q4 c
B >>
q0
c a
b,c
a
b
q2
q1 b
a
q3
q4 c a,b,c
a
b,c
Figure 5 Steps in constructing a DFA to a recognize a pattern x in a file of text. In this case corresponding to Example 2, x equals abac. Part (A) shows how to begin by hard coding the pattern into the machine’s states. Part (B) shows the complete DFA.
sition all the way back to state q0. The “b” nullifies all of the progress we had made and we must now start over from the beginning. The complete description of the DFA for recognizing strings containing the pattern x equals abac over the alphabet {a, b, c } is ({q0, q1, q2, q3, q4}, {a, b, c}, , q0, {q4}), where is as shown in Table II. One point worth noting is that once a pattern is found (that is, the first time an accepting state is entered), the text editor can notify the user of the pattern’s location rather than continuing to process the remainder of the file. This is usually what text editors do.
Table II The Transition Table for the DFA Described in Example 2 and Shown in Fig. 5 State
Input symbol
New state
q0
a
q1
q0
b
q0
q0
c
q0
q1
a
q1
q1
b
q2
q1
c
q0
q2
a
q3
q2
b
q0
q2
c
q0
q3
a
q1
q3
b
q0
q3
c
q4
q4
a
q4
q4
b
q4
q4
c
q4
Automata Theory
54 q1
IV. NONDETERMINISTIC FINITE AUTOMATA (NFAs) a
In this section we define the nondeterministic finite automata (NFAs). A DFA being deterministic has only one computational thread. However, an NFA, because any given configuration may have many possible next configurations, cannot be described by a single computational thread. An NFA computation should be visualized as many superimposed simultaneous threads. But an NFA is not a parallel computer—it does not have any ability to run simultaneous computations. Instead, one can imagine the NFA behaving as follows: if the problem the NFA is solving has a solution, then the simultaneous threads will collapse into a single unique thread of computation that expresses the solution. If the NFA cannot solve the problem, the threads collapse into failure. It is obvious that an NFA is not a machine that one can build directly. So why is it worth considering? Here are three reasons. The first is simply that this model has more expressive power than the DFA in the sense that it is easier to design NFAs than DFAs for some languages, and such NFAs usually have fewer states than the corresponding DFA. A second reason is that the abstract concept of nondeterminism has proved very important in theoretical computer science. Third, although NFA are more expressive when it comes to programming them, it turns out that any language that can be accepted by an NFA can also be accepted by a DFA. We prove this result via simulation in Section V. Nearly all of the basic definitions about DFAs carry over to NFAs. Let us mention the enhancements to a DFA that yield the NFA model and then look at some examples. The new features in order of descending importance are the use of nondeterminism, the use of transitions, and the use of transitions on arbitrary strings. Nondeterminism means that the machine could potentially have two or more different computations on the same input. For example, in Fig. 6 we show a portion of an NFA. In this NFA from state q0 on reading an a, the machine could go to either state q1 or state q2. This behavior is nondeterministic and was not allowed in the DFA. In our examples we will see that this feature is very useful for designing NFAs. A -transition allows the machine to change state without reading from the input tape or advancing the input head. It is useful to think of such a transition as a jump or goto. Why would such a jump be useful? As an example suppose we want to recognize the language {a}* ∪ {b}* over the alphabet {a, b}.The NFA shown in Fig. 7 accepts this language. Since NFAs, like DFAs, only get to
>>
q0
a q2
Figure 6 A partial NFA. Notice from state q0 there is a choice of either state q1 or state q2 on input a.
read their input once, the two -transitions start two threads of computation. One thread looks for an input that is all a’s, the other looks for an input that is all b’s. If either thread accepts its input, then the NFA stops and accepts. Thus we can accept (formally defined in this section) {a}* ∪ {b}* very easily; the design is also conceptually appealing. Notice that without using -transitions the machine needs three accepting states. A DFA for accepting the language {a}* ∪ {b}* is shown in Fig. 7B. This machine has more states, transitions, and accepting states and it is more complex. It turns out that at least four states are needed for any DFA that accepts the language {a}* ∪ {b}*. Now let us look at the third enhancement to DFAs. By use of arbitrary transitions on strings we mean that a transition can be labeled with any string in *. Essentially, this means an NFA is allowed to read more than one input symbol at a time (or none). How might this feature prove useful? Coupled with nondeterminism this enhancement allows us to design simpler machines. As an example recall the DFA presented in Fig. 5 that accepted the language {x | x {a, b, c}* and
Figure 7 Part (A) shows an NFA for accepting the language {a}* ∪ {b}*. Part (B) depicts the smallest DFA for accepting the same language.
Automata Theory
55
x contains the pattern abac }. An NFA for accepting this same language is shown in Fig. 8. Until we formally define computations and acceptance for NFAs, think of this machine as gobbling up symbols unless it encounters the pattern abac in which case it jumps to an accepting state and then continues to gobble up symbols. We have reduced the five-state DFA from Fig. 5 to a two-state NFA using this new feature. Rather than go through all of the definitions presented for DFAs again for NFAs, we highlight the changes in defining NFAs. DEFINITION 8 A nondeterministic finite automaton (NFA) is a fivetuple M (Q, , , q0, F ) that is defined similarly to a DFA except for the specification of the transitions. The transition relation is a finite subset of Q T* Q. Notice Q *T Q is an infinite set of triples but we require to be finite. The new specification of transitions handles all of the enhancements that were discussed above. Since we now have a relation instead of a function, the machine can be nondeterministic. That is, for a given state and symbol pair it is possible for the machine to have a choice of next move. In Fig. 6, for example, the two transitions (q0, a, q1) and (q0, a, q2) are shown. Of course, in a DFA this pair of transitions would not be allowed. Since Q *T Q this model incorporates transitions and arbitrary string transitions. For examples, in the NFA shown in Fig. 7A the two -transitions (q0, , q1) and (q0, , q2) are shown and in Fig. 8 the transition from state q0 to q1 is (q0, abac, q1). Finally, since is a relation that is not total, there can be state symbols pairs for which is not defined. In Fig. 7A the machine does not have a transition out of state q0 on either a or b. So, in the full representation of there simply are no transitions (q0,a,q) nor (q0,b,q) for any q Q. If a thread ever entered such a state, the thread would terminate. Nearly all of the other definitions for DFAs carry over with very little modification. For example, still relates configurations but now we might have C1 C2 and C1 C 3, where C 2 C 3; a situation that was not
0
1
Figure 8 An NFA for accepting the language {x | x {a, b, c}* and x contains the pattern abac}.
possible in a DFA. One definition we need to rethink is that for acceptance. Since NFAs are nondeterministic, there may be several possible computation threads on the same input. We say an input is accepted if at least one thread leads to acceptance. DEFINITION 9 Let M (Q, , , q0, F ) be an NFA. M accepts input x * if (q0, I(x)) *M (f,F(x)), for some accepting state f F. Such a computation is called an accepting computation. Computations that are not accepting are called rejecting computations. The language accepted by M, denoted L(M), is {x | M accepts x }. The union of all languages accepted by NFAs is denoted L NFA. That is, L NFA {L | there is an NFA M with L L(M)}. On a given input an NFA may have both accepting and rejecting computations. If it has at least one accepting computation, then the input is accepted. That is, the input is accepted if at least one thread leads to an accepting state. The language accepted by an NFA consists of all strings that the NFA accepts. Let us consider an example of two possible computations of the NFA M shown in Fig. 8 on input abaca. The first is (q0, [1,abaca]) M (q0,[2, abaca]) M (q0,[3, abaca]) M (q0,[4, abaca]) M (q0,[5, abaca]) M (q0,[6, abaca]) and the second is (q0, [1,abaca]) M (q1,[5, abaca]) M (q1,[6, abaca]). Clearly, the two computations are very different. In the first one we use up all of the input in five steps but do not end in an accepting state. Thus, this is an example of a rejecting computation. In the second case we use up all of the input in two steps and do end in an accepting state. Thus, the latter computation is accepting. Since there was an accepting computation, the input abaca is accepted by M and abaca L(M). To prove that an input is accepted by an NFA, one only needs to demonstrate that a single accepting computation exists. However, to argue that an NFA does not accept a given string, one must show that all possible computations are rejecting. This is usually more difficult.
Automata Theory
56
V. EQUIVALENCE OF DFAs AND NFAs NFAs possess many features DFAs do not. These enhancements simplify the design of NFAs for certain languages. Do these new features actually make the NFA a more powerful model in the sense that it can accept languages that no DFA can? That is, is there some language L that an NFA can accept that no DFA can? Surprisingly, DFAs and NFAs accept exactly the same class of languages. We prove this theorem below. Our treatment of this result is more detailed than that of other similar equivalences described in this article. The intention is to present one detailed argument to the reader. The specifics for other equivalences can be found in the references. THEOREM 1 The language classes LDFA and LNFA are equal. Proof. (LDFA LNFA) Suppose L LDFA. Then there exists a DFA M (Q, , , q0, F ) such that L(M) L. The idea is simply to view M as an NFA. Define an NFA M (Q, , , q0, F ), where if (q,a) q then (q,a,q) . We claim that L(M) L(M). It is easy to see that (q0, I(x)) *M (f, F(x)), where f F if and only if (q0, F(x)) *M (f, F(x)). This simply says the machines have the same transitions, which is of course how M was defined. Since L(M) L, this shows L LNFA and we can conclude that LDFA LNFA. (LNFA LDFA) Suppose L LNFA. Then there exists an NFA M (Q, , , q0, F ) such that L(M) L. We will construct a DFA M3 such that L(M3) L. The simulation of M will take place in three stages. In each stage a new machine will be constructed that is equivalent to M but is more like a DFA than in the previous phase. In the third stage the result is in fact a DFA. The first stage involves eliminating transitions of the form (q, y, q), where |y| 1. Stage two involves eliminating -transitions. In the third stage nondeterminism is removed. From M we construct M1, from M1 we define M2, and from M2 we build the desired DFA M3. Figure 9 illustrates the process. Since we will show L(M3) L(M), this is enough to complete the proof. Constructing M1: The idea in building M1 is to add new states to M so that strings y labeling transitions,
with |y| 1, can be split up into single symbol transitions as required in a DFA. The reader can think of splicing in new states in the transition diagram for any edge labeled by a string of length more than one. The five-tuple for M1 is given by M1 (Q1, , 1, q0, F ). The new state set Q1 consists of the states from Q and the additional states that we need to splice in. The relation 1 is defined in terms of except that transitions on strings of length more than one need to be replaced by a new set of equivalent transitions. The algorithm shown in Fig. 10 describes exactly how Q1 and 1 are constructed. The following three facts imply that L(M) L(M1): the set of accepting states in M1 is the same as in M, all transitions in are in 1 except for those on strings of length greater than one, and transitions on strings of length greater than one in were replaced by transitions in 1 that carried out the same function. Constructing M2: The second stage in the construction requires that we eliminate -transitions from M1. In the process we will define a new NFA M2 (Q2, ,
2, q0, F2). In this case Q2 equals Q1 and F2 F ∪ {q | (q, [1,]) *M1 (f, [1,]) for some f F }. This says that any state in M1 from which we can reach an accepting state without using input becomes an accepting state in M2. Since we are not consuming input, the empty tape configuration [1, ] is sufficient. The algorithm shown in Fig. 11 shows precisely how
2 is constructed. The idea is to eliminate all transitions from 1. We replace any combination of -transitions and a single transition on one symbol in
1 by a transition involving a single symbol in 2. We now argue that L(M1) L(M2). Suppose x L(M1). Then (q0, I(x)) *M1 (f, F(x)) for some f F. This computation may involve -transitions. Any series of -transitions that are followed by a transition in which an individual symbol is read can be replaced by a single transition of M2 resulting in M2 having the same configuration as M1. Any combination of -transitions that occur after x has been completely read lead from some state q~ to f. Because of the way in which F2 was defined, we see q~ F2 and so x L(M2). This shows L(M1) L(M2). A related argument can be used to show that L(M2) L(M1) essentially by re-
Figure 9 Illustration of the construction carried out in Theorem 1.
Automata Theory
Figure 10 Constructing M1.
versing the steps in the argument just presented. All this says is that we did not make M2 accept more strings than M1. Notice if there was some state that involved only transitions in M1, it is possible that after applying stage two of subset construction that this state becomes disconnected or unreachable. An unreachable state is one that no longer plays any role in the strings accepted by the machine since it can no longer be used in a computation. This is because there is no way to enter an unreachable state by a computation starting from the initial state. Before proceeding to stage three, we discard unreachable states from M2. Constructing M3: The third stage in the construction is to eliminate nondeterminism from M2 in forming M3. The idea is to consider all the possible threads of computation that M2 could have active. Each thread is in a single well-defined state, which is based on the nondeterminism and the transitions that were chosen in the past by the thread. The current states over all possible threads will be compressed into a single state in M3. In other words the states in M3 will be the power set of the states in M2 (after having removed unreachable states). Rather than complicate notation further let us continue to refer to Q2 and F2 (possibly different since unreachable states may have been removed) using the same notation. We define the new machine as follows: M3 (Q3, , 3, {q0}, F3), where Q3 2Q2 and F3 {Q | Q Q3 and Q F2 }, and 3 is formed as described in the algorithm depicted in Fig. 12.
57
Figure 12 Constructing M3.
It is important to observe that the states of M3 are sets of states of M2. That is, M3’s states consist of all possible subsets of states from M2. The idea behind the algorithm shown in Fig. 12 is best explained via a simple example. Suppose, for example, that from state q, M2 could on an a go to either state q1 or state q2. That is, M2 from configuration (q, [1,as]) for any string s can make a nondeterministic move—either change to configuration (q1,[2,as]) or to configuration (q2, [2,as]). In M3 then we want to end up in configuration ({q1, q2},[2,as]) since M2 could be in either one of these states after reading the a. This is why we need sets to represent the states of M3. It is easy to see that M3 is a DFA. The last statement in the code shown in Fig. 12 dictates that exactly one transition is added to the relation 3 for each symbol and each state. Thus 3 is a function. We must argue that L(M3) L(M2). Using the transitivity of equality, this will imply that L(M3) L(M). So, this will complete the entire proof of the theorem. First, we prove that L(M2) L(M3). Suppose x L(M2), with n |x|. Then there exists a computation (q0,[1,x]) *M2 (f, [n 1,x]) for some f F2. Since M2 has no -transitions and no transitions on strings of length more than one, this computation passes through exactly n 1 states. The if statement of the algorithm shown in Fig. 12 adds the appropriate state to the set R, and then in the last step of the algorithm the appropriate transition is added to 3 keeping track of all the possible states that M2 could be in. Thus for each step of the computation in M2 involving a transition (q, a, q2), there are corresponding sets of states Q with q Q and Q 2 with q2 Q 2 in M3, and a transition (Q, a, Q 2) in 3. Therefore, ({q0}, [1,x]) nM3 (F ,[n 1,x]),
Figure 11 Constructing M2.
where f F . This shows that x L(M3), so L(M2) L(M3). By a related argument that essentially reverses the steps in this one, we can prove that L(M3) L(M2).
Automata Theory
58 q0 >>
0
q1
0
0
0,1
0,1
1
q2 1
q3
0 0
q4 0 0,1
Figure 13 A sample NFA.
It is instructive to trace an NFA on an input to examine how the set of states that the NFA can be in evolves. In Fig. 13 we show a five-state NFA. Consider this NFA on the input 010. On reading the first 0, the machine can occupy states q0 or q1. On reading the 1, the machine can occupy states q0 or q2. Finally, on reading the last 0, the machine can occupy states q0, q1, or q3. Since none of these states are accepting, the machine rejects input 010.
VI. EQUIVALENCE OF DFAs WITH OTHER MODELS We call a language regular if it belongs to LDFA. Therefore, every regular language can be described by some DFA or some NFA. In this section we look at other ways of representing regular languages.
A. Regular Expressions One way of describing regular languages is via the notion of regular expressions. The notation for regular expressions involves a combination of strings of symbols from some alphabet , parentheses, and the operators , , and *. DEFINITION 10 We construct regular expressions by applying the following rules: 1. , , and a are all (primitive) regular expressions. 2. If r1 and r2 are regular expressions, so are r1 r2, r1 r2, r*1, and (r1). 3. A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in step 2. The next definition describes the languages that can be represented by regular expressions.
DEFINITION 11 The language L(r) denoted by any regular expression r is defined by the following rules: is a regular expression denoting the empty set; is a regular expression denoting {}; for every a , a is a regular expression denoting {a}; if r1 and r2 are regular expressions, then so are L(r1 r2) L(r1) ∪ L(r2), L(r1 r2) L(r1)L(r2), L((r1)) L(r1), and L(r1*) (L(r1))*. It is intuitively reasonable that for every regular language L, there exists a regular expression r such that L equals L(r). In fact, every regular language has an associated NFA and it can be seen that the inputs of all the accepting threads from the initial state to any final state are generated by a regular expression. On the other hand, if r is a regular expression then L(r) is regular. This follows from the fact that LDFA is closed under union, concatenation, and star closure. By closed we mean that you can take any languages in the class, apply these operations to them, and the resulting language will still be in the class. It is relatively easy to build the corresponding finite automata by working with NFAs.
B. Grammars We present the definition of another model of computation, namely a grammar, in this section. DEFINITION 12 A grammar G is defined as a quadruple G (N, , S, P), where • N is an alphabet called the set of nonterminals. • is an alphabet called the set of terminals, with N . • S N is the start variable. • P is a finite set of productions of the form x → y, where x (N ∪ ) and y (N ∪ )*. Given three strings w, u, v (N ∪ )* such that w uxv, we say that the production x → y is applicable to w. By applying the production to w we obtain a new string z uyv. We say that w derives z and denote this with w ⇒ z. If w1 → w2 ⇒ ⇒ wn, we say w1 derives wn and denote this with w1 ⇒* wn. The set L(G) {w * : S ⇒* w} is the language generated by G. A third way of representing regular languages is by means of certain simple grammars, called regular grammars.
Automata Theory DEFINITION 13 A regular grammar is a four-tuple G (N, , P, S), where N is a nonempty set of nonterminals, is an alphabet of terminals, S N is the start symbol, and P is a set of productions of the form x → y, where x N and y *N ∪ *. Notice that productions in a regular grammar have at most one nonterminal on the right-hand side and that this nonterminal always occurs at the end of the production. The grammar generates a string of terminals starting from S and then by repeatedly applying productions in the obvious way until no nonterminals remain. Let M (Q, , , q0, F ) be an NFA accepting a language L. It is easy to see that a regular grammar G (N, , S, P) generating L can be defined by setting V Q, S q0, and putting in P the production qi → ajqk if (qi, aj, qk) and the production qk → if qk F. It is also easy to see that every regular grammar G (N, , P, S) has a corresponding NFA M (N ∪ {f }, , , S, {f }), where is formed as follows: 1. For every production of the form A → xB, where A, B N and x *, the transition relation contains (A,x,B). 2. For every production of the form A → x, where A N and x *, the transition relation contains (A,x,f ). Whenever we define a language family through a grammar, we are interested in knowing what kind of automaton we can associate with the family. This will give us an indication of how efficiently the language can be recognized. Programming language syntax is usually specified using a grammar. The parser for the language is an implementation of the corresponding automaton.
VII. PUSHDOWN AUTOMATA AND BEYOND A. Introduction Regular languages have broad application in computer science, but many useful languages are not regular and so cannot be accepted by DFAs and NFAs. For example, no DFA can accept the language consisting of nested, balanced parentheses, i.e. {(i)i | i 0} {, (), (()), ((())), . . .}. Obviously, a language such as this is an important one from the perspective of programming languages. The reason DFAs cannot accept a language like the nested, balanced paren-
59 theses language is because they do not have any way of storing information other than in a finite set of states. The deterministic pushdown automaton (DPDA) and the nondeterministic pushdown automaton (PDA) considered in this section are extensions of the DFA and NFA, respectively. The models are extended by adding a pushdown (or stack) data structure. Stacks provide the machines with the ability to write and store information for later retrieval. The pushdown automata allow us to accept a richer class of languages than the finite automata and are useful for parsing program code. In Section VII.D we briefly explore an extension of these models called the Turing machine. The deterministic (nondeterministic) pushdown automata can be viewed as a DFA (respectively, NFA) that has a stack added to it. Its finite control can read symbols from the stack as well as having the current state and input tape symbol to base its next move on. The next move will result in a possible state change and some manipulation of the stack. We allow the DPDA to access more than one symbol from the stack at a time analogously to NFAs. Before formally defining the pushdown automaton let us ask the following question: how could a DFA augmented with a stack be used to recognize the language {(i)i | i 0}? Intuitively, we could use the stack to store left parentheses. For each right parenthesis we encounter a left parenthesis that could be popped off the stack. If we run out of right parentheses exactly when the stack is empty, then we know that the parentheses are balanced. Naturally, we have to make sure that the parentheses are in the correct order too; we have to be careful not to accept strings like (())(). Let us now describe how a stack behaves, define the DPDA formally, and then return to a complete description for a DPDA that accepts the language of balanced parentheses. We will implement the stack by adding an extra tape to our finite automaton model. This extra work tape, called the stack, will be writable (unlike the input tape). However, there will be restrictions on how this tape can be manipulated. Its initial configuration is [1,], the empty stack. The right end tape mark is the bottom of stack marker, and the stack tape head will be the top of stack pointer. The stack can be written to in only two circumstances. 1. The head can move left one square onto the < mark and then write any symbol from the data alphabet . This symbol is thus pushed onto the stack and becomes the new topmost symbol. This operation is called a basic push.
Automata Theory
60 2. If the head is not over the > mark, then a < mark can be written and the head advanced one square to the right. This results in the top symbol being popped off the stack. This operation is called a basic pop. These two basic stack operations describe how to push a single symbol and pop an individual symbol off of the stack. To implement a push x operation, |x| basic push operations are required. To perform no operation on the stack, a basic push operation of < is executed. To implement the instruction pop y, |y| basic pop operations are executed; each one removing the next symbol from y off the stack. To implement the pop y, push x combination requires |y| basic pops followed by |x| basic pushes. Armed with these preliminaries we are now ready to define our stack machines.
implemented via |y| basic pop operations. More generally, the transition (q, a, y) (q,x) means read an a from the input tape, change state from q to q, and replace the string y on the top of the stack by the string x—think of a pop y followed by a push x being executed in one step. The actual implementation of this operation requires |y| basic pops followed by |x| basic pushes. From now on we will focus on the highlevel pushes and pops, which in the DPDA and PDA require only one step. All of our definitions about finite automata carry over in a natural manner to pushdown automata. The one item that needs further clarification is the acceptance of a string. For acceptance we will require that the DPDA end up in an accepting state after reading all of its input, as we did for DFAs, but also require that the stack be empty. As promised we construct a DPDA to accept the language of nested, balanced parentheses.
B. Deterministic PDAs The formal definition of a deterministic pushdown automata is given below. DEFINITION 14 A deterministic pushdown automaton (DPDA) is a five-tuple M (Q, , , q0, F ) that is a DFA augmented with a stack. All components are defined similarly to a DFA, see Definition 1, except for the transition function that must incorporate the stack. : The transition function or finite control is a (partial) function : Q T *T ! Q T. The important points to notice about are that it is finite and a partial function. A partial function is a function that may be undefined for some elements of its domain. Although the domain of is infinite, being finite implies it is only defined for a fixed number of triples. The restrictions on how reads and writes to the stack are performed (described in the last section) disallows strings in T* like a then empty the stack while leaving
E. Turing Machines Interestingly, simply adding a read/write work tape to a DFA is sufficient to produce a machine, called a Turing machine, which is as powerful as any other computing device. We present its definition below. DEFINITION 17 A deterministic Turing machine (DTM) is a five-tuple M (Q, , , q0, F ) that is defined similarly to a DFA, except we allow two-way read-only access to the input tape and incorporate a work tape. The transition function or finite control is a (partial) function : Q T T ! Q {1, 0, 1} T {1, 0, 1} The current state of the DTM, and the current configurations of its input and work tapes, are used by to compute the next state and to manipulate the tapes. takes a state, a symbol from the input tape alphabet, and a symbol from the work tape alphabet as arguments. It generates a four-tuple that indicates the next state of the finite control, a change in position of the input tape head, a symbol to be written to the work tape head, and a change in position of the work tape head. A 1 moves left one cell, a 1 moves right one cell, and a 0 means do not move. As with any of the automata, a Turing machine starts in its initial state with an input written on the input tape. It then goes through a sequence of steps controlled by the transition function . During this process, the contents of any cell on the tape may be examined and changed many times. Eventually, the whole process may terminate, either by entering a halt state or reaching a configuration for which is not defined. Languages accepted by a Turing machine
Automata Theory
63
are called recursively enumerable. A language is recursively enumerable if and only if is generated by a (general) grammar. Those recursively enumerable languages accepted by Turing machines that halt for any given input are called recursive and form a proper subset of the set of recursively enumerable languages. As with each of the other automaton, we can define the nondeterministic version of a Turing machine.
ACKNOWLEDGMENTS The material in this article has been recast from Greenlaw and Hoover (1998). We are grateful to Jim Hoover for allowing us to include the material here. Thanks to the referees for their valuable suggestions; they helped to improve this article.
SEE ALSO THE FOLLOWING ARTICLES
DEFINITION 18 A nondeterministic Turing machine (NTM) is a fivetuple M (Q, , , q0, F ) that is defined similarly to a DTM, except for the specification of the transitions. The transition relation is a finite subset of
Decision Theory • Future of Information Systems • Game Theory • Information Theory • Machine Learning • Systems Science • Uncertainty
BIBLIOGRAPHY
Q T T Q {1, 0, 1} T {1, 0, 1} It can be shown that any nondeterministic Turing machine can be simulated by a deterministic Turing machine.
VIII. SUMMARY In this article we have covered the basic ideas in automata theory. The references provide additional details, and many other interesting applications and results of the theory.
Greenlaw, R., and Hoover, H. J. (1998). Fundamentals of the theory of computation. San Francisco, CA: Morgan Kauffmann Publishers. Hopcroft, J. E., Motwani, R., and Ullman, J. D. (2001). Introduction to automata theory, languages, and computation, 2nd Edition, Addison-Wesley. Lewis, H. R., and Papadimitriou, C. H. (1998). Elements of the theory of computation, 2nd edition. Englewood Cliffs, NJ: Prentice Hall. Sipser, M. (1997). Introduction to the theory of computation. Boston, MA: PWS Publishing Company. Sudkamp, T. A. (1991). Languages and machines, 2nd edition. Reading, MA: Addison-Wesley. Wood, D. (1987). Theory of computation. New York: John Wiley & Sons.
B
Benchmarking Bengt Karlöf Karlöf Consulting
I. INTRODUCTION II. THEORY AND REVIEW OF BENCHMARKING III. METHODOLOGY OF BENCHMARKING
IV. BENCHMARKING—PITFALLS, SPRINGBOARDS, AND OBSERVATIONS V. SUMMARY
GLOSSARY
strategic benchmarking A looser and more inspirational version of benchmarking to enhance creativity in strategy processes. strategy A pattern of decisions and actions in the present to secure future success and exploit opportunities.
benchlearning Combines the efficiency aspects of benchmarking with the learning organization. It thereby combines hard and soft issues in making people learn about what is important for the success of the operation. benchmarking A management method deriving from the land surveying term “benchmark” which is a point fixed in three dimensions in the bed rock. Benchmarking means calibrating your efficiency against other organizations, getting the inspiration and building on other peoples experiences. causality Illustrates the cause and effect logic that is important to understand in benchmarking. Why is someone performing better and how do they do that? cross-industry benchmarking Means that you take out a function or a process in a company and compare it to the corresponding organizational units in a company in another industry. Take for instance billing in the telecom industry that can be benchmarked with billing in credit card operations or energy companies. efficiency The function of value and productivity. Value is in turn utility (or quality) in relation to price. Productivity means the cost of producing and delivering a unit of something. Efficiency thereby includes effectiveness and productivity. internal benchmarking Surprisingly often learning does not take place across a decentralized structure with many similar units. Internal benchmarking has the purpose of creating cross learning.
I. INTRODUCTION Benchmarking is a widely used concept, but one that is often misinterpreted and insufficiently used. Although the simple definition of benchmarking is to make a comparison between parts of or the entire operation, there is much more to it. This article sorts out the correct meaning and application of benchmarking. Based on several years of experience working with the method, this article also shares the most important lessons learned from the practical use of benchmarking. This article views benchmarking from a management point of view and does not have a specific information systems approach. Benchmarking derives its force from the logical trap that it springs on those who oppose change. The psychology that makes benchmarking so effective compared to most other methods can be summed up as a reversal of the burden of proof. It works like this: In normal circumstances, those who want to change something have to show proof of why it should be changed. Application of advanced benchmarking shifts the burden of proof to the conservatives, who have to show proof of why the change should not be made.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
65
Benchmarking
66
II. THEORY AND REVIEW OF BENCHMARKING A. What Is Benchmarking? Benchmark is a term used in surveying. It means a fixed point, often marked by drilling a hole in bedrock and painting a ring around it, with a defined location in three dimensions (altitude, latitude, and longitude, not time, mass, and energy) from which the locations of other points can be determined. Etymologically the word can be traced back to the British weaving industry in the 18th century. The original meaning is uncertain, but is in any case irrelevant to the use of the word in a management context. In management, the terms benchmark and benchmarking are used as a metaphor for criteria of efficiency in the form of correct key indicators, but above all they refer to a process in an organization that leads to more efficient operation in some respect, better quality, and/or higher productivity. The word benchmark has been used in functional parts of companies to denote calibrated key indicators or other comparable quantities. In the former sense, the word has been used since the 1960s for comparison of production costs in computer operation. In the advertising business it has been used in comparing quality/price relationships between comparable products. Nothing is new under the sun. That also applies to benchmarking. The practice of acquiring knowledge from colleagues has been customary in the medical profession ever since the days of Ancient Egypt, and a similar phenomenon has been observable in Japan since the Imperial Meiji Restoration of the 1860s. The origin of benchmarking in its modern, deliberately applied form is, however, generally ascribed to Rank Xerox, which in 1978 was severely shaken by price competition from Japanese manufacturers of small and medium-sized office copiers. This competition was initially dismissed as dumping or faulty arithmetic, but then a fact-finding expedition was sent to Japan and found, to its dismay, that the Japanese had managed to reduce their production costs far below what Rank Xerox then considered feasible; their own targets for productivity improvement turned out to be hopelessly inadequate. IBM likewise adopted benchmarking a decade later, but on a scale that was far too limited; their benchmarking was between their own production units and with other mainframe computer manufacturers. The most important aspects of current benchmarking can be summarized as follows:
1. A complete and correct description of the processes and activities that create value-adding performance 2. Correct and accepted comparison with another party—a good example 3. In-depth understanding of causality, i.e., of the differences in work organization, skills, etc. and so on, that explain differences in performance. (In short: why, why, why?) 4. Reengineering of work organization and routines and development of new skills to make operations more efficient; inspiration from, not imitation of, the partner 5. A goal-related, result-rewarded process of change that uses benchmarking as the starting point for an institutionalized search for new examples for continuity Benchmarking is thus essentially a dynamic, improvement-oriented process of change, not a static comparison of key indicators which at worst are not calibrated and therefore not comparable. This last must be particularly emphasized, because benchmarking has often been wrongly applied using uncalibrated indicators in which nobody believes, therefore making them entirely worthless. New methods, usually with catchy names, are constantly being launched in the field of management. The danger of a label like benchmarking is that it may be filed away in the same pigeonhole with many other concepts whose effectiveness is only a fraction of that which benchmarking possesses. So it is important for the reader to make this distinction in order to appreciate the usefulness of the method. Its potential lies perhaps not so much in the quick and easy boost to efficiency that it offers as in the institutionalized learning process, which is hereinafter termed Benchlearning. That name is a registered trademark and stands for organizational skill development. Benchmarking can be advantageously defined in terms of what it is not, namely: 1. Key indicator analysis 2. Competition analysis 3. Imitation It may, however, be useful to emphasize a few side effects that often tend to become the principal effects of benchmarking: 1. Definition of the content of one’s own work 2. Determination of measurement parameters
Benchmarking
67
3. Sound learning, which generates a demand for even more knowledge Benchmarking is almost always initiated by the management of a company or organization or unit thereof. The aim is almost always to improve efficiency, leading to higher quality or productivity. The mistake is often made of taking certain things for granted, for example, that processes and activities are defined and that parameters exist for measuring them. Such is in fact seldom the case, and this necessitates a thorough review of the situation, which performs the highly useful function of creating awareness of what the tasks are and how they relate to the company’s corporate mission. Institutionalized learning is seldom if ever the focus of a benchmarking process. Its psychosocial consequences are thus not the original reason for starting the project. In the best case they come as a pleasant surprise—provided, of course, that all concerned have been given time to reflect, make their own discoveries, and solve their problems. The one thing that is more crucial than anything else to success in any organized enterprise is efficiency. This concept is therefore considered in detail next.
B. Efficiency: Technocracy, Software, and Reality Half in jest, I often define a technocrat as a person who approaches technical or economic problems from a strictly rational standpoint, unmoved by human, cultural, or environmental considerations. Semantically, the term efficiency is an exceptionally vague one. Most people associate it with productivity in the sense of “working harder and faster.” In the
line of reasoning we shall follow here, productivity is just one axis of the efficiency graph. Productivity means doing things right, i.e. making or delivering products and services of a given quality at the lowest possible unit cost. Let us begin by looking at an efficiency graph (Fig. 1), on which the following discussion is based. As the graph shows, efficiency is a function of value and productivity. This means that an organization can achieve high efficiency either by working on its productivity to lower its production costs, thereby enabling it to offer lower prices and thus greater value, or by concentrating on quality, which increases the numerator of the value fraction and offers the options of either greater volume or higher prices. Conversely, inefficiency can take two forms. One is when the value (price/quality ratio) offered to the market is acceptable, but low productivity leads to losses when the product is sold at the price the market is willing to pay. That, until recently, was the situation of several automakers like Saab, Volvo, and Mercedes-Benz. The market was willing to pay a given price for a car of a given design, but unfortunately the cost of making the car—materials, components, and labor—was far too high. In the automotive industry they use assembly time in hours as an approximation of productivity. An unprofitable factory with a sellable model can thus define its short- and medium-term problem as one of productivity and try to reduce the cost (or assembly time) per unit, assuming that the situation has been correctly analyzed. This brings us to one of the key questions regarding the graph, along with two others: 1. Where are we on the graph compared to our competitor or whomever we are comparing ourselves with?
Figure 1 The efficiency graph.
Benchmarking
68 2. Given our position, in which direction should we move on the graph? 3. How quickly do we need to do it, i.e., how much time do we have?
ginning of the 20th century. The reason, of course, is that direct material and labor costs have historically been the dominant components in the selling price of a product. John Andersson has put it this way:
Sometimes you can get halfway to solving the problem simply by asking the right question. If Saab, for example, has produced a model that the market is unwilling to buy in sufficient numbers at a price that is impossible to achieve, there is no point in messing about with assembly time and productivity. There are two analytical parameters, value and productivity. The foregoing example illustrates a situation in which value is judged to be satisfactory and productivity is the problem. The reverse situation, alas, is much more common and more treacherous, i.e., one in which productivity is high but the value is too low or, even worse, has never been tested on a market. The most obvious examples of the latter situation are to be found in production of tax-financed public services. Productivity in the public sector in, for instance, my own country, Sweden, is often remarkably high. Measured in terms of processing times for inspection of restaurants or issuing of building licenses or passports, Swedish civil servants are outstandingly productive compared to their counterparts in other countries of which I have experience, such as the United States, the United Kingdom, France, or Germany. The problem, just as in companies, is that their products and services have never been put to the market test. This frequently results in production of things for which there would be no demand if they were offered on a free market where the customers could make their own choices. The situation is obscure but probably even worse in countries that also suffer from low productivity in the public sector. Many of us have noticed this since joining the European Union (EU), which has meant that yet another publicly financed system has been superimposed on those we already had. The efficiency graph raises some central issues in a benchmarking context:
There used to be 14 blacksmiths for every clerk. Now there are 14 clerks for every blacksmith.
1. 2. 3. 4.
How do we define what we produce? What is the cost per unit produced? Who uses what we produce? By what criteria do users judge what we produce and deliver?
Productivity in administration and production of services is a subject that has received scant attention in the literature. The question of manufacturing productivity, on the other hand, has been discussed in detail ever since the heyday of F. W. Taylor in the be-
The sharply rising proportion of distributed costs has prompted growing interest in distributed costs in general and administrative overhead in particular. This is one reason why ABC (activity-based costing) analysis and other aids to calculation have been developed. In an era of mass production and mass consumption, productivity in manufacturing was a prime consideration. It was more important to have a car, any kind of car, than no car at all. The same applied to refrigerators, shoes, etc. Most of what the economist Torstein Veblen has called conspicuous consumption takes place in modern Western societies. That term refers to values which in Abraham Maslow’s hierarchy of needs are classified as self-realizing. These developments have led to a growth of interest in value theory. There is a shortage of literature, and indeed of general knowledge, about the value axis of the efficiency graph. Value is a subjective phenomenon. Perhaps that is why analysts, regarding it as vague and lacking in structure, have paid so little attention to it. Adam Smith, the father of economic science, tried to explain value in terms of the amount of work that went into a product; but he never succeeded in explaining why a glass of water, which is so useful, is worth so little while a bag of diamonds, which is of no practical use, is worth so much. It was the Austrian economist Hermann Heinrich Gossen (1810–1858) who came to consider how precious water becomes to a thirsty traveler in a desert. That thought led him to formulate the theory of marginal utility, as follows: The marginal utility of an article is the increment to total utility provided by the most recently acquired unit of that article.
Gossen’s simple observation was that the parched traveler in the desert would be willing to trade a sizable quantity of diamonds for a liter of water, because the utility of the water far exceeded that of the diamonds. Value theory and its corollaries regarding quality will command increasing attention from leaders of enterprises. This is especially true of leaders of units that have a planned-economy relationship to their “customers.” In most cases it is desirable to specify the value axis by constructing a value graph with quality as the abscissa and price as the ordinate, as shown in (Fig. 2).
Benchmarking
69 Experienced customer value
High
Quality
Low Low
Price
High
Figure 2 The value graph.
Quality stands for all the attributes of products and services that together configure the offering. Price is the sacrifice the customer must make in order to take advantage of the offering. What Veblen calls conspicuous consumption is a deviation, such as when somebody buys a status-symbol car like a Jaguar at a price far in excess of reasonable payment for performance. In this case, the price functions as a signal of quality to the rest of the world. A consequence of this is that the volume of sales does not increase if the price is reduced. Conspicuous consumption situations are fairly common nowadays, occurring in connection with all prestige brands. Fake Rolex watches made in Thailand do not encroach on the market for real Rolex watches, because they are sold to people who would never dream of paying what a genuine Rolex costs. In the case of internal deliveries between departments in the planned-economy environment that prevails within a company, the price consists of a production cost that is seldom paid directly by the receiving unit. In this case, the price does not represent a sacrifice on the buyer’s part. The quality of the delivery, however, can be judged. These are very important considerations with regard to productivity measurements in planned-economy systems, as well as in total quality management (TQM) and similar schemes aimed at monitoring efficiency in units of companies and organizations. In the application of benchmarking it is of course extremely important to get into the habit of considering both axes of the efficiency graph, i.e., value (quality in relation to price) and productivity (cost per unit). The highest price that a product or service can command is determined by the value that customers put on it. The lowest price that the supplier can charge without going broke is determined by productivity, because no company can stay in business for long if its production costs are higher than its rev-
enues. Productivity, therefore, influences the customer’s perception of value in that higher productivity makes it possible to offer a lower price and thus higher value. Quality, on the other hand, usually costs money, when we are talking about customer-perceived quality. More space between the seats in an aircraft means fewer passengers and therefore a higher cost per passenger-kilometer. In medical services, increased consultation time per patient is likewise a quality parameter that adds to the cost of providing the service. Improving quality in the other sense, i.e., reducing the frequency of rejects in production, does not normally increase costs but reduces them instead. This is done by tightening the production flow and eliminating the cost of rework by taking proactive measures (Fig. 3). The concept of efficiency is central not only to benchmarking, but also to all forms of enterprise and organization, so a full understanding of the meaning of efficiency is essential.
C. Categories of Benchmarking According to the individual situation, benchmarking can be divided into a number of categories and extremes can be contrasted with each other. Some of the more important ones are as follows: • • • • • •
Strategic and operative benchmarking Internal and external benchmarking Qualitative and quantitative benchmarking Same industry and cross-industry benchmarking Benchmarking in a leading or supporting role Benchmarking for better performance or for world-class performance
The purpose is instructional, i.e., to make the reader aware of angles that enable benchmarking to be done more effectively.
1. Strategic and Operative Benchmarking One of the great advantages of benchmarking is that the method can be applied to both strategic and operative issues. The dividing line is by no means sharp. Strategy is defined here as “action taken at the present time to ensure success in the future.” Strategy thus aims at achieving good results not only now but next year and the year after that too, though the term is sometimes loosely used of any issue that is of major importance, regardless of the time frame.
Benchmarking
70
Figure 3 Quality.
Operative management, on the other hand, has to do with all the day-to-day problems that must be solved right now to ensure that deliveries can be made to customers without a hitch. Strategy is always important, but seldom acute. Operative problems, on the other hand, may be both important and acute. (You can read more about this in Conflicts of Leadership, Bengt Karlöf, Wiley, 1996, and Strategy in Reality, Bengt Karlöf, Wiley, 1997.) A rough distinction between strategic and operative issues is made in Fig. 4. As a strategic instrument,
benchmarking can perform the function of identifying new business opportunities or seeking areas where dramatic improvements can be made. Identification of the latter is made harder in many organizations by the absence of competition, and in such cases benchmarking performs a strategic function. Probably the most widespread application of benchmarking is as an operative instrument to identify ways to improve operations. The aim of functional or process benchmarking is to seek and find good models for operative improvements. The great challenge
Figure 4 Strategic and operative efficiency.
Benchmarking of the future will be to apply benchmarking in all parts of the business that operate under plannedeconomy rules. The conclusion from this is that the principal application of benchmarking will be operative. That statement is not intended in any way to belittle the importance of the strategic angle, but because competition is endemic to business, and because competition can be regarded as an ongoing form of benchmarking, one can safely conclude that all the planned-economy parts of an organization will have a strong motive to find indicators of their efficiency through benchmarking.
2. Internal and External Benchmarking Benchmarking will be imperative in organizations that contain a number of similar production centers within themselves. Internal benchmarking means making comparisons between similar production units of the same organization or company. It may sometimes seem strange that organizations exposed to the full force of competition do not use benchmarking to the extent that one might expect. This applies, for example, to banks and airlines. Learning how to improve efficiency within one’s own company ought to come first, indeed it should be self-evident. External benchmarking raises numerous questions. Many people seem to assume that benchmarking must involve comparisons with competitors, but this is not necessarily so. Of all the projects I have worked on, only a vanishingly small minority have involved competitors. Where this has happened, the subjects of study have been processes at an early stage in the valueadded chain, like production and project engineering, which are not regarded as competitively critical. Almost all industries offer opportunities for benchmarking in which the competitive situation is not a sensitive issue. A construction firm in England, for example, may pick one in North America with which it is not in competition. A European airline may benchmark its medium-haul operations against an airline in Southeast Asia, where the degree of competition is negligible. A step-by-step procedure often follows the sequence of • Internal comparisons • Benchmarking within the same industry • Good examples outside the industry I would like to emphasize that companies should start by picking partners whose operations are as closely comparable as possible to avoid straining people’s abil-
71 ity to recognize similarities. Later you can gradually move further afield to study situations that may differ more from your own, but where even greater opportunities for improvement may be found. In many cases there is a tendency to regard internal benchmarking as simpler. This is true in some respects, but not in others. The instinct to defend one’s territory and not give anything away can be an obstacle to contacts with colleagues in other parts of the organization. A combination of internal and external benchmarking has frequently proved fruitful. This means seeking both internal and external reference points simultaneously and thus opening a wider field to the participants, giving them insights into situations other than those with which they are already familiar. Variety is valuable in itself in enhancing the instructiveness of benchmarking; the participants are stimulated and inspired by glimpses of working environments that differ somewhat from their own.
3. Qualitative and Quantitative Benchmarking The object of any organized activity is to create a value that is greater that the cost of producing it. This applies to staff work just as much as to selling and production, and it applies to all kinds of organizations regardless of mission or ownership. In some cases quality and productivity may be exceedingly difficult to measure. The personnel department of a company, regardless of what that company does, is likewise expected to produce a value greater than the cost of producing it, but in such a department it may be far from easy to answer the questions that touch on efficiency: 1. 2. 3. 4.
What do we produce and deliver? What does it cost per unit? Who evaluates what we deliver? On what criteria is the evaluation based?
A personnel department normally exists in a plannedeconomy environment, operating on money allocated by top management and lacking customers who have a free choice of supplier and pay for what they get out of their own pockets. This makes the questions hard to answer, but does not make them any the less relevant. The processes of personnel or human resource management are fairly easy to define and structure; the hard part lies in defining who the buyers of the department’s services are and what set of criteria should be used to dimension the resources allocated to the department. Qualitative benchmarking may be preferable in such cases, i.e., a description of processes and activities
Benchmarking
72 followed by a comparison with the way similar things are done in another organization that serves as a good example. This is usually called descriptive or qualitative benchmarking, and in many cases it is quite as effective as the quantitatively oriented kind. Qualitative benchmarking can often be supplemented by measurement of attitudes, frequency studies of typical cases, and isolation of customized production. The latter includes the making of studies and other work of a nonrepetitive nature.
4. Same Industry or Same Operation? Some of the most important questions here have already been dealt with under the heading of internal and external benchmarking. One client in the telecom business said: “The telecom industry is tarred with the monopoly brush all over the world, so we must look for examples outside the industry.” The same is true of many European companies and other structures that are largely shielded from competition. In such cases it is advisable to seek examples from other industries. Experience shows that it is this type of exercise that reveals the widest performance gaps, and thus the greatest opportunities for improvement. You can however look around in your own industry if good examples are available at a sufficient geographical remove that the awkward question of competition does not arise. European airlines can go to the United States, and firms in the building trade can undoubtedly find useful objects of comparison all over the world. Some examples of cross-industry benchmarking are shown in Table I. They include some well-known and illustrative examples of how inspiration can come from quite unexpected places. The recognition factor—how easy it is to relate what you find to your own experience—determines how far you can safely move outside your own familiar field of business.
Table I
5. Benchmarking in a Leading or Supporting Role Benchmarking can be run as an adjunct to other methods and processes of change or the methods and processes can be run as adjuncts to benchmarking. In a number of cases, business process reengineering (BPR) has been clarified and made more manageable by the addition of benchmarking. In other cases process definitions and measurement criteria may be the object of benchmarking. The same applies to Total Quality Management (TQM), just-in-time ( JIT), kaizen (continual improvement), lean production, and other approaches. The educational aspect of benchmarking helps to identify both the need for change and opportunities for improvement. The ranking order should be determined by the kind of change desired. If uncertainty prevails on that point, running an exploratory benchmarking exercise can be useful. That is usually an excellent way to find out what needs to be changed. If you go on from there to consider how and why something needs to be changed, benchmarking can provide a source of inspiration as part of a broader program of change. It may also be used as the principal instrument for improvement, preferably in conjunction with the learning that takes place when benchmarking is further developed into Benchlearning. That method takes full advantage of the institutional learning process that benchmarking can generate. This approach is particularly suitable where the object is to combine business development with skill development in parts of companies. It is, of course, impossible to predict all the situations of change that may arise, but it is always advisable to try benchmarking as a powerful educational accessory to other schemes in cases where it is not the principal instrument of change.
Examples of Cross-Industry Benchmarking Global best practices: Take a look outside your own industry Key process
Good examples
Defining customer needs and customer satisfaction
Toyota (Lexus), British Airways, American Express
Dealing with customers’ orders
DHL, Microsoft
Delivery service
UPS (United Parcel Service), Electrolux, Atlas Copco
Invoicing and debt collection
American Express, Singapore Telecom
Customer support
Microsoft, Word Perfect
Benchmarking
73
Sometimes an organization must strive for world-class performance to secure its own existence and survival— as for example in the case of nuclear power station builders, international accountancy chains, or credit card operators. The rule of thumb is that the more concentrated an industry is, the more important it is to achieve world-class performance in the organization as a whole and all of its parts. There is, however, an unfortunate tendency to strive for peak performance without regard to the company’s frame of reference or the competition it faces. The aim should be to seek an optimum rather than a maximum solution, i.e., to gradually realize the potential for improvement that is achievable through effort and determination. To some extent it may be useful to discuss the worldclass aspect, especially in Europe where pressure of competition is low. Many organizations suffer from an odd combination of overconfidence and an institutional inferiority complex. One often encounters the attitude that our public service, despite lack of competition, is of course the best of its kind in the world and that there is nothing else that can be compared to it. The primary weakness of this line of reasoning is that it is probably not true, and a secondary one is that it assumes there is nothing to be learned anywhere, which is never true. The discussion, if we can manage to conduct it unencumbered by considerations of prestige, often leads to the opposite conclusion by revealing a number of imperfections that not only indicate potential areas of improvement but also betray a sense of inferiority in important respects. Low pressure of competition encourages an overconfident attitude that is often only a thin veneer in the performance-oriented world of today. Willingness to learn is obviously a good thing. Such an attitude is characteristic of successful organizations; the trouble is that the very fact of long-term success makes it very difficult to maintain. The literature on benchmarking generally recommends world-class performance as the aim. That is all very fine if you can get there, but the road may be a long one. If the shining goal of the good example is too far off, the excuse-making reflex sets in and presents a serious obstacle to the process of change. The ideal is to be able to find a few partners at different removes from your own organization; this maximizes the force for change, enabling good practice to be studied in various organizations and to serve as a source of inspiration for improving your own. Figure 5 illustrates
Performance
6. Benchmarking for Better Performance or for World-Class Performance Own conception of what is possible
Benchmarking partner
Figure 5 Inspiration from a benchmarking partner. [From Karlöf, B., and Östblom, S. (1993). Benchmarking. Chichester: Wiley.]
how the creativity of one’s own organization can translate inspiration from a partner into performance that surpasses the latter’s. In the ideal situation, this leads to a self-sustaining process of improvement, Benchlearning. When you set out to organize and mount a benchmarking project, you should consider a number of questions that may give rise to problems at the implementation stage, e.g. between consultant and client: 1. Who does the necessary research? 2. Who is the customer? 3. How many people are involved and in what capacities? 4. Who should be included in the project group? 5. What can be classed as a “good example”? 6. What experts need to be consulted in what areas (e.g., ABC analysis)? 7. What special analyses need to be made (e.g., frequency studies and time measurements)? 8. How much money is available for the project? 9. What kind of skills and experience are needed on the project team? 10. Organize a seminar on success factors and pitfalls! If you pay close attention to these questions and answer them properly, you will minimize the risk of running into difficulties along the road.
III. METHODOLOGY OF BENCHMARKING After some 10 years of learning lessons from benchmarking, I have now adopted a 10-step methodology.
Benchmarking
74 How the process is divided up is actually just a teaching device. An experienced benchmarker can make do with fewer steps because he or she is familiar with what is included in each of them.
1. Explanation of Method, Leadership, and Participation People with extensive experience in benchmarking tend to seriously underestimate how difficult and time consuming it is to explain the method and get people to understand it. The natural way to structure the explanation is to use the lessons related here and the structure of the method shown below: 1. A detailed explanation of the benchmarking method with time for questions and discussion 2. The concept of efficiency as a basis for diagnosis, measurement, and comparison 3. Definitions of concepts, terminology, and methods that are recognized by all members of the group 4. A survey of any special circumstances that may motivate a departure from the 10 steps presented here
2. Choosing What to Benchmark The natural place to start is with units whose efficiency is known to be below par. The units concerned may have administrative functions like finance or information technology, but they can just as easily be line functions like aircraft maintenance or luggage handling. In the telecommunications field it may be invoicing or productivity in access networks. In insurance companies it may be claims adjustment, in banks creditworthiness assessment, and so on. In every industry there are production units of a generic nature like IT as well as industry-specific processes like claims adjustment (insurance), clinical testing (pharmaceuticals), etc. Generic support processes can often be compared across industrial demarcation lines, whereas more specific processes call for a closer degree of kinship. To sum up, we are looking at a situation in which efficiency is clearly in need of improvement.
3. Business Definition and Diagnosis A considerable proportion of the time devoted to benchmarking is spent in defining the work content of the business, or in what I have called diagnosis. Put bru-
tally, the statement may sound like an extremely harsh judgment. What we are actually saying is that people have failed to define what they are doing and that they therefore do not know their jobs. Sadly, that brutal statement is corroborated by repeated and unanimous experience. Just as learning is such an important side effect of benchmarking that it contends for the title of principal effect, business definition or diagnosis is so important that it deserves to be highlighted as the special value that benchmarking often creates, even though that was not the original intention.
4. Choosing Partners Choosing a partner is naturally an essential feature of benchmarking. In a structure with multiple branches, the choice might seem to be obvious—the partners are right there in your own organization. Experience shows, unfortunately, that even if the identity of the partners is obvious, the difficulties involved in making contact and collaborating are actually greater than with an external partner. As a result of decentralization, often carried to extremes, there are a lot of people in an organization who regard definition as their own prerogative and therefore see no reason to harmonize process descriptions, measurement criteria, etc. So the partnership contact that looks so easy may turn out to be fraught with great difficulty.
5. Information Gathering and Analysis We are now ready to proceed to one of the core areas of benchmarking: the collection of information and the analysis of descriptive and numerical quantities. The descriptive quantities are things like work organization, skill development, and other areas relevant to the comparison that cannot be expressed in figures.
6. Inspiration from Cause and Effect Benchmarking indicates not only what can be improved, but also how and why. This is what we call a cause-and-effect relationship, causality or why, why, why? Even in studies that use calibrated key indicators, the important explanations of differences in performance are often overlooked. You do not have to go down to the level of detail of describing every manual operation, but you should realize that if the drivers in your partner company take care of routine checks that have to be made by skilled technicians in your own company, that goes a long way toward explaining differences in performance.
Benchmarking
75
7. Follow-Up Visit
10. Change and Learning
The follow-up visit qualifies as a separate step because it needs to be taken into account at the planning stage and in initial discussions with partners. If you fail to schedule at least one follow-up visit as part of the program, your partner may feel neglected. One or more follow-up visits are a perfectly normal feature of benchmarking. The main items on the agenda of the follow-up visit are
Change management is much more difficult and demands much more energy than is commonly supposed at the outset. Even when the message has been received, understood, and acknowledged, a tremendous amount of work still remains to be done.
1. Checking figures and measurements 2. Testing hypotheses to explain gaps 3. Identifying new explanations of why differences in performance exist 4. Discussing obstacles to improvement
8. Reengineering The word reengineering is used here to denote changes in processes or flows, activities, work organization, system structures, and division of responsibilities. Benchmarking may be supplemented by other points of reference. Two such points of reference may be well worth considering in some cases because of their great instructive value: 1. Own history (experience curve) 2. Zero-base analysis
9. Plans of Action and Presentation The reason why these two apparently disparate items have been lumped together under one head is that they are interdependent. The substance of the plan of action influences the form in which it is presented and vice versa. Planning the changes prompted by a benchmarking study is no different from any other kind of project work. The benchmarking element, however, simplifies the job of planning as well as the presentation, and the changes will be easier to implement and better supported by facts than in projects of other kinds. The plan of action covers the following principal parameters: 1. 2. 3. 4. 5. 6. 7.
Strategies and goals Studies Activities Responsibility Time Resources Results
IV. BENCHMARKING—PITFALLS, SPRINGBOARDS, AND OBSERVATIONS Leaders of business know by experience that good examples, well chosen and presented, have great educational value. They have also learned to discount the value of key indicators that nobody believes in and that only provoke the excuse-making reflex in the organizations and individuals concerned. In addition, they understand the logic of shifting the burden of proof: Nobody wants to make a fool of himself by rejecting changes when there is hard evidence to show that others have already done so and that the results have been successful. Anybody who did reject them would be insulting her own intelligence. There are, however, innumerable kinds of less blatant behavior that can detract from the effect, but there are also ways of finding shortcuts that lead much more quickly to the goal of improvement. What follows is a list—not in any particular order, but based on experience—of some of the pitfalls, springboards, and observations.
A. Pitfalls of Benchmarking 1. The Effect of Benchmarking Is Binary Thoroughness in calibrating numerical data and consistency in the descriptive parts of the analysis are essential. The requirements here are full comparability, elimination of noncomparable factors, and acknowledgment by the people concerned that the highest possible degree of comparability has in fact been achieved. If those people go through the same intellectual process that you have gone through and accept the comparability of the findings, that effectively disarms the excuse-making reflex. If, in addition, you have been really thorough about gathering data and can show that you have considered and measured all of the relevant factors, you will avoid the yawning pitfall you can otherwise fall into—the pitfall of lack of calibration, comparability, and acceptance.
Benchmarking
76
2. Beware of Distributed Costs There used to be 14 smiths for every clerk. Now there are 14 clerks for every smith. With IT as an enabler, with a shrinking proportion of direct costs and a growing proportion of overheads, it has become increasingly hard to assign cost elements to a specific operation. This difficulty can be overcome by using either ABC analysis or indicators that exclude overheads to secure acceptance of cost distribution and comparability. I have had personal experience of situations where certain cost elements were charged to a higher level of management in one unit than in another. Much to our embarrassment, this was pointed out to us when we made our presentation, and we had to go back and revise some of the material.
3. Do Not Save Everything for the Final Report One way to avoid the embarrassment of being caught out in errors of comparability is to present conclusions in the form of a preliminary report that is expected to be adjusted prior to the final presentation. This defuses the emotional charge in the analysis and enables a kinder view to be taken of any errors.
4. Do Not Be a Copycat A lot of people with no firsthand experience of benchmarking think it means copying the behavior of others. That is, of course, true to the extent that good practice is well worth following. But the real aim of benchmarking is inspiration for your own creativity, not imitation of somebody else. Benchmarking is intended to promote creativity, which can be defined as the ability to integrate existing elements of knowledge in new and innovative ways. What benchmarking does is to supply the necessary elements of knowledge.
5. Do Not Confuse Benchmarking with Key Indicators Let me recapitulate the levels of benchmarking: 1. Uncalibrated key indicators that nobody believes in and that therefore have no power to change anything 2. Calibrated key indicators unsupported by understanding of the differences in practice and motivation that explain differences in performance (why and how)
3. Calibrated key indicators supported by understanding of why and how. (This is real benchmarking, and it leads to spectacular improvements) 4. Learning added to benchmarking, which then evolves into Benchlearning The greatest danger to the future destiny and adventure of benchmarking is that it may degenerate into bean-counting with little or no calibration. So let us be quite clear about what benchmarking really means.
6. Complacency Is a Dangerous Enemy The attitude that “we know it all, know best and can do it ourselves” is a cultural foundation stone of many organizations that reject benchmarking as an instrument for their own improvement. Such organizations react vehemently with the excuse-making reflex when benchmarking is applied to them. The danger of complacency is greatest at the outset. When benchmarking is introduced in an organization with a culture of complacency, it is often dismissed with scorn. Many successful organizations have applied benchmarking too narrowly and consequently have run into structural crises. Rank Xerox, where the benchmarking method originated, had set productivity improvement targets that proved hopelessly inadequate. IBM benchmarked between its own units, but not to any significant degree outside its own organization, being overconfident in its own superiority. Uninterrupted success is an obstacle to the learning and improvement of efficiency that benchmarking can offer.
7. Benchmarking Is Not the Same Thing as Competition Analysis I have concerned myself very little with what is sometimes called competitive benchmarking. This is because benchmarking is not at all the same thing as competition analysis, which traditionally consists of charting 1. 2. 3. 4.
Market shares Financial strength Relative cost position Customer-perceived quality
These are highly interesting items of information, but they do not constitute benchmarking. If benchmarking is done with a competitor as the partner, it must be done in areas where the competitive interface is small. Some industries, the automotive industry for
Benchmarking
8. Do Not Let Your Own Administrators Take Charge of New Methods In the 1980s companies set up quality departments. These have gradually expanded their domains to include productivity aspects like BPR and, of course, benchmarking. Having projects run by your own staffers may look like a cheap way of doing it, but is seldom effective. In fact, I have never seen an in-house benchmarking project that was really successful. The resources allocated are inadequate, the project is a secondary assignment for the people in charge of it, and the method is not fully understood. It is the line management that must be motivated and act as the driving force, and it is desirable to retain outside consultants for three reasons: 1. The project must be somebody’s primary responsibility, which is never the case when the project manager is recruited internally 2. Benchmarking calls for specialized knowledge that few employees possess 3. People from inside the company are suspected of having their own corporate political agenda, which reduces their credibility So make sure that somebody has the primary responsibility. That probably means bringing in an outside consultant.
9. Benchmarking Risks Being Viewed as a Management Fad The fact that the whole field of management is a semantic mess is aggravated by the confusion caused by all the new methods that burst on the scene from time to time, only to fade away again. We once ran a poll asking a large number of respondents to draw a life-cycle curve for benchmarking as a method. This was done in relation to other approaches like BPR, TQM, lean production, reinvention of the corporation, JIT, conjoint analysis, and so on. Though the interviewees had no special preference for benchmarking, their verdict was that it would stand the test of time. Figure 6 shows the predicted life-cycle curve for benchmarking as a management technique compared
Importance of benchmarking
example, practice what they call co-opetition—a mixture of cooperation and competition. The farther back you go along the integration chain, the easier it is to collaborate. Conversely, it is very hard to collaborate at the front end where you are directly competing for customers.
77
Time
Figure 6 Life cycle of benchmarking.
to those of other unnamed methods. This claim must of course be made in an article about benchmarking, but there is much independent evidence that testifies to the staying power of the method.
10. Our Banana Is the Best in the Bunch All organizations prefer to make their own inventions rather than give the credit to somebody else. A European telecom company spent about $70 million on developing its own invoicing system instead of adopting an existing system from Australia that met the same specifications. The tendency to reject good examples is thus not entirely a matter of lack of comparability in the material, but may also be attributable to the “not invented here” syndrome.
B. Springboards for Benchmarking Springboards (success factors) for benchmarking are necessarily a mirror image of the pitfalls. Let us combine the success factors with some pertinent observations that are crucial to the success of a benchmarking project.
1. Get Staff Really Involved To ensure success, the organizational units and individuals affected by the project must be given the opportunity to go through the same intellectual process as the project management. That will predispose them to accept the findings of the benchmarking study, which in turn will make it easier to take the step from thought to deed. Improvements in understanding, behavior, and results will all be accelerated if people are allowed to take an active part. So allocate a share of the resources to getting them really involved.
Benchmarking
78
2. Make Special Measurements to Acquire New Information
5. Benchmarking Is a Surrogate for Market Economy
There is never enough information in existing reporting systems. Efforts to improve efficiency nowadays are directed mainly at administrative work, and this calls for time management studies, floor traffic studies, descriptions of sales and other processes, and so on. Special measurements have double value in that they both get people involved and provide new information that is often interesting and helpful for purposes of change management. You may sometimes experience problems in persuading your partner to make the necessary measurements, but this is seldom an insurmountable obstacle. If you start with the two parameters of efficiency—quality and productivity—you can usually find innovative approaches, methods, and measurements that make a significant contribution to the project.
A user of goods or services inside a company or organization lacks the freedom of choice between suppliers that exists in a market economy. The trouble is not only that the value of an internal delivery is not measured against a price, but also that what is delivered may not in fact be necessary at all. Units of organizations often justify their existence and growth by being productive, but the value of what they produce is not assessed. Benchmarking helps you make that assessment.
3. Benchmarking Contributes to Both Gradual and Quantum-Jump Change The magnitude of the improvements that can ultimately be achieved is impossible to judge in advance. Experience shows, however, that although you can sometimes find hitherto unsuspected possibilities for making major breakthroughs, you will usually have to be patient and take many small steps that will ultimately lead to great gains in efficiency. Radical change management is often prompted by a crisis situation in which the very existence of the business is threatened. Benchmarking, on the other hand, involves a constant search for good examples to serve as models for the modest improvements that will enable you to avoid crises, i.e., to be proactive instead of reactive.
4. The Whole Conceals Inefficiencies in the Parts A commercial company is evaluated by its bottom line. Other types of organizations have their own criteria for success. But if you look at the component parts of the organization—departments like accounting, personnel, and IT or processes like sales and aftermarket—you may find a grave lack of efficiency. There may thus be a substantial potential for improvement even where the overall result appears to be good. In the future, every head of department and process owner must be prepared to answer the question: “How do you know that your operation is efficient?” Benchmarking is an unsurpassed instrument for detecting inefficiencies concealed in parts of the whole.
6. Benchmarking Encourages Performance-Oriented Behavior Relations between individuals and units in large organizations are often, alas, governed by power rather than performance. Because the actual performance of a unit is so hard to measure, managers tend to devote more attention to matters of form and irrelevant factors than to things that really contribute to the success of the company as a whole. A person may be judged by his or her style, connections, appearance, or career status for lack of ways to measure actual performance and contribution to achieving the company’s aims. It has become increasingly evident that benchmarking appeals to strongly performance-oriented people, by which I mean people who reckon success in terms of getting results. In this they differ from power-oriented and relation-oriented people, who are motivated by other considerations.
7. Proceed Step by Step What we call the cascade approach means starting with a broad, shallow comparison (low level of resolution) and then taking a closer look at those areas that appear to offer the greatest opportunities for improvement. The first step in the cascade approach is exploratory benchmarking, a study that aims at identifying the areas with the greatest potential for improvement. Such a study seldom explains the reasons for differences in performance, but it does suggest where you should start digging deeper to look for reasons. This step-by-step approach is particularly recommended in cases where there are no obvious candidates for improvement.
8. Benchmarking Encourages Learning in Decentralized Systems Organizations and companies with a multiple-branch structure and delegated profit-center responsibility
Benchmarking often lack a mechanism for system-wide learning. One petrol station does not learn from another, nor one bank branch or retail outlet from another. One of the obvious applications of benchmarking in the future will be to enable units of decentralized systems to learn from each other. This will take advantage of the organization’s accumulated knowledge—an advantage that ought to be self-evident but is often overlooked. Failure to make use of knowledge from individual units of the organization (“advantages of skull”) is astonishingly widespread.
9. Benchmarking Unites Theory and Practice When the individuals in a working group or process are made to lift their gaze from their day-to-day tasks and consider their performance objectively, they form a theory about their own work. This has proved to be of great help in motivating organizations. People like to unite theory with practice because it makes them feel motivated. That in turn leads them to put more energy into their work, which results in a win–win situation where the employer benefits from increased energy and efficiency, while the employees are happier in their work. The function of theory is to lay a foundation for good practice. Benchmarking combines the theory of what the job in hand is really about with concrete inspiration from good examples to optimize the prospects for successful change management.
10. Benchmarking Benefits the Business and the Individual Traditional training, especially in Scandinavia, focuses on the individual. People are sent off on courses to improve their qualifications. This is not always compatible with the employer’s desire for better performance and efficiency. In the worst case the individual concerned may quit and use his qualifications to get a job with a competitor. Benchmarking ensures that the business as a whole benefits from learning; knowledge becomes an asset of the organization rather than the individual. That way the skills stay in the company and the working unit even if individuals depart. This reconciles the requirements of a benevolent personnel policy with management’s need for an efficiently run business. The intellectual simplicity of the benchmarking method may be a pitfall in itself. There is a great risk of underestimating the problems. So if you are contemplating a project, use the foregoing points as a checklist to help you avoid the mistakes that others
79 have made before you and to take advantage of the lessons they have learned.
V. SUMMARY Benchmarking is widely known and used, and has been for a long time. The most important elements in benchmarking as of today are as follows: • A complete and correct description of the processes and activities that create value-adding performance • Correct and accepted comparison with another party—a good example • In-depth understanding of causality, that is, of the differences in work organization, skills, and so on, that explain differences in performance (In short: why, why, why?) • Reengineering of work organization and routines and development of new skills to make operations more efficient; inspiration from, not imitation of, the partner • A goal-related, result-rewarded process of change that uses benchmarking as the starting point for an institutionalized search for new examples for continuity One key factor when it comes to benchmarking is efficiency, which in its right sense is a function of value and productivity. Related to this are issues of how to define what is produced, what the cost per unit is, who are the users of the product, and by what criteria those users evaluate the product. Benchmarking can be divided into a number of categories of which some are strategic or operative benchmarking, internal or external benchmarking, and qualitative or quantitative benchmarking. Furthermore, benchmarking can be conducted within the same industry or cross-industry. Benchmarking can have a leading or supporting role and have the goal of better performance or world-class performance. From long experience of working with benchmarking, several important and not instantly obvious issues have emerged. The method is simple and appealing, but its very simplicity has proved deceptive. Some advice to potential benchmarkers is to beware of distributed costs, to get inspired instead of copying, and not to confuse benchmarking with comparison of uncalibrated key indicators or competition analysis. Benchmarking can be further developed into the method of Benchlearning, which also incorporates a continuous learning process in the organization with numerous positive side effects.
80
SEE ALSO THE FOLLOWING ARTICLES Cost/Benefit Analysis • Executive Information Systems • Goal Programming • Operations Management • Project Management Techniques • Quality Information Systems • Reengineering • Resistance to Change, Managing • Total Quality Management and Quality Control
BIBLIOGRAPHY Briner, W., Geddes, M., and Hastings, C. (1990). Project leadership. New York: Van Nostrand Reinhold. Camp, R. C. (1989). Benchmarking: The search for industry best practices that lead to superior performance. Milwaukee, WI: ASQC Industry Press.
Benchmarking Edenfeldt Froment, M., Karlöf, B., and Lundgren, K. (2001). Benchlearning. Chichester: Wiley. Gordon, G., and Pressman, I. (1990). Quantitative decisionmaking for business, 2nd ed. Upper Saddle River, NJ: PrenticeHall International. Karlöf, B. (1996). Conflicts of leadership. Chichester: Wiley. Karlöf, B. (1997). Strategy in reality. Chichester: Wiley. Karlöf, B., and Östblom, S. (1993). Benchmarking—A signpost to excellence in quality and productivity. Chichester: Wiley. Peters, T. J., and Waterman, R. H., Jr. (1982). In search of excellence: Lessons from America’s best run companies. New York: Harper Collins. Spendolini, M. J. (1992). The benchmarking book. Amacom. Thompson, A. A., Jr., and Strickland, A. J., III (1990). Strategic management—Concepts and cases, 5th ed. Homewood, IL: Irwin.
Business-to-Business Electronic Commerce Jae Kyu Lee Korea Advanced Institute of Science and Technology
I. II. III. IV. V.
INTRODUCTION TAXONOMY OF B2B E-MARKETPLACES ON-LINE SERVICES TO BUSINESSES E-PROCUREMENT MANAGEMENT INTERNET-BASED EDI
GLOSSARY agent-based commerce Electronic commerce that is assisted by the buyer agents and seller agents according to the mutually agreed language, message format, and protocol. B2B EC Electronic commerce between businesses. B2C EC Electronic commerce between business sellers and individual consumers. data warehouse Data repository that stores the historical transactions to mine the meaning patterns useful for decision making. e-hub The electronic hub which plays the roles (at least one of the roles) of e-marketplace, supply chain information coordinator, application service providers to enable the B2B EC. e-marketplace Electronic marketplaces that display the electronic catalogs of suppliers, take orders, support the order fulfillment and perhaps the electronic payments. e-procurement Electronic procurement system for buyer company, which needs to integrate with the external e-marketplaces as well as the buyer’s corporate information system such as ERP. electronic supply chain management Management of the supply chain on-line to reduce the inventory to minimum by sharing the information between partnered buyers and sellers. exchange A type of e-marketplace where many buyers trade with many sellers. internet-based EDI Electronic data interchange that
VI. VII. VIII. IX. X.
SCM AND COLLABORATION AMONG ALIGNED PARTNERS AGENT-BASED COMMERCE XML IN B2B EC KNOWLEDGE MANAGEMENT IN THE B2B EC PLATFORM SUMMARY AND CONCLUSION
is implemented on the Internet in a secure and economical manner. virtual corporation An organization composed of several business partners, sharing costs and resources to produce goods and services more effectively and efficiently. XML eXtensible Markup Language that can be browsed to human and also comprehensible by software agents.
BUSINESS-TO-BUSINESS ELECTRONIC COMMERCE
(B2B EC) has emerged as the largest pie in EC. However, its definitions as well as perspectives on the current status and future prospects are not unequivocal. Therefore this article describes the necessary factors that can enhance the effectiveness and efficiency of the electronic commerce between companies. Then the current key activities in the B2B EC area are reviewed one by one with the discussion of its relationship with others. Key activities included are B2B e-marketplaces (seller, intermediary, or buyercentric ones); on-line service applications for business; the electronic procurement management system and its integration with external e-marketplaces; evolution of the EDI platform onto the Internet; collaboration by supply chain management; the shared data warehouse and virtual corporation; agent-based commerce between the buyer agents and seller agents; XML (eXtensible Markup Language) standards and their integration with electronic data interchange (EDI) and
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
81
82 agents. This article provides a good understanding about activities in the B2B EC area, and explicates the converging phenomena among the separate activities toward enhanced managerial performance.
I. INTRODUCTION Electronic Commerce (EC) has been launched with the web-based e-marketplaces for individual consumers. As the large portion of buyers become businesses, it has become necessary to contrast business-tobusiness (B2B) EC with business-to-consumer (B2C) EC. According to the forecasts, B2B EC is expected to grow to $1330.9 billion by the year 2003, and continues to be the major share of the EC market. Since B2B EC has evolved from B2C EC, there are common points between them, but the distinction has become enlarged as the technologies for B2B EC develop. Theoretically, B2B EC pursues the maximized values in terms of quality and delivery time, efficient collaboration, and reduced search and transaction costs. According to Handfield and Nichols (1999), fullblown B2B applications will be able to offer enterprises access to the following sorts of information about associated companies: • Product: Specifications and prices in e-catalogs, and sales history in the data warehouse • Customer, sales, and marketing: Sales history, forecasts, and promotion for one-to-one customer relationship management (CRM) • Suppliers and supply chain: Key contractors, product line and lead times, quality, performance, inventory, and scheduling for effective supply chain management • Production process: Capacities, commitments, and product plans for virtual corporations • Transportation: Carriers, lead times, and costs for just-in-time (JIT) delivery management • Competitors: Benchmarking, competitive product offerings, and market share In practice, key activities in the B2B EC community are 1. The B2B e-marketplaces emerged with the same architecture used for B2C, although they handle different contents that business requires. Dell and Cisco are good examples of this category of direct marketing. 2. A large number of vertical e-marketplaces (usually called exchanges because in most cases there are many buyers and sellers involved) are widely established. In the B2B e-marketplaces, the buyers play the key role
Business-to-Business Electronic Commerce for market making, while in the B2C e-marketplaces sellers control the e-marketplaces. 3. Large buyers build their own e-marketplaces to handle their tenders in their servers, and to integrate the procurement process with their virtually internalized e-marketplaces. 4. E-procurement systems, which support the internal procurement management and its integration with the external e-marketplaces, begin to emerge. 5. The platform for EDI moves onto the Internet to reduce its implementation cost. So the secure extranet has become popular among associated companies. 6. To reduce the inventory and capacity, the information is shared among aligned partners. To support such collaboration, the third-party electronic supply chain management (SCM) hub service appeared. Big retailers opened their own data warehouse to their suppliers as a private e-hub on the Internet. 7. The third-party JIT delivery service becomes tightly coupled with suppliers. 8. To reduce the transactional burden between buyers and sellers, the software agents are actively under development. These agents will open the next stage of agent-based commerce. 9. Sellers can trace the behaviors of on-line customers precisely, and store this information in the data warehouse. So web mining through the data warehouse for customer relationship management has become popular for identifying relevant customers. To grasp the activities of, and to design the B2B EC, the entities that we have to consider are selling companies, buying companies, intermediaries, e-marketplaces, e-SCM hubs, deliverers, a secure network platform on the Internet, protocol of B2B EC, and a back-end information system that should be integrated with the external e-marketplaces. B2B EC is still in its infant stage; so many aspects have emerged from different backgrounds, as listed above. However, in the future such efforts should converge under a unified framework. This implies that there are ample research opportunities to open the future of B2B EC. The remaining sections describe each of the activities and interactions among them, and foresee future prospects.
II. TAXONOMY OF B2B E-MARKETPLACES The B2B EC can be best understood by categorizing who controls the e-marketplaces: the seller, interme-
Business-to-Business Electronic Commerce
83
diary, or buyer. Some e-marketplaces are called exchange. Most commonly agreed definition of exchange is the e-marketplaces which assist the matching of multiple buyers and multiple sellers. But the term may be used with different implication depending upon the purpose of authors. Let us study the characteristics of each type of e-marketplace contrasting them with the B2C e-marketplace. The best B2B web sites for 15 industries can be found in The NetMarketing 200 (www.btobonline.com/netMarketing).
A. Seller-Centric E-Marketplaces The seller-centric e-marketplaces are the first type of e-marketplaces that can be used by both consumers and business buyers as depicted in Fig. 1. In this architecture, the features of e-catalog, e-cart, on-line ordering, and on-line payment schemes can be used for both B2B and B2C e-marketplaces. The only difference between them is their contents. While B2C handles the consumer items, B2B handles industrial items. However, some items like computers and printers are necessary in both sectors. So the seller-centric e-marketplace and direct marketing will never disappear even though the B2B EC evolves. The seller-centric e-marketplace is the easiest way to build B2B e-marketplaces. Successful cases of this kind are manufacturers like Dell and Cisco, and retailers like Amazon and Ingram Micro. All of them support e-catalog, and some support auction services. For instance, Ingram Micro (www.ingram.com) opened the auction site only to the existing customers to sell off the seller’s surplus goods at a deep dis-
count. The e-marketplace may consist of pure on-line players or a combination of on- and off-line (clickand-mortar) businesses. In the early stage of sellercentric e-marketplaces, there is no distinction between the architectures of B2B and B2C EC. It is reported that Dell sold more than 50% of computer products on-line, and 90% to business buyers. Cisco sold more than $1 billion worth of routers, switches, and other network interconnection devices to business customers through the Internet. The Internet covers 90% of sales and 80% of customer support, and the operating cost is reduced by 17.5%. In this way, Cisco recorded the gross margin of 64.4% in 2000. The seller-centric architecture will continue to exist as long as the supplier has a superb reputation in the market and a group of loyal online customers. The benefits to online sellers are • Global single contact point • Updating the e-catalog is easy, on-line and consistent • Reduced operating cost • Enhanced technical support and customer service • Reduced technical support staff cost • Reduced catalog and service software distribution cost However, the problem with the seller-centric e-marketplaces is that the buyers’ order information is stored in each seller’s server and it is not easy to integrate the information with the individual buyer’s corporate information system. To handle this problem, the e-procurement system emerged which tries to integrate the internal procurement management
Consumer Business customer
Consumer
Supplier's electronic mall Supplier's products catalog
Business customer
Customer's order information
Business-to-consumer EC
Business-to-business EC
Figure 1 Seller-centric B2B e-marketplace architecture.
84 process with the external e-marketplaces as explained in Section IV. When the internal information system is developed by the ERP (enterprise resource planning) system, the ERP system should be integrated with the e-marketplaces as well. For these reasons, the need of the buyer-centric e-marketplace is enlarged, as explained in Section II.C.
B. Intermediary-Centric E-Marketplaces The second type of e-marketplace is the intermediarycentric e-marketplace. The definition between the e-tailer (electronic retailer) and intermediary is blurred. A reasonable criterion of distinction seems to occur if the on-line site operator takes the orders and fulfills them. The possession of inventory cannot distinguish them precisely because many e-tailers do not keep inventory although they guarantee the order fulfillment. The intermediary-centric architecture can also be used for both B2B and B2C EC. The only difference here again is the items handled. Examples of this kind are B2B portal directory service, comparison aid, auction to business, and exchange. A distinctive feature of the B2B intermediary e-marketplace is that in many cases buyers are involved in market making, usually to make sure that the pertinent items are appropriately included and to lead the standard of integration with their e-procurement systems. Therefore most B2B intermediaries are vertical e-marketplaces. They support e-catalog, search, auction, and exchange.
1. Vertical E-Marketplaces At the moment, the active industries in vertical e-marketplaces are computers, electronics, chemicals, energy, automotive, construction, financial services, food and agriculture, health care, life science, metals, and telecom. Typical sites in these industries are as follows: 1. Computers and electronics: FastParts, PartMiner.com, PcOrder, and TechnologyNet 2. Chemicals and energy: AetraEnergy, Bloombey, CheMatch, ChemConnect, e-Chemicals, Enermetrix.com, Energy.com, HoustonStreet, and Industria 3. Automotive: Covisint, Cole Hersee Co. 4. Construction: Deere & Co., Bidcom, BuildNet, and Cephren 5. Financial Services: IMX Exchange, Muniauction, Pedestal, and Ultraprise 6. Food and Agriculture: Monsato Co., efdex, Floraplex, FoodUSA, and Gofish.com
Business-to-Business Electronic Commerce 7. Health Care and Life Sciences: Baxter Healthcare Corp., BioSuppliers.com, Chemdex, Neoforma, and SciQuest 8. Metals: e-Steel, iSteelAsia, MaterialNet, MetalShopper, and MetalSite 9. Telecom: Arbinet, Band-X, RateXchange, Simplexity.com, Telezoo, and The GTX.com Since each industry has a different level of product fitness and readiness for digital marketing, the potential benefit of EC will not be the same. According to Forrester Research, the opportunity of B2B e-marketplaces can be categorized as in Fig. 2. According to this analysis, computers, electronics, shipping, and warehousing have the highest (above 70%) potential of e-marketplace saturation, while the heavy industry and aerospace industry have the lowest potential. So the selection of right industry and items are very important for the successful implementation of vertical e-marketplaces. In addition, the business partnership is critical for the success of this business.
2. Horizontal E-Marketplaces The horizontal e-marketplaces support the aggregation of vertical e-marketplaces and the acquisition of common items and services that are necessary in all industries like MRO (maintenance, repair, and operations) items. The types of horizontal e-marketplaces are as follows: 1. Vertical e-marketplace aggregators: Typical aggregators are VerticalNet, Ventro, and Internos 2. Procurement solution provider initiative: A group of horizontal e-marketplaces are built under the initiative of e-procurement solution providers like Ariba, Commerce One, Clarus, Works.com, Suppplyaccess, PurchasePro, Peregrine/Harnbinger, and Unibex 3. MRO suppliers and services: The sites that handle the MRO items are MRO.com and W.W.Grainger 4. Industrial products: supplyFORCE 5. Surplus goods and equipment management: TradeOut, FreeMarkets, and AssetTrade 6. Information technology: Interwar 7. Sales and marketing: Art Technology Group, Broadvision, Calico, Firepond, and SpaceWorks. Some B2B auction sites like FreeMarkets.com, Earmarked (www.fairmarket.com), and A-Z Used Computers (www.azuc.com) belong to the intermediarycentric B2B e-marketplaces.
Business-to-Business Electronic Commerce
85
Figure 2 The e-marketplace opportunity index. [From The eMarketplace Opportunity, Forrester Research. http:// europa.eu.int/comm/enterprise/ict/e-marketplaces/presentation_homs.pdf. With permission.]
C. Buyer-Centric E-Marketplaces To buying businesses, the seller-centric e-marketplaces are convenient places to search one by one, however, they are inconvenient in a sense that they are segmented. The ordered information should be stored in the seller’s server, therefore the buying company has to manually type in the order information in its procurement system. Thus, big buyers are compelled to build the buyer’s own e-marketplace to manage the purchasing process efficiently. Types of buyer-centric e-marketplaces are buyer initiated bidding sites, internalized e-marketplaces, and buyer’s-coalition procurement sites. Buyer’s bidding sites may evolve opening the site to the other potential buyers. Internalized e-marketplaces need to link and maintain consistency with the external e-marketplaces, and may open to outside buyers. This service can evolve to e-procurement service providers.
1. Buyer’s Bidding Site In this model, a buyer opens an e-market on its own server and invites potential suppliers to bid on the announced request for quotations. A successful example is the GE TPN case (tpn.geis.com). This model can offer a great sales opportunity to committed suppliers. However, as the number of such sites increases, suppliers will not be able to trace all such tender sites. At this stage, only very prominent buyers can take full advantage of this approach. Such governmentdriven sites are CBDNet (cbdnet.access.gpo.gov), GPO (www.access.gm.gov), COS (cos.gdb.org), EPIN (epin1.epin.ie), and GSD (www.info.gov.hk.gsd).
By building the buyer’s bidding site, buyers can obtain the following benefits: • Identifying and building partnerships with new suppliers worldwide • Strengthening relationships and streamline sourcing processes with current business partners • Rapidly distributing information and specifications to business partners • Transmitting electronic drawings to multiple suppliers simultaneously • Cutting sourcing cycle times and reducing costs for sourced goods • Quickly receiving and comparing bids from a large number of suppliers to negotiate better prices However, small companies cannot justify the cost of building their own bidding site for purchase, so the third party service has emerged. Examples of such sites are BIDCAST (www.bidcast.com), BIDLINE (www.bidline.com), BIDNET (www.bidnet.com), Federal Marketplace (www.fedmarket.cim), and GOVCON (www.govcon.com). Each site announces thousands of requests for bids. By exploring the opportunities, suppliers in the buyer’s bidding sites can take advantage of boosted sales, expanded market reach, lowered cost for sales and marketing activities, shortened selling cycle, improved sales productivity, and a streamlined bidding process. As the number of buyers increases, the burden of facilitating contacts between many buyers and many sellers becomes more serious. Manual handling of a high volume of transactions will not be possible let alone economical. So the software agents that work
86 for buyers and sellers become essential, as described in Section VII.
2. Internalized E-Marketplaces Another type of buyer-centric e-marketplace is the internalized e-marketplace. In this architecture, the internal e-catalog is accessible by the employees and the final requisitioners can order directly on-line. Then the order will be processed on the e-marketplace seamlessly and the procurement decision and ordering process can be tightly coupled with the internal workflow management systems, enhancing the efficiency of the procurement process. In this architecture, the procurement department defines the scope of products and invites the preoffered prices. The posted prices will be stored in the internal database. MasterCard International developed the procurement card system, which allows the requisitioner to select goods and services from its own e-catalog containing more than 10,000 items. An obstacle of this approach is the maintenance of the internal e-catalog so that it is consistent with the external e-marketplaces. For this purpose, the buyer’s directory should be able to be coordinated in accordance with the change of e-marketplaces. Further research for this challenging issue can be found in Joh and Lee (2001).
3. Buyers-Coalition Procurement Sites If every buyer builds its own internal e-marketplace, suppliers will be scared by complexity that they have to follow. So common buyers need to coalesce to reduce the complexity and incompatibility among buyers. The best example of this kind can be found in the automobile industry. GM, Ford, and DaimlerChrysler performed the joint project Automotive Network Exchange (ANX) to provide a common extranet platform to 30,000 suppliers. Then each of them started to build an independent procurement company, but instead they merged and built a common procurement company, Covisint. This case is a typical example of coalition among competitors and the moving the procurement activities to outsource.
III. ON-LINE SERVICES TO BUSINESSES In the e-marketplaces, the majority of items are hard goods. However, in cyberspace, service is more effective because the service can be completed on-line without any physical delivery. So let us review the status of online services to business.
Business-to-Business Electronic Commerce • Travel and tourism services: Many large corporations have special discounts arranged with travel agents. To further reduce costs, companies can make special arrangements, which enable employees to plan and book their own trips on-line. For instance, Carlson Travel Network of Minneapolis provides an agentless service to corporate clients like General Electric. The GE employees can fill out the application at their intranet system. • Internal job market on the intranet: Many companies conduct an internal electronic job market site on the intranet. Openings are posted for employees to look at, and search engines enable managers to identify talented people even if they were not looking actively for a job change. This is an example of intraorganizational e-commerce. • Real estate: Since business real estate investment can be very critical, web sites cannot replace the existing agents. Instead, the web sites help in finding the right agents. However, some auctions on foreclosed real estate sold by the government may be opened on-line only to business. • Electronic payments: Firm-banking on the Internet is an economical way of making business payments. The electronic fund transfers (EFT) using the financial EDI on the Internet is the most popular way that businesses use. The payment fee on the Internet is cheaper than other alternatives. • On-line stock trading: Corporations are important stock investors. Since the fees for on-line trading are very low (as low as $14.95) and flat regardless the trading amount, the on-line trading brokerage service is a very attractive option for business investors. • Electronic auction to business bidders: Some electronic auctions are open only to dealers. For instance, used cars and foreclosed real estate sold by the government are open only to dealers. The comprehensive list of auction sites is available in www.usaweb.com/auction.html. • On-line publishing and education: On-line publishing is not the monopolistic asset of business. However, the subscribers of certain professional magazines are only for businesses. The on-demand electronic education programs can provide a useful training opportunity to busy employees. • On-line loan and capital makers: Business loans can be syndicated on-line from the lending companies. IntraLink provides a solution for the service, and BancAmerica offers IntraLoan’s matching service to business loan applicants and potential lending corporations. Some sites like www.garage.com provide information about venture capital.
Business-to-Business Electronic Commerce
87
• Other on-line services: Businesses are the major users of on-line consulting, legal advice, health care, delivery request, electronic stamping, escrowing, etc.
IV. E-PROCUREMENT MANAGEMENT
of appropriate technologies such as workflow, groupware, and ERP packages, as well as the B2B e-marketplaces. By automating and streamlining the laborious routine of the purchasing function, purchasing professionals can focus on more strategic purchases achieving the following goals:
As we have observed with the buyer-centric B2B e-marketplaces, one of the most important goals of B2B EC is effective procurement. So let us study the B2B EC from the evolutionary view of e-procurement management.
• • • • •
A. Requisite for Effective and Efficient Procurement Management All around the world, purchase and supply management (P&SM) professionals now advocate innovative purchasing as a strategic function to increase profit margins. Some of the tactics used in this transformation process are volume purchases, buying from approved suppliers, selecting the right suppliers, group purchasing, awarding business based on performance, improving quality of existing suppliers, doing contract negotiation, forming partnership with suppliers, and reducing paper work and administrative cost. What many organizations fail to understand is that a fundamental change in businesses’ internal processes must be implemented to maximize the full benefits of procurement reengineering. The two critical success factors which most organizations overlook are cutting down the number of routine tasks and reducing the overall procurement cycle through the use
Reducing purchasing cycle time and cost Enhancing budgetary control Eliminating administrative errors Increasing buyers’ productivity Lowering prices through product standardization and consolidation of buys • Improving information management on suppliers and pricing • Improving the payment process To implement an effective and efficient e-procurement system, B2B EC needs to support either the integration of e-procurement systems with external e-marketplaces, or the integration of e-marketplaces with ERP packages.
B. Integration of E-Procurement Systems with External E-Marketplaces A group of e-procurement solution providers develops architectures that can support the integration of the buyer sites with external e-marketplaces and any independent suppliers. The architectures offered by Ariba and Commerce One are an example of this kind as depicted in Fig. 3. In this architecture, the solution has a pair of servers in the buyer and seller sites; namely BuySite and MarketSite.
MarketSite
BuySite
Suppliers
Commerce one transaction servers Web server
Fax
Corporate user
Email
Firewall
EDI format
Figure 3 An architecture of buyer-centric B2B e-marketplace. [Copyright © Commerce One Operations, Inc. 1999. All rights reserved.]
Business-to-Business Electronic Commerce
88 Suppliers can input their goods in the MarketSite server via the extranet. In the buyer site, the web server BuySite supports corporate users in the webbased search of e-catalog and ordering. The order will then be executed in the MarketSite. For the customers who have not joined the MarketSite, the orders can also be transmitted via fax, e-mail, and EDI message. To expand the capability of this architecture, many MarketSites should be able to support the BuySite. Therefore solution vendors try to make many vertical MarketSites and aggregate them.
C. Integration of E-Marketplaces with ERP Buyers tend to develop their back-end information systems as a combination of intranet, database, ERP, and legacy systems. ERP is enterprise-wide application software, which can provide a centralized repository of information for the massive amount of transactional detail generated daily. ERP integrates core business processes from planning to production, distribution, and sales. SAP’s R/3 is one such software. Early versions of the ERP solution did not consider the integration with emarketplaces, however, integration has become a critical issue in the B2B EC environment. Integration can be realized by adopting one of the following approaches: the inside-out, outside-in, and buyer’s cart approaches.
1. The Inside-Out Approach: Extend ERP Outward The leading ERP vendors offer a way to extend their solutions so that they are usable with the external e-marketplaces. This approach is called the inside-
out approach as depicted in Fig. 4a. One scheme of this approach is that an ERP solution maker also provides an e-marketplace solution that is compatible with the ERP package. For instance, the solution called MySAP (www.mysap.com) is developed by SAP for this purpose. The other scheme is to build a strategic alignment with the e-marketplace solution providers to establish a mutually agreed upon interface. For instance, SAP has a strategic partnership with Commerce One. When the e-marketplace solution requires a simple mapping of ERP functionality with a web interface, the inside-out architecture can be highly effective. It lets companies distribute ERP transaction capabilities to a wide audience of web users, without requiring that they load any specific client software on their PCs. However, from the e-marketplace’s point of view, this approach is applicable only when the ERP system is installed. Many companies still use legacy systems.
2. The Outside-In Approach In this approach, instead of extending the reach of ERP-based business processes through a web server, the software named application server integrates multiple back-end systems with an e-marketplace solution, as depicted in Fig. 4b. The outside-in architecture is better suited for complex e-business with multiple back- and front-end applications. In the outside-in approach, the e-business application resides within the application server, rather than within the individual back-end systems. Typical application servers are Application Server (Netscape), Enterprise Server (Microsoft), Domino (Lotus), Websphere (IBM), and Enterprise Server (Sun). However, the outside-in
Figure 4 Architectures of integrating EC with ERP. [From Sullivan, 1999a and b.]
Business-to-Business Electronic Commerce approach is limited by the capabilities of the application server platforms upon which they are built.
3. Buyer’s Cart Approach In this approach, the buyer keeps a shopping cart in the buyer’s PC or server instead of the seller’s server. The items from multiple e-marketplaces can be tentatively selected and stored in the buyer’s electronic cart (called b-cart). The order can also be made at the b-cart, and the result can be stored in the b-cart as well. With a standard file format in b-cart, the ERP or any other legacy system can be compatibly interfaced. This architecture is simple and economical, and it is well suited for the B2B EC environment.
V. INTERNET-BASED EDI The most basic instrument for B2B EC is efficient exchange of messages between companies. Thus Electronic Data Interchange (EDI) has been around for almost 30 years in the non-Internet environment. EDI is a system that standardizes the process of trading and tracking routine business documents, such as purchase orders, invoices, payments, shipping manifests, and delivery schedules. EDI translates these documents into a globally understood business language and transmits them between trading partners using secure telecommunications links. The most popular standard is the United Nations EDI for Administration, Commerce, and Trade (EDIFACT). In the United States the most popular standard is ANSI X.12. Traditional EDI users (most Fortune 1000 or global 2000 companies) use leased or dedicated telephone lines or a value-added network, such as those run by IBM and AT&T, to carry these data exchanges. Now the platform is moving to the Internet.
89 However, despite the tremendous impact of traditional EDI among industry leaders, the current set of adopters represents only a small fraction of potential EDI users. In the United States, where several million businesses participate in commerce every day, fewer than 100,000 companies have adopted EDI (in 1998). Furthermore, most of the companies could maintain contact with only a small number of business partners on the EDI, mainly due to its high cost. Therefore, in reality, most businesses have not benefited from EDI. The major factors that limit businesses from benefiting from the traditional EDI are • Significant initial investment is necessary • Restructuring business processes is necessary to fit the EDI requirements • Long start-up time is needed • Use of expensive private value-added network (VAN) is necessary • High EDI operating cost is needed • There are multiple EDI standards • The system is complex to use • There is a need to use a converter to translate business transactions to EDI standards These factors suggest that the traditional EDI—relying on formal transaction sets, translation software, and value-added networks—is not suitable as a longterm solution for most corporations, because it does not meet the following requirements: • Enabling more firms to use EDI • Encouraging full integration of EDI into trading partner’s business processes • Simplifying EDI implementation • Expanding the capabilities of on-line information exchange Therefore, a better infrastructure is needed; such infrastructure is the Internet-based EDI.
A. Traditional EDI Traditional EDI has changed the landscape of business, triggering new definitions of entire industries. Wellknown retailers, such as Home Depot, Toys R Us and Wal-Mart would operate very differently today without EDI, since it is an integral and essential element of their business strategy. Thousands of global manufacturers, including Proctor & Gamble, Levi Strauss, Toyota, and Unilever have used EDI to redefine relationships with their customers through such practices as quick response retailing and JIT manufacturing. These highly visible, high-impact applications of EDI by large companies have been extremely successful.
B. Internet-based EDI When considered as a channel for EDI, the Internet appears to be the most feasible alternative for putting on-line B2B trading within the reach of virtually any organization, large or small. There are several reasons for firms to create the EDI ability using the Internet: • The Internet is a publicly accessible network with few geographical constraints. Its largest attribute, large-scale connectivity (without requiring any special company networking architecture), is a
90 seedbed for growth of a vast range of business applications. • The Internet global internetwork connections offer the potential to reach the widest possible number of trading partners of any viable alternative currently available. • Using the Internet can cut communication cost by over 50%. • Using the Internet to exchange EDI transactions is consistent with the growing interest of businesses in delivering an ever-increasing variety of products and services electronically, particularly through the web. • Internet-based EDI can compliment or replace current EDI applications. • Internet tools such as browsers and search engines are very user friendly and most people today know how to use them.
1. Types of the Internet EDI The Internet can support the EDI in a variety of ways: • Internet e-mail can be used as the EDI message transport in place of VAN. For this end, the Internet Engineering Task Force (IETF) considers standards for encapsulating the messages within the Secure Internet Mail Extension (S/MIME). • A company can create an extranet that enables trading partners to enter information in web form whose fields correspond to the fields of an EDI message or document. • Companies can utilize the services of a web-based EDI hosting service in much the same way that companies rely on third parties to host their commerce sites. Netscape Enterprise is illustrative of the type of web-based EDI software that enables a company to provide its own EDI services over the Internet, while Harbinger Express is illustrative of those companies that provide thirdparty hosting services.
2. Prospect of Internet EDI Companies who currently possess traditional EDI respond positively to Internet EDI. A recent survey by Forester Research on 50 Fortune 1000 companies showed that nearly half of them plan to use EDI over the Internet. Frequently, companies combine the traditional EDI with the Internet by having their Internet-based orders transmitted to a VAN or a service provider that translates the data into an EDI format and sends it to their host computers. The Internet sim-
Business-to-Business Electronic Commerce ply serves as an alternative transport mechanism to a more expensive lease line. The combination of the Web, XML (eXtensible Markup Language), and Java makes EDI worthwhile even for small, infrequent transactions. Whereas EDI is not interactive, the Web and Java were designed specifically for interactivity as well as ease of use.
VI. SCM AND COLLABORATION AMONG ALIGNED PARTNERS The major roles of e-marketplaces are the reducing search cost and competitive purchasing. However, when strategic partnership is essential, the advantage of the e-marketplace diminishes. In this setting, a more critical aspect is the elimination of uncertainty between companies along the supply chain, reducing the burden of inventory and buffer capacity. However, a lean supply chain is inherently vulnerable to the system collapse if one company in the chain cannot fulfill its mission. In this section, we describe three types of B2B collaboration: the electronic SCM system, the shared data warehouse and data mining, and virtual corporations.
A. Electronic Supply Chain Management (SCM) System Supply chain is a business process that links material suppliers, manufacturers, retailers, and customers in the form of a chain to develop and deliver products as one virtual organization of pooled skills and resources. SCM is often considered an outgrowth of JIT manufacturing where companies operate with little or no inventory, relying instead on a network of suppliers and transportation partners to deliver the necessary parts and materials to the assembly line just as they are needed for the production. Key functions in SCM are • Managing information on demand to better understand the market and customer needs • Managing the flow of physical goods from suppliers • Managing the manufacturing process • Managing the financial flows with suppliers and customers In the early stage, SCM is attempted within a single factory, and then within an entire enterprise that may possess geographically dispersed workplaces. To link
Business-to-Business Electronic Commerce
91
the supply chain between companies, the corresponding companies may be connected point-topoint. However, as the number of participating companies increases, the point-to-point connection becomes too expensive and technically too difficult for small companies. So the electronic hub-based collaboration become more feasible as depicted in Fig. 5, and the third party SCM service providers like i2 opened their e-hub service on the Internet. Convergence of e-hub—The initial purpose of SCM was sharing information among aligned partners. But since the technical architecture of the e-hub is basically the same as that of the e-marketplace, the SCM service companies try to integrate the e-marketplace function with the hub. In the near future, we will observe the integration of the SCM hub and e-marketplaces. However, for the actual implementation, EC managers should judge which is more important: the transaction cost reduction among the aligned partners, or the flexible competitive selection of products and suppliers. Eventually by providing the combined service, participating companies will be able to enjoy the benefit of transaction cost reduction and competitive selection. JIT delivery—Even though the SCM reduces the inventory by sharing the information, if there is no physical JIT delivery, implementation of SCM is nothing but a mirage. In the EC environment, the orders can arrive from any geographical locations, so the inhouse delivery is not feasible in most cases. Therefore
Point-to-Point
partnership with the third-party delivery companies like FedEx, United Parcel Service, and United States Postal Service becomes very important. Key delivery companies provide on-line tracking service on their web sites. Moreover, the tracking information can be embedded in the manufacturer’s information system so that the employees and its customers can utilize the information as if the manufacturer handles the delivery. The deliverers also provide the warehouse rental service for quick delivery and reduced carrying cost, and some value-added services like simple installation and repair. For instance, National Semiconductor (NatSemi) dealt with a variety of different companies to get products from Asian factories to customers across the world, including freight forwarders, customs agents, handling companies, delivery companies, and airlines. They decided to outsource this entire process to FedEx. Today, virtually all of NatSemi’s products, manufactured in Asia by three company factories and three subcontractors, are shipped directly to a FedEx distribution warehouse in Singapore. Each day, NatSemi sends its order electronically to FedEx. FedEx makes sure the order gets matched to a product and the product is delivered directly to the customer at the promised time. By going with FedEx as a one-stop shop for their logistics needs, NatSemi has seen a reduction of the average customer delivery cycle from four weeks to one week and their distribution costs drop from 2.9% of sales to 1.2%.
E-Hub Collaboration
Supplier Manufacturer
Merge Consolidation eShowroom
Megastore
Logistics
Figure 5 Point-to-point and e-hub collaborated SCM.
Business-to-Business Electronic Commerce
92
B. Shared Data Warehouse and Data Mining Large retailers like Wal-Mart share their data warehouses with the suppliers to share the sales history, inventory, and demand forecast. This framework is illustrated in Fig. 6. This example is a case that a buyer owns its private SCM e-hub. For instance, Wal-Mart’s 3570 stores share their data warehouse, RetailLink, with 7000 suppliers like Warner-Lambert. The world’s largest RetailLink stores 2 years of sales history in 101 terabytes disks. There are about 120,000 data mining inquiries per week to RetailLink. Traditionally, the retailers and suppliers forecasted separately, which resulted in excessive inventory, running out of stock, and lost opportunity for suppliers. However, with RetailLink, collaborative forecasting and replenishment become possible. Suppliers can use accurate sales and inventory data and take care of inventory management. By sharing RetailLink, WalMart could display the right product in the right store at the right price. The retail industry is projected to save $150–250 billion per year.
C. Virtual Corporation One of the most interesting reengineered organization structures is the virtual corporation (VC). A virtual corporation is an organization composed of several business partners sharing costs and resources for the
purpose of producing a product or service. According to Goldman et al. (1995), permanent virtual corporations are designed to create or assemble productive resources rapidly, frequently, concurrently, or to create or assemble a broad range of productive resources. The creation, operation, and management of a VC is heavily dependent on the B2B EC platform. Sometimes a VC can be constructed with the partners in the supply chain. In this case, the SCM can be a vehicle of implementing the VC. However, VCs are not necessarily organized along the supply chain. For example, a business partnership may include several partners, each creating a portion of products or service, in an area in which they have special advantage such as expertise or low cost. So the modern virtual corporation can be viewed as a network of creative people, resources, and ideas connected via on-line services and/or the Internet. The major goals that virtual corporations pursue are • Excellence: Each partner brings its core competence, so an all-star winning team is created. • Utilization: Resources of the business partners are frequently underutilized. A VC can utilize them more profitably. • Opportunism: A VC can find and meet market opportunity better than an individual company. The B2B EC platforms like the Internet and extranet will make the VC possible, because the communication and collaboration among the dispersed business part-
Figure 6 Collaboration by shared data warehouse and data mining.
Business-to-Business Electronic Commerce ners are the most critical essence to make it happen. Extranet is a network that links the intranets of business partners using the virtually private network on the Internet. On this platform, the business partners can use e-mail, desktop videoconferencing, knowledge sharing, groupware, EDI, and electronic fund transfer. For instance, IBM Ambra formed a VC to take advantage of an opportunity to produce and market a PC clone. Each of five business partners played the following roles: engineering design and subsystem development, assembly on a build-to-order basis, telemarketing, order fulfillment and delivery, and field service and customer support. As the B2B EC platform propagates, more companies will be able to make VCs. More example cases, including Steelcase Inc. and The Agile Web, Inc., can be found in Turban et al. (1999) and the case AeroTech can be found in Upton and McAfee (1996).
VII. AGENT-BASED COMMERCE The necessity of agent-based commerce emerges, because B2B EC needs to utilize not only buyer agents but also seller agents, and also because these agents should work together. Role of buyer agents—The purchase process consists of six steps: need identification, product brokering, merchant brokering, negotiation, payment and delivery, and service and evaluation. Among these steps, most agents are developed to assist the product brokering, merchant brokering, and negotiation process. For the buyers, the major roles of software agents are the col-
93 lection of data from multiple e-marketplaces, filtering relevant items, scoring the products according to the customers’ preference, and tabular comparison for side-by-side examination. Pricing Central.com lists the comparison search engines for each category of items. Buyer agents are mainly necessary in the seller-centric e-marketplaces. Buyer agents need to send out the requirement to the relevant seller agents, and interpret the received proposals. Role of seller agents—On the other hand, in the buyer-centric e-marketplaces, sellers have to discover the relevant call for bids from multiple bidding sites. Proposal preparation for numerous buyer agents is another laborious and time-consuming task. So the seller agent must substitute for the transactional role of salesman as much as possible. To customize the proposal, seller agents should be able to understand the request for bid announced by buyer agents, identify its relevance to the items they handle, and generate the appropriate proposals. In this manner, the interaction between buyer and seller agents becomes essential. A prototypical scenario of agent-based commerce is depicted in Fig. 7, and it works as follows: 1. A human buyer gives the requirement by his/her buyer agent. 2. The buyer agent generates a message (request for bids) to send to the relevant seller agents. 3. The seller agent interprets the message and identifies its relevance to the items they handle. 4. The seller agent generates an appropriate proposal and sends it back to the buyer agent.
Figure 7 A prototypical scenario of agent-based commerce. [From Lee, J. K., and Lee, W. (1997). Proceedings of the 13th Hawaii International Conference on System Sciences, pp. 230–241. With permission.]
Business-to-Business Electronic Commerce
94 5. The buyer agent compares the received proposals and reports the results to the human buyer for final decision-making. This step is the same as the comparison shopping aid in the current e-marketplaces. 6. The buyer agent receives the decision made by human buyer, and reports the selection to the bidders. The procedure may vary depending upon the contract protocol adopted. So the protocol of contract, message format, representation of requirement, and specification of products are crucial elements of meaningful agent-based commerce. There is much research going on in various settings. To exchange the messages in a compatible format, we need a common standard language called agent communication languages (ACL) like KQML (Knowledge Query and Manipulation Language). An illustrative message that requests a proposal from seller agents is demonstrated in Fig. 8. ACL consists of performatives and parameters. For instance, evaluate is a performative, and sender and receiver are parameters. To develop a dedicated ACL for EC, we need to augment the performatives and parameters to incorporate the generic terms necessary for EC, product specification, and buyers’ requirement representation. For instance in Fig. 8, the parameters such as title, contract_ID, contract_type, bid_time, payment_method, de-
livery_method, delivery_date, item_name, and quantity are the generic terms that are necessary in any agentbased commerce. So let us distinguish these terms as the electronic commerce layer. In the bottom of the message, the buyer’s requirements are represented by the product specification. They are not generic, but depend upon the products to buy and sell. So this layer is distinguished as the product specification layer. In many cases, the buyers may not be comfortable with expressing their requirements in the product specification level. It may be too technical and incomprehensible. Buyers are more familiar with the terms used in the buyer’s environment. So mapping between the buyer’s requirements expressions with seller’s product specification is necessary. For this purpose, the salesman expert system can be used. For instance, in the process of computer purchase in the Personalogic site, customers may express the level of using word processing, network, graphics, etc. Then the system suggests the recommended memory requirement.
VIII. XML IN B2B EC Hypertext Markup Language (HTML) is developed to display text for human comprehension. For the agents to comprehend the global HTML files, the agents should be equipped with natural language processing capability. However, the natural language processing
Figure 8 Three layers of message representation.
Business-to-Business Electronic Commerce capability is limited only to the cases with a small vocabulary and structured grammar. Since the web sites cover all sorts of information, it is impossible to effectively extract the relevant data from the HTML files. So eXtensible Markup Language (XML) is developed to define the semantic data items in tags. The software agents are able to read out the values in the tags. XML files are also transformable to HTML files to display on the browser. To complement the XML files, the document type definition (DTD) defines the structure between data elements, and the eXtensible Stylesheet Language (XSL) defines the transformation of XML and DTD to HTML files for human comprehension. XML has become popular as the second generation of web technology. For B2B EC, XML can be observed from the view of EDI and agents. From the EDI’s point of view, XML is a tool to implement the EDI on the web. Users can just download the XML-based web page, and browse the transformed HTML files and retrieve the values of embedded data items. If the purpose of XML is just a message exchange, it can be regarded as the XML/EDI. However, the EDI community does not limit the function of XML just for the message exchange, as the traditional EDI did. When the XML is used in the context of agent-based commerce, our concern is not only message format but also the entire protocol. Therefore the EDI community and agent community are destined to converge to set up a commonly agreed upon B2B protocol implemented on XML. So far, the major research focus in the agent community is not its implementing platform like XML because it is more practical than academic. On the other hand, the EDI task forces, mainly led by industrial experts, try to set up simple and useful business protocols implemented on XML. The main issue here is the establishment of standard protocols including business entities, procedure, message format, and product specification. For instance, the protocol OBI (open business interface) adopts the requisitioner, buying organization, supplier, and payment authority as the entities of B2B protocol. It is very difficult for a single protocol to meet the needs of all circumstances. So the standard establishment is booming to capture the strategically beneficial position in B2B EC. Many task forces attempt to establish quite a number of XML-based business protocols. Some of them are • OTP (open trading protocol). Proposed by a 30company consortium for handling remote electronic purchase regardless of the payment mechanism (www.otp.org) • OFX (open financial exchange). Proposed by Microsoft, Intuit, and Checkfree for exchanging financial transaction documents
95 • OSD (open software description). Proposed by Marimba and Microsoft for describing software package components to be used in automated software distribution environment • OBI (open buying on the Internet). Proposed by American Express and major buying and selling organizations (www.openbuy.com) • CBL (common business language). Supports the messages for the basic business forms used in ANSI X12 EDI (www.xmledi.com) transactions as well as those in OTP and OBI • RossetaNet. Developed for PC industry (www.rosettanet.org) Software vendors attempt to establish a standard, and develop solutions that can implement the standard protocol. Biz Talk developed by Microsoft is one of these. A concern at this point is that there are too many standards, so companies are confused which standard will really become the de facto standard in the market. To overcome this chaos, UN EDIFACT and OASIS have developed a standard ebXML. Many standard bodies, industry groups, vendors, users from around the world are integrating ebXML into their implementation. So the ebXML may become the de facto standard in the industry. If the industry moves with the XML standard, agent research should take this movement into account to develop the practical agents that meet the standard. That is why the XML, EDI, and agent-based commerce will merge for effective B2B EC.
IX. KNOWLEDGE MANAGEMENT IN THE B2B EC PLATFORM Another important aspect of B2B EC is the management of knowledge in the web sites that are used by employees and partners. For instance, the e-catalog is an important source of knowledge about the products; the Q&A for the technical support is the source of technical knowledge; the regulation for budgetary control is important knowledge for budgeting. The first users of visually displayed knowledge are human. So browsing HTML files is a way to gain it. Most of the current knowledge management systems belong to this category. It provides a search engine for retrieving the statements with the requested key words. The second users are software agents as mentioned in the previous section. For this purpose, we need to represent the data in XML files explicitly. However, XML files cannot process the implicitly embedded rules in the texts. So we need the third generation web technology, which can explicitly codify
Business-to-Business Electronic Commerce
96
Figure 9 Knowledge management with XRML.
the rules for inference engines to process. An issue here is the maintenance of consistency between the structured rules and natural language texts. To solve this issue, eXtensible Rule Markup Language XRML is under development, as depicted in Fig. 9. To realize the rule processing with XRML files, we need three components: 1. Rule structure markup language: This language represents the rule in markup syntax so that the variables and values in the rule can be associated with the natural language texts. 2. Rule identification markup language: This language identifies the natural language sentences and its relationship with the rules, variables, and values used in the rules. 3. Rule trigger markup language: This meta-language defines the condition that the rules will be triggered, and is codified as embedded language in agents and forms in workflow management systems. The knowledge editor should be able to support the consistent maintenance of natural language text and rules and any other programs codified for machine processing. By developing the XRML or a similar environment, the B2B EC can become more intelligent allowing visual display, a machine’s data processing, and a machine’s rule processing. Agents and workflow software will be more intelligent in this environment. More information about Rule Markup Language research initiatives can be found in cec.net.
e-marketplaces, and evolved to buyer-centric e-marketplaces. A large number of vertical marketplaces are being developed to help the exchange and supply chain of each industry. To integrate the e-procurement systems with the external e-marketplaces, the Inside-out and outside-in approaches are competitively being attempted. The EDI platform for B2B is moving onto the Internet. The electronic SCM system, shared data warehouse, and virtual corporations are getting provided on the e-hub exploring collaboration among the aligned partners. Agents that assist the buyers and sellers are establishing the agent-based commerce environment, and they will be implemented using XML. To build the standard B2B EC protocol, a large number of XML consortiums are conducting research. Each of the above activities has started from the different angles for B2B interactions. However, eventually the B2B platform will converge to the e-hubs with a unified architecture, and will continuously seek the effective integration with the internal information systems. Initial development of B2B EC was pushed by information technology, but it will be concluded by the pull of meeting managerial goals.
ACKNOWLEDGMENT Materials in the author’s book (Turban et al. 2000) is used with permission of the publisher.
SEE ALSO THE FOLLOWING ARTICLES X. SUMMARY AND CONCLUSION Key activities in B2B EC have been discussed in this paper. B2B EC started with the seller-centric
Advertising and Marketing in Electronic Commerce • Electronic Commerce • Electronic Commerce, Infrastructure for • Enterprise Computing • Marketing • Sales • Service Industries, Electronic Commerce for
Business-to-Business Electronic Commerce
BIBLIOGRAPHY Blankenhorn, D. (May 1997). GE’s e-commerce network opens up to other marketers. NetMarketing. www.netb2b.com. Cunningham, M. J. (2000). B2B: How to build a profitable E-commerce strategy. Cambridge, MA: Perseus Publishing. Davis, C. (September 1997). Most B-to-B sites don’t meet customer needs: Gartner. NetMarketing. www.netb2b.com. Finin, T., Wiederhold, J. W. G., Genesereth, M., Fritzson, R., McGuire, J., Shapiro, S., and Beck, C. (1993). Specification of the KQML Agent Communication Language, DARPA Knowledge Sharing Initiative, External Interface Working Group. Freeman, L. ( January 1998). Net drives B-to-B to new highs worldwide. NetMarketing. www.netb2b.com. Frook, J. E. ( July 1998). Web links with back-end systems pay off. Internet Week. www.internetwk.com. Goldman et al. (1995). Competitors and virtual organization. New York: Van Nostrand Reinhold. Handfield, R., and Nicols, E. (1999). Supply chain management. Upper Saddle River, NJ: Prentice Hall. Joh, Y. H., and Lee, J. K. (2001). A framework of buyer’s e-catalog directory management system. Decision Support Systems. Kalakota, R., and Whinston, A. B. (1997). Electronic commerce: A manager’s guide. Reading MA: Addison-Wesley. Lawrence, E. et al. (1998). Internet commerce: Digital models for business. New York: John Wiley and Sons. Lee, J. K. (1998). Next generation electronic marketing environment: ICEC perspective. Proceedings of International Conference on Electronic Commerce ‘98, p. 6. icec.net. Lee, J. K. and Sohn, M. (2002). eXtensible Rule Markup Language—Toward the Intelligent Web Platform. Communications of the ACM. Forthcoming. Lee, J. K., and Lee, W. (1997). An intelligent agent based contract process in electronic commerce: UNIK-AGENT approach. Proceedings of 13th Hawaii International Conference on System Sciences, pp. 230–241.
97 Lee, S. K., Lee, J. K., and Lee, K. J. (1997). Customized purchase supporting expert system: UNIK-SES. Expert Systems with Applications, Vol. 11, No. 4, pp. 431–441. Lim, G., and Lee, J. K. (2002). Buyer-carts for B2B EC: The b-cart approach. Organizational Computing and Electronic Commerce, Forthcoming. Maddox, K. (1998). Cisco wins big with net ordering. NetMarketing. www.netb2b.com/cgi-bin/cgi_article/monthly/97/ 05/01/article.html. Maes, P., Guttman, R. H., and Moukas, A. G. (March 1999). Agents that buy and sell. Communications of ACM, Vol. 42, No. 3, 81–91. Nelson, M. ( July 1998). SAP adds module for I-commerce. InfoWorld, Vol. 21, No. 27. Retter, T., and Calyniuk, M. ( July 1998). Technology Forecast: 1998. Price Waterhouse. Sculley, A. B., and Woods, W. W. (1999). B2B Exchanges. Philadelphia, PA: ISI Publications. Silverstein, B. (1999). Business-to-business internet marketing. Gulf Breeze, FL: Maximum Press. Silwa, C. (1998). Software improved net purchasing process. ComputerWorld. www.computerworld.com. Sullivan, D. ( January 1999a). Extending E-business to ERP. e-business Advisor, pp. 18–23. Sullivan, D. ( January 1999b). Take ERP on the road. e-business Advisor, pp. 24–27. Trading Process Network. (1999). Extending the enterprise: TPN post case study—GE Lighting. tpn.geis.com/tpn/resouce_center/casestud.html. Turban, E., Lee, J. K., King, D., and Chung, M. (2000). Electronic Commerce: a managerial perspective. Englewood Cliffs, NJ: Prentice Hall. Turban, E., McLean, E., and Wetherbe, J. (1999). Information technology for management, 2nd Ed. New York: John Wiley & Sons. Upton, D. M., and McAfee, A. (July 1996). The real virtual factory. Harvard Business Review, pp. 123–133. Weston, W. ( June 1998). Commerce one debuts SAP-oriented Tools. News.com. www.news.com/News/Item/0,4,23566,00. html.
C
C and C Jiang Guo California State University, Bakersfield
I. HISTORIES OF C & C II. C’S CHARACTERISTICS III. COMPARISON OF C AND C
IV. ADVANTAGES OF C V. ADVANCED TOPICS OF C VI. CONCLUSION
GLOSSARY
structure. In this way, the data structure becomes an object that includes both data and functions. In addition, programmers can create relationships between one object and another. For example, objects can inherit characteristics from other objects. To perform object-oriented programming, one needs an object-oriented programming language (OOPL). Java, C, and Smalltalk are three popular OOPL languages, and there is also an objectoriented version of Pascal. programming language A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. The term programming language usually refers to high-level languages, such as BASIC, C, C, COBOL, FORTRAN, Ada, and Pascal. Each language has a unique set of keywords (words that it understands) and a special syntax for organizing program instructions. type checking Ensures that all declarations and uses referring to the same object are consistent. It is also the key to determining when an undefined or unexpected value has been produced due to the type conversions that arise from certain operations in a language.
data encapsulation The process of combining elements to create a new entity. For example, a complex data type, such as a class, is a type of data encapsulation because it combines built-in types or use-defined types and functions. Object-oriented programming languages rely heavily on data encapsulation to create high-level objects. dynamic binding A method of attaching processor addresses to instructions and data during program execution. C implements dynamic binding through the use of virtual functions, which allows derived classes to override the base class functionality. Which function is invoked depends on the context in which the function is invoked at run time. inheritance The concept that when a class is defined, then any subclass can inherit the definitions of one or more general classes. This means for the programmer that a subclass need not carry its own definition of data and methods that are generic to the class (or classes) of which it is a part. This not only speeds up program development; but it also ensures an inherent validity to the defined subclass object. memory leak A bug in a program that prevents it from freeing up memory that it no longer needs. As a result, the program grabs more and more memory until it finally crashes because there is no more memory left. object-oriented programming A type of programming in which programmers define not only the data type of a data structure, but also the types of operations (functions) that can be applied to the data
C AND C are widely used for teaching, research,
and developing large systems in industry. They are the most important computer languages. With a few modest exceptions, C can be considered a superset of the C programming language. While C is similar to C in syntax and structure, it is important to realize that the two languages are radically different. C and its support for object-oriented programming pro-
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
99
100 vide a new methodology for designing, implementing, and ease of maintaining software projects which C, a structured programming language, is unable to support. This article describes the histories of C and C. It compares the advantages and disadvantages of the two languages and discusses the object-oriented features of C. It also addresses some advanced topics of C, such as storage and memory leaks, type checking, templates, and exceptions.
I. HISTORIES OF C AND C The C programming language came into being in the years 1969–1973. It was developed at AT&T for the purpose of writing an operating system for PDP-11 computers. This operating system evolved into Unix. In 1978 Brian Kernighan and Dennis Ritchie published The C Programming Language. The C programming language was finally and officially standardized by the ANSI X3J11 committee in mid-1989. During the 1980s the use of the C language spread widely, and compilers became available on nearly every machine architecture and operating system; in particular it became popular as a programming tool for personal computers, both for manufacturers of commercial software for these machines, and for end users interested in programming. Today it is among the languages most commonly used throughout the computer industry. C evolved from the B and BCPL programming languages. BCPL, B, and C all fit firmly into the traditional procedural family typified by FORTRAN and Algol 60. They are particularly oriented toward system programming, are small and compactly described, and are amenable to translation by simple compilers. They are low level programming languages. The abstractions that they introduce are readily grounded in the concrete data types and operations supplied by conventional computers, and they also rely on library routines for input-output and other interactions with an operating system. The programmers can use library procedures to specify interesting control constructs such as coroutines and procedure closures. At the same time, their abstractions lie at a sufficiently high level that, with care, portability between machines can be achieved. BCPL, B, and C are syntactically different in many details. But broadly speaking, they are similar. Programs consist of a sequence of global declarations and function (procedure) declarations. Procedures can be nested in BCPL, but may not refer to nonstatic objects defined in containing procedures. B and C
C and C avoid this restriction by imposing a more severe one: no nested procedures at all. Each of the languages recognizes separate compilation, and provides a means for including text from named files which are called header files. BCPL, B, and C do not strongly support character data in the language; each treats strings much like vectors of integers and supplements general rules with a few syntactic conventions. In the C language, a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. The strings are terminated by a special character “\0”. It also should be pointed out that BCPL and B are the typeless languages, whereas C is a typed language (every variable and expression has a data type that is known as compile time). The programming language C has several direct descendants, though they do not rival Pascal in generating progeny, such as Concurrent C, Objective C, C*, and especially C. The C language is also widely used as an intermediate representation (essentially, as a portable assembly language) for a wide variety of compilers, both for direct descendents like C, and independent languages like Modula 3 and Eiffel. C is a programming language developed at AT&T Bell Laboratories by Bjarne Stroustrup in the early 1980s. In the first phase, Stroustrop merely viewed his work as an extension of C, calling it C with Classes. This was released around 1982. From 1983–1984 C with Classes was revised and renamed C. Subsequent comments led to a revision and new release in 1985. The programming language C was designed with the intent of merging the efficiency and conciseness of C with the object-oriented programming (OOP) features of SIMULA-67. The language has evolved rapidly and several new features have been added since its initial release in 1985. The language also provides support for several other useful mechanisms such as parameterized types, templates, and exception handling. A formal ISO/ANSI C committee (X3J16) was established to help develop an accurate and reliable standard for the language in order to eliminate the ambiguities in the C compilers and translators of the time. The ISO/ANSI C language standard was officially approved in 1998 and adopted most of the rules present in the ANSI base document The Annotated C Reference Manual as written by Ellis and Stroustrup. With a few modest exceptions, C can be considered a superset of the C programming language. While C is similar to C in syntax and structure, it is important to realize that the two languages are radically different. C and its support for OOP provide a new methodology for designing, implementing, and
C and C ease of maintaining software projects which C, a structured programming language, is unable to support. Extensive libraries are available for the C programming language; consequently, a deliberate effort was made by the developers of C to maintain backward compatibility with C. Any major deviation from the C programming language would have meant that all the libraries available for C would have to be tediously rewritten for C. This would have severely limited the usefulness of C in an environment where C libraries were used extensively. C is largely an amalgamation of several other programming languages. Obviously, C inherits most of its characteristics, such as its basic syntax, looping mechanisms and the like, from C. Apart from C, C borrows most heavily from the aforementioned SIMULA-67 programming language. Nearly all the support that C provides for OOP comes from this language. The concept of a class and the so-called virtual function mechanism are a few of the features present in SIMULA-67 which have been integrated in C. To a limited extent, C also borrows some programming mechanisms from Algol 68. These include support for operator overloading and the declaration of variables almost anywhere in the code. As mentioned, the newer C compilers provide support for parameterized types and exception handling, concepts borrowed from Ada and Clu.
II. C’S CHARACTERISTICS C is a relatively small language, but one which wears well. C’s small, unambitious feature set is a real advantage: there’s less to learn and there isn’t excess baggage in the way when programmers do not need it. It can also be a disadvantage: since it doesn’t do everything for programmers, there’s a lot they have to do themselves. (Actually, this is viewed by many as an additional advantage: anything the language doesn’t do for programmers, it doesn’t dictate to them either, so they are free to do that something however they want.)
A. The Advantages of C Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. The good news about C is that programmers can write code that runs quickly, and their pro-
101 gram is very “close to the hardware.” That means that they can access low-level facilities in computers quite easily without the compiler or run time system stopping them from doing something potentially dangerous. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time, the language is sufficiently abstracted from machine details that program portability can be achieved. C is sometimes referred to as a “high-level assembly language.” Some people think that is an insult, but it is actually a deliberate and significant aspect of the language. If a programmer has programmed in assembly language, he/she will probably find C very natural and comfortable (although if he/she continues to focus too heavily on machine-level details, he will probably end up with unnecessarily nonportable programs). If he/she has not programmed in assembly language, he/she may be frustrated by C’s lack of certain higher level features. In either case, he/she should understand why C was designed this way: so that seemingly simple constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine language constructions when compiled. If a programmer writes a C program simply and concisely, it is likely to result in a succinct, efficient machine language executable. If he/she finds that the executable program resulting from a C program is not efficient, it is probably because of something silly he/she did, not because of something the compiler did behind his back with which he has no control. In any case, there is no point in complaining about C’s low-level flavor: C is what it is. C imposes relatively few built-in ways of doing things on the programmer. Some common tasks, such as manipulating strings, allocating memory, and doing input/output (I/O), are performed by calling on library functions. Other tasks which a programmer might want to do, such as creating or listing directories, interacting with a mouse, displaying windows or other user-interface elements, or doing color graphics, are not defined by the C language at all. A programmer can do these things from a C program, of course, but he/she will be calling on services which are peculiar to his programming environment (compiler, processor, and operating system) and which are not defined by the C standard. The use of compiler directives to the preprocessor makes it possible to produce a single version of a program which can be compiled on several different types of computers. In this sense C is said to be very portable.
C and C
102 The function libraries are standard for all versions of C so they can be used on all systems. C’s central library support always remains in touch with a real environment. It was not designed in isolation to prove a point or to serve as an example, but as a tool to write programs that did useful things; it was always meant to interact with a larger operating system, and was regarded as a tool to build larger tools. A parsimonious, pragmatic approach influences the things that go into C: it covers the essential needs of many programmers, but does not try to supply too much. C is quirky, flawed, and an enormous success. While accidents of history surely helped, it evidently satisfied a need for a system implementation language efficient enough to displace assembly language, yet sufficiently abstract and fluent to describe algorithms and interactions in a wide variety of environments.
If a programmer takes care and pays attention, he/she can avoid many of the pitfalls. Another disadvantage of C is that it allows programmers to write very terse code. They can express exactly what they want to do in very few statements. They might think that this is nice, because it makes their programs even more efficient, but it has the side effect of making them much harder to understand. At the time a programmer writes the code he/she knows exactly what each part is supposed to do. If he/she comes back to the program in several months, he/she will need time to “get back inside it.” If the code is written very tightly he/she will take much longer to do this, and other people may not be able to understand it at all. In contrast, many programmers strive to write code that is not necessarily the most efficient possible, but is easy to understand. Such programmers sacrifice a bit of program performance for ease of maintenance.
B. The Disadvantages of C C. A Critique of C The disadvantages of C fall neatly from the advantages. The biggest one is that a programmer can write C programs that can fail in very catastrophic ways. These programs will appear totally valid as far as the compiler is concerned, but will not work and may even cause computers to stop. A more picky language would probably notice that programmers were doing something stupid in their program and allow them to find the error before it crashed their computers. However, a more picky language would probably not allow them to write the program in the first place. It is worth mentioning that C is a bit dangerous. C does not, in general, try hard to protect a programmer from mistakes. If a programmer writes a piece of code which will do something wildly different from what he intended it to do, up to and including deleting his data or trashing his disk, and if it is possible for the compiler to compile it, it generally will. C is often compared to a sharp knife: it can do a surgically precise job on some exacting task a programmer has in mind, but it can also do a surgically precise job of cutting off his finger. It is up to a programmer to use it carefully. This aspect of C is very widely criticized; it is also used to argue that C is not a good teaching language. C aficionados love this aspect of C because it means that C does not try to protect them from themselves: when they know what they’re doing, even if it’s risky or obscure, they can do it. Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy specifically designed to lead them into booby traps. This is another aspect of the language that is fairly pointless to complain about.
Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax. They are also among its most frequently criticized features, and often serve as stumbling blocks to the beginner. C treats strings as arrays of characters conventionally terminated by a marker (the character \0). Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type. Some costs accrue from its approach: certain string operations are more expensive than in other designs because application code or a library routine must occasionally search for the end of a string, because few built-in operations are available, and because the burden of storage management for strings falls more heavily on the user. Nevertheless, C’s approach to strings works well. On the other hand, C’s treatment of arrays in general (not just strings) has unfortunate implications both for optimization and for future extensions. The prevalence of pointers in C programs, whether those declared explicitly or arising from arrays, means that optimizers must be cautious, and must use careful dataflow techniques to achieve good results. Sophisticated compilers can understand what most pointers can possibly change, but some important usages remain difficult to analyze. For example, functions with
C and C pointer arguments derived from arrays are hard to compile into efficient code on vector machines because it is seldom possible to determine that one argument pointer does not overlap data also referred to by another argument, or accessible externally. More fundamentally, the definition of C so specifically describes the semantics of arrays that changes or extensions treating arrays as more primitive objects, and permitting operations on them as wholes, become hard to fit into the existing language. Even extensions to permit the declaration and use of multidimensional arrays whose size is determined dynamically are not entirely straightforward, although they would make it much easier to write numerical libraries in C. Thus, C covers the most important uses of strings and arrays arising in practice by a uniform and simple mechanism, but leaves problems for highly efficient implementations and extensions. Many smaller infelicities exist in the language and its description besides those discussed above. There are also general criticisms to be lodged that transcend detailed points. Chief among these is that the language and its generally expected environment provide little help for writing very large systems. The naming structure provides only two main levels, “external” (visible everywhere) and “internal” (within a single procedure). An intermediate level of visibility (within a single file of data and procedures) is weakly tied to the language definition. Thus, there is little direct support for modularization, and project designers are forced to create their own conventions. Similarly, C itself provides two durations of storage: “automatic” objects that exist while control resides in or below a procedure, and “static,” existing throughout execution of a program. Off-stack, dynamically allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.
III. COMPARISON OF C AND C C is an extension of C developed at AT&T with the purpose of adding object-oriented features to C while preserving the efficiencies of C. For all practical purposes, the C language is a subset of C even though it is possible to write C programs that are not valid in C. The main similarity in C and C lies in the syntax of the two languages. C and C both share many of the same fundamental programming constructs. This is the reason why it is easy for a proficient C programmer to learn C provided he/she understands the object-oriented paradigm. C sup-
103 ports every programming technique supported by C. Every C program can be written in essentially the same way in C with the same run time and space efficiency. It is not uncommon to be able to convert tens of thousands of lines of ANSI C to C-style C in a few hours. Thus, C is as much a superset of ANSI C as ANSI C is a superset of the original C and as much as ISO/ANSI C is a superset of C as it existed in 1985. C maintains its C roots at various levels: • Source code level. Most ANSI C programs are valid C programs. • Object code level. C structures are “binarycompatible” with equivalent C structures. • Environment/tool level. C works with standard tools like the make facility. C can be viewed as a programming language derived from C with improved procedural syntax as compared to C and object-oriented features not present in C. Note that though C supports OOP, it does not enforce it. C is therefore a multiparadigm language. If what a programmer is looking for is something that forces him to do things in exactly one way, C isn’t it. There is no one right way to write every program—and even if there were, there would be no way of forcing programmers to use it. Of course, writing C-style programs in C is not an optimal use of C for most applications. To be a truly effective C programmer, he/she must use the abstraction mechanisms and the type system in a way that fits reasonably with their intent. Trying to ignore or defeat the C type system is a most frustrating experience. C is a procedural language. It is not designed to support OOP. In a procedural program, the problem is broken down into modules and submodules which implement the solution. C on the other hand can provide all the advantages inherent in the objectoriented paradigm. In an OOP the problem space is represented by objects that interact with each other, and these objects are implemented along with messaging mechanisms to provide a solution. C supports data abstraction, OOP, and generic programming. OO programming languages have a number of inherent advantages. They lend themselves to better design because they allow problem spaces to be modeled like real-world objects. In an object oriented language, an object is called an instance of a class. A class packages all the attributes and methods of an object. Attributes are the data associated with the object and methods are the functions that operate on the data and express the behavior of the ob-
104 ject. As an object-oriented language C supports inheritance, encapsulation, and polymorphism, which if properly used lead to better programs. These features are discussed in detail in the next section. C provides stronger type checking than C. When a user defines a new type in C, support is provided in the language to permit that type to behave in a manner similar to types already built into the language. The user may define how the standard operators act upon these user-defined types (operator overloading) and how these types can be converted to another type (user defined conversions). The user may also specify how memory is allocated or deallocated when an instance of that type is created or destroyed. This is done through the use of constructors and destructors, which are called implicitly at run time when an instance of that type is brought into and taken out of scope respectively. C provides support for function prototypes (forward declarations of function signatures), hence enabling strong type checking of function parameters to take place during compilation. In addition, C provides support for the pass by reference mechanism and also supports default arguments to functions. The latter means that a function requires an argument that often has the same specific value, the user can default the argument to that value and not pass that parameter when the function is called. In the few cases where the function has to be called with a different value for the default argument, the user simply passes that argument into the function and the new value overrides the default value. There is another aspect worth mentioning. Some people feel that C is a little overrated; in general this holds true for the entire OOP. Often it is said that programming in C leads to “better” programs. Some of the claimed advantages of C are • New programs can be developed in less time because existing C code can be reused. • Creating and using new data types is easier than in C. • The memory management under C is easier and more transparent. • Programs are less bug-prone, as C uses a stricter syntax and type checking. • “Data hiding,” the usage of data by one program part while other program parts cannot access the data, is easier to implement with C. Which of these allegations are true? Obviously, extreme promises about any programming language are
C and C overdone; in the end, a problem can be coded in any programming language (even BASIC or assembly language). The advantages or disadvantages of a given programming language aren’t in “what a programmer can do with them,” but rather in “which tools the language offers to make the job easier.” In fact, the development of new programs by reusing existing code can also be realized in C by, e.g., using function libraries: handy functions can be collected in a library and need not be reinvented with each new program. Still, C offers its specific syntax possibilities for code reuse in addition to function libraries. Memory management is in principle in C as easy or as difficult as in C, especially when dedicated C functions such as xmalloc() and xrealloc()are used (these functions, often present in our C programs, allocate memory or abort the program when the memory pool is exhausted). In short, memory management in C or in C can be coded “elegantly,” “ugly,” or anything in between—this depends on the developer rather than on the language. Concerning “bug proneness,” C indeed uses stricter type checking than C. However, most modern C compilers implement “warning levels”; it is then the programmer’s choice to disregard or heed a generated warning. In C many such warnings become fatal errors (the compilation stops). As far as data hiding is concerned, C does offer some tools; e.g., where possible, local or static variables can be used and special data types such as structs can be manipulated by dedicated functions. Using such techniques, data hiding can be realized even in C; though it needs to be said that C offers special syntactical constructions. In contrast, programmers who prefer to use a global variable int i for each counter variable will quite likely not benefit from the concept of data hiding. Concluding, C in particular and OOP in general are not solutions to all programming problems. C, however, does offer some elegant syntactical possibilities, which are worth investigating. At the same time, the level of grammatical complexity of C has increased significantly compared to C. In time a programmer gets used to this increased level of complexity, but the transition doesn’t take place quickly or painlessly. In the strict mathematical sense, C isn’t a subset of C. There are programs that are valid C but not valid C and even a few ways of writing code that has a different meaning in C and C. However, well-written C tends to be legal C also.
C and C Here are some examples of C/C compatibility problems: • Calling an undeclared function is poor style in C and illegal in C, and so is passing arguments to a function using a declaration that doesn’t list argument types. • In C, a pointer of type void* can be implicitly converted to any pointer type, and free-store allocation is typically done using malloc(), which has no way of checking if “enough” memory is requested. • C has more keywords than C.
IV. ADVANTAGES OF C Most people think OOPs are easier to understand, correct, and modify. Besides C, many other object-oriented languages have been developed, including most notably, Smalltalk. The best feature (some people feel this is the worst feature) of C is that C is a hybrid language—it is possible to program in either a C-like style, an object-oriented style, or both. Writing Smalltalk-style in C can be equally frustrating and as sub-optimal as writing C-style code in C. There is no formal definition of OOP. Hence there is some confusion surrounding what features a programming language must support in order to claim that it is object-oriented. Despite this, however, most agree that in order for a language to claim that it is object-oriented, it must provide support for three major concepts. • Data encapsulation or data abstraction • Inheritance or derivation • Dynamic or run-time binding The following subsections will explain these features and show how C provides support for them through its concept of a class; the underlying mechanism upon which all good C programs are based.
A. Data Hiding and Encapsulation in C This is an important aspect of OOP languages and C offers several mechanisms to achieve data hiding. Encapsulation is the abstraction of information and is also called data hiding. It prevents users from seeing the internal workings of an object so as to pro-
105 tect data that should not be manipulated by the user. Encapsulation is supported in C through the class mechanism though the view of encapsulation differs from that in Eiffel. Data hiding and encapsulation form a crucial element in the protection mechanism within OOP languages. Encapsulation improves the modularity of the code and leads to more easy maintenance of programs by hiding the actual implementation details within an object and simply allowing access to an object’s interface. There are several ways in which data hiding can be achieved. It is true that C has the concept of an interface in which the “services” a class offers can be made accessible to other objects, however, the manner in which that interface can be described varies. Whatever the mechanism used, an object of any class A that wishes to send a message to an object of another class B needs to “include” the interface of class B in order to allow the compiler to perform its normal checks (method name, number of parameters, types of parameters, etc.). C is backward compatible with C and the include mechanism derives from this. In effect, what happens is that a pre-compilation process occurs where any include directives are replaced with the contents of the file to be included. As this is a process that can be carried out separately, i.e., a programmer can carry out the precompilation without carrying on to the compilation stage it means that he can produce a text file that comprises his code together with the code from any include file. This does not seem to be a very good idea given the aims and objectives of data hiding and encapsulation. The user can only perform a restricted set of operations on the hidden members of the class by executing special functions commonly called methods. The actions performed by the methods are determined by the designer of the class, who must be careful not to make the methods either overly flexible or too restrictive. This idea of hiding the details away from the user and providing a restricted and clearly defined interface is the underlying theme behind the concept of an abstract data type. One advantage of using data encapsulation comes when the implementation of the class changes but the interface remains the same. For example, to create a stack class, which can contain integers, the designer may choose to implement it with an array, which is hidden from the user of the class. The designer then writes the push() and pop() methods which put integers into the array and remove them from the array, respectively. These methods are made accessible to
C and C
106 the user. Should an attempt be made by the user to access the array directly, a compile time error will result. Now, should the designer decide to change the stack’s implementation to a linked list, the array can simply be replaced with a linked list and the push() and pop() methods rewritten so that they manipulate the linked list instead of the array. The code that the user has written to manipulate the stack is still valid because it was not given direct access to the array to begin with. The concept of data encapsulation is supported in C through the use of the public, protected, and private keywords, which are placed in the declaration of the class. Anything in the class placed after the public keyword is accessible to all the users of the class; elements placed after the protected keyword are accessible only to the methods of the class or classes derived from that class; elements placed after the private keyword are accessible only to the methods of the class. As a convention, calling a method of an object instantiated from a class is commonly referred to as sending a message to that object.
B. Inheritance in C There is a very important feature of C, and objectoriented languages in general, called inheritance. Inheritance allows programmers to create a derived class from a previously defined base class. The derived class contains all the attributes and methods of the base class plus any additional specifications of the derived class. Any changes made to base classes are propagated to all derived classes unless explicitly overridden. Inheritance facilitates code reuse and thereby cutting development costs. In the inheritance mechanism, the original class is called the “base” or “ancestor” class (also the “superclass”), and the derived class is the “derived” or “descendant” class (also the “subclass”). Inheritance is the mechanism whereby specific classes are made from more general ones. The child or derived class inherits all the features of its parent or base class, and is free to add features of its own. In addition, this derived class may be used as the base class of an even more specialized class. Inheritance, or derivation, provides a clean mechanism whereby common classes can share their common features, rather than having to rewrite them. For example, consider a graph class, which is represented by edges and vertices and some (abstract) method of traversal. Next, consider a tree class, which is a special form of a graph. We can simply derive tree from graph and the
tree class automatically inherits the concept of edges, vertices, and traversal from the graph class. We can then restrict how edges and vertices are connected within the tree class so that it represents the true nature of a tree. Inheritance is supported in C by placing the name of the base class after the name of the derived class when the derived class is declared. It should be noted that a standard conversion occurs in C when a pointer or reference to a base class is assigned a pointer or reference to a derived class.
C. Dynamic Binding of Function Calls in C Quite often when using inheritance, one will discover that a series of classes share a common behavior, but how that behavior is implemented is different from class to class. Such a situation is a prime candidate for the use of dynamic or run-time binding, which is also referred to as polymorphism. Polymorphism allows different objects to respond differently to the same message. There are two types of polymorphism: (1) early binding, which allows overloading of functions; overloading means that different functions can have the same name but can be distinguished based on their signature (number, type and order of parameters) and (2) late binding, which allows derived classes to override the base class functionality. Which function is invoked depends on the context in which the function is invoked. Polymorphism improves the flexibility of programming by allowing developers better design options. Going back to our previous example, we may decide to derive two tree classes from our graph class. The first class, in_order_tree would be traversed in an “in order” fashion when it received a traverse() message, whereas post_order_tree would be traversed in a “post order” manner. The different traversal algorithms could be incorporated into a dynamically bound traverse() method. Now, when one of these trees is passed to a function which accepts a reference to a graph class, the invocation of the traverse() method via the graph parameter would call the correct traversal algorithm at run time depending upon which tree was passed to the function. This reduces the burden on the programmer since a tag does not have to be associated with each class derived from a graph to distinguish it from other graphs. In addition, the programmer would not have to maintain an unwieldy switch statement to determine which traversal algorithm to invoke since this is all handled automatically by the compiler.
C and C C implements dynamic binding through the use of virtual functions. While function calls resolved at run time are somewhat less efficient than function calls resolved statically, Stroustrup notes that a typical virtual function invocation requires just five more memory accesses than a static function invocation. This is a very small penalty to pay for a mechanism that provides significant flexibility for the programmer. It is from inheritance and run time binding of function calls that OOP languages derive most of their power. Some problems lend themselves very well to these two concepts, while others do not. As Stroustrup notes: “How much types have in common so that the commonality can be exploited using inheritance and virtual functions is the litmus test of the applicability of object-oriented programming.”
V. ADVANCED TOPICS OF C A. Storage and Memory Leaks In C and C, there are three fundamental ways of using memory: 1. Static memory, in which an object is allocated by the linker for the duration of the program. Global variables, static class members, and static variables in functions are allocated in static memory. An object allocated in static memory is constructed once and persists to the end of the program. Its address does not change while the program is running. Static objects can be a problem in programs using threads (sharedaddress-space concurrency) because they are shared and require locking for proper access. 2. Automatic memory, in which function arguments and local variables are allocated. Each entry into a function or a block gets its own copy. This kind of memory is automatically created and destroyed; hence the name automatic memory. Automatic memory is also said “to be on the stack.’’ 3. Free store, from which memory for objects is explicitly requested by the program and where a program can free memory again once it is done with it (using the new and delete operators). When a program needs more free store, new requests it from the operating system. Typically, the free store (also called dynamic memory or the heap) grows throughout the lifetime of a program. As far as the programmer is concerned, automatic and static storage are used in simple, obvious, and implicit ways. The interesting question is how to manage
107 the free store. Allocation (using new) is simple, but unless we have a consistent policy for giving memory back to the free store manager, memory will fill up— especially for long-running programs. The simplest strategy is to use automatic objects to manage corresponding objects in free store. Consequently, many containers are implemented as handles to elements stored in the free store. When this simple, regular, and efficient approach isn’t sufficient, the programmer might use a memory manager that finds unreferenced objects and reclaims their memory in which to store new objects. This is usually called automatic garbage collection, or simply garbage collection. Naturally, such a memory manager is called a garbage collector. Good commercial and free garbage collectors are available for C, but a garbage collector is not a standard part of a typical C implementation. Many problems encountered during C program development are caused by incorrect memory allocation or deallocation: memory is not allocated, not freed, not initialized, boundaries are overwritten, etc. C does not “magically” solve these problems, but it does provide a number of handy tools. Memory leaks are perhaps the most famous dynamic memory problems, though they are by no means the only ones. A memory leak occurs when a program (either in application code or in library use) allocates dynamic memory, but does not free it when the memory is no longer useful. Memory leaks are especially problematic with X Window programs since they are often run for long periods of time (days or longer) and can execute the same event handling functions over and over. If there is a memory leak in X event handling code, an X application will continually grow in size. This will lead to decreased performance and possibly a system crash. For example, each string must always be initialized before using it, since all strings must end with a null character (ASCII 0). The compiler interprets strings this way—it continues to process characters (in a cout or cin or string operation) until it hits a character with value 0—the null character. If this null character does not exist, then the computer will process the memory in the string and flow over, until it happens to hit a null character by chance or runs out of memory. This is also considered a memory leak and will probably crash the computer. Here are some available tools to help debug the memory leak problem. • Boehm-Weiser Conservative Garbage Collector: http://www.hpl.hp.com/personal/Hans_Boehm/gc/
C and C
108 • Centerline TestCenter: http://www.centerline. com/productline/test_center/test_center.html • Debug Malloc Library, by Gray Watson: http://www.dmalloc.com • MemCheck, by StratosWare: http://www. stratosware.com/products/MemCheck32/index.htm • Memdebug, by Rene Schmit: http://www.bss.lu/ Memdebug/Memdebug.html • ParaSoft Insure: http://www.parasoft.com/ products/insure/index.htm • Purify, by Rational Software, Inc.: http://www. rational.com/products/purify_nt/index.jsp
B. Type Checking Traditionally, a C or a C program consists of a number of source files that are individually compiled into object files. These object files are then linked together to produce the executable form of the program. Each separately compiled program fragment must contain enough information to allow it to be linked together with other program fragments. Most language rules are checked by the compiler as it compiles an individual source file (translation unit). The linker checks to ensure that names are used consistently in different compilation units and that every name used actually refers to something that has been properly defined. The typical C run-time environment performs few checks on the executing code. A programmer who wants run-time checking must provide the tests as part of the source code. C interpreters and dynamic linkers modify this picture only slightly by postponing some checks until the first use of a code fragment. • Compile-Time Type Checking. As with most other bug areas, the best debugging techniques are those that catch bugs at compile time rather than at run time. The compiler touches all of the code, so it can find errors that may only rarely occur at run time. At least occasionally, a programmer should set his compiler’s warning output level to the most verbose setting and then track down and fix all the problems that it reports. Even if a report is not a critical problem, it may be worth fixing for portability reasons or to make real problems easier to find in the output. Compile-time error messages that are especially important with respect to pointer problems are those generated by function prototypes. Using incorrect pointer types in functions is a common and serious application programming problem. A programmer should enable this compiler feature
all the time and immediately correct any problems it detects. Some problems almost always lead to program bugs, especially when the pointers are to data types of different sizes. Sometimes these problems are not immediately apparent. For example, the data types may be the same size on a particular machine, but the problems show up when a programmer tries to port the program to machines with other data type sizes. • Run-time Type Checking. If a programmer cannot catch a bug at compile time, the next best thing for the system is to automatically halt his program with a core dump when the bug occurs. While a programmer never wants his end users to experience core dumps, they identify the program state at the time of the crash and can help him identify and debug many types of bugs. The assert() macro, available with most C and C implementations, is a simple way to force a program to exit with an error message when unexpected results occur. Using assert() is an easy but powerful way to find pointer and memory problems. A good programming technique is to initialize pointers to NULL and to reset them to NULL whenever the objects they point to are freed. If programmers do this, they can easily check for initialized pointers before using them.
C. Operator Overloading Operator overloading allows C operators to have user-defined meanings for user-defined types (classes). Overloaded operators are syntactic sugar for function calls. Overloading standard operators for a class can exploit the intuition of the users of that class. This lets users program in the language of the problem domain rather than in the language of the machine. The ultimate goal is to reduce both the learning curve and the defect (bug) rate. Some people don’t like the keyword operator or the somewhat funny syntax that goes with it in the body of the class itself. But the operator overloading syntax isn’t supposed to make life easier for the developer of a class. It’s supposed to make life easier for the users of the class. Remember, in a reuse-oriented world, there will usually be many people who use class R, but there is only one person who builds it (Rself); therefore he should do things that favor the many rather than the few. Most operators can be overloaded. The only C operators that can’t be are . and ?: (and sizeof, which is technically an operator). C adds a few of its own operators, most of which can be overloaded except :: and .*.
C and C When compiling an expression of the form , the compiler does the following: 1. If is of a built-in type (int, char*, etc.), the standard (built-in) operator is used 2. If is of a user-defined class type, the compiler checks if there is a suitable user-defined operator function defined (that is, one whose parameter is of the same type as , or of a type convertable to the type of ). If so, that function is used 3. Otherwise, a compiler error is flagged Note that there are special problems if there are more than one “suitable” operator functions available. Such problems are resolved using the normal function overloading resolution rules
D. Templates The C language supports a mechanism that allows programmers to define completely general functions or classes, based on hypothetical arguments or other entities. These general functions or classes become concrete code once their definitions are applied to real entities. The general definitions of functions or classes are called templates; the concrete implementations, instantiations. Templates automate the implementation of abstract algorithms that can operate on multiple data types. Considering Stack as an example, a different data class can be given for each instantiation of the Stack class without manually changing the Stack class definition. Instead, the compiler generates code specific for each data type listed as a specific class. For example, if there is a method that must be defined differently to print each different data type stored in the Stack class, this is done automatically by the template facility. Templates are valuable in promoting code reuse since different stack code can be generated for different data types from a single copy of the algorithm. Templates are satisfactory for constructing distinctly different stacks. But suppose a single stack that held data of different data type was required. Because C templates generate code specific for each data type statically (at compile time), the data type of the items stored in the stack must remain static during program execution. Different code is generated for different stacks, each able to hold only one data type. The Standard Template Library (STL) represents a breakthrough in C programming methodology. Comprising a set of C generic data structures and algorithms, the STL provides reusable, interchange-
109 able components adaptable to many different uses without sacrificing efficiency. Adopted by the ANSI/ISO C Standards Committee, the STL is an important addition to every C programmer’s portfolio of skills. The STL is a general purpose library consisting of containers, generic algorithms, iterators, function objects, allocators, and adaptors. The data structures that are used in the algorithms are abstract in the sense that the algorithms can be used on (practically) any data type. The algorithms can work on these abstract data types due to the fact that they are template based algorithms.
E. Exceptions Improved error recovery is one of the most powerful ways a programmer can increase the robustness of his code. If a programmer can make several function calls with only one catch, he greatly reduces the amount of error-handling code he must write. Unfortunately, it is almost accepted practice to ignore error conditions, as if programmers are in a state of denial about errors. Some of the reason is, no doubt, the tediousness and code bloat of checking for many errors. For example, printf() returns the number of characters that were successfully printed, but virtually no one checks this value. The proliferation of code alone would be disgusting, not to mention the difficulty it would add in reading the code. The problem with C’s approach to error handling could be thought of as one of coupling—the user of a function must tie the error-handling code so closely to that function that it becomes too ungainly and awkward to use. One of the major features in C is exception handling, which is a better way of thinking about and handling errors. With exception-handling, 1. Error-handling code is not nearly so tedious to write, and it doesn’t become mixed up with the “normal” code. A programmer can write the code he wants to happen; later in a separate section he writes the code to cope with the problems. If he makes multiple calls to a function, he handles the errors from that function once, in one place. 2. Errors will not be ignored. If a function needs to send an error message to the caller of that function, it “throws” an object representing that error out of the function. If the caller doesn’t “catch” the error and handle it, it goes to the next enclosing scope, and so on until some code block catches the error.
110 There are two basic models in exception-handling theory. In termination (which is what C supports) programmers assume the error is so critical there is no way to get back to where the exception occurred. Whoever threw the exception decided there was no way to salvage the situation, and they don’t want to come back. The alternative is called resumption. It means the exception handler is expected to do something to rectify the situation, and then the faulting function is retried, presuming success the second time. If a programmer wants resumption, he still hopes to continue execution after the exception is handled, so his exception is more like a function call—which is how he should set up situations in C where he wants resumption-like behavior (that is, don’t throw an exception; call a function that fixes the problem). Alternatively, the programmer can place his try block inside a while loop that keeps reentering the try block until the result is satisfactory. Historically, programmers using operating systems that supported resumptive exception handling eventually ended up using termination-like code and skipping resumption. So although resumption sounds attractive at first, it seems it isn’t quite so useful in practice. One reason may be the distance that can occur between the exception and its handler; it’s one thing to terminate to a handler that’s far away, but to jump to that handler and then back again may be too conceptually difficult for large systems where the exception can be generated from many points.
F. Use of C C is used by hundreds of thousands of programmers in essentially every application domain. This use is supported by about a dozen independent implementations, hundreds of libraries, hundreds of textbooks, several technical journals, many conferences, and innumerable consultants. Training and education at a variety of levels are widely available. Early applications tended to have a strong systems programming flavor. For example, several major operating systems have been written in C and many more have key parts done in C. C was designed so that every language feature is usable in code under severe time and space constraints. This allows C to be used for device drivers and other software that rely on direct manipulation of hardware under real-time constraints. In such code, predictability of performance is at least as important as raw speed. Often, so is compactness of the resulting system. Most applications have sections of code that are critical for acceptable per-
C and C formance. However, the largest amount of code is not in such sections. For most code, maintainability, ease of extension, and ease of testing is key. C’s support for these concerns has led to its widespread use where reliability is a must and in areas where requirements change significantly over time. Examples are banking, trading, insurance, telecommunications, and military applications. For years, the central control of the United States long-distance telephone system has relied on C and every 800 call (that is, a call paid for by the called party) has been routed by a C program. Many such applications are large and long-lived. As a result, stability, compatibility, and scalability have been constant concerns in the development of C. Million-line C programs are not uncommon. Like C, C wasn’t specifically designed with numerical computation in mind. However, much numerical, scientific, and engineering computation is done in C. A major reason for this is that traditional numerical work must often be combined with graphics and with computations relying on data structures that don’t fit into the traditional FORTRAN mold. Graphics and user interfaces are areas in which C is heavily used. All of this points to what may be C’s greatest strength—its ability to be used effectively for applications that require work in a variety of application areas. It is quite common to find an application that involves local and wide-area networking, numerics, graphics, user interaction, and database access. Traditionally, such application areas have been considered distinct, and they have most often been served by distinct technical communities using a variety of programming languages. However, C has been widely used in all of those areas. Furthermore, it is able to coexist with code fragments and programs written in other languages. C is widely used for teaching and research. This has surprised some who, correctly, point out that C isn’t the smallest or cleanest language ever designed. However, C is clean enough for successful teaching of basic concepts, • Realistic, efficient, and flexible enough for demanding projects • Available enough for organizations and collaborations relying on diverse development and execution environments • Comprehensive enough to be a vehicle for teaching advanced concepts and techniques • Commercial enough to be a vehicle for putting what is learned into nonacademic use
C and C There are many C compilers. Following is a list of popular C compilers. • • • • •
GNU C compiler Microsoft Visual C Borland C PARCompiler C and C The CC Programming Language—a parallel programming language based on C • Watcom C/C Compiler • pC/Sage—a portable parallel C for high performance computers
VI. CONCLUSION The designers of C wanted to add object-oriented mechanisms without compromising the efficiency and simplicity that made C so popular. One of the driving principles for the language designers was to hide complexity from the programmer, allowing them to concentrate on the problem at hand. Because C retains C as a subset, it gains many of the attractive features of the C language, such as efficiency, closeness to the machine, and a variety of built-in types. A number of new features were added to C to make the language even more robust, many of which are not used by novice programmers. Most of these features can be summarized by two important design goals: strong compiler type checking and a user-extensible language. By enforcing stricter type-checking, the C compiler makes programmers acutely aware of data types in their expressions. Stronger type checking is provided through several mechanisms, including: function argument type checking, conversions, and a few other features. C also enables programmers to incorporate new types into the language through the use of classes. A class is a user-defined type. The compiler can treat new types as if they are one of the built-in types. This is a very powerful feature. In addition, the class provides the mechanism for data abstraction and encapsulation, which are a key to OOP. C is a very useful and popular programming language. However, there are still some critiques. For example, the struct type constructor in C is a redundant feature when the class concept is introduced, and the worse, C sets up different accessibility rules for structures and classes. Struct is only in C as a com-
111 patibility mechanism to C. A struct is the same as a class that has by default public components. The struct keyword could not be eliminated without losing compatibility, but the class keyword is unnecessary. As C is a subset of C, C programmers can immediately use C to write and compile C programs, this does not take advantage of OOP. Many see this as a strength, but it is often stated that the C base is C’s greatest weakness. However, C adds its own layers of complexity, like its handling of multiple inheritance, overloading, and others. Java has shown that in removing C constructs that do not fit with object-oriented concepts, that C can provide an acceptable, albeit not perfect base. One of the stated advantages of C is that programmers can get free and easy access to machine level details. This comes with a downside: if programmers make a great deal of use of low level coding their programs will not be economically portable. Java has removed all of this from C, and one of Java’s great strengths is its portability between systems, even without recompilation.
SEE ALSO THE FOLLOWING ARTICLES COBOL • Fortran • Java • Pascal • Perl • Programming Languages Classification • Simulation Languages • Visual Basic • XML
BIBLIOGRAPHY Barton, J., and Nackman, L. (1994). Scientific and Engineering C. Reading, MA: Addison-Wesley. Brokken, F. (1994). C Annotations. University of Groningen. Deitel & Deitel. (2001). C How to Program. Englewood Cliffs, NJ: Prentice Hall. Deitel & Deitel. (2001). C How to Program. Englewood Cliffs, NJ: Prentice Hall. Ellis, M., and Stroustrup, B. (1990). The Annotated C Reference Manual. Reading, MA: Addison-Wesley. Joyner, I. (1999). Objects Unencapsulated: Eiffel, Java and C? Englewood Cliffs, NJ: Prentice Hall. Kernighan, B., and Ritchie, D. (1978). The C Programming Language. Englewood Cliffs, NJ: Prentice Hall. Ritchie, D. (1993). The Development of the C Language. Second History of Programming Languages Conference, Cambridge, MA. Stroustrup, B. (1994). The Design and Evolution of C. Reading, MA: Addison-Wesley. Stroustrup, B. (1997). The C Programming Language. Reading, MA: Addison-Wesley.
COBOL Mohammed B. Khan California State University, Long Beach
I. II. III. VI.
THE DEVELOPMENT OF COBOL COBOL’S DISTINGUISHING FEATURES STRUCTURAL ORGANIZATION OF COBOL DATA ORGANIZATION FOR COBOL PROGRAMS
GLOSSARY alphabetic data field A data field used to store alphabetic characters. CODASYL The name of the organization that developed the first version of COBOL in 1960. data field A single item contained within a record. documentation Written statements that explain a program to humans. elementary data field A data field that is not divided into subordinate components. file An organized accumulation of related data. group data field A data field that is divided into subordinate components. machine independent language A computer programming language that can be executed on many different computers with little or no modifications. numeric data field A data field used to store numbers. program maintenance The activity of modifying preexisting programs. record A group of data items pertaining to a single entity. COBOL (COMMON BUSINESS-ORIENTED LANGUAGE)
was once the dominant programming language for business applications. Many distinguishing features characterize this language among which are: structured modularity, efficient input-output capability, English-like sentences, and self-documenting nature. Though the language has undergone several revisions, its popularity has definitely reduced in recent years. Many legacy systems written in COBOL are still in use. These systems are primarily run on mainframe
V. VI. VII. VIII.
EDITING FEATURES OF COBOL COMPUTING IN COBOL COBOL’S INTERFACE WITH SQL FUTURE OF COBOL
computers. However, COBOL compilers have been written for smaller computers, including personal computers. COBOL programs interface gracefully with Structured Query Language (SQL) statements. Graphical user interfaces (GUI) have been implemented in COBOL, making the language more acceptable in today’s visual environment.
I. THE DEVELOPMENT OF COBOL Cobol is one of the oldest high-level programming languages. Its development began in 1959 when a group of computer professionals from various organizations—government, education, business and others—agreed to hold a series of meetings to design a uniform, machine-independent computer programming language for business use. This group met under the name Conference on Data Systems Language or CODASYL. CODASYL developed the first version of COBOL, which was called COBOL-60 because it was released in 1960. Three revisions followed: COBOL-68, COBOL-74, and most recently, COBOL85. Another revision of COBOL (COBOL-9X) is under development at this time. Over the years, the American National Standards Institute (ANSI), an organization that adopts standards in a wide variety of fields, has adopted CODASYL standards for COBOL. Therefore, COBOL-68, COBOL-74, and COBOL-85 are often referred to as ANSI-68, ANSI-74, and ANSI-85, respectively. Each of these standards had new features that made it better and easier to use than the previous version.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
113
114 Although more and more organizations are adopting COBOL-85, many organizations still use COBOL74. Many existing COBOL programs that businesses use are COBOL-74. In fact, there are more COBOL74 programs in the business world than there are COBOL-85 programs.
II. COBOL’S DISTINGUISHING FEATURES The original intent for the development of COBOL was to introduce a computer programming language that would have three major characteristics: • It would be machine independent, meaning that COBOL programs could be executed on many different types of computers with little or no modification. For example, a COBOL program that was written to run on a VAX computer can be executed on an IBM computer with only minor modifications. • It would be easy to maintain. Program maintenance is the process of making modifications to existing programs. In industry, the largest part of a programmer’s time is spent modifying existing programs. Therefore, a program that is easy to maintain can save a company considerable time and money. • It would be self-documenting. Documentation consists of explanations telling users how a computer program works or what the program is doing at a particular point. When a program is self-documenting, the programming language instructions, or code, contain English-like words and phrases. COBOL has additional features that distinguish it from most other computer programming languages. Some of them are
COBOL outputs it in the desired form. Because there is likely to be a large number of customers, the program must input and output the data efficiently. In addition, it must calculate each bill using only addition and subtraction. COBOL easily and efficiently performs these operations. • The data in COBOL programs are organized into records. Each record contains a collection of data. For example, each record used for the department store charge accounts program might contain one customer’s account number, name, address, balance owed, and purchases made during the current month. • COBOL programs can process several types of files—sequential, indexed sequential, and relative. A file consists of a group of records. • A COBOL program can read and write massive amounts of data using only a few statements. These are referred to as input/output (I/0) operations. The COBOL language performs input/output operations efficiently.
III. STRUCTURAL ORGANIZATION OF COBOL The structure of a COBOL program is unique in that it consists of divisions and sections. Within a division and a section, a program contains statements that perform certain specific functions. COBOL programs are typed in an 80-column line format. The leftmost column is column 1, the rightmost column is column 80. Of the 80 columns, only 66 columns (columns 7–72) are actually used for the program itself. A unique feature of COBOL programs is that certain parts of the program must start in certain columns of the 80-column line. A sample COBOL program is presented in Table I.
A. Column 1 through 6: Sequence Area • COBOL programs are uniquely organized. Each program is divided into four standard parts called divisions. These divisions will be discussed in detail later. • COBOL is well suited for commercial data processing. Most of the data processing performed in business requires two major types of tasks: performing simple calculations, and inputting and outputting large quantities of data. For example, consider the tasks that a program performs when billing department store chargeaccount customers. The program reads the needed data, calculates the bill, and then
Columns 1 through 6 are known as the sequence area. These columns are used to provide program statement numbers. A computer program consists of statements that tell the computer the steps necessary to solve a specific problem. These statements must be arranged in a particular sequence. Normally, the computer follows these statements in the order they appear, one after another, unless specifically directed otherwise. For this reason, each statement in a program is given a sequence number. The first statement has the sequence number 1, the second statement has the sequence number 2, etc. Having six columns
COBOL Table I
115 Sample COBOL Program ***********************************************************************************
Column 7: Indicator Area
IDENTIFICATION DIVISION
This Division Identifies the Program and Provides General Information. ***********************************************************************************
PROGRAM-ID.SAMPLE. AUTHOR.M.B.KHAN. DATE-WRITTEN.JANUARY 5,1996. DATE-COMPILED.JANUARY 5,1996. *********************************************************************************** * *REMARKS. THIS PROGRAM CALCULATES GROSS PAY OF EMPLOYEES *BASED ON HOURLY RATES AND NUMBER OF HOURS WORKED. *OVERTIME PAY IS CALCULATED AT THE RATE OF TIME AND A *HALF FOR HOURS WORKED OVER 40. THE PROGRAM PRINTS *EMPLOYEE NAME, HOURLY RATE, HOURS WORKED, AND GROSS *WEEKLY PAY. SPECIFICALLY, THE PROGRAM *(1) READS NAME, HOURLY RATE, AND HOURS WORKED FOR AN * EMPLOYEE *(2) CALCULATES REGULAR OR OVERTIME PAY, AS APPROPRIATE *(3) PRINTS NAME, HOURLY RATE, HOURS WORKED, AND GROSS * PAY FOR THE EMPLOYEE *********************************************************************************** *********************************************************************************** ENVIRONMENT DIVISION *********************************************************************************** This Division Describes the Computing Environment-Specific Computers that Will Be Used-for the Program, and the Interface with Hardware Devices. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM3090. OBJECT-COMPUTER. IBM3090. INPUT-OUTPUT SECTION FILE-CONTROL SELECT EMPLOYEE-FILE-IN ASSIGN TO “PROG31I.DAT”. SELECT EMPLOYEE-FILE-OUT ASSIGN TO “PROG310.DAT”. ***************************************************************************************** DATA DIVISION This Division Describes All Data that Are Needed by the Program ***************************************************************************************** FILE SECTION. ***************************************************************************************** *THE FOLLOWING STATEMENTS DEFINE THE INPUT FILE * ***************************************************************************************** FD EMPLOYEE-FILE-IN LABEL RECORDS STANDARD. 01 EMPLOYEE-RECORD-IN. 05 EMPLOYEE-NAME-IN PIC X(20). 05 EMPLOYEE-RATE-IN PIC 9(3)V99. 05 EMPLOYEE-HOURS-INPIC 9(3)V99. ***************************************************************************************** (continues)
COBOL
116 Table I
Sample COBOL Program (continued)
*THE FOLLOWING STATEMENTS DEFINE THE OUTPUT FILE * ***************************************************************************************** FD EMPLOYEE-FILE-OUT LABEL RECORDS OMITTED. EMPLOYEE-RECORD-OUT. 05 EMPLOYEE-NAME-OUT PIC X(20). 05 PIC X(5). 05 EMPLOYEE-RATE-OUT PIC 9(3).99. 05 PIC X(5). 05 EMPLOYEE-HOURS-OUT PIC 9(3).99. 05 PIC X(5). 05 EMPLOYEE-PAY-OUT PIC 9(5).99. WORKING-STORAGE SECTION. 01 W1-CONTROL-FIELDS. 05 W1-END-OF-FILE PIC X VALUE “N”. 01 W2-WORKING-FIELDS. 05 W2-REGULAR-HOURSPIC 9(2) VALUE 40. 05 W2-REGULAR-PAY PIC 9(5)V99 VALUE 0. 05 W2-OVERTIME-FACTOR PIC 9V9 VALUE 1.5. 05 W2-OVERTIME-HOURS PIC 9(3)V99 VALUE 0. 05 W2-OVERTIME-RATE PIC 9(3)V99 VALUE 0. 05 W2-OVERTIME-PAY PIC 9(5)V99 VALUE 0. 05 W2-TOTAL-PAY PIC 9(5)V99 VALUE 0. *********************************************************************************************** * PROCEDURE DIVISION. This Division Describes the Step-By-Step Instructions for Solving the problem *********************************************************************************************** *********************************************************************************************** * THE FOLLOWING IS THE MAIN PARAGRAPH. FROM THIS PART, THE * * PROGRAM TRANSFERS TO OTHER PARTS FOR SPECIFIC FUNCTIONS * *********************************************************************************************** AOOO-CONTROL-LOGIC. PERFORM B100-INITIALIZE-PROCESSING. PERFORM B200-PROCESS-DATA UNTIL W1-END-OF-FILE EQUAL TO “Y”. PERFORM B300-TERMINATE-PROCESSING. STOP RUN. *********************************************************************************************** * THE FOLLOWING PARAGRAPH OPENS THE INPUT AND THE * * OUTPUT FILE AND READS THE FIRST INPUT RECORD * *********************************************************************************************** B100-INITIALIZE-PROCESSING OPEN INPUT EMPLOYEE-FILE-IN OUTPUT EMPLOYEE-FILE-OUT. PERFORM X100-READ-DATA. B100-EXIT. EXIT. *********************************************************************************************** * THE MAIN PROCESSING IS CONTROLLED IN THE NEXT PARAGRAPH * * THE PROGRAM IS TRANSFERRED FROM HERE TO OTHER PARTS FOR * * APPROPRIATE PROCESSING * *********************************************************************************************** B200-PROCESS-DATA. PERFORM C100-CALC-REGULAR-PAY. IF EMPLOYEE-HOURS-IN IS GREATER THAN W2-REGULAR-HOURS PERFORM C200-CALC-OVERTIME-PAY (continues)
COBOL Table I
117 Sample COBOL Program (continued) ELSE
MOVE W2-REGULAR-PAY TO W2-TOTAL-PAY END-IF PERFORM X200-WRITE-DATA. PERFORM X100-REAO-DATA. B200-EXIT. EXIT. *********************************************************************************************** * INPUT AND OUTPUT FILES ARE CLOSED IN THE NEXT PARAGRAPH * *********************************************************************************************** B300-TERMINATE-PROCESSING. CLOSE EMPLOYEE-FILE-IN EMPLOYEE-FILE-OUT. B300-EXIT. EXIT. *********************************************************************************************** * THE FOLLOWING PARAGRAPH CALCULATES REGULAR PAY * *********************************************************************************************** C100-CALC-REGULAR-PAY. IF EMPLOYEE-HOURS-IN IS GREATER THAN W2-REGULAR-HOURS MULTIPLY EMPLOYEE-RATE-IN BY W2-REGULAR-HOURS GIVING W2-REGULAR-PAY ELSE MULTIPLY EMPLOYEE-RATE-IN BY EMPLOYEE-HOURS-IN GIVING W2-REGULAR-PAY END-IF C100-EXIT. EXIT. *********************************************************************************************** * THE FOLLOWING PARAGRAPH CALCULATES OVERTIME PAY * *********************************************************************************************** C200-CALC-OVERTIME-PAY. SUBTRACT W2-REGULAR-HOURS FROM EMPLOYEE-HOURS-IN GIVING W2-OVERTIME-HOURS. MULTIPLY W2-OVERTIME-FACTOR BY EMPLOYEE-RATE-IN GIVING W2-OVERTIME-RATE. MULTIPLY W2-OVERTIME-HOURS BY W2-OVERTIME-RATE GIVING W2-OVERTIME-PAY. ADD W2-OVERTIME-PAY W2-REGULAR-PAY GIVING W2-TOTAL-PAY. C200-EXIT. EXIT. *********************************************************************************************** * THE INPUT FILE IS READ IN THE FOLLOWING PARAGRAPH * *********************************************************************************************** X100-READ-DATA. READ EMPLOYEE-FILE-IN AT END MOVE “Y” TO W1-END-OF-FILE END READ X100-EXIT. EXIT. *********************************************************************************************** * THE NEXT PARAGRAPH MOVES DATA VALUES TO OUTPUT DATA * * FIELDS AND WRITES THE OUTPUT DATA * *********************************************************************************************** (continues)
COBOL
118 Table I
Sample COBOL Program (continued) X200-WRITE-DATA. MOVE SPACES TO EMPLOYEE-RECORD-OUT. MOVE EMPLOYEE-NAME-IN TO EMPLOYEE-NAME-OUT. MOVE EMPLOYEE-RATE-IN TO EMPLOYEE-RATE-OUT. MOVE EMPLOYEE-HOURS-IN TO EMPLOYEE-HOURS-OUT. MOVE W2-TOTAL-PAY TO EMPLOYEE-PAY-OUT. WRITE EMPLOYEE-RECORD-OUT. X200-EXIT. EXIT.
means that one could have hundreds of thousands of statements in a program if necessary. The sequence numbers in columns 1 through 6 are a holdover from the days when programs were punched on cards. If the cards were dropped, they could easily be reordered by using these numbers. Today, programmers type their programs directly into the computer and sequence numbers are generated automatically by compilers. Therefore, sequence numbers have little use, and the programmer generally leaves these columns blank. A few sequence numbers have been provided in Table I to show where they would appear.
D. Columns 73 through 80: Identification Area Columns 73 through 80 are designated as the identification area. Program identification codes were useful when programs were punched on cards in preventing the cards of different programs from getting mixed together. The use of this area is optional, and the COBOL compiler ignores whatever is typed in it. For this reason, care should be taken not to type any significant part of a program statement in this area. The columns have not been filled in the sample program. It is important to note that every line of the sample program ends with a period; this is considered an important part of COBOL programs.
B. Column 7: Indicator Area Column 7 is called the indicator area. This column is used for special purposes. A specific entry in this column has a special meaning. For example, an asterisk (*) in this column indicates a comment statement. A slash (/) makes the computer go to the top of a new page before printing that line. A hyphen (-) indicates that the line is a continuation of the preceding line.
C. Column 8 through 72: Area A and Area B Columns 8 through 72 are known as program areas. Columns 8 through 11 are known as Area A, and columns 12 through 72 are known as Area B. These two areas represent the columns in which the COBOL program appears. The first column of Area A (column 8) is called the “A margin” and the first column of Area B (column 12) is called the “B margin.” Some parts of a COBOL program must start in Area A, while others must start in Area B.
E. Hierarchical Organization and Divisions in COBOL The structural organization of a COBOL program is perhaps its most striking feature. The program follows a hierarchical organization. The basic structural unit of a program is the division. As previously mentioned, every COBOL program is divided into four divisions: the IDENTIFICATION DIVISION, the ENVIRONMENT DIVISION, the DATA DIVISION, and the PROCEDURE DIVISION. These divisions must be present in a specific sequence. The IDENTIFICATION DIVISION is the first division in a COBOL program. Its purpose is to identify the program and its author and to provide general information about the program, such as the dates the program is written and compiled, any program security, etc. A narrative description of the program is usually included in this division. The ENVIRONMENT DIVISION is the next division that appears in a COBOL program. The ENVIRONMENT DIVISION describes the “computing environ-
COBOL ment” of the program. The “computing environment” refers to the type of computer hardware on which the program is written and run. This division also briefly describes the data files required by the program. The DATA DIVISION is the third division of a program. All data that are part of input and output files as well as other data are described in this division. Consequently, the DATA DIVISION is one of the most complicated and lengthy parts of a COBOL program. This division bears some relationship to the ENVIRONMENT DIVISION in that the files specified earlier are further described in the DATA DIVISION. Data are defined in COBOL in a specific way. The PROCEDURE DIVISION contains the step-bystep instructions that are necessary to convert the input data to the desired output data. The instructions that perform a specific function are grouped together under a paragraph. Most COBOL statements start with regular English verbs. The divisions are divided into sections, which are again divided into paragraphs. Each paragraph is composed of several entries/clauses or sentences/statements. The IDENTIFICATION DIVISION is divided into several paragraphs. There are four paragraphs in this division in Table I. However, only the PROGRAM-ID paragraph is required. The ENVIRONMENT DIVISION is divided into sections. There are two sections in the sample program of Table I. These sections are the CONFIGURATION SECTION and the INPUT-OUTPUT SECTION, which are again divided into paragraphs. Paragraphs have either entries or clauses associated with them. Both entries and clauses perform the same function; they describe the paragraphs with which they are associated. Neither the division name nor the name of the first section is required. The DATA DIVISION is divided into two sections. These sections have several entries that are used to describe the data used by the program. The first section is the FILE SECTION, the other is the WORKING-STORAGE SECTION. The PROCEDURE DIVISION can be divided into sections or paragraphs. Each section or paragraph consists of one or more sentences or statements. A sentence is a collection of one or more statements. A statement is a valid combination of words and characters according to the rules of COBOL syntax. Most COBOL statements start with regular English verbs. There are eight paragraphs in the PROCEDURE DIVISION in Table I.
119
IV. DATA ORGANIZATION FOR COBOL PROGRAMS The input and the output data for a COBOL program are organized in the form of a file. A file is an organized accumulation of related data. Input data are contained in an input file; output data are contained in an output file. Any number of input and output files can be used in a COBOL program. A file is divided into records, and records are, in turn, subdivided into data fields.
A. Files, Records, and Data Fields A computer does not read all the data from a file at one time. Data are read in segments. A file is divided into many segments of data; each segment is called a record. Each record pertains to a single entity such as an individual person or item. For example, a school may have a computer file that has the data pertaining to all students. In this file, each record contains all the data of a single student. If there are 5000 students in the school, this file will have 5000 records. When instructed to read data from an input file, the computer reads one record at a time. Similarly, when instructed to write data to a file, the computer writes one record at a time. The record of an input file is called an input record; the record of an output file is called an output record. Just as a file is divided into records, a record is divided into smaller segments, called data fields, or simply fields. An input or output record may be divided into any number of data fields; the number of data fields depends on the specific problem being solved by the computer. In COBOL, data fields can be of three different types: numeric, alphabetic, and alphanumeric. The data fields that represent only the digits 0 through 9 are called numeric. Examples of data that might appear in numeric data fields are age, social security number (not including the hyphens), price, sales, quantity, and number of employees. The data fields that represent alphabetic characters and spaces only are called alphabetic. Examples are names of individuals, and names of cities, states, or countries. Alphanumeric data fields can represent both numeric and nonnumeric data. Alphanumeric data include letters of the alphabet and special characters such as hyphens, parentheses, blanks, etc. Examples are telephone numbers (including the parentheses and hyphens), social security numbers (including the
COBOL
120 hyphens), dates of birth (including the hyphens), and addresses. Although most names contain letters of the alphabet only, there are names that consist of alphabets and special characters such as the apostrophe in the name LINDA O’DONNELL. For this reason, most COBOL programs store names as alphanumeric data. A data field that is subdivided into subordinate components is called a group data field; a data field not further divided into components is called an elementary data field. For example, the data field “date of birth” can be subdivided into three subordinate components: “day of birth,” “month of birth,” and “year of birth.” In this case, date of birth is a group data field; its components are elementary data fields. In a similar manner, the data field “mailing address” is a group data field. Its components, “residence number,” “street name,” “city name,” “state name,” and “zip code,” are elementary data fields. A group data field may consist of a combination of numeric and nonnumeric elementary data fields as in the case of the “mailing address” data field. It is important to be able to distinguish between a data field name and the data value it contains. A data field may be thought of as the name of a container, and the data value as the actual contents of the container. COBOL offers considerable latitude in the naming of files, records, and data fields. However, certain rules must be observed. Rules for Choosing Names of Files, Records, and Data Fields
1. A name must be contructed from the letters of the alphabet, the digits 0 through 9, and the hyphen. No other characters are allowed. 2. A name must not be longer than 30 characters. 3. A name must not begin or end with a hyphen. 4. A name must contain at least one letter of the alphabet. Digits only and digits in combination with hyphens are not allowed. 5. A name must not be a reserved COBOL word. Reserved COBOL words have preassigned meanings.
The names of files, records, and data fields should be selected carefully so that these names are selfexplanatory. Usually, programmers select names that consist of English words joined by hyphens. Thus, a file for students may be named STUDENT-FILE; a file for the master inventory of a company may be named INVENTORY-MASTERFILE. A record in the STUDENT-FILE may be named STUDENT-RECORD; the data fields of this record may be named STUDENT-NAME, STUDENTADDRESS, STUDENT-
PHONE-NUMBER, etc. Based on the five rules for naming files, records, and data fields, the following are all valid names: INPUT-EMPLOYEE-FILE EMPLOYEE-RECORD EMPLOYEE-ADDRESS-1 CUSTOMER-FILE-IN CUSTOMER-RECORD-I CUSTOMER-BALANCE FILE- 123 However, the following names are invalid: EMPLOYEE FILE ACCOUNTS-RECEIVABLESMASTER-FILE INVENTORY:RECORD 123-FILE
B. Structural Description of a Record In COBOL, records and data fields are described in a special way. A clear understanding of the structural description of a record is essential for writing COBOL programs.
1. Levels Records and data fields are described in COBOL using levels. Levels are designated by numbers 01 through 49. All records must have level number 01 and their data fields must have level numbers 02 through 49. A common practice is to use level numbers in increments of five—5, 10, 15, and so on—instead of increments of one. Though both practices are allowed in COBOL, levels are usually numbered 05, 10, 15, and so on. The 05 level number indicates the highest level data field. This data field may be subdivided into smaller data fields whose level numbers are 10. These data fields with level number 10 may be further divided into smaller data fields whose levels are 15, and so on. In this way, level numbers 01 through 49 can be used in a COBOL program for describing records and data fields. An example will make this clearer. Suppose STUDENT-RECORD is the name of a record. This record consists of five data fields: STUDENT-NAME, STUDENT-ADDRESS, STUDENT-DOB, STUDENTCLASS (for student classification), and STUDENTMAJOR. This record and its five data fields can be described using the structure given below. The level number 01 for records may start anywhere in Area A but usually starts in column 8. Simi-
COBOL larly, the level number for data fields may start anywhere in Area B, though it usually starts in column 12 and every fourth column thereafter. The record name starts in column 12 (B margin); the data field name starts in column 16. It is customary to leave two blank spaces between the level number and the name of the record or data field. The data field STUDENT-DOB can be divided into three smaller data fields: STUDENT-BIRTH-DAY, STUDENT-BIRTH-MONTH, and STUDENT-BIRTHYEAR. If this is done, the structure of the record and the data fields will appear as shown below.
121 at least one blank space after the data field name) with the word PIC followed by at least one blank space and the appropriate picture character (X or 9 or A). If a data field requires more than one picture character, the number of characters to which the picture character applies is enclosed in parentheses following the picture character. The PIC clause of a student record can be written as follows: 01 STUDENT-RECORD. 05 STUDENT-NAME PIC X(20). 05 STUDENT-ADDRESS PIC X(30). 05 STUDENT-DOB. 10 STUDENT-BIRTH-DAY PIC 99. 10 STUDENT-BIRTH-MONTH PIC 99. 10 STUDENT-BIRTH-YEAR PIC 9999.
01 STUDENT-RECORD 05 STUDENT-NAME 05 STUDENT-ADDRESS 05 STUDENT-DOB 10 STUDENT-BIRTH-DAY 10 STUDENT-BIRTHMONTH 10 STUDENTBIRTH-YEAR 05 STUDENT-CLASS 05 STUDENT-MAJOR
Four-character numeric field
The only restriction is that each subordinate data field must have a higher level number than the level number of the data field to which it is subordinate. As each data field subdivides, each new level of data fields is given a higher number. Level 77 and level 88 are two other levels of data fields used in COBOL programs, though level 88 is much more common than level 77.
05 STUDENT-NAME PIC XXXXXXXXXXXXXXXXXXXX.
2. PICTURE Clause The description of a record and its data fields is not complete without understanding and using the PIC (short for PICTURE) clause. Each elementary data field must be defined by a PIC clause that provides information as to its type (numeric, alphabetic, or alphanumeric) and its size. A group data field’s length attribute is defined by the PIC clauses of its subordinate data fields, but it is always considered to be alphanumeric. A specific picture character designates a specific type of data. The picture character 9 designates numeric data; the picture character A designates alphabetic data; and the picture character X designates alphanumeric or nonnumeric data. The picture character A is rarely used in COBOL programs; in practice, the picture character 9 is used for numeric data and the picture character X is used for non-numeric data. The PIC clause is written at the end of the data field (leaving
05 STUDENT-CLASS PIC X(10). 05 STUDENT-MAJOR PIC X(15). According to these PIC clauses, the data field STUDENT-NAME can contain up to 20 characters of non-numeric data indicated by the number 20 in parentheses following the X. This field could also be written using 20 Xs:
However, the first method requires fewer keystrokes and offers fewer opportunities for error. Similarly, STUDENT-ADDRESS may contain up to 30 characters of non-nummic data (each blank in the address counts as one space). The other data fields contain data of the following types and sizes (Table II): Each PIC character occupies one byte of computer memory. For example, STUDENT-NAME occupies 20 bytes of computer memory, and STUDENT-ADDRESS occupies 30 bytes. The entire STUDENT-RECORD occupies 83 bytes of computer memory. A record name can have a PIC clause provided it is not divided into component data fields. If the record is subdivided, then each component data field must carry a PIC clause. PIC clauses can only be used with elementary data fields. The following illustrations make this clear: 01 STUDENT-RECORD PIC X(83). 01 STUDENT-RECORD PIC X(83). 05 STUDENT-NAME PIC X(20). 05 STUDENT-ADDRESS PIC X(30). 05 STUDENT-DOB.
COBOL
122 Table II
Data Fields, Types and Sizes Data field
Type
Maximum characters/digits
STUDENT-BIRTH-DAY
Numerica
2
STUDENT-BIRTH-MONTH
Numerica
2
a
STUDENT-BIRTH-YEAR
Numeric
STUDENT-CLASS
Nonnumericb
10
STUDENT-MAJOR
Nonnumericb
15
4
a Assuming these data are to be described as numbers such as 16 (for STUDENT-BIRTH-DAY), 02, (for STUDENT-BIRTH-MONTH) and 1976 (for STUDENT-BIRTH-YEAR). b Assuming these data are to be described with words such as “freshman” (for STUDENT-CLASS) and “management” (for STUDENT-MAJOR).
10 STUDENT-BIRTH-DAY PIC 99. 10 STUDENT-BIRTH-MONTH PIC 99. 10 STUDENT-BIRTH-YEAR PIC 9(4). 05 STUDENT-CLASS PIC X(10). 05 STUDENT-MAJOR PIC X(15). In the first example STUDENT-RECORD has a PIC clause because it is not divided into subordinate data fields. This record is defined correctly. However, in the second example, the same record is divided into subordinate data fields, and thus, it should not have a PIC clause. A PIC clause must be provided for all input and output data fields as well as for all other data fields that may be required in a program. Similarly, data fields that are divided into subordinate data fields must not have the PIC clause. The following examples are given to illustrate this point. 01 STUDENT-RECORD. 05 STUDENT-NAME PIC X(20). 05 STUDENT-ADDRESS. 10 STREET-ADDRESS. 15 RESIDENCE-NUMBER PIC X(5). 15 STREET-NAME PIC X(10). 10 CITY-NAME PIC X(8). 10 STATE-NAME PIC X(2). 10 ZIP-CODE PIC 9(5). These data fields do not have PIC clauses because they have subordinate data fields. 01 INVENTORY-RECORD. 05 PART-NO PIC X(5). 05 PART-MANUFACTURER. 10 COUNTRY-CODE PIC X(3). 10 MANUFACTURER-CODE PIC X(5). 05 UNIT-PRICE PIC 9(5)V99. 05 QUANTITY-IN-STOCK PIC 9(5).
This data field does not have the PIC clause because it has subordinate data fields. It is important to know the characters that are allowed in each type of PIC clause. The following table lists these valid characters. Spaces are valid characters in COBOL; they occupy computer memory just like other characters. In this respect, spaces are treated by COBOL differently than we would view them in the English language. Valid Characters for each type of picture Clause are shown in Table III. A numeric data field normally contains only digits 0 through 9. If the numeric field is input data, it cannot contain anything else, not even a decimal point. In some systems, leading spaces are allowed. Thus, a data value of 10.50 cannot be directly stored Rules for Choosing PIC Clauses: 1. A group data field must not have a PIC clause. 2. The word PIC can start in any column after the data field. However, for easy readability, the word PIC for all data fields should start in the same column. 3. There must be at least one blank space between the word PIC and the picture characters. 4. If a data field requires more than one picture character, the number of times the picture character appears can be enclosed in parentheses. Thus, PIC 9999 is the same as PIC 9(4). 5. Each group data field must end with a period. Each elementary data field must also end with a period after the picture characters. 6. It is important to understand the distinction between a numeric data value that can be contained in a data field with 9s for picture characters and in a data field with Ks as picture characters. Only a data field that specifies “9” for picture characters can contain numeric data that is to be used in a calculation.
COBOL
123
Table III Clause
Valid Characters for Each Type of PICTURE
Data field type
Valid characters
Alphabetic
A through Z, a through z, spaces
Numeric
0 through 9
Alphanumeric
A though Z, a through z, 0 through 9, all special characters, spaces
V. EDITING FEATURES OF COBOL COBOL provides several features by which numeric output data are edited to make them more understandable and readable to humans. Among these features are leading zero suppression, comma insertion, dollar sign insertion, sign ( or ) insertion, character insertion, and check protection.
A. Leading Zero Suppression in an input numeric data field. How does one represent data values with decimal points in input data fields? The letter V is used to represent an implied decimal point in the PIC clause. For example, the PIC clause PIC 99V99 represents a four-digit numeric data value with two digits to the left of the decimal point and two digits to the right of the decimal point. If the data value 1050 is stored in a data field having this PIC clause, the contents of the data field would represent the value 10.50. If the same data value is stored in a data field with the PIC clause PIC 999V9, it would represent 105.0. The following examples illustrate how the letter V works in PIC clauses (see Table IV). Though numeric data values with decimal points are stored in data fields without their decimal points, the computer keeps track of the decimal points during arithmetic calculations. The presence of implied decimal points does not require any computer memory space. Thus, a data field defined with the PIC 9(3)V99 requires five, not six, bytes of computer memory. Review the following data fields. PIC99V9 105
PIC999V99 23450
PIC9V99 105
Through the use of a special PIC clause, all leading zeros in output data can be suppressed. This PIC clause is represented by the letter Z (instead of 9). Table V explains the difference between PIC 9 and PIC Z clauses.
B. Comma/Dollar Sign Insertion In output data, comma and dollar sign ($) can be inserted so that 1000 looks like $1,000. This is accomplished by placing $ and comma in the PIC clause of an output data field. This is illustrated in Table VI.
C. Check Protection COBOL supports the appearance of asterisks (*) in numeric output data—a feature that is common in checks. This feature prevents tampering with dollar amounts. A special PIC clause (PIC *) accomplishes this feature. This PIC clause can be used in combination with dollar sign and comma. Review Table VII.
PICV99 10
VI. COMPUTING IN COBOL The Computer Assumes Decimals in these Positions
Table IV
The Letter V in PIC Clauses
Data value stored
Data field’s PIC clause
Value contained
10005
PIC 999V99
100.05
9725
PIC 9V999
124
PIC 99V9
567
PIC V999
9.725
Calculations are performed in COBOL using several statements. Among them are ADD, SUBTRACT, MULTIPLY, DIVIDE, and COMPUTE statements. While the ADD, SUBTRACT, MULTIPLY, and DIVIDE statements perform primarily one arithmetic operation in one statement, a single COMPUTE statement performs several arithmetic operations. Examples of these statements follow in Table VIII.
VII. COBOL’S INTERFACE WITH SQL
12.4 .567
COBOL can execute SQL statements to access relational databases. A COBOL compiler does not recognize SQL
COBOL
124 Table V
Differences in Output Data Using PIC 9 and PIC Z Clauses
Data field
Value in data field
PIC clause
Output PIC clause
Output value b900
QUANTITY-IN-STOCK
PIC 9(4)
0900
PIC Z(4)
DEDUCT-AMOUNT
PIC 9(2)V99
5075
PIC Z(3).99
b50.75
ACCT-BALANCE
PIC 9(3)V99
10000
PIC ZZ9.99
100.00
statements. Therefore, a program that contains SQL statements must first pass through a precompiler, which translates those statements into standard COBOL code which a COBOL compiler can the process. The precompiler also verifies the validity and syntax of the SQL statements. An assortment of coding requirements and options govern coding techniques for SQL in a COBOL program: • Delimiting SQL statements • Declaration of communications area for SQL, known as SQLCA • Declaration of tables (optional) • Declaration and usage of host variables and structures to pass data between the database management system (DBMS) and COBOL • Retrieving multiple rows of data using a cursor
EXEC SQL SELECT LSTMNS, DEPT FROM TEMPL WHERE EMPID = ‘123456’ END-EXEC IF SQL CODE NOT EQUAL TO 0 process error
SQL statements must be contained within EXEC SQL and END-EXE delimiters, code as follows: EXEC SQL SQL statement(s) END-EXEC. These keywords, which must be coded between Columns 12 and 72 (both inclusive), alert the precompiler to the presence of one or more SQL statements. A comment may be coded within the statements by placing an asterisk in column 7. The period on the END-EXEC clause is optional, but a COBOL warning may be used if it is missing. However, the END-EXEC clause should carry no period when SQL statements are embedded within an IF . . . THEN . . .
Table VI
ELSE set of statements to avoid ending the IF statement inadvertently. All programs with embedded SQL statements need to communicate with the DBMS through the SQL communications area (SQLCA). The results of the DBMA operation are placed in the SQLCA variables. The most important of theses, the return code, is placed in the SQLCODE field. It should be tested to examine the success, failure, or exception condition on the operation, as shown in the following code:
An abbreviated list of common SQL return codes appears in Table IX. The SQLCA is a data structure that should be copied into the program’s WORKING-STORAGE SECTION through the SQL INCLUDE statement, as shown below: EXEC SQL INCLUDE SQLCA END-EXEC Structure of SQLCA 01 SQLCA 05 SQLCAID PIC X(08) 05 SQLCABC PIC S9(09) USAGE COMP 05 SQLCODE PIC S9(09) USAGE COMP
Dollar Sign and Comma Insertion in Output Data Fields
Data field NET-SALARY
PIC clause
Value in data field
Output PIC clause
Output value
PIC 9(4)V99
123456
PIC $Z,ZZZ.99
$1,234.56
345
PIC $Z,ZZZ.99
$bbbb3.45
1234
PIC $Z,ZZZ.99
$bbb12.34
UNIT-PRICE
PIC 9V99
INSURANCE-FEE
PIC 9(2)V99
COBOL
125 Table VII
Check Protection Feature of COBOL
Sending data field PIC clause
Data value
PIC clause
Output data
9(4)V99
123456
$*,***.99
$1,234.56
9(4)V99
12345
$*,***.99
$**123.45
9(4)V99
1234
$*,***.99
$***12.34
9(6)V99
123456
$**,***.99
$*1,234.56
05 SQLERRM 10 SQLERRML 10 SQLERRMC 05 SQLERRP 05 SQLERRD
10 SQLWARN0 10 SQLWARN1 10 SQLWARN2 10 SQLWARN3 10 SQLWARN4 10 SQLWARN5 10 SQLWARN6 10 SQLWARN7 05 SQLEXT
Table VIII
Output data field
PIC S9(04) USAGE COMP PIC X(70) PIC X(08) OCCURS 6 TIMES PIC S(09) USAGE COMP PIC X(01) PIC X(01) PIC X(01) PIC X(01) PIC X(01) PIC X(01) PIC X(01) PIC X(08)
This statement resembles the COPY statement in that the SQL INCLUDE copies portions of code stored in a disk library into the program. The INCLUDE statement, however, copies at precompile time rather than at compile time.
VII. FUTURE OF COBOL At one time, COBOL was the most widely used programming language for business applications. Introduction of newer languages has somewhat adversely affected the popularity of COBOL. Although new business applications are still being written in COBOL, the introduction of newer languages such as Visual Basic and Visual C has reduced COBOL’s use. It remains to be seen how object-oriented features introduced in COBOL will impact COBOL.
COBOL Calculation Statements
ADD data-field-a TO data-field-B
Adds contents of data-field-a and data-field-b and stores the results in data-field-b
ADD data-field-a, data-field-b GIVING data-field-c
Adds contents of data-field-a and data-field-b and stores the result in data-field-c
SUBTRACT data-field-a FROM data-field-b
Subtracts contents of data-field-a from data-field-b and stores the result in data-field-b
SUBTRACT data-field-a FROM data-field-b GIVING data-field-c
Subtracts contents of data-field-a from data-field-b and stores the result in data-field-c
MULTIPLY data-field-a BY data-field-b
Multiplies contents of data-field-a and data-field-b and stores the result in data-field-b
DIVIDE data-field-a BY data-field-b
Divides contents of data-field-a by contents of data-field-b and stores the result in data-field-a
DIVIDE data-field-a INTO data-field-b
Divides contents of data-field-b by contents of data-field-a and stores the result in data-field-b
COMPUTE data-field-a data-field-b data-field-c * data-field-d / data-field-e
In a COMPUTE statement, arithmetic operations are performed from left to right with the following priority order: 1. Exponentiation 2. Multiplication/division 3. Addition/subtraction
COBOL
126 Table IX
Common SQL Return Codes
SQL code 0 0
Explanation Warning Successful execution
100
Row not found for FETCH, UPDATE, or DELETE, or the result of a query is an empty table
802
Data exception
007
Statement contains illegal character
101
Statement too long
103
Invalid numeric literal
105
Invalid string
117
Number of insert values not the same as number of object columns
803
Inserted or updated value invalid due to duplicate key
0
BIBLIOGRAPHY
Error
SEE ALSO THE FOLLOWING ARTICLES C and C • Fortran • Java • Pascal • Perl • Programming Languages Classification • Simulation Languages • Visual Basic • XML
Arnett, K. P., and Jones, M. C. (Summer 1993). Programming languages today and tomorrow. The Journal of Computer Information Systems, 77–81. Bauman, B. M., Pierson, J. K., and Forchit, K. A. (Fall 1991). Business programming language preferences in the 1990’s. The Journal of Computer Information Systems, 13–16. Borck, J. R. (August 2000). Cobol programmers face it: Your favorite code is outdated and losing support. Infoworld, Vol. 23, Issue 2, 56. Borck, J. R. (August 2000). Application transformation. Infoworld, Vol. 22, Issue 33, 53. Borck, J. R. (August 2000). PERCobol whips Cobol into shape for budget-minded enterprises. Infoworld, Vol. 22, Issue 33, 54. Buckler, G. (1990). Languages, methods all part of a COBOL evolution. Computing Canada, Vol. 16, Issue 13, 31–34. Coffee, P. (August 2000). Cobol comes to the party. eweek, Vol.17, Issue 33, 53. Garfunkel, J. (March 1992). Trends in COBOL. Enterprise Systems Journal, 120–122. Garfunkel, J. (July 1990). COBOL—The next stage. Computerworld, Vol. 24, Issue 30, 31–34. Khan, M. B. (1996). Structured COBOL: First course. Danvers, MA: Boyd & Fraser Publishing Company. Lauer, J., and Graf, D. (Spring 1994). COBOL: Icon of the past or the symbol of the future? The Journal of Computer Information Systems, 67–70. McMullen, J. (May 1991). Why PC COBOL is gaining ground. Datamation, Vol. 37, Issue 10, 70–72.
Cohesion, Coupling, and Abstraction Dale Shaffer Lander University
I. HISTORICAL PERSPECTIVE II. COHESION III. COUPLING
IV. COHESION AND COUPLING IN PERSPECTIVE V. ABSTRACTION VI. CONCLUSION
GLOSSARY
that are combined together to perform a specified task. A module can also contain data, but the data is generally not maintained when program control passes from the module. Examples include a function, procedure, subroutine, subprogram, and method. object An encapsulation of data and the methods that operate on the data. Access to the data and methods is strictly controlled through messagepassing and inheritance. An object is an instance of a class. structured design A set of techniques and strategies that are used to design modules that will best solve a problem.
abstraction A technique that allows one to focus on a portion of an information system while ignoring other features of the system. abstraction barrier A barrier around the implementation of a portion of an information system that is designed so that all access to the implementation is through well-defined avenues of access known as the interface. class A set of objects with similar characteristics and behavior. It identifies the structure of the characteristics and behaviors that are exhibited by all of its instances. encapsulation Binding a collection of data elements and modules that operate on those data elements into one package which can be invoked using one name. functional independence The result from making modules that were of a single purpose and that avoided interaction with other modules. implementation The portion of an information system located below the abstraction barrier that contains the programming code that implements specified behaviors. information hiding The process of hiding detail from the user of a module or object; the information that is hidden is located below the abstraction barrier. interface The bridge across an abstraction barrier that carefully controls access to the implementation portion. module Any collection of programming statements
COMPUTER PROGRAMS are usually constructed from building blocks, most of which can be categorized as modules and objects. Each module should perform a specific process in a program, and each object should perform one or more specific processes on a specific set of data. Cohesion is a measure of the functional strength, and coupling is a measure of the independence, of a module or object. A computer program with modules and objects that exhibits a high degree of cohesion and a low degree of coupling is considered to be well designed. While cohesion and coupling are metrics used in software design, abstraction is a technique for building software systems. By employing abstraction, a software developer can focus on one portion of an information system at a time.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
127
Cohesion, Coupling, and Abstraction
128
I. HISTORICAL PERSPECTIVE Early computer programs followed computer architecture, with data in one block of memory and program statements in another. With larger systems and recognition of the major role of maintenance, the block of program statements was further broken down into modules, which could be developed independently. Research in cohesion and coupling has its roots in the early 1970s as part of the development of modules. Structured design formalized the process of creating modules, recognizing that better written modules were self-contained and independent of each other. This functional independence was achieved by making modules that were of a single purpose, avoided interaction with other modules, and hides implementation details. Consider the modules in Fig. 1. The combinations module calculates the number of n things taken r at a time. For example, consider a visit to a restaurant that had a salad bar where you were allowed to choose any three of the six meat and vegetable selections to include on your lettuce salad. The combinations module would determine that there were twenty different ways you could select three of the six items. Functional independence is shown by the factorial module. The factorial module does only one task; it returns the factorial of the value given. It also has minimal interaction with other modules. The calling module, combinations, sends the minimum information that factorial needs—one number. Functional independence is measured using two criteria, cohesion and coupling.
II. COHESION First consider cohesion from outside of information systems. Two teams, each aligned along opposite ends
of a rope, are pulling against each other. Each member of a team shares a common goal—to pull the knot at the midpoint of the rope across a specified finish line. Some members of the team have dug in their heels for better traction, and the one on the end has the rope tied around himself. The team has a high degree of cohesion. However, if one team member quits, or even begins to pull in the opposite direction, the high degree of cohesion is lost. Much like the cohesive nature of the team, cohesion in information systems is the cement that holds a module together. Cohesion of a module is a measure of the relationship between its parts. If the cohesion is high, then the module is more likely to be designed well. Cohesion in a module is a measure of the closeness of the data and programming statements, or more specifically the tasks performed, within the module. The higher the degree of cohesion that exists within a module, the less likely it is that the programmer using the module will have to become knowledgeable about what the module does. The programmer using a module with high cohesion can generally focus on what a module does rather than how it does it. The encapsulation that cohesion promotes also aids in maintenance of the entire software system. For example, if a program contains modules strictly related to accessing and updating an inventory, and one module within it performs a particular search function, it would be quite easy to implement a new and improved search algorithm for the module. After thoroughly testing the new module, it would simply replace the old module in the software system. If done correctly, the replacement would be virtually unnoticeable by the user (except, for example, improved search time). A module with high cohesion is generally better designed and represents a more general approach than one with low cohesion.
int factorial (int num) { int result = 1, counter; for (counter = 1; counter Number (or No. or num) or as some other obvious designator of an entity. Consider the invoice document illustrated in Fig. 4. A scan of that document reveals 22 headings: Invoice (Order) Number, Date, Customer Number, Salesperson, Bill-to-Address, Customer PO, Terms, FOB Point, Line Number, Product Number, Product Description, Unit of Sale, Quantity Ordered, Quantity Shipped, Quantity Backordered, Unit Price, Discount, Extension, Order Gross, Tax, Freight, and Order Net. The fact that these are on the same document indicates that there is some association among these various items. They represent descriptors of related entities. To develop a data model from this document, the headings are categorized into one of three classes: entity identifier, attribute, or calculated value. This is somewhat subjective, but recalling the definition of an entity as any thing about which information must be maintained, the following are initially classified as entity identifiers: Invoice Number (equivalent to Order Number), Customer Number, Salesperson (with an implicit Salesperson number), and Product Num-
ber. Tax, Order Gross, and Order Net are classified as calculated values. All others are classified as attributes. For each entity identifier, the entity is named: Invoice Number identifies Invoice, Customer Number identifies Customer, and Product Number identifies Product. Next, relationships are established. It is obvious that Customer relates to Invoice. Given the existence of a relationship the next questions that must be asked are: “How many Customers are responsible for a single Invoice?” (answer: exactly one) and “How many Invoices can be the responsibility of one Customer?” (answer: zero or more). In this way it is determined that there is a one-to-many relationship between Customer and Invoice (see Fig. 5). Similarly a many-to-many relationship is established between Invoice and Product—one Invoice contains many Products (this is obvious), and that the same Product can be on many Invoices. How can the same product be on more than one Invoice? This assumes that what is meant by the same Product is not the same physical instance of the product, but different instances of the same type of product, having a single Product Number, where the instances of which are completely interchangeable. That is, the customer is being invoiced for some quantity of the same product. In Fig. 3, for example, “Cheerios” is Product Number 2157, sold in units of cartons. Local Grocery Store is being invoiced for 40 cartons of this same product.
Figure 4 An example invoice document.
Data Modeling: Entity-Relationship Data Model
499
Figure 5 A data model for an order processing application.
Presumably 40 (or 50 or 100) more cartons of the same product could be ordered, shipped, and invoiced to a different Customer. Hence the relationship is many-to-many. As many-to-many relationships are not permitted in this formalism, an intersection entity must be created between Invoice and Product. The intersection entity is related many-to-one to each of the entities in the many-to-many relationship (chicken feet on the intersection entity). An obvious name for the intersection entity is Line Item, since, in common business terminology, a line item on an invoice represents a single product on the invoice. Each Line Item is identified by the combination of the Invoice to which it relates and the Line Number (line) as illustrated in Fig. 5. Finally, attributes are identified from the values on the document and associated with the appropriate entity. Frequently attributes are named for the heading under which the value appears, e.g., Customer PO, Terms, FOB Point, Product Description. In other cases the value is the result of a calculation, e.g., Extension, Order Gross, and Tax at 6%. When that is the case, the calculated value itself does not correspond to an attribute, however, all data needed to perform the calculation must be represented within the attributes. For example, Extension is defined by the calculation, Order Quantity * Unit Price * (1 Discount/100). Order Quantity, Unit Price, and Discount are each represented as attributes. Extension is not. Similarly Order Gross is the sum of all Extension calculations for each
Line Item on the Invoice. Tax at 6% is also a calculated value, however, it requires an additional attribute, Tax Rate. The appropriate entity for this attribute must be determined. There may be a single Tax Rate that applies to each Customer. Alternately there may be a different Tax rate for each State in which Customers reside. The domain for which the data model in Fig. 5 was developed has a potentially different tax rate for each customer. Hence, tax_percent is an attribute of Customer. If there is not an appropriate entity for an attribute, i.e., one that is uniquely described by an attribute, then there is a missing entity. Further analysis is required to identify it. Finally, the model is reviewed to identify subtypes. If a relationship applies to some, but not all instances of an entity, that is, the minimum degree of one of the relationship descriptors is zero, then subtypes exist in at least one of the entities, i.e., the subtype participates in the relationship while the supertype does not. Similarly, if an attribute applies to some, but not all instances of an entity, then subtypes exist for the entity, i.e., the subtype has the additional attributes. The value of explicitly recognizing subtypes depends on the degree of heterogeneity among the subtypes. The purpose of the data model is to communicate the meaning of the data. If the introduction of subtypes confuses rather than clarifies, they should not be introduced. Referring to the data model of Fig. 5, some Customers may not have any outstanding Invoices (minimum cardinality of zero). Thus there are two subtypes of Customer—those with Invoices and those without
500 Invoices. If this is the only distinction for subtyping, it is probably not worthwhile explicitly recognizing the subtypes. If, on the other hand, Customers with Invoices have additional attributes or are viewed differently from Customers without Invoices, then it may be beneficial to represent this subtype explicitly in the data model.
Data Modeling: Entity-Relationship Data Model Disallowing many-to-many relationships assures that each relationship corresponds to a functional dependency between entity identifiers. The identifier of the entity on the “one” side is fully functionally dependent upon the identifier of the entity on the “many” side. Thus the resultant data model is “well formed” in the sense that it can be directly transformed into a “third normal form” relational schema. The structural rules for evaluating a data model are
C. Event Analysis Event analysis defines an entity for each event and identifies the associated actors and resources required for that event to occur. The sentences in the Section V.A not only identify things, they also identify events. At least three events can be distinguished, Place Order, Ship Order, and Invoice Order. If additional sentences were obtained, it is likely that a fourth event would be identified, Pay Invoice. Event analysis can often help clarify the nature of attributes and relationships. Invoice, for example, has an attribute named date. The identification of four events related to this entity forces the question: “To which event does this date refer?” Most likely four different date attributes must be maintained, one for each event. Furthermore, the attribute freight does not have a meaningful value for an Order that has not been shipped. Similarly, for the Line Item entity, the attribute quantity_shipped is not meaningful until the Ship Order event has occurred. Event methodologies recommend creating an entity for each event. Hence, Place Order, Ship Order, Invoice Order, and Pay Invoice would be separate entities in the Order processing data model. Such a representation highlights the possibility that a single Order may have many Ship events or that a single Ship event may include multiple Orders. It effectively forces an analyst to investigate such possibilities.
1. Each entity must be uniquely named and identified 2. Attributes are associated with entities (not relationships), and each entity must have one and only one value for each of its attributes (otherwise an additional entity must be created) 3. Relationships associate a pair of entities or associate an entity with itself (only binary relationships are allowed but relationships can be recursive) 4. Many-to-many relationships are not allowed (an intersection entity with two one-to-many relationships must be created) 5. Subtypes are identified when the minimum degree of a relationship descriptor is zero or when an attribute does not apply to all instances of an entity; these subtypes are explicitly recognized when there is a “significant” difference among the subtypes (e.g., they have multiple attributes or relationships)
VI. TRANSFORMING DATA MODELS INTO DATABASE DESIGNS After validation a data model must be transformed into the schema definition of a DBMS for implementation. This section describes how data models can be transformed into an RDBMS schema and presents the basic efficiency issues that must be considered in that transformation.
D. Evaluating the Resultant Data Model The quality of the data model is assured by the manner in which it was constructed. The construction described above is based on the principle of normal forms originally proposed for the relational data model and later applied to semantic data models. Each attribute is associated with an entity only if that attribute directly describes that entity with a single value. In the “normal form” terminology, the assignment of attributes to entities in this way assures that each attribute is fully functionally dependent upon the identifier of the entity, and not dependent upon any other attribute in the model.
A. Relational DBMS The basic elements of a relational database schema define tables, columns in those tables, and constraints on those columns. Constraints include primary keys (identifiers) and foreign keys (relationships). While there are numerous efficiency issues related to transforming a data model into such a database schema, a “quick and dirty” approach simply maps entities into tables, attributes into columns, identifiers into primary keys, and relationships into columns designated as foreign keys.
Data Modeling: Entity-Relationship Data Model Fig. 7 shows such a relational schema in tabular form for the data model of Fig. 5. Figure 6 shows its definition in the standard relational language SQL. The data model has five entities, Customer, Salesperson, Order, Line Item, and Product. Five corresponding tables are defined in the relational schema. Similarly, columns are defined for each attribute in each entity. In a relational schema all interconnections among tables are represented by data values. Hence, columns must be created to represent relationships. For a one-to-many relationship, a column is created in the table representing the entity on the “many” side for each attribute (column) in the identifier (primary key) of the table representing the entity on the “one” side. These columns are termed “foreign keys.” As illustrated in Figs. 6 and 7, the Customer table has an “extra” column, spno (salesperson number) to represent the one-to-many relationship between Salesperson and Customer. It is designated in a Foreign Key constraint to reference the Primary Key column, spno, in the Salesperson table. The spno column in the Customer table is constrained to be NOT NULL. This constraint specifies that each row of the Customer table must have a value for that column. The Foreign Key constraint specifies that the value of that column must appear in the spno column of some row in the Salesperson table. Together these implement the specified relationship between Customer and Salesperson in the data model having a minimum 1 and maximum 1 on the Customer side. Removing the NOT NULL constraint would implement a minimum 0 and maximum 1 relationship on the Customer side. The minimum 0 and maximum many (unlimited) on the Salesperson side is implicit in the Foreign Key representation. Enforcing a minimum other than 0 or a maximum other than unlimited on the Salesperson (many) side requires a procedurally implemented constraint. The Line Item table similarly has two “extra” columns: invoice_no (invoice number), representing its many-to-one relationship with Invoice, and pno (product number) representing its many-to-one relationship with Product. Again each is designated in a Foreign Key constraint; however, while invoice_no is
501 designated to be NOT NULL, pno is not. This implementation requires each Line Item to have a related Invoice, but does not require it to have a related Product. Hence this schema implements the specified relationships in the data model of Fig. 5. A one-to-one relationship may be represented in the table corresponding to either entity in the relationship. When one role in a one-to-one relationship has a minimum of 0 and the other has a minimum of 1, the relationship is most commonly represented by a foreign key in the table corresponding to the entity on the minimum 0 side. Consider, for example, the data model in Fig. 3. The Managing relationship could be represented by a foreign key in either the Employee table or the Department table. Since it has a minimum of 0 on the Department side, it would most likely be represented by a foreign key column constrained to be NOT NULL in the table corresponding to that entity. If this relationship was represented in the table corresponding to the Employee table, the foreign key column would not be constrained to be NOT NULL and, in fact, would contain a NULL value in each row corresponding to an employee who did not manage a department, likely most of them.
B. Efficiency Issues in Relational Database Design This type of direct transformation from a data model into a database schema may be inefficient for database processing. Decisions related to efficiency of implementation are part of physical database design. There are numerous physical database design possibilities. The following are illustrative.
1. Vertical and Horizontal Fragmentation For efficiency reasons, an entity may be split vertically or horizontally. This is termed fragmentation. Vertical fragmentation assigns subsets of attributes to different tables. Horizontal fragmentation assigns subsets of instances to different tables. Of course, these approaches can be combined.
Figure 6 A relational database schema in tabular form (primary keys are underlined, foreign keys are in italic).
Data Modeling: Entity-Relationship Data Model
502
Figure 7 A relational database schema definition in SQL.
Vertical fragmentation can increase efficiency if clusters of “frequently” used and “rarely” used attributes can be identified. Frequently used attributes can be stored in the table corresponding to the entity and rarely used attributes can be stored in a separate table. This reduction in the size of the table corresponding to the entity can significantly reduce processing requirements. Consider, for example, an Employee entity used in a payroll application. The Employee entity may contain attributes such as emergency contact, emergency contact address, and emergency contact telephone number that are rarely, if ever, used in payroll processing. Segmenting them to a separate table, related to the Employee table by a foreign key reduces the size of the Employee table, which is likely scanned for payroll processing. Horizontal fragmentation can similarly increase efficiency by reducing the size of the table corresponding
to an entity by identifying “rarely” used instances. This technique is similar to “archiving” rarely used data. Again considering an Employee entity used in a payroll application, terminated employees must be retained for end of year processing, but are no longer used in normal payroll processing. Segmenting them to a separate table can reduce normal payroll processing time.
2. Attribute Replication Efficiency can be increased by redundantly storing specific attributes with related entities. Consider, for example, the processing required to produce invoices from the set of tables shown in Fig. 5. Assuming that invoices require Customer, Order, Salesperson, Line Item, and Product data, the production of this report requires all five tables to be joined together. Redun-
Data Modeling: Entity-Relationship Data Model dantly storing the required Salesperson attributes in the Order table and the required Product attributes in the Line Item table would eliminate two joins, significantly reducing the computing effort to produce it. Of course this replication increases the size of those tables and can result in update anomalies since the third normal form is violated. This approach is most effective when the replicated data is small and is not subject to frequent modification.
3. Entity Merging Merging related entities that are frequently retrieved together into a single table may also increase retrieval efficiency. Merging the entity on the one side of a one-tomany relationship into the entity on the many side is similar to attribute replication, but replicates the entire entity. This approach can be effective when combined with judicious vertical fragmentation. Merging the entity on the many side of a one-to-many relationship into the entity on the one side violates first normal form. It is not directly supported by RDBMSs although it is directly supported by object DBMSs. To implement this strategy in an RDBMS a set of columns must be defined for the maximum possible number of related instances. It is most effective when the actual number of related instances is fixed and relatively “small.” Consider, for example, an inventory system that tracks the ending inventory level for each product for each of the past 12 months. The data model for such a domain would have an entity for Product related one to many to an entity for Ending Inventory. The Ending Inventory entity would need two attributes, Month and Ending Quantity. The minimum and maximum cardinality on the Ending Inventory side is 12. Merging that entity into the Product entity would require 12 columns, one for the ending inventory in each of the past 12 months. The month would be built into the column name.
VII. SUMMARY AND DIRECTIONS FOR FURTHER RESEARCH Data modeling is a process by which the logical or “natural” data structure of a domain of interest is represented using a predefined set of constructs. The set of constructs is termed a data modeling formalism. The product of data modeling is a logical data model.
503 This article has discussed the major constructs of data modeling formalisms. It used a simple graphical notation to illustrate these constructs. Approaches to developing a data model were presented and illustrated. Finally, a simple way to transform a logical data model into a database schema was presented and efficiency issues discussed. Current research in data modeling is progressing in several directions. These include the development of modeling and design tools and techniques, user interfaces based on data modeling constructs, usability, semantics and constraints, quality and reliability metrics, and the interface with object technologies and languages. A number of these are addressed in the articles listed in the bibliography.
SEE ALSO THE FOLLOWING ARTICLES Database Administration • Database Development Process • Database Systems • Data Modeling: Object-Oriented Model • Relational Database Systems
BIBLIOGRAPHY Carlis, J. V., and Maguire, J. (2000). Mastering data modeling: A user-driven approach. Reading, MA: Addison-Wesley. Chen, P. P.-S. (1976). The entity-relationship model—Toward a unified view of data. ACM Transactions on Database Systems, Vol. 1, No. 1, 9–36. Denna, E., Cherrington, J. O., Andros, D., and Hollander, A. (1993). Event-driven database solutions. Irwin, Homewood, IL: Business One. Fowler, M., Kendall, S., and Booch, G. (1999). UML distilled, Second Edition. Reading, MA: Addison-Wesley. Hull, R., and King, R. (1987). Semantic database modelling: Survey, applications, and research issues. ACM Computing Surveys, Vol. 19, No. 3, 201–260. Ling, T. W., and Ram, S. (1998). Conceptual modeling—ER ’98. Berlin: Springer. McFadden, F. R., Hoffer, J., and Prescott, M. (1999). Modern database management, Oracle 7.3.4 Edition. Reading, MA: Addison-Wesley. Peckham, J., and Maryanski, F. (1988). Semantic data models. ACM Computing Surveys, Vol. 20, No. 3, 153–189. Snodgrass, R. T. (2000). Developing time-oriented database applications in SQL. San Francisco, CA: Morgan Kaufmann. Teorey, T. (1994). Database modeling and design, the fundamental principles, Second Edition. San Francisco, CA: Morgan Kauffmann.
Data Modeling: Object-Oriented Data Model Michalis Vazirgiannis Athens University of Economics and Business
I. INTRODUCTION II. MOTIVATION AND THEORY III. INDUSTRIAL SYSTEMS—STANDARDS
IV. CONCLUSION—RESEARCH ISSUES AND PERSPECTIVES V. CASE STUDY: OBJECT RELATIONAL SOLUTIONS FOR MULTIMEDIA DATABASES
GLOSSARY
relational model A data modeling approach having found very successful industrial implementations in DBMS. The fundamental modeling constructs are the relations consisting of tuples of values each one taking its semantics from an appropriate attribute. The relations represent entities of the real world and relationships among entities.
data model In data modeling we try to organize data so that they represent as closely as possible a real world situation, yet representation by computers is still feasible. A data model encapsulates three elements: objects’ structure, behavior and integrity constraints. encapsulation A property of the object-oriented model, promoting data and operation independence. This is achieved by hiding the internal structure and implementation details from the external world simplifying thus the maintenance and usage of a multitude of object classes in an application. inheritance The ability of one class to inherit the structure and behavior of its ancestor. Inheritance allows an object to inherit a certain set of attributes from another object while allowing the addition of specific features. object-oriented modeling It is an abstraction of the real world that represents objects’ structural content and behavior in terms of classes’ hierarchies. The structural content is defined as a set of attributes and attached values and the behavior as a set of methods (functions that implement object’s behavior). polymorphism The ability of different objects in a class hierarchy to have different behaviors in response to the same message. Polymorphism derives its meaning from the Greek for “many forms.” A single behavior can generate entirely different responses from objects in the same group. Within the framework of the program, the internal mechanism determines what specific name of different purposes is known as function overloading.
I. INTRODUCTION A. Need for Data Modeling The word “datum” comes from Latin and, literally interpreted, means a fact. However, data do not always correspond to concrete or actual facts. They may be imprecise or may describe things that have never happened (e.g., an idea). Data will be of interest to us if they are worth not only thinking about, but also worth recording in a precise manner. Many different ways of organizing data exist. For data to be useful in providing information, they need to be organized so that they can be processed effectively. In data modeling we try to organize data so that they represent as closely as possible a real world situation, yet representation by computers is still feasible. These two requirements are frequently conflicting. The optimal way to organize data for a given application can be determined by understanding the characteristics of data that are important for capturing their meaning. These characteristics allow us to make general statements about how data are organized and processed.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
505
506 It is evident that an interpretation of the world is needed, sufficiently abstract to allow minor perturbations, yet sufficiently powerful to give some understanding concerning how data about the world are related. An intellectual tool that provides such an interpretation will be referred to as a data model. It is a model about data by which a reasonable interpretation of the data can be obtained. A data model is an abstraction device that allows us to focus on the information content of the data as opposed to individual values of data.
B. Historical Overview: First and Second Database Model Generations Information systems demand more and more services from information stored in computing systems. Gradually, the focus of computing shifted from processoriented to data-oriented systems, where data play an important role for software engineers. Today, many design problems center around data modeling and structuring. After the initial file systems in the 1960s and early 1970s, the first generation of database products was born. Database systems can be considered intermediaries between the physical devices where data are stored and the users (humans) of the data. Database management systems (DBMS) are the software tools that enable the management (definition, creation, maintenance, and use) of large amounts of interrelated data stored in computer-accessible media. The early DBMSs, which were based on hierarchical and network (Codasyl) models, provided logical organization of data in trees and graphs. IBM’s IMS, General Electric’s IDS, Univac’s DMS 110, Cincom’s Total, MRI’s System 200, and Cullinet’s (now Computer Associates) IDMS are some of the well-known representatives of this generation. Although efficient, these systems used procedural languages, did not offer physical or logical independence, thus limiting its flexibility. In spite of that, DBMSs were an important advance compared to the files systems. IBM’s addition of data communication facilities to its IMS software gave rise to the first large-scale database/data communication (DB/DC) system, in which many users access the DB through a communication network. Since then, access to DBs through communication networks has been offered by commercially available DBMSs. C. W. Bachman played a pioneering role in the development of network DB systems (IDS product and Codasyl DataBase Task Group, or DBTG, proposals). The DBTG model is based on the data structure diagrams, which are also known as Bachman’s diagrams.
Data Modeling: Object-Oriented Data Model In the model, the links between record types, called Codasyl sets, are always one occurrence of one record type. To many, that is, a functional link. In its 1978 specifications, Codasyl also proposed a data definition language (DDL) at three levels (schema DDL, subschema DDL, and internal DDL) and a procedural (prescriptive) data manipulation language (DML). In 1969–1970, Dr. E. F. Codd proposed the relational model, which was considered an “elegant mathematical theory” without many possibilities of efficient implementation in commercial products. In 1970, few people imagined that, in the 1980s, the relational model would become mandatory (a “decoy”) for the promotion of DBMSs. Relational products like Oracle, DB2, Ingres, Informix, Sybase, etc., are considered the second generation of DBs. These products have more physical and logical independence, greater flexibility, and declarative query languages (users indicate what they want without describing how to get it) that deal with sets of records, and they can be automatically optimized, although their DML and host language are not integrated. With relational DBMSs (RDBMSs), organizations have more facilities for data distribution. RDBMSs provide not only better usability but also a more solid theoretical foundation. Unlike network models, the relational model is value-oriented and does not support object identity. Needless to mention, there is an important trade-off between object identity and declarative features. As a result of Codasyl DBTG and IMS support object identity, some authors introduced them in the objectoriented DB class. The initial relational systems suffered from performance problems. While nowadays these products have achieved wide acceptance, it must be recognized that they are not exempt from difficulties. Perhaps one of the greatest demands on RDBMSs is the support of increasingly complex data types; also, null values, recursive queries, and scarce support for integrity rules and for domains (or abstract data types) are now other weaknesses of relational systems. Some of those problems are solved in the current version of Structured Query Language (SQL), SQL: 1999 (previously SQL3). In the 1970s, the great debate on the relative merits of Codasyl and relational models served to compare both classes of models and to obtain a better understanding of their strengths and weaknesses. During the late 1970s and in the 1980s, research work (and, later, industrial applications) focused on query optimization, high-level languages, the normalization theory, physical structures for stored relations, bugger and memory management algorithms, indexing techniques (variations of B-tress), distributed sys-
Data Modeling: Object-Oriented Data Model tems, data dictionaries, transaction management, and so on. That work allowed efficient and secure on-line transactional processing (OLTP) environments (in the first DB generation, DBMSs were oriented toward batch processing). In the 1980s, the SQL language was also standardized (SQL/ANS 86 was approved by the American National Standard Institute, ANSI and the International Standard Organization, ISO in 1986), and today, every RDBMS offers SQL. Many of the DB technology advances at that time were founded on two elements: reference models and data models. ISO and ANSI proposals on reference models have positively influenced not only theoretical researches but also practical applications, especially in DB development methodologies. In most of those reference models, two main concepts can be found; the well-known three-level architecture (external, logical, and internal layers), also proposed by Codasyl in 1978, and the recursive data description. The separation between logical description of data and physical implementation (data application independence) devices was always an important objective in DB evolution, and the three-level architecture, together with the relational data model, was a major step in that direction. In terms of data models, the relational model has influenced research agendas for many years and is supported by most of the current products. Recently, other DBMSs have appeared that implement other models, most of which are based on object-oriented principles. Three key factors can be identified in the evolution of DBs: theoretical basis (resulting from researcher’s work), products (developed by vendors), and practical applications (requested by users). These three factors have been present throughout the history of DB, but the equilibrium among them has changed. What began as a product technology demanded by users’ needs have always influenced the evolution of DB technology, but especially so in the last decade. Today, we are witnessing an extraordinary development of DB technology. Areas that were exclusive of research laboratories and centers are appearing in DBMSs’ latest releases: World Wide Web, multimedia, active, object-oriented, secure, temporal, parallel, and multidimensional DBs. The need for exploiting the Object-Oriented Model for such complex systems is apparent.
507 demonstrated that we still ignore the solutions to some of the problems of the new millennium. In spite of the success of this technology, different “preoccupation signals” must be taken into account. We identify the following architectural issues that need to be solved in the light of new application domains: • Current DBMSs are monolithic; they offer all kinds of services and functionalities in a single “package,” regardless of the users’ needs, at a very high cost, and with a loss of efficiency • About half of the production data are in legacy systems • Workflow management (WFM) systems are not based on DB technology; they simply access DBs though application programming interfaces (APIs) • Replication services do not scale well over 10,000 nodes • Integration of strictly structured data with loosely structured data (e.g., data from a relational DB with data from electronic mail) On the other hand there is wealth of new application domains that produce huge amounts of data and therefore call for database support. Such domains are computer-aided design (CAD), computer-aided software engineering (CASE), office-automation, multimedia databases, geographic information systems (GIS), scientific experiments, telecommunications, etc. These application domains present some important common characteristics that make their database support by traditional relational systems problematic. Such features include: • Hierarchical data structures (complex objects) • New data types for storing images or large textual items • No general-purpose data structure available • Nonstandard application-specific operations • Dynamic changes • Cooperative design process among designers • Large number of types • Small number of instances • Longer duration transactions
A. Motivation
The database technology has to respond to these challenges in a way that the above requirements are addressed as database technology design features. In the sequel we identify the shortcomings of current database technology in the context of the new applications:
Although one might think that DB technology has reached its maturity, the new DB generation has
• Poor representation of “real-world” entities, need to decompose objects over relations
II. MOTIVATION AND THEORY
Data Modeling: Object-Oriented Data Model
508 • Fixed build-in types; no set-valued attributes are supported, thus complex and highly nested objects cannot be represented efficiently • Semantic overloading • Poor support for integrity and enterprise constraints • No data abstraction such as aggregation and generalization, thus inheritance and specialization cannot be addressed • Limited operations • Difficulty handling recursive queries • No adequate version control is supported
B. Object-Oriented Model 1. Historical Preview of Object-Oriented Databases Before we proceed with our discussion of data modeling, it is necessary to define, even if only approximately, the elementary objects that will be modeled (i.e., what a datum is). Suppose that we accept as a working definition of an atomic piece of data the tuple . After all, a phenomenon or idea usually refers to an object (object name) and to some aspect of the object (object property), which is captured by a value (property value) at a certain time (time). Of these four aspects of data, time is perhaps the most cumbersome aspect of data modeling. Therefore, many data models completely drop the notion of time and replace it either with other kinds of explicit properties or with orderings among objects. The issues of Object-Oriented (OO) Models, object structure–object Classes will be treated as the basis of an object-oriented database management system (OODBMS). Furthermore such a system must support Inheritance (single–multiple) and handle object identity issues. Then OO languages providing persistency (persistence by class, creation, marking, reference) are necessary so that users of an OODBMS are able to define and manipulate database objects.
2. Object-Oriented Modeling and Programming Concepts Hereafter an overview of object-oriented concepts will be presented. Object orientation has its origins in object-oriented programming languages (OOPLS). The “class” concept is introduced by SIMULA, where as abstract data types encapsulation, message passing, and inheritance features are further introduced by the pioneering SMALLTALK. Another language of this family is C that integrates the strengths of C
with object-oriented concepts. The newest OO language is Java, inherently object-oriented providing a wide selection of classes for different tasks (i.e., visualization, network and task management, persistent features). Its portability across platforms and operating systems made it a very attractive development environment, widely used and with important impact on programming large-scale applications. An object has an inherent state followed by its behavior, which defines the way the object treats its state as well the communication protocol between the object and the xternal world. We have to differentiate between the transient objects in OOPLs and the persistent objects—in object-oriented databases (OODBs). In the first case the objects are eliminated from the main memory as soon as they are not needed whereas in the case of OODBMSs objects are persistently stored and other mechanisms such as indexing, concurrency, control, and recovery are available. OODBMSs usually offer interfaces with one or more OOPLs. The three features that differentiate an object from a relational tuple are 1. Object identity, a unique identifier generated with the object creation and follows it throughout its life cycle 2. Encapsulation features (promoting data and operation independence), since the internal structure and implementation details are not accessible from the external world simplifying thus the maintenance and usage of a multitude of object classes in an application 3. Operator polymorphism and overloading, allowing different behaviors to be grouped under the same method and operator names; this facilitates design and evolution of large sets of classes Here after these features are further analyzed. a. OBJECT IDENTITY, OBJECT STRUCTURE TYPE CONSTRUCTORS The object identity (OID) is generated by the system when a new object is created and is unique and immutable. Each OID is used only once and is invisible to the users. An OODBMS offers a set of type constructors that are used to define the data structures for an OO database schema. The basic constructors offered created atomic values (atom (integer, string, float, etc.), tuples, and sets of objects. Other constructors create lists, bags, and arrays of objects. It is also feasible to have attributes that refer to other objects called references—OID. An object has an internal structure defined by a triple (OID, type constructor, state). For instance, asAND
Data Modeling: Object-Oriented Data Model sume an object o (i1, tuple, <manager: i3, start-date: i5>). An object can be represented as a graph structure that can be constructed by recursively applying the type constructors. Assume the following example:
509 contained package. Thus the external aspects of an object are separated from its internal details. The external users of the object are only made aware of the interface of the object type, i.e., name and parameters of each operation, also called signature. The functionality of an object is implemented by a set of methods and messages. A method consists of a name and a body that perform the behavior associated with the method name, whereas a message is simply a request from one object to another object asking the second object to execute one of its methods. In the context of database applications dealing with objects we distinguish between visible attributes, implemented by external operators or query language predicates or hidden attributes which are referenced only through predefined operations. Assume the example:
Define class Employee; type tuple ( ...) operations age: integer; create_emp: Employee destroy_emp: boolean; end Employee;
Example 1 Object identities (OIDs).
In this example the types Employee, Date, and Department are defined. As the object structures are potentially complex the issue of object equality becomes interesting. Object equality is a concept that can be viewed from two aspects: deep and shallow. • Two objects are said to have identical states (deep equality) if the graphs representing their states are identical in every respect, including the OIDs at every level. • Two objects have equal states (shallow equality) if the graph structures are the same and all the corresponding atomic values in the graphs are also the same. However, some corresponding internal nodes in the two graphs may have objects with different OIDs. b. ENCAPSULATION An important feature in the domain of OODBs is the encapsulation that offers an abstraction mechanism and contributes in information hiding since an object encapsulates both data and functions into a self-
If d is a reference to an Employee object, we can invoke an operation such as age by writing d.age. Of course the main issue in OODBs is to have a mechanism that makes the objects persistent, i.e., stored safely in the secondary storage at appropriate times. The mechanisms for making an object persistent are called naming and reachability. The naming mechanism involves giving an object a unique persistent name through which it can be retrieved by this and other programs. The following example illustrates this concept:
define class DepartmentSet: type ... operations ... ... end DepartmentSet; ... persistent name AllDepartments: DepartmentSet; ... d := create_dept;
c. TYPE HIERARCHIES AND INHERITANCE A type can be defined by giving it a type name and then listing the names of its public functions. Then it
510 is possible to define subtypes emanating from the basic types. For example:
PERSON: Name, Address, Birthdate, Age, SSN EMPLOYEE subtype-of PERSON: Salary, HireDate, Seniority STUDENT subtype-of PERSON: Major, GPA
Each subtype inherits the structure and behavior of its ancestor. It is also possible to redefine (override) an inherited property or method. In some cases it is necessary to create a class that inherits from more than one superclass. This is the case of multiple inheritance which occurs when a certain subtype T is a subtype of two (or more) types and hence inherits the functions of both supertypes. This leads to the creation of a type lattice. In several cases if the lattice grows integrity and ambiguity problems may arise. In other cases it is desirable that a type inherits only part of structure and/or behavior of the a supertype. This is the case of selective inheritance. The EXCEPT clause may be used to list the functions in a supertype that are not to be inherited by the subtype. d. POLYMORPHISM The term polymorphism integrates the various cases where, during inheritance, redefinition of structure or behavior is necessary. Polymorphism implies operator overloading where the same operator name or symbol is bound to two or more different implementations of the operator, depending on the type of objects to which the operator is applied. For instance the function Area should be overloaded by different implementations in the Geometry_Object example. The process of selecting the appropriate method based on an object’s type is called binding. In strongly typed systems, this can be done at compile time. This is termed early (static) binding. On the other hand binding can take place at runtime, in this case called dynamic binding. An example follows: template T max(x:T, y:T) { if (x > y) return x; else return y; }
Data Modeling: Object-Oriented Data Model The actual methods are instantiated as follows: int max(int, int); real max(real, real); If the determination of an object’s type can be deferred until runtime (rather than compile time), the selection is called dynamic binding. e. COMPLEX OBJECTS A complex object, an important concept in the object-oriented approach, is an item that is perceived as a single object in the real world, but combines with other objects in a set of complex a-part-of relationships. We distinguish two categories of complex objects: unstructured and structured ones. The unstructured complex objects contain bitmap images, long text strings; they are known as binary large objects (BLOBs), OODBMSs provide the capability to directly process selection conditions and other operations based on values of these objects. Such objects are defined by new abstract data types and the user provides the pattern recognition program to map the object attributes to the raw data. The second category, structured complex objects, implies that the object’s structure is defined by repeated application of the type constructors provided by the OODBMS. Here we distinguish two types of reference semantics: (1) ownership semantics (is-part-of; iscomponent-of), and (2) is-associated-with relationship. The idea is that OODBMSs constitute a “marriage” between the concepts of object-oriented programming (OOP), such as inheritance, encapsulation, and polymorphism, and well-founded and industry level supported database capabilities (see Fig. 1). Conclusively, the ideas that were worked out in the OODBMS area have been concentrated and codified by the Object-Oriented Database System Manifesto. This is an effort toward standardization and promoting ideas that should be part of an object-oriented database system. The most important concepts from this manifesto follow: • An OODBMS should support complex objects. • Object identity is an important part of an object and must be supported. • Encapsulation must be supported as an integral part of the system. • Types or classes must be supported along with inheritance features, so that types or classes must be able to inherit from their ancestors. • Dynamic binding must be supported. • The DML must be computationally complete and the set of data types must be extensible. Moreover
Data Modeling: Object-Oriented Data Model
511
Figure 1 OODBMS is an amalgamation between OOP and traditional database concepts.
• • • •
The DBMS must provide a simple way of querying data. Data persistence must be provided as a natural feature of a database system. The DBMS must be capable of managing very large databases. The DBMS must support concurrent users. The DBMS must be capable of recovery from hardware and software failures.
Though these features are quite interesting and promising, the impact on the database industry was not as important as hoped. Instead the big database industry tried to extend the well-established relational technology by integrating some of the object-oriented database concepts. This hybrid technology is known as object relational. In the next section there is a review of the main features of this technology.
C. The Object Relational Approach Here we contrast and compare the relative strengths and weaknesses of the relational and object-oriented
systems in order to motivate the need for objectoriented modeling. The discussion will include userdefined data types and set-based versus navigational access to data. We also examine some simple modeling examples to illustrate the discussions.
1. A Quick Look at Relational and Object-Oriented Databases It is evident that the strengths of the relational paradigm have revolutionized information technology. Relational database technology was originally described by E. F. Codd. Not long afterward, companies like IBM and Oracle created very successful products. The relational DB standard is published by ANSI, with the current specification being X3H2 (SQL’92). The new specification dealing with object extensibility has been labeled X3H7. A relational DB stores data in one or more tables or rows and columns. The rows correspond to a record (tuple); the columns correspond to attributes (fields in the record), with each column having a data type like date, character, or number. Commercial implementations currently support very few data types. For example, character, string, time,
512 date, numbers (fixed and floating point), and currency describe the various options. Any attribute (field) of a record can store only a single value. Relational DBs enforce data integrity via relational operations, and the data themselves are structures to a simple model based on mathematical set theory. Relationships are not explicit but rather implied by values in specific fields, for example, foreign keys in one table that match those of records in a second table. Many-to-many relationships typically require an intermediate table that contains just the relationships. Relational DBs offer simplicity in modifying table structure. For example, adding data columns to existing tables or introducing entire tables remains an extremely simple operation. The beauty of relational DBs continues to be in its simplicity. The process of normalization establishes a succinct clarity to the management and organization of data in the DB. Redundancies are eliminated and information retrieval its governed by the associations created between primary and foreign keys. Why store the same piece of information in two or more places when a logical connection can be established to it in one place? Referential integrity (RI) has also made an important contribution because it enables business rules to be controlled through the use of constraints. The role of constraints is to prevent the violation of data integrity and, thereby, its normalization. The origins of OODBs trace their beginnings to the emergence of OOP in the 1970s. Technically, there is no official standard for object DBs. The book The Object Database Standard: ODMG-V2.0, under the sponsorship of the Object Database Management Group (ODMG) (http://www.odmg.com), describes an industry-accepted de facto standard. Object DBMSs emphasize objects, their relationships, and the storage of those objects in the DB. Designers of complex systems realized the limitations of the relational paradigm when trying to model complex systems. Characteristics of object DBs include a data model that has object-oriented aspects like class, with attributes, methods, and integrity constraints; they also have OIDs for any persistent instantiation of classes; they support encapsulation (data and methods), multiple inheritance, and abstract data types. Object-oriented data types can be extended to support complex data such as multimedia by defining new object classes that have operations to support the new kinds of information. The object-oriented modeling paradigm also supports inheritance, which allows incremental development of solutions to complex problems by defining new objects in terms of previously defined objects.
Data Modeling: Object-Oriented Data Model Polymorphism allows developers to define operations for one object and then share the specification of the operation with other objects. Objects incorporating polymorphism also have the capability of extending behaviors or operations to include specialized actions or behaviors unique to a particular object. Dynamic binding is used to determine at runtime which operations are actually executed and which are not. Object DBs extend the functionality of object programming languages like C or Java to provide fullfeatured DB programming capability. The result is a high level of congruence between the data model for the application and the data model of the DB, resulting in less code, more natural data structures, and better maintainability and greater reusability of code. All of those capabilities deliver significant productivity advantages to DB application developers that differ significantly from what is possible in the relational model.
2. Contrasting the Major Features of Pure Relational and Object-Oriented Databases In the relational DB, the query language is the means to create, access, and update objects. In an object DB, the primary interface for creating and modifying objects is directly via the object language (C, Java, SMALLTALK) using the native language syntax even though declarative queries are still possible. Additionally, every object in the system is automatically given an OID that is unique and immutable during the object’s life. One object can contain an OID that logically references, or points to, another object. Those references prove valuable when in the association of objects with real-world entities, such as products, customers, or business processes; they also form the basis of features such as bidirectional relationships, versioning, composite objects, and distribution. In most ODBMSs, the OIDs become physical (the logical identifier is converted to pointers to specific memory addresses) once the data are loaded into memory (cached) for use by the object-oriented application. No such construct exists in the relational DB. In fact, the addition of navigational access violates the very principles of normalization because OIDs make no reliance on keys. To further explore the divergent nature of relational and object-oriented DBs, let us look more closely at the drawbacks of each. Our discussion eventually leads us to the justification behind the objectrelational paradigm. In the following we explore the specifics of what it takes to define the object-relational paradigm. A first
Data Modeling: Object-Oriented Data Model issue is to enable object functionality in the relational world. Two important aspects must be considered in any definition of the object-relational paradigm. The first is the logical design aspects of the architecture. What data types will be supported? How will data be accessed? The other aspect is the mapping of the logical architecture to a physical implementation. From a technological standpoint, certain capabilities must be included in the list of logical capabilities, or the DB will not qualify to the minimal requirements for being object-relational. Such capabilities are behavior, collection types, encapsulation, inheritance, and polymorphism. a. BEHAVIOR A method, in the purely object-oriented paradigm, is the incorporation of a specific behavior assigned to an object or element. A method is a function of a particular class. b. COLLECTION TYPES An aggregate object is essentially a data-type definition that can be composed of many subtypes coupled with behavior. In Oracle 8, for example, there are two collection types: VARRAYs and nested tables. VARRAYs are suitable when the subset of information is static and the subset is small. A suitable implementation of a VARRAY might be in the same context where a reference entity might be used. The contents of reference entities remain relatively static and serve to validate entries in the referencing table. For example, a reference entity called MARKETS can be created to store the valid set of areas where a company does business. In the same way, a VARRAY might be substituted to perform the same reference and validation. VARRAY constructs are stored inline. That means the VARRAY structure and data are stores in the same data block as the rest of the row as a RAW data type. Although they bear some similarity to PL/SQL tables, VARRAYSs are fixed size. c. ENCAPSULATION Encapsulation is the definition of a class with data members and functions. In other words, it is the mechanism that binds code and data together while protecting or hiding the encapsulation from outside of the class. The actual implementation is hidden from the user, who only sees the interface. As an illustrative example, think of a plane engine. You can open it and see that it is there, and the plane pilot can start the ignition. The engine causes the plane to move. Although you can see the motor, the
513 inner functions are hidden from your view. You can appreciate the function that the motor performs without ever knowing all the details of what occurs inside or even how. d. INHERITANCE Inheritance is the ability of one class to inherit the structure and behavior of its ancestor. Inheritance allows an object to inherit a certain set of attributes from another object while allowing the addition of specific features. e. POLYMORPHISM Polymorphism is the ability of different objects in a class hierarchy to have different behaviors in response to the same message. Polymorphism derives its meaning from the Greek for “many forms.” A single behavior can generate entirely different responses from objects in the same group. Within the framework of the program, the internal mechanism determines specific names for different purposes and is known as function overloading. If we consider the perspective of moving OODBs closer to the middle, we discover the following points. • Object-relational DBs require a generalized objectoriented programming language interface versus a specific, hard-coded one. Normally, objectoriented DBs are geared for a specific programming language. • Object-oriented DB architectures have been known historically for their slow performance. • Object-oriented DBs are, by design, limited in terms of scalability. • Object-oriented DBs are not designed for high concurrency.
III. INDUSTRIAL SYSTEMS—STANDARDS A. Standards—The Object Model of the ODMG ODMG was created at the initiative of a set of object database vendors with the goal of defining and promoting a portability standard for OODBs. It had eight voting members (O2 Technology, Versant, Poet, Gemstone, Objectivity, Object Design, Uni SQL, IBEX), reviewing members (among which, CERN, Hewlett Packard, Microsoft, Mitre, etc.), and academic members (D. DeWitt, D. Maier, S. Zdonic, M. Carey, E. Moss, and M. Solomon).
514 It provides the potential advantages of having standards: (1) portability, (2) interoperability, (3) compares commercial products easily and consists of the modules: 1. Object model It describes the specific object model supported by the ODMG system, being an extension of the OMG (Object Management Group) model. 2. Object Definition Language (ODL) It is used to specify the schema of an object database having a specific schema with a C binding. 3. Object query Language (OQL) An SQL like language that can be used as a stand-alone language for interactive queries or embedded in a programming language. In the following we provide an overview of the ODMG model. ODMG addresses objects and literals. An object has both an OID and a state where a literal has only a value but no OID. An object is described by four characteristics: identifier, name, lifetime, and structure. On the other hand, three types of literals are recognized: atomic, collection, and structured. The notation of ODMG uses the keyword “interface” in the place of the keywords “type” and “class.” Below is an example: interface Object { ... boolean same_as (...) Object copy ( ); void deleted ( ); } Interface Collection: Object { ... boolean is_empty( ); ... }; o.same_as (p) q = o.copy()
1. Built-In Interfaces for Collection Objects Any collection object inherits the basic Collection interface. Given a collection object o, the o.cardinality() operation returns the elements in the collection. O.insert_element(e) and o.remove_ element(e) insert or remove an element from the collection O. The ODMG object model uses exceptions for reporting errors or particular conditions. For example, the ElementNotfound exception in
Data Modeling: Object-Oriented Data Model the Collection interface would be raised by the o.remove_element(e) operation if e is not an element in the collection o. Collection objects are further specialized into Set, List, Array, and Dictionary.
2. Atomic (User-Defined) Objects In the ODMG model, any user-defined object that is not a collection object is called an atomic object. They are specified using the keyword class in ODL. The keyword Struct corresponds to the tuple constructor. A relationship is a property that specifies that two objects in the database are related together. For example, work_for relationship of Employee and has_emps relationship of Department, Interfaces, Classes, and Inheritance. An interface is a specification of the abstract behavior of an object type, which specifies the operation signatures. An interface is noninstantiable—that is, one cannot create objects that correspond to an interface definition. A class is a specification of both the abstract behavior and abstract state of an object type, and is instantiable. Another object-oriented feature supported by ODMG is inheritance, which is implemented by extends keyword. The database designer can declare an extent for any object type that is defined via a class declaration and it contains all persistent objects of that class. Extents are also used to automatically enforce the set/subset relationship between the extents of a supertype and its subtype. A key consists of one or more properties (attributes or relationships) whose values are constrained to be unique for each object in the extent.
3. The ODL The ODL’s main use is to create object specifications—that is, classes and interfaces. The user can specify a database schema in ODL independently of any programming language. Moreover it is feasible to use the specific language bindings to specify how ODL constructs that can be mapped to constructs in specific programming languages, such as C, Java, SMALLTALK. There may be several possible mappings from an object schema diagram (or extended entity relational schema diagram) into ODL classes. The mapping method follows the following steps: (1) the entity types are mapped into ODL classes, and (2) inheritance is done using extends. Here we have
Data Modeling: Object-Oriented Data Model to stress that there is no direct way to support multiple inheritance. Below is an example: class Person extent persons { attribute struct Pname ... }; Class Faculty extends Person ( extent faculty ) { attribute string rank; .... }; The class Faculty extends the class Person which is in turn an extension of class person. Therefore the leaf class Faculty contains all the attributes of all the ancestor classes (person and Person). In the following example we have two classes (Rectangle, Circle) emanating from the same class GeometryObject. Interface GeometryObject { attribute ... ... }; Class Rectangle : GeometryObject { attribute ... ... }; Class Circle : GeometryObject { attribute ... ... };
4. The OQL The OQL is the query language proposed for the ODMG object model in order to be able to query OODBs. It is designed to work closely with the programming languages for which an ODMG binding is defined, such as C, SMALLTALK, and Java. It is similar to SQL (the standard query language for relational databases) and the main difference is the handling of path expressions related to the class framework used. For example:
515 SELECT d.name FROM d in departments WHERE d.college = ‘Engineering’ This query results in the set of the names of the departments on the Engineering college. The type of the result is a bag of strings (i.e., duplicates allowed). The query results have to involve in many cases path expressions (i.e., the path from the root parent class to the class required via inheritance links). The following example illustrates the use of path expressions in OQL queries: departments; csdepartment; csdepartment.chair; csdepartment.has_faculty; Csdepartment.has_faculty.rank ( x ) SELECT f.rank FROM f in csdepartment.has_faculty; Then we can write the following query to retrieve the grade point average of all senior students majoring in computer science, with the result ordered by gpa, and within that by last and first name: SELECT
FROM WHERE ORDER BY
struct (last_name: s.name.lname, first_name: s.name.fname, gpa: s.gpa) s in csdepartment. has_majors s.class = ‘senior’ gpa DESC, last_name ASC, first_name ASC;
Another facility provided is the specification of views as named queries. For instance we can define the following view: DEFINE has_minors(deptname) AS SELECT s FROM s in students WHERE s.minors_in.dname = deptname; Then a potential query searching for the students that took as minor Computer Science the following query can be formed: Has_minors(‘Computer Science’); QOL also provides aggregate functions, and quantifiers. The following example query returns the number of students that took as minor Computer Science:
516 count (s in has_minors(‘Computer Science’)); avg (SELECT s.gpa FROM s in students WHERE s.major_in.dname = ‘Computer Science’ and s.class = ‘senior’); The following query illustrates the use of the “forall” predicate. Are all computer science graduate students advised by computer science faculty? For all g in ( SELECT s FROM s in grad_students WHERE s.majors_in.dname = ‘Computer Science’) : g.advisor in csdepartment. has_faculty; The following query illustrates the use of the “exists” predicate. The query searches for any graduate computer science major having a 4.0 gpa exists g in ( SELECT s FROM s in grad_students WHERE s. majors_in.dname = ‘Computer Science’ AND g.gpa = 4;
5. Object Database Conceptual Design An important issue that arises here is the conceptual design of an object-oriented schema. Can we benefit from the traditional design techniques applied in relational databases? There are important differences between conceptual design of object databases and relational databases. As regards relationships, object databases (ODB) exploit object identifiers resulting in OID references while a relational database system (RDB) references to tuples by values or by externally specified/generated foreign keys. Regarding inheritance, the ODB approach exploits inheritance constructs such as derived (:) and EXTENDS, and this is a fundamental feature of the approach, whereas RDBs do not offer any built in support for inheritance. Of course object relational systems and extended RDB systems are adding inheritance constructs. An important issue for compatibility and reusability of data is the mapping of an extended entity relationship (EER) schema to an ODB schema. The following steps have to be followed:
Data Modeling: Object-Oriented Data Model Step 1: Create an ODL class for each EER entity type or subclass Step 2: Add relationship properties or reference attributes for each binary relationship into the ODL classes that participate in the relationship Step 3: Include appropriate operations for each class Step 4: An ODL class that corresponds to a subclass in the EER schema inherits the type and methods of its superclass in the ODL schema Step 5: Weak entity types can be mapped in the same way as regular types Step 6: Declare a class to represent the category and define 1:1 relationships between the category and each of its superclasses Step 7: An n-ary relationship with degree n > 2 can be mapped into a separate class, with appropriate references to each participating class.
B. OO Systems: O2, Object Store, etc. In this section we refer briefly to existing object-oriented and object-relational database industrial approaches.
1. Example of ODBMS—O2 System This system has historical importance as it was the only European DBMS and one of the few OODBMSs. Its architecture is based on a kernel, called O2Engine, and is responsible for much of the ODBMS functionality. The implementation of O2Engine at the system level is based on a client/server architecture. At the functional level, three modules (storage component, object manager, schema manager) implement the functionality of the DBMS. Data definition in O2 is carried out using programming languages such as C or Java. Data manipulation in O2 is carried out in several ways. O2 supports OQL as both an ad hoc interactive query language and as an embedded function in a language. There are two alternative ways for using OQL queries. For example: Q1: d_Bag engineering_depts; departments ->query(engineering_depts, ‘’this.college = “Engineering’’ ‘’); Q2: d_Bag engineering_dept_names;
Data Modeling: Object-Oriented Data Model d_oql_Query q0(‘’select d.dname from d in departments where d.college = “Engineering” “); d_oql_execute (q0, engineering_dept_names); Both queries (Q1and Q2) search for the names of the departments of the college of Engineering.
2. Object Relational Systems—Complex Types and Object Orientation Object-relational DBs are the evolution of pure object-oriented and relational DBs. The convergence of those two disparate approaches came about as a realization that there were inherent shortcomings in the existing paradigms when considered individually. Observers of information technology can still see the debate that rages in the software community over the ultimate character of object-relational DBs or objectrelational DBMSs (ORDBMS). Industry experts are frequently critical of objectrelational DBs because they usually demonstrate a limited or nonexistent ability to perform certain relational or object-oriented tasks in comparison to their pure counterparts. A case in point would be the limited support for inheritance in object-relational DBs, a feature fully supported in the object-oriented paradigm. Others argue that inheritance is of such limited consequence when employed in the storage of data and datacentric objects that its pursuit is a waste of effort. Each point of view is almost always driven by the particular background of the individual presenting the criticism. Because the object-relational paradigm is a compromise of two very different architectures, the most effective definition of its ultimate character will be devised by those who have an unbiased appreciation for relational and object-oriented systems alike. An issue to be tackled can be the question whether ORDBMSs should begin with a relational foundation with added object-orientation of the reverse. From a conceptual level relational DBs have been far more successful than their OODBMS counterparts. It should not be any surprise then, that virtually all major vendors are approaching the object-relational arena by extending the functionality of existing relational DB engines. A case in point would be IBM, Oracle, and Informix, whose efforts to create a “universal server” began by extending their core relational engines. Anything that causes such controversy should be worth the effort, which begs two important questions:
517 First, what factors have led to the development of object-relational DBs? And second, what characterizes an accurate definition of an object-relational DB? The general answer to the first question is that developers need a more robust means of dealing with complex data elements without sacrificing the access speed for which relational DBs have become known. To answer the second question, there are several characteristics that, as a minimum, must be included to achieve a true object-relational structure. Those characteristics are the following: • Retrieval mechanism, that is, a query language like SQL but one adapted to the extended features of the ORDBMS, the retrieval mechanism must include not only relational navigation but also object-oriented navigational support • Support for relational features like keys, constraints, indexes, and so on • Support for referential integrity as it is currently supported by the relational paradigm • Support for the object metamodel (classes, types, methods, encapsulation, etc) • The ability to support user-defined data types • Support for the SQL3 ANSI standard A well known extension requirement regards spatial information. Oracle, Informix, and IBM DB2 offer capabilities for user-defined data types and functions that are applied to them. All of them propose the nested storage model, where a complex attribute is stored in an attribute (they call it in-line). They claim it is more efficient when queries regarding an object reposed since self-join is avoided. This is a good argument for choosing this approach. DB2 Spatial extender provides a comprehensive set of spatial predicates, including comparison functions (contains, cross, disjoint, equals, intersects etc.), relationship functions (common point, embedded point, line cross, area intersect, interior intersect, etc.), combination functions (difference, symmetric difference, intersection, overlay, union, etc.), calculation functions (area, boundary, centroid, distance, end point, length, minimum distance, etc.) and transformation functions (buffer, locatealong, locate between, convexhull, etc.). The Informix Dynamic Server with Universal Data Option offers type extensibility. So-called DataBlade modules may be used with the system, thus offering new types and associated functions that may be used in columns of database tables. The Informix Geodetic DataBlade Module offers types for time instants and intervals as well as spatial types for points, line
518 segments, strings, rings, polygons, boxes, circles, ellipses, and coordinate pairs. Since 1996, the Oracle DBMS has offered a socalled spatial data option, also termed “spatial cartridge,” that allows the user to better manage spatial data. Current support encompasses geometric forms such as points and point clusters, lines and line strings, and polygons and complex polygons with holes.
IV. CONCLUSION—RESEARCH ISSUES AND PERSPECTIVES The presence of rich voluminous and complex data sets and related application domains rises requirements for object oriented features in database support. Significant research has been devoted in this area of integrating such object-oriented features in database technology. Important effort has been committed to either as pure OODBMS or as extensions of the relational approach (ORDBMSs). The object-oriented model offers the advantageous features regarding: • Flexibility to handle requirements of new database applications • Complex objects and operations specification capabilities • OODBs are designed so they can be directly—or seamlessly—integrated with software that is developed using OOPLs Nevertheless, there are several interesting issues for further research in the context of object-oriented and object relations database systems. Some of them are: • View definition and management on the ODB/ORDB schema • Querying the schema, in terms of class names, attributes, and behavior features • Optimization in several levels such as path expression queries, storage, and retrieval issues • Access mechanisms, indexing, and hashing techniques specialized for object-oriented and object relations databases • Dynamic issues such as schema evolution (class removal or moving in the inheritance tree and related instances management) Last but not least one should take into account the requirements and the tremendous potential of the World Wide Web content viewed as a loosely struc-
Data Modeling: Object-Oriented Data Model tured database of complex objects. There the applicability of the object-oriented approach both for modeling and storage/retrieval is very promising.
V. CASE STUDY: OBJECT RELATIONAL SOLUTIONS FOR MULTIMEDIA DATABASES Hereafter we will present the soulutions provided by different ORDMS for video storage and retrieval. IBM’s DB2 system supports video retrieval via “video extenders.” Video extenders allow for the import of video clips and querying these clips based on attributes such as the format, name/number, or description of the video as well as last modification time. Oracle (v.8) introduced integrated support for a variety of multimedia content (Oracle Integrated Multimedia Support). This set of services includes text, image, audio, video, and spatial information as native data types, together with a suite of data cartridges that provide functionality to store, manage, search, and efficiently retrieve multimedia content from the server. Oracle 8i has extended this support with significant innovations, including its ability to support crossdomain applications that combine searches of a number of kinds of multimedia forms and native support for data in a variety of standard Internet formats, including JPEG, MPEG, GIF, and the like. Oracle has packaged its complex datatype support features together with management and access facilities into a product called Oracle 8i interMedia. This product enables Oracle 8i to manage complex data in an integrated fashion with other enterprise data, and permits transparent access to such data through standard SQL using appropriate operators. It also includes Internet support for popular web-authoring tools and web servers. It offers online Internet-based geocoding services for locator applications, and powerful text search features. Informix’s multimedia asset management technology offers a range of solutions for media or publishing organizations. In fact, Informix’s database technology is already running at the core of innovative multimedia solutions in use. Informix Dynamic Server with Universal Data Option enables effective, efficient management of all types of multimedia content—including images, sound, video, electronic documents, web pages, and more. The Universal Data Option enables query, access, search, and archive digital assets based on the content itself. Informix’s database technology provides: cataloging, retrieval, and reuse of rich and complex media types like video, audio, images, time series, text, and more—enabling viewer ac-
Data Modeling: Object-Oriented Data Model Table I
519
Comparative Presentation of Multimedia Retrieval Capabilities of Commercial Object Relational Systems
Color
Percentage, layout (hist)
Texture
Similar
Shape Spatial relationships
Oracle Visual Info Retrieval
QBIC
Informix
Global, Local color
Excalibur (Image Dblade)
Graininess, smoothness
Excalibur (Image Dblade)
DB2
Excalibur (Image Dblade)
Show position
Scene detection
MEDIAstra (Video Dblade)
Object detection
MEDIAstra (Video Dblade) MPEG-4 approach
Captions & Annotations
DbFlix—meta-data storage Time, frame, content based approach.
Description (img) Format, frame rate, tracks (video) Format, last update (audio)
Extend functionality
Datablades
DB2 Extenders
Sound
Muscle Fish Audio (content based queries)
Limited
Feature vector (Exc) Video reproduction (Media)
Feature layout Voice to text
Other
Manual annotation
Feature layout
Ideal for Video on Demand
cess to audio, video, and print news sources; highperformance connectivity between your database and Web servers providing on-line users with access to upto-the-minute information; tight integration between your database and web development environments for rapid application development and deployment; and extensibility for adding features like custom news and information profiles for viewers (Table I).
SEE ALSO THE FOLLOWING ARTICLES Cohesion, Coupling, and Abstraction • Database Administration • Data Modeling: Entity-Relationship Data Model • Multimedia • Object-Oriented Programming • Relational Database Systems
BIBLIOGRAPHY Atkinson, M., Bancillon, F., DeWitt, D., Dittrich, K., Maier, D., and Zdonik, S. (1992). The Object-Oriented Database System Manifesto. In Building an Object-Oriented Database System: The Story of O2, Bancillon et al. (eds), San Francisco: Morgan Kaufmann. Also in Proc. Int. Conf. Deductive Object-Oriented Databases, Kyoto, Japan, Dec. 1989. Bancillon, F., Briggs, T., Khoshafian, S., Valduriez, P. (1987).
FAD-a Simple and Powerful Database Language, Proceedings of VLDB 1987. Chou, Ç., et al. (1985). Design and implementation of the Wisconsin storage system, Software-Practice and Experience, 15(10), Oct. 1985. Codd, E. F. (1970). A Relational Model for Large Shared Data Banks, Communications of the ACM, 13, 377–387. Copeland, G., and Khoshafian, S. (1986). Identity and Versions for Complex Objects, Proceedings of the International Workshop on Object-Oriented Database Systems, September, Pacific Grove, CA. Dahl, O. J., and Nygaard, K. (1966). SIMULA—An ALGOLbased simulation language. Communications of the ACM, 9(9):671–678. Date, C. J., and Darwen, J. (1996). Foundation for object/relational databases, the third manifesto. Reading, MA: Addison-Wesley. Davis, J. R. (1998). IBM’s DB2 Spatial Extender: Managing GeoSpatial Information Within the DBMS. Technical report. IBM Corporation. Deux, Ï., et al. (1990). The story of O2, IEEE Transactions on Knowledge and Data Engineering, 2(1), March 1990. Dittrich, K. (1986). Object-Oriented Database Systems: The Notion and the Issues. Proceedings of the International Workshop on Object-Oriented Database Systems, September, Pacific Grove, CA. Dittrich, K., and Dayal, U. (eds.). (1986). Proceedings of the International Workshop on Object-Oriented Database Systems, September, Pacific Grove, CA. Ehoshafian, S., (1990). Insight into object-oriented databases. Information and Software Technology, 32(4).
520 Goldberg, A., and Robson, D. (1982). Smalltalk 80: The language and its implementation. Reading, MA: Addison-Wesley. http://www.software.ibm.com/data/dbs/extenders. Ìaier, D. (1986). Why Object-Oriented Databases Can Succeed Where Others Have Failed, Proceedings of the International Workshop on Object-Oriented Database Systems, September, Pacific Grove, CA. Informix DataBlade Technology (1999). Transforming Data into Smart Data. Khoshafian, S. (1993). Object Oriented Databases. New York: John Wiley. Kim, W. (1991). Introduction to Object-Oriented Database. Cambridge. MA: The MIT Press. Lynbaek, P., and Kent, W. (1986). A Data Modeling Methodology for the Design and Implementation of Information Systems, Proceedings of the International Workshop on ObjectOriented Database Systems, September, Pacific Grove, CA. Maier, D., Stein, J. (1986). Indexing in an Object-Oriented DBMS, Proceedings of the International Workshop on ObjectOriented Database Systems, September, Pacific Grove, CA.
Data Modeling: Object-Oriented Data Model Maiers, R. (1990). Making Database Systems Fast Enough for CAD, In Object-Oriented Concepts, Databases and Applications, (Kim, W., and Lochovsky, F., eds.). Reading, MA: AddisonWesley Publishing. Manola, F. (1989). An Evaluation of Object-Oriented DBMS Developments, GTE Laboratories Incorporated, TR-0066-10-89-165. Oracle 8 SQL Type Data Definition Language. (1997). An Oracle Technical White Paper. Piatini, M., and Diaz, O., eds. (2000). Advanced database technology and design. Norwood, MA: Artech House. Stonebraker, M., Rowe, L., Lindsay, B., Gray, J., and Carey, M. (1990). Third-Generation Data Base System Manifesto. Memorandum No. UCB/ELR. M90/23, April. The Committee for Advanced DBMS Function, University of California, Berkeley, CA. Tsichritzis, D., and Lochovsky, F. (1982). Data models. New York: Prentice Hall. Wilkinson, K., Lyngbaek, P., Hasan, W. (1990). The Iris Architecture and Implementation, IEEE Transactions on Knowledge and Data Engineering, 2(1), March 1990.
Data Warehousing and Data Marts Zhengxin Chen University of Nebraska, Omaha
I. II. III. IV. V.
OVERVIEW OPERATIONAL SYSTEMS AND WAREHOUSE DATA ARCHITECTURE AND DESIGN OF DATA WAREHOUSES DATA WAREHOUSES AND MATERIALIZED VIEWS DATA MARTS
VI. VII. VIII. IX. X.
METADATA DATA WAREHOUSE AND THE WEB DATA WAREHOUSE PERFORMANCE DATA WAREHOUSES, OLAP, AND DATA MINING CONCLUSION
GLOSSARY
I. OVERVIEW
data mart A departmental subset of the warehouse data focusing on selected subjects. A data warehouse could be a union of all the constituent data marts. data mining Also referred to as knowledge discovery in databases, it is the nontrivial extraction of implicit, previously unknown, interesting, and potentially useful information (usually in the form of knowledge patterns or models) from data. data warehouse A subject-oriented, integrated, timevarying, nonvolatile collection of data that is used primarily in organizational decision making. It is a collection of materialized views derived from base relations that may not reside at the warehouse metadata Data about data, or what the warehouse data look like (rather than warehouse data themselves). In a warehousing environment, metadata include semantic metadata, technical metadata, and core warehouse metadata. on-line analytical processing (OLAP) Applications dominated by stylized queries that typically involve group-by and aggregation operators for analysis purpose. Web The World Wide Web is a hypermedia-based system that provides a simple “point and click” method of browsing information on the Internet using hyperlinks.
The complexity involved in traditional distributed database systems (DDBS) has stimulated organizations to find alternative ways to achieve decision support. Data warehousing is an emerging approach for effective decision support. According to the popular definition given by the “godfather” of data warehousing technique, William Inmon (1996), a data warehouse is a “subject-oriented, integrated, timevarying, nonvolatile collection of data that is used primarily in organizational decision making.” Although considered by some business people as a low-key answer for the “failed” DDBS, data warehousing does take advantage of various techniques related to distributed and parallel computing. Data warehousing provides an effective approach to deal with complex decision support queries over data from multiple sites. The key to this approach is to create a copy (or derivation) of all the data at one location, and to use the copy rather than going to the individual sources. Note that the original data may be on different software platforms or belong to different organizations. There are several reasons for the massive growth of volumes of data in the data warehouse environment: 1. Data warehouses collect historical data 2. Data warehouses involve the collection of data to satisfy unknown requirements
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
521
522 3. Data warehouses include data at the summary level, but may also include data at the very detailed atomic or granule level 4. Data warehouses contain external data as well (e.g., demographic, psychographics, etc.). Significant amounts of external data are collected to support a variety of data mining activities for prediction or knowledge discovery. For example, data mining tools would use this external data to predict who is likely to be a good customer or how certain companies are likely to perform in the marketplace. Data warehouses contain consolidated data from many sources (different business units), spanning long time periods and augmented with summary information. Warehouses are much larger than other kinds of databases, typical workloads involve ad hoc, fairly complex queries, and fast response times are important. Data warehousing encompasses frameworks, architectures, algorithms, tools, and techniques for bringing together selected data from multiple databases or other information sources into a single repository suitable for direct querying or analysis. Data warehousing is especially important in industry today because of a need for enterprises to gather all of their information into a single place for in-depth analysis, and the desire to decouple such analysis from their on-line transaction processing (OLTP) systems. Since decision support often is the goal of data warehousing, clearly warehouses may be tuned for decision support, and perhaps vice versa. In its simplest form, data warehousing can be considered as an example of asynchronous replication, in which copies are updated relatively infrequently. However, a more advanced implementation of data warehousing would store summary data or other kinds of information derived from the source data. In other words, a data warehouse stores materialized views (plus some local relations if needed). It is common in a data warehousing environment for source changes to be deferred and applied to the warehouse views in large batches for efficiency. Source changes received during the day are applied to the views in a nightly batch window (the warehouse is not available to the users during this period). Most current commercial warehousing systems focus on storing the data for efficient access, and on providing extensive querying facilities at the warehouse. Maintenance of warehousing data (in a large degree, maintenance of materialized views) is thus an important problem. The need for data warehousing techniques is justified largely due to decision support queries, which are ad hoc user queries in various business applications. In
Data Warehousing and Data Marts these applications, current and historical data are comprehensively analyzed and explored, identifying useful trends and creating summaries of the data, in order to support high-level decision making in data warehousing environment. A class of stylized queries typically involve group-by and aggregation operators. Applications dominated by such queries are referred to as on-line analysis processing (OLAP), which refers to applications dominated by stylized queries that typically involve group-by and aggregation operators for analysis purpose. Such queries are extremely important to organizations to analyze important trends so that better decisions can be made in the future. In addition, most vendors of OLAP engines have focused on Internet-enabling their offerings. The true promise of the Internet is in making OLAP a mainstream technology, that is, moving OLAP from the domain of analysts to consumers. E-commerce has emerged as one of the largest applications of the Internet in decision support. We will revisit issues related to OLAP after we examine basic features of data warehousing.
II. OPERATIONAL SYSTEMS AND WAREHOUSE DATA In order to understand the nature of data warehousing, we should first pay attention to the relationship between the parallel universes of operational systems and the warehouse data. Operational data stores (ODSs), also called operational systems, feature standard, repetitive transactions that use a small amount of data as in traditional database systems, while warehouse systems feature ad hoc queries and reports that access a much larger amount of data. High availability and real-time updates are critical to the success of the operational system, but not to the data warehouse. Operational systems are typically developed first by working with users to determine and define business processes. Then application code is written and databases are designed. These systems are defined by how the business operates. The primary goal of an organization’s operational systems is efficient performance. The design of both applications and databases is driven by OLTP performance. These systems need the capacity to handle thousands of transactions and return information in acceptable user-response time frames, often measured in fractions of a second. By contrast, data warehouses start with predefined data and vague or unknown usage requirements. Operational systems have different structures, needs, requirements and objectives from data warehouses. These variations in data, access, and usage prevent organizations from simply using existing operational systems as data ware-
Data Warehousing and Data Marts house resources. For example, operational data is short-lived and changes rapidly, while warehouse data has a long life and is static. There is a need for transformation of operational data to warehouse data. The architecture of a data warehouse differs considerably from the design and structure of an operational system. Designing a data warehouse can be more difficult than building an operational system because the requirements of the warehouse are often ambiguous. In designing a data warehouse, an organization must understand current information needs as well as likely future needs. This requires a flexible design that ensures the data warehouse can adapt to a changing environment. Both operational and warehouse systems play an important role in an organization’s success and need to coexist as valuable assets. This can be done by carefully designing and architecting the data warehouse so that it takes advantage of the power of operational data while also meeting the unique needs of the organization’s knowledge workers.
III. ARCHITECTURE AND DESIGN OF DATA WAREHOUSES
523 • A relational database for data storage. As the data warehouse proper, it stores the corporate data. Here data volumes are very high as multi-terabyte data warehouses are beginning to appear more frequently. • Data marts. Departmental subsets of the warehouse data focusing on selected subjects. The data mart is where departmental data is stored, and often various external data items are added. The data volumes are usually 15–30% of warehouse sizes, and the envelope is being pushed toward the terabyte limit. These databases are also usually either based on star schemas or are in a normalized form. They mostly deal with the data space, but at times some multidimensional analysis is performed. • Back end. System components providing functionality such as extract, transform, load and refresh data and front end such as OLAP and data mining tools and utilities. • Metadata. The system catalogs associated with a warehouse are very large, and are often stored and managed in a separate database called a metadata repository. • Other components. These depend on the design methods and the specific needs of the organizations.
A. Data Warehouse Components The data warehouse is an integrated environment, containing integrated data, detailed and summarized data, historical data, and metadata. An important advantage of performing data mining in such an environment is that the data miner can concentrate on mining data, rather than cleansing and integrating data. Data warehousing provides an effective approach to deal with complex decision support queries over data from multiple sites. According to a popular definition, a data warehouse is a subject-oriented, integrated, time-varying, nonvolatile collection of data that is used primarily in organizational decision making. The key to the data warehousing approach is to create a copy of all the data at one location, and to use the copy rather than going to the individual sources. Data warehouses contain consolidated data from many sources (different business units), spanning long time periods, and augmented with summary information. Warehouses are much larger than other kinds of databases, typical workloads involve ad hoc, fairly complex queries, and fast response times are important. Since decision support often is the goal of data warehousing, clearly warehouses may be tuned for decision support, and perhaps vice versa. A typical data warehousing architecture consists of the following components (as in Fig. 1):
B. Data Warehouse Design There are four different views regarding the design of a data warehouse. The top-down view allows the selection of the relevant information necessary for the data warehouse. The data source view exposes the information being captured, stored, and managed by operational systems. The data warehouse view includes fact tables and dimension tables. The business query view is the perspective of data in the data warehouse from the viewpoint of the end user. The process of data warehouse design can take several approaches. The top-down approach starts with the overall design and planning, and is useful in cases when the business problems are clear and well understood. The bottomup approach starts with experiments and prototypes, and is useful in the early stages of business modeling and technology development. In addition, a combination of both can be used. In general, data warehouse design process consists of the following steps: 1. Choose a business process to model, such as sales, shipments, etc. 2. Choose the grain of the business process. The grain is the granularity (namely, fundamental, atomic) level of the data used in the fact table.
Data Warehousing and Data Marts
524 Data Marts
Extract
OLAP
Transform
Sources Load
Serve
Data mining
Data Operational data
Refresh
Data Sources Other tools
Administration Repository and Management
Figure 1 Data warehouse architecture.
The data stored there are the primary data based on which OLAP operations can be performed. 3. Choose the dimensions that will apply to records in the fact table(s). For example, time is a typical dimension. Dimensions are important and based on which various OLAP operations can be performed. 4. Choose the measures that will populate each fact table record. Measures are numeric additive quantities such as sales amount or profit. The architecture depicted in Fig. 1 is basically twotier, namely, warehouse and its front ends. A variation of data warehouse architecture consists of three tiers: the bottom tier is a warehouse database server, which is typically a relational database system; the middle tier is an OLAP server; and the top tier is a client, containing query and reporting tools.
perspective. However, this discussion requires some further technical clarification. For example, we said that a data warehouse consists of a copy of data acquired from the source data. What does this copy look like? In fact, we may need to distinguish between a “true” copy (duplicate), a derived copy, approximate duplicate, or something else. For this reason, we need to examine the concept of data warehouse in more depth. In fact, a data warehouse can be characterized using materialized views and indexing. In the following, we will examine these two issues. According to fundamentals of database management systems (DBMS), relational views are the most important asset of the relational model. Recall that we have the following basic concepts in relational databases:
A. Materialized Views
• Relation (base table): It is a stored table. • External view (virtual view, or simply view): It is a virtual table (derived relation defined in terms of base relations). • Materialized view: A view is materialized when it is stored in the database, rather than computed from the base relations in response to queries.
At the beginning of this article we provided a brief discussion on data warehouses based on a business
The general idea of the approach is to materialize certain expensive computations that are frequently
IV. DATA WAREHOUSES AND MATERIALIZED VIEWS
Data Warehousing and Data Marts inquired, especially those involving aggregate functions, such as count, sum, average, max, etc., and to store such materialized views in a multidimensional database (called a data cube) for decision support, knowledge discovery, and many other applications. Commercial relational database products are used to discard views immediately after they are delivered to the user or to a subsequent execution phase. The cost for generating the views is for one-time-use only instead of being amortized over multiple and/or shared results. Caching query results or intermediate results for speeding up intra- and interquery processing has been studied widely. All these techniques share one basic idea: the reuse of views to save cost. The benefit of using materialized views is significant. Index structures can be built on the materialized view. Consequently, database access to the materialized view can be much faster than recomputing the view. A materialized view is just like a cache, which is a copy of the data that can be accessed quickly. Materialized views are useful in new applications such as data warehousing, replication servers, chronicle or data recording systems, data visualization, and mobile systems. Integrity constraint checking and query optimization can also benefit from materialized views, but will not be emphasized in our current context. We now discuss the issue of what materialized views look like. The traditional relational database design has put emphasis on normalization. However, data warehouse design cannot be simply reduced to relational database design. In fact, frequently materialized views involve join operation, and they are no longer in high normal forms as discussed in DBMS. Although normalized data guarantees integrity constraints and avoiding anomalies, in the business community, it is not uncommon for people to feel that normalized designs are hard to comprehend; denormalized designs tend to be more self-explanatory, even though denormalized tables have longer records. Typical multi-attribute search-and-scan performance is better on denormalized data because fewer tables are involved than in normalized designs. Denormalization data provide an intuitive productive environment for users who need to be trained or retrained. On the other hand, denormalization is the greatest cultural hurdle for most incremental data mart design teams, because they are used to deal with OLTP. Redundancy of data is the result of denormalization. For example, two relations along with the joined result coexist. Another remark we want to make here is on the impact of ER modeling to data warehouse design. There are two schools of thought in enterprise data warehouse design. The ER normalized school still starts from the fundamentally normalized tables and
525 then spawn off subset data marts that are denormalized. In contrast, Ralph Kimball and his school endorse a consistent, denormalized star schema environment across the entire enterprise data warehouses.
B. Indexing Techniques Due to the close relationship between materialized views and indexing, it is worthwhile to take a look at the issue of indexing. Traditional indexing techniques can be used, but there are also additional issues which are unique in a data warehousing environment. The mostly read environment of OLAP systems makes the CPU overhead of maintaining indices negligible, and the requirement of interactive response times for queries over very large datasets makes the availability of suitable indices very important. • Bitmap index. The idea is to record values for sparse columns as a sequence of bits, one for each possible value. For example, the biological gender of a customer (male or female) can be represented using bitmap index. This method supports efficient index operations such as union and intersection; more efficient than hash index and tree index. • Join index. This method is used to speed up specific join queries. A join index maintains the relationships between a foreign key with its matching primary keys. The specialized nature of star schemas makes join indices especially attractive for decision support. Indexing is important to materialized views for two reasons: indexes for a materialized view reduce the cost of computation to execute an operation (analogous to the use of an index on the key of a relation to decrease the time needed to locate a specified tuple); and indexing reduces the cost of maintenance of the materialized views. One important problem in data warehousing is the maintenance of materialized views due to changes made in the source data. Maintenance of materialized views can be a very time-consuming process. There need to be some methods developed to reduce this time (one method is use of supporting views and/or the materializing of indexes).
V. DATA MARTS A. Why Data Marts As an important component of data warehousing architecture, a data mart is a departmental subset on
526 selected subjects. Therefore, a data mart is an application-focused data warehouse, built rapidly to support a single line-of-business application. Data marts still have all of the other characteristics of data warehouses, which are subject-oriented data that is nonvolatile, time-variant, and integrated. However, rather than representing a picture of the enterprise data, it contains a subset of that data which is specifically of interest to one department or division of that enterprise. The data that resides in the data warehouse is at a very granular level and the data in the data mart is at a refined level. The different data marts contain different combinations and selections of the same detailed data found at the data warehouse. In some cases data warehouse detailed data is added differently across the different data marts, while in other cases a specific data mart may structure detailed data differently from other data marts. In each case the data warehouse provides the granular foundation for all of the data found in all of the data marts. Because of the singular data warehouse foundation that all data marts have, all of the data marts have a common heritage and are able to be reconciled at the most basic level. There are several factors that lead to the popularity of the data mart. As data warehouses quickly grow large, the motivation for data marts increases. More and more departmental decision-support processing is carried out inside the data warehouse, as a result, resource consumption becomes a real problem. Data becomes harder to customize. As long as the data in the data warehouse are small, the users can afford to customize and summarize data every time a decision support analysis is done. But with the increase in magnitude, the user does not have the time or resources to summarize the data every time a decision support analysis is done. The cost of doing processing in the data warehouse increases as volume of data increases. The software that is available for the access and analysis of large amounts of data is not nearly as elegant as the software process smaller amounts of data. As a result of these factors, data marts have become a natural extension of the data warehouse. There are organizational, technological, and economic reasons why the data mart is so beguiling and is a natural out of the data warehouse. Data marts are attractive for various reasons: • Customization. When a department has its own data mart, it can customize the data as the data flows into the data mart from the data warehouse. The department can sort, select, and structure their own departmental data without consideration of any other department.
Data Warehousing and Data Marts • Relevance. The amount of historical data that is needed is a function of the department, not the corporation. In most cases, the department can select a much smaller amount of historical data than that which is found in the data warehouse. • Self-determination. The department can do whatever decision-support processing they want whenever they want, with no impact for resource utilization on other departments. The department can also select software for their data mart that is tailored to fit their needs. • Efficiency. The unit cost of processing and storage on the size of machine that is appropriate to the data mart is significantly less than the unit cost of processing and storage for the facilities that houses the data warehouse.
B. Types of Data Marts There are several kinds of data marts strategies, the following are two important types: 1. Dependent data marts. The architectural principles evolved into the concept of dependent data marts, which are smaller subsets of the enterprise warehouse specifically designed to respond to departmental or line-of business issues. In this strategy, data is loaded from operational systems into the enterprise warehouse and then subdivided into the smaller data marts. These marts rely on the central warehouse for their data and metadata rather than obtaining these from the operational systems. While these data marts can solve some of the performance issues and even some of the political issues, the financial problems and strategic issues were, if anything, exacerbated because the enterprise warehouse must be built before the data marts can be implemented. 2. Independent data marts. Independent data marts have been viewed as a viable alternative to the top-down approach of an enterprise warehouse. An organization can start small and move quickly, often realizing limited results in three to six months. Proponents of this approach argue that after starting with a small data mart, other marts can proliferate in departments or lines of business that have a need. In addition, by satisfying the various divisional needs, an organization can build its way to a full data warehouse by starting at the bottom and working up.
Data Warehousing and Data Marts
C. Multiple Data Marts In today’s environment, data marts have evolved to more cost effectively meet the unique needs of business lines. These marts can be built more quickly and contain only the data relevant to the specific business unit. For example, data marts may be divided based on departmental lines or by product type. These marts challenge the systems group in terms of managing the ongoing changes along with ensuring data consistency across the different marts. The architecture must provide for simplifying and reducing the costs in this multiple data mart management process. A data mart can overlap another data mart. Kimball et al. (1998) advised to consider 10 to 30 data marts for a large organization. For an organization, careful studies and plans are needed for developing multiple data marts. A matrix method for identifying all the possible data marts and dimensions was introduced. Subsets of the data are usually extracted by subject area and stored in data marts specifically designed to provide departmental users with fast access to large amounts of topical data. These data marts often provide several levels of aggregation and employ physical dimensional design techniques for OLAP. If a corporation begins their information architecture with an enterprise data warehouse, they will quickly realize the need for subsets of the data levels of summary. A recommended approach is through top-down development: we can spawn marts and select subject area and summary from the enterprise data warehouse. The other approach is where a data mart is built first to meet specific user group requirements, but is built according to a data plan and with roll-up as a goal. The need will arise for the marts to participate in a hierarchy in which detailed information from several subject-area data marts is summarized and consolidated into an enterprise data warehouse.
D. Networked Data Marts Increasingly, multiple data mart systems cooperate in a larger network creating a virtual data warehouse. This results in networked data marts. A large enterprise may have many subject-area marts as well as marts in different divisions and geographic locations. Users or workgroups may have local data marts to address local needs. Advanced applications, such as the Web, extend data mart networks across enterprise boundaries. In the network data mart world, users must be able to look at and work with multiple warehouses from a
527 single client workstation, requiring location transparency across the network. Similarly, data mart administrators must be able to manage and administer a network of data marts from a single location. Implementation and management of data mart networks not only imposes new requirements on the mart relational database management system (RDBMS), but more importantly requires tools to define, extract, move, and update batches of information as self-consistent units on demand. It also requires a whole new generation of data warehouse management software to support subset, catalog, schedule, and publish/subscribe functions in a distributed environment.
VI. METADATA A. Basics of Metadata Data warehousing must not only provide data to knowledge workers, but also deliver information about the data that defines content and context, providing real meaning and value. This information about data is called metadata. The coming of data warehouses and data mining has significantly extended the role of metadata in the classical DBMS environment. Metadata describe the data in the database, they include information on access methods, index strategies, and security and integrity constraints, as well as policies and procedures (optional). Metadata become a major issue with some of the recent developments in data management such as digital libraries. Metadata in distributed and heterogeneous databases guides the schema transformation and integration process in handling heterogeneity, and are used to transform legacy database systems to new systems. Metadata can be used for multimedia data management (metadata itself could be multimedia data such as video and audio). Metadata for the Web includes information about various data sources, locations, and resources on the Web as well as usage patterns, policies and procedures. Metadata (such as metadata in repository) can be mined to extract useful information in cases where the data themselves are not analyzable. For example, the data are not complete, or the data are unstructured. The coming of data warehouses and data mining has significantly extended the role of metadata in the classical DBMS environment. Metadata describe the data in the database, they include information on access methods, index strategies, security and integrity constraints, as well as policies and procedures (optional). Every software product involved in loading, accessing,
Data Warehousing and Data Marts
528 or analyzing the data warehouse requires metadata. In each case, metadata provides the unifying link between the data warehouse or data mart and the application processing layer of the software product.
B. Metadata for Data Warehousing Metadata for warehousing include metadata for integrating the heterogeneous data sources. Metadata can guide the transformation process from layer to layer in building the warehouse, and can be used to administer and maintain the warehouse. Metadata is used to extract answers to the various queries posed. Figure 2 illustrates metadata management in a data warehouse. The metadata repository stores and maintains information about the structure and the content of the data warehouse components. In addition, all dependencies between the different layers of the data warehouse environment, including operational layer, data warehouse layer, and business layer, are represented in this repository. Figure 2 also shows the role of three different types of metadata: 1. Semantic (or business) metadata. These kinds of data intend to provide a business-oriented description of the data warehouse content. A repository addressing semantic metadata should cover the types of metadata of the conceptual enterprise model, multidimensional data model, etc., and their interdependencies. 2. Technical metadata. These kinds of data cover information about the architecture and schema
with respect to the operational systems, the data warehouse, and the OLAP databases, as well as the dependencies and mappings between the operational sources, the data warehouse, and the OLAP databases on the physical and implementation level. 3. Core warehouse metadata. These kinds of data are subject-oriented and are based on abstractions of the real world. They define the way in which the transformed data are to be interpreted, as well as any additional views that may have been created. A successful data warehouse must be able to deliver both the data and the associated metadata to users. A data warehousing architecture must account for both. Metadata provides a bridge between the parallel universes of operational systems and data warehousing. The operational systems are the sources of metadata as well as operational data. Metadata is extracted from individual operational systems. This set of metadata forms a model of the operational system. This metadata includes, for example, the entities /records/ tables and associated attributes of the data source. The metadata from multiple operational data sources is integrated into a single model of the data warehouse. The model provides data warehouse architects with a business model through which they can understand the type of data available in a warehouse, the origin of the data, and the relationships between the data elements. They can also provide more suitable terms for naming data than are usually present in the operational systems. From this business model, physical database design can be engineered and the actual data warehouse can be created.
Front end tools: Ad-hoc queries, OLAP, data mining, etc. Business layer Conceptual enterprise model Data model Knowledge model Multidimensional data model
Semantic Metadata
Core Warehouse Metadata
Technical Metadata
Metadata Repository
Data warehouse layer Data warehouse Data marts
Operational layer Relational DBMS Legacy systems Flat files
Figure 2 Metadata management in a data warehouse.
Data Warehousing and Data Marts The metadata contained in the data warehouse and data mart models is available to specialized data warehousing tools for use in analyzing the data. In this way the data and metadata can be kept in synchronization as both flow through the data warehouse distribution channel from source to target to consumer.
C. Metadata in Data Marts Metadata in the data mart serves the same purpose as metadata in the data warehouse. Data mart metadata allows the data mart decision-support user to find out where data are in the process of discovery and exploration. Note that types of metadata form a hierarchy: on the topmost are the metadata for the data warehouse, underneath are metadata for mappings and transformations, followed by metadata for various data sources. This observation explains the relationship between metadata and multitiered data warehouse, which is built to suit the customers’ needs and economics, spanning the spectrum from an enterprise-wide data warehouse to various data marts. Since multitiered data warehouses can encompass the best of both enterprise data warehouses and data marts, they are more than just a solution to a decision support problem. Multitiered implies a hierarchy, with a possible inclusion of a networked data mart layer within the hierarchy. In order to build a hierarchy of data warehouses, a sound data plan must be in place for a strong foundation. It makes no difference whether a corporation starts at the bottom of the hierarchy or the top—they must have a goal in mind and a plan for relating the various levels and networks. The data plan cannot be constructed or managed without an active metadata catalog. Development of data marts could provide tremendous contribution to the metadata. Along with a robust metadata catalog, a tool that reverse-engineers the various data marts’ metadata into a logical unit would be of tremendous value. Reliable algorithms can be used to scan the catalog and group the related items from each data mart, suggesting how they should be combined in a higher level data warehouse.
VII. DATA WAREHOUSE AND THE WEB An appropriate platform for building data warehouses and for broader deployment of OLAP is the Internet/intranet/World Wide Web, for various reasons. • The Web provides complete platform independence and easy maintenance.
529 • The skills and techniques used to navigate and select information, as well as the browser interface are the same as for all other web-based applications. • With increased security, the Internet can be used as an inexpensive wide area network (WAN) for decision-support and OLAP applications. Web-enabled data warehouses deliver the broadest access to decision support. In order to understand how to create a web-based data warehouse architecture for maximum growth and flexibility, we need to take a look at issues related to the Internet, as well as intranets.
A. The Impact of the Internet The Internet has opened up tremendous business opportunities needing nontraditional categories of information. The true promise of the Internet is in making OLAP a mainstream technology, that is, moving OLAP from the domain of analysts to consumers. E-commerce has emerged as one of the largest applications of the Internet in decision support. The basic concepts of data warehousing and aggregation have naturally made their way onto the Web. In fact, some of the most popular web sites on the Internet are basically databases. For example, search engines such as Alta Vista and Lycos attempt to warehouse the entire Web. Aggregation as a means to navigate and comprehend the vast amounts of data on the Internet has to also be recognized. Directory services such as Yahoo and Excite attempt to aggregate the entire Web into a category hierarchy and give users the ability to navigate this hierarchy. The infrastructure for decision support is also in the process of improvement.
B. Intranets Intranets are essentially secure mini-internets implemented within organizations. An intranet can offer a single point of entry into a corporate world of information and applications. Intranets provide a powerful solution for data warehousing. A key benefit of the intranet technology is the ability to provide up-to-date information quickly and cost-effectively. Some advantages are listed below: • Intranets allow the integration of information, making it easy to access, maintain, and update. Data warehouses can be made available worldwide on public or private networks at much lower cost. Web browsers can provide a universal application delivery platform for any data mart or data warehouse user.
530 As a result, the enterprise can create, integrate, or deploy new, more robust applications quickly and economically. Use of intranets enables the integration of a diverse computing environment into a cohesive information work. • Information disseminated on an intranet enables a high degree of coherence for the entire enterprise (whether the data content comes from data marts or a central data warehouse), because the communications, report formats, and interfaces are consistent. An intranet provides a flexible and scalable nonproprietary solution for a data warehouse implementation. The intuitive nature of the browser interface also reduces user support and training requirements. • The Internet can be easily extended into WANs or extranets that serve remote company locations, business partners, and customers. External users can access internal data, drill through, or print reports through secure proxy servers that reside outside the firewall.
VIII. DATA WAREHOUSE PERFORMANCE A. Measuring Data Warehouse Performance The massive amount data (in terabytes) of data warehouses makes high performance a crucial factor for the success of data warehousing techniques. Successful implementation of a data warehouse on the World Wide Web requires a high-performance, scalable combination of hardware, which can integrate easily with existing systems. Data warehousing involves extracting data from various sources transforming, integrating, and summarizing it into relational management systems residing over a span of World Wide Web servers. Typically as part of the client/server architecture, such data warehouse servers may be connected to application servers which improve the performance of query and analysis tools running on desktop systems. Possibly the most important factor to consider in arriving at a high-performance data warehouse environment is that of end-user expectations. These expectations represent unambiguous objectives that provide direction for performance tuning and capacity planning activities within the data warehouse environment. The basis for measuring query performance in the data warehouse environment is the time from the submission of a query to the moment the results of the query are returned. A data warehouse query has two important measurements:
Data Warehousing and Data Marts 1. The length of time from the moment of the submission of the query to the time when the first row/record is returned to the end user 2. The length of time from the submission of the query until the row is returned The data warehouse environment attracts volumes of data that have never before been experienced in the information processing milieu. In previous environments, volumes of data were measured in the thousands (kilobytes) and millions (megabytes) of bytes of data. In the data warehouse environment volumes of data are measured in gigabytes and terabytes of data. Thus, there are many orders of magnitude of difference between these measurements. Some aspects of improving performance have already been discussed earlier in this article, such as indexing and denormalization. There are also other important aspects, such as use of hardware architecture that is parallelized. Optimizing data structures in the data warehouse environment is an important, but difficult issue, because many different sets of requirements must be satisfied all at once. Therefore great care must be taken in the physical organization of the data in the warehouse.
B. Performance Issues and Data Warehousing Activities Performance issues are closely tied to data warehousing activities at various stages: • Base level architecture-hardware and software— Issues that need to be considered include whether the hardware platform supports the volume of data, the types of users, types of workload, and the number of requests that will be run against it, whether the software platform organizes and manages the data in an efficient and optimal manner, as well as others. • Design and implementation of the data warehouse platform based on usage and data— There are various issues related to different aspects, such as: 1. Database design. For example, we need to know whether the different elements of data have been profiled so that the occurrences of data that will exist for each entity are roughly approximated. 2. Usage and use profiles. For example, we need to know whether the database design takes into account the predicted and/or known usage of the data.
Data Warehousing and Data Marts • Creation of the programs and configuration of tools that will make use of the data—For example, we need to know whether the queries or other programs that will access the data warehouse have been profiled, information about the programmers, etc. • Post warehousing development—After programs are written and the data warehouse is being populated, the ongoing system utilization needs to be monitored and system guidelines—service management contracts—need to be established. If an organization follows these guidelines and carefully considers performance at each appropriate point in time, the organization will arrive at a point where performance is truly optimal. A final remark that must be put here is data mart performance. Although performance related to a data mart shares many considerations with data warehouse as a whole, limiting a data mart’s scope ensures that the resulting data mart will fit within the scalability limits of an OLAP database server and permits the analysis at hand to be conducted without the distractions presented by extraneous data. Using an OLAP database server, in turn, allows the use of OLAP indexing and presummarization techniques to deliver rapid response times and intuitive access.
IX. DATA WAREHOUSES, OLAP, AND DATA MINING A. Basics of OLAP In this article we have repeatedly emphasized the close relationship between data warehousing and OLAP. In addition, in many cases data warehouses also play an enabling technique for data mining. Therefore, it is important to examine the relationship among data warehouses, OLAP, and data mining. OLAP, as a multidimensional analysis, is a method of viewing aggregate data called measurements (e.g., sales, expenses, etc.) along a set of dimensions such as product, brand, stored, month, city, state, etc. An OLAP typically consists of three conceptual tokens. Each dimension is described by a set of attributes. A related concept is domain hierarchy; for example, “country,” “state,” and “city” form a domain hierarchy. Each of the numeric measures depends on a set of dimensions, which provides the context for the measure. The dimensions together are assumed to uniquely determine the measure. Therefore, the multidimensional data view a measure as a value in the multidimensional space of dimensions.
531 There are several basic approaches to implementing an OLAP: • ROLAP (relational OLAP). OLAP systems that store all information (including fact tables) as relations. Note that the aggregations are stored with the relational system itself. • MOLAP (mulitdimensional OLAP). OLAP systems that use arrays to store multidimensional datasets. In general, ROLAP is more flexible than MOLAP, but has more computational overhead for managing many tables. One advantage of using ROLAP is that sparse data sets may be stored more compactly in tables than in arrays. Since ROLAP is an extension of the matured relational database technique, we can take advantage of using standard query language (SQL). In addition, ROLAP is very scalable. However, one major advantage is its slow response time. In contrast, MOLAP abandons the relational structure and uses a sparse matrix file representation to store the aggregations efficiently. This gains efficiency, but lacks flexibility, restricts the number of dimensions (7–10), and is limited to small databases. (Remark on dimension: a relation can be viewed as a 2D table or n D table (each attribute represents a dimension). One advantage of using MOLAP is that dense arrays are stored more compactly in the array format than in tables. In addition, array lookups are simple arithmetic operations which result in an instant response. A disadvantage of MOLAP is long load times. Besides, MOLAP design becomes massive very quickly with the addition of multiple dimensions. To get the best of both worlds, we can combine MOLAP with ROLAP. Other approaches also exist.
B. Relationship between Data Warehousing and OLAP Having described the basic architecture of data warehouses, we may further describe the relationship between data warehousing and OLAP as follows. Decision-support functions in a data warehouse involve hundreds of complex aggregate queries over large volumes of data. To meet the performance demands so that fast answers can be provided, virtually all OLAP products resort to some degree of these aggregates. According to a popular opinion from OLAP Council, a data warehouse is usually based on relational technology, while OLAP uses a multidimensional view of aggregate data to provide quick access to strategic information for further analysis. A data warehouse stores
Data Warehousing and Data Marts
532 tactical information that answers “who” and “what” questions about past events. OLAP systems go beyond those questions; they are able to answer “what if ” and “why” questions. A typical OLAP calculation is more complex than simply summarizing data. Most data warehouses use star schemas to represent the multidimensional data model. In a star schema there is a single fact table (which is at the center of a star schema and contains most of the data stored in the data warehouse), and a set of dimension tables which can be used in combination with the fact table (a single table for each dimension). An example of star schema is shown in Fig. 3. The star schema model of data warehouses makes join indexing attractive for cross-table search, because the connection between a fact table and its corresponding dimension tables are the foreign key of the fact table and the primary key of the dimension table. Join indexing maintains relationships between attribute values of a dimension and the corresponding rows in the fact table. Join indices may span multiple dimensions to form composite join indices. Data mining algorithms can take advantage of this kind of facility. In fact, usually data mining algorithms are presented on a single table. Join indexing makes such assumptions reasonable. Join operations in a star schema may be performed only between the fact table and any of its dimensions. Data mining has frequently been carried out on a view that is joined by the fact table with one or more dimension tables, followed by possible project and se-
lect operations. In addition, to facilitate data mining, such a view is usually materialized. A very useful concept for OLAP is data cube, which presents a materialized view involving multidimensional data. The two most well-known operations for OLAP queries are 1. Roll-up. This operation takes the current data object and does a further group-by on one of the dimensions. For example, given total sale by day, we can ask for total sale by month. 2. Drill-down. As the converse of rule-up, this operation tries to get more detailed presentation. For example, given total sale by model, we can ask for total sale by model by year. Other operations include: pivot (its result is called a cross-tabulation), slice (it is an equality selection, reducing the dimensionality of data), and dice (it is the range selection).
C. Integration of OLAP and Data Mining in Data Warehouses It has been noted that there are significant semantic differences between data mining and OLAP. Although both OLAP and data mining deal with analysis of data, the focus and methodology are quite different. In this section, we provide a much needed discussion on this issue and use several examples to illustrate these dif-
Product Order OrderNo OrderDate
OrderNo
ProdNo ProdName ProdColor Category
StoreID Customer CustomerNo Cust.Name Cust.Address
CustomerNo ProdNo DateKey CityName Sales
Store
Profit
StID StSize Sales
Date DateKey Date Month Quarter Year City CName State Country Dimension tables
Dimension tables Fact Table
Figure 3 A star schema.
Data Warehousing and Data Marts ferences. We point out the difference of data mining carried out at different levels, including how different types of queries can be handled, and how different semantics of knowledge can be discovered at different levels, as well as how different heuristics may be used. We point out that different kinds of analysis can be carried out at different levels: What are the features of products purchased along with promotional items? The answer for this query could be association rule(s) at the granularity level, because we need to analyze actual purchase data for each transaction which is involved in promotional items (we assume information about promotional items can be found in product price). • What kinds of products are most profitable? This query involves aggregation, and can be answered by OLAP alone. • What kinds of customers bought the most profitable products? This query can be answered by different ways. One way is to analyze individual transactions and obtain association rules between products and customers at the granularity level. An alternative way is to select all most profitable products, project the whole set of customers who purchased these products, and then find out the characteristics of these customers. In this case we are trying to answer the query by discovering characteristic rules at an aggregation level. (For example, customers can be characterized by their addresses.) The above discussion further suggests that data mining at different levels may have different semantics. Since most people are familiar with semantics of knowledge discovered at the granularity level, here we will provide a discussion emphasizing what kind of difference is made by the semantics of knowledge discovered at aggregation levels (which will be referred to as aggregation semantics). Nevertheless, OLAP and data mining should be and can be integrated for decision support.
X. CONCLUSION In this article we discussed the basics of data warehousing techniques, as well as related issues such as OLAP and data mining. Due to the rich contents and rapid development of this area, we can only present the most fundamental ideas. Readers who are interested in more details of data warehousing are referred
533 to Inmon (1996), Kimball et al. (1998), and Singh (1999). Readers interested in the relationship between data warehousing and data mining, particularly detailed techniques of data mining, can find useful materials in Han and Kamber (2000). Finally, for users who are interested in a more global picture but without much technical detail, Chen (1999) and Chen (2001) should not make you disappointed. There are also plenty of web sites available on data warehousing and data marts. For example, http: //www.dwinfocenter.org/articles.html contains numerous articles related to data warehousing and data marts applications. On the other hand, for researchoriented readers, http://www-db.stanford.edu/ contains very useful materials from the well-known Stanford data warehousing project, including discussions on various future research directions related to data warehouses.
SEE ALSO THE FOLLOWING ARTICLES Data Mining • Data Modeling: Entity-Relationship Model • Data Modeling: Object-Oriented Model • Distributed Databases • Model Building Process • Network Database Systems • On-Line Analytical Processing (OLAP)
BIBLIOGRAPHY Chaudhuri, S., and Dayal, U. (1997). An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1), 65–74. Chen, Z. (1999). Computational intelligence for decision support. Boca Raton, FL: CRC Press. Chen, Z. (2001). Data mining and uncertain reasoning: An integrated approach. New York: John Wiley. Han, J., and Kamber, M. (2000). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann. Harinarayan, V. (1987). Issues in interactive aggregation. Data Engineering Bulletin, 20(1), 12–18. Kimball, R., Reeves, L., Ross, M., and Thornthwaite, W. (1998). The data warehouse lifecycle toolkit. New York: John Wiley. Inmon, W. H. (1996). Building the data warehouse. New York: John Wiley. Inmon, W. H. (1999). Data warehouse performance. New York: John Wiley. Rundersteiner, E. A., Koeller, A., and Zhang, X. (2000). Maintaining data warehouses over changing information sources. Communications of the ACM, 43(6), 57–62. Silberschatz, A., Korth, H., and Sudarshan, S. (1997). Database system concepts, 3rd ed. New York: McGraw-Hill. Singh, H. (1999): Interactive data warehousing. Upper Saddle River, NJ: Prentice Hall.
Decision-Making Approaches Theodor J. Stewart University of Cape Town
I. THE DECISION-MAKING PROBLEMATIQUE II. APPROACHES TO PROBLEM STRUCTURING III. COGNITIVE BIASES IN DECISION MAKING
IV. MULTIPLE CRITERIA DECISION AID AND SUPPORT V. RISK AND UNCERTAINTY IN DECISION MAKING
GLOSSARY
value function A numerical function constructed so as to associate a preference score with each decision alternative (either holistically or for one or more individual criteria) in order to provide a tentative rank ordering of such alternatives in a defensible manner. value measurement An axiom-based theoretical foundation for constructing value functions.
cognitive bias Any tendency by human decision makers to adopt simplifying heuristics, which result in the selection of certain classes or types of action, for reasons other than genuinely informed preference. criterion Any particular point of view or dimension of preference according to which decision alternatives or courses of action can be compared in moreor-less unambiguous terms. decision support system A computer system which assists decision makers in exploring the consequences of decisions in a structured manner and in developing an understanding of the extent to which each decision alternative or option contributes toward goals. goal programming A mathematical programming technique for identifying decision alternatives or courses of action which approach decision-making goals most closely in an aggregate sense. multiple criteria decision analysis (MCDA) A branch of management science in which complex decision problems are first decomposed into underlying unitary criteria; once clarity has been obtained regarding preference orderings for each individual criterion, these are gradually reaggregated to develop a holistic preference ordering. outranking An approach to MCDA in which decision alternatives are compared pairwise in order to establish the strength of preference for and against the assertion that one alternative is at least as good as the other.
DECISION MAKING is a fundamental activity of all management, and research and literature concerning decision-making processes and paradigms occur in many fields, including management theory, psychology, information systems, management science, and operations research. As the present volume is devoted to information systems, the primary focus of this article will be the manner in which the decision-making process can be supported and guided by appropriate decision support systems. We shall, on one hand, broaden the view of “decision support” to include not only the provision of interactive computer systems, but also other modes of facilitation by which decision-making effectiveness may be enhanced. On the other hand, we shall restrict the concept of computerized decision support systems to those systems which assist the decision maker directly in exploring and evaluating alternative courses of action. This is in contrast to other common uses of the term which refer to any system which processes or analyzes data needed for decision making (including “data mining”), but without necessarily providing direct support to the search for a preferred course of action.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
535
536 Our approach will be in essence constructive (sometimes termed prescriptive) in the sense that our main thrust will be to present models that are neither fully descriptive, i.e., indicating how unaided decisions are actually made, nor normative in claiming that this is how decisions should be made. By the constructive (prescriptive) approach, we mean a process by which decision makers can be assisted to develop preferences between alternative courses of action in an internally consistent and coherent manner, so that they can have confidence that the emerging decision does in fact satisfy their long-run goals. In discussing this approach, we shall look at the structuring of decision problems, empirical research on the psychology of decision making, and the provision of quantitative decision support aids.
I. THE DECISION-MAKING PROBLEMATIQUE At an elementary level, decision making may be viewed simply as a choice between a number of available courses of action. This is naive. In virtually all realworld settings, potential courses of action are seldom, if ever, self-evidently available, simply awaiting a choice between them. At strategic planning levels, decision problems will initially be totally unstructured. There may be no more than a general feeling of unease that things are not going as they should, that we are not achieving what we ought, and that “something needs to be done.” For example, a company might have previously had an effective monopoly in a particular market, but over the past year costs have been escalating, while one or two other players are entering the market so that the simple expedient of raising prices could cause rapid erosion of market share. There is, however, incomplete knowledge of the causes of the cost escalations and of how strong the emerging competition may prove to be, but nevertheless “something has to be done” before the situation gets out of hand. At this stage, however, there may be a substantial lack of awareness, not only about what courses of action are open but even about what the organizational or personal goals actually are. What are we trying to achieve by the interventions we might be planning to make? Even when the decision problem may seem superficially to be more clear-cut, the problem is in most cases still only partially structured. The apparent decision problem may present itself as a choice between some clear alternatives: Do we launch this new product or not? Do we purchase this software system or the other? Which of the options offered by our travel
Decision-Making Approaches agent should we take up for our vacation this year? We say that these are only partially or semi-structured, as there remain many issues that are not yet resolved. Even for the options on the table, the consequences have probably not been completely explored. Decision makers may still need to identify their decision goals and objectives before they even know what consequences are relevant to explore. The initial apparent options may only be symptomatic of deeper underlying problems, and when these become better understood there may be a need to seek other alternative options. In fact, as the deeper issues are given consideration, it may become evident that the problem is still to a large extent unstructured. What the previous two paragraphs have sought to indicate is that nontrivial management decision problems are essentially never fully structured at the outset in the sense that the alternative courses of action, the criteria by which each need to be assessed, and the consequences of each alternative in terms of these criteria are fully and completely known. Thus, the first step in any decision support process is to provide an adequate degree of structuring, so that informed decisions can be made in accordance with a coherent set of objectives. We shall return to this point in Section II. Even once a requisite structure has been developed for a decision problem, it is not always true that the decision itself is a simple choice between alternatives. Roy (1996), for example, provides a classification into a number of decision problematiques, which with some modification and extension we might summarize as follows: 1. Simple choice of one from among a number of explicit alternatives. 2. Sorting of alternatives into a number of categories: For example, a company might wish to classify potential subcontractors into “good,” “acceptable,” and “poor” categories for purposes of awarding future contracts. 3. Ranking of alternatives: For example, potential development projects might be ranked from highest to lowest priority, with the intention being that projects would be worked on in priority order until time or other resources are exhausted. 4. Designing alternatives or creating portfolios in the sense of creating a complete strategy from component elements in a coherent manner: For example, policies for future land use in a region may be constructed by a combination of public sector investments, legislative programs controls, tax incentives, etc.
Decision-Making Approaches The decision support approaches described particularly in Section IV can, in principle, be applied to any of the situations described above, but the precise implementation may need to be adapted to the relevant problematique. Ultimately, decisions may be approached in a number of different ways. At a risk of over-simplification, we can identify at least three distinct approaches which may be adopted in decision-making processes as follows: 1. The political or advocacy approach: Champions for different courses of action prepare and present the arguments favoring their preferred option. This may occur in a formalized open forum or by a process of lobbying. Options backed by stronger and better arguments tend to survive at the expense of others, until either by consensus or formal voting procedures a winner ultimately emerges. Political approaches tend to dominate in multistakeholder contexts, when different groups may have substantially divergent agendas. 2. The organizational approach: This is similar to the political advocacy approach, but less directly confrontational. Different sectors of the organization may be tasked with establishing “pros” and “cons” of alternative courses of action from different perspectives. The organizational approach may be found to dominate in group decision-making contexts when group members do, to a large extent, share common goals. 3. The analytical (or “rational”) approach: Here the attempt is made to identify the full set of alternatives from which the choice has to be made and the degree to which each satisfies explicitly stated management goals. In principle, then, it remains only to select the alternative for which this degree of satisfaction is maximized. A purely rational approach probably only applies for decision contexts involving a single decision maker or at most a small relatively homogeneous group. However, as we will be discussing, a degree of analysis can provide valuable support to the political and/or organizational approaches. It is probable that none of the above approaches is used in isolation, but that aspects of each are used in any real problem. The modern decision support philosophy aims at providing some form of analytical basis from which the political and organizational components of the decision process can be supported and made more effective and internally coherent. The decision support system thus facilitates communication between stakeholders in the process and the emer-
537 gence of a decision that is broadly satisfactory and concordant with the organization’s goals. It is this view of decision support which underlies the methods of “decision analysis” discussed in the remainder of this article.
II. APPROACHES TO PROBLEM STRUCTURING As indicated at the start of the previous section, decision problems seldom, if ever, present themselves in a neatly structured form as a simple choice between a number of alternatives with the aim of achieving welldefined objectives. On the other hand, the application of the decision support procedures described particularly in Section IV does require that the problem be given some degree of structure at least. Most nontrivial decision-making contexts involve multiple stakeholders, so that the structuring phase will in general be a group effort, as will be assumed for purposes of the discussion which follows. The management literature is filled with many suggestions for structuring decision problems, ranging from relatively informal SWOT analysis (strengths, weaknesses, opportunities, and threats) to sophisticated computer-assisted brainstorming techniques. The management science literature includes such concepts as soft-systems methodology and cognitive mapping, many of which are reviewed in the volume edited by Rosenhead and Mingers (2001). The underlying basis of all of these methodologies is a two-phase approach, which may be described as: 1. A divergent phase, in which the aim is simply to identify all issues and concerns relevant to the decision at hand 2. A convergent phase, in which the connections and interactions between these issues and concerns are systematically explored in order to structure, classify, and cluster them. It has been our experience that the divergent phase is often best carried out in a relatively low technology manner in order to stimulate a maximum degree of human interaction between participants, although some groupware systems have been developed to mimic the informal processes described here. A simple approach is to bring group members together in some form of workshop, at which each participant is provided with a set of cards or self-adhesive notelets on which to jot down key points in response to a welldefined question (such as, “What issues do we need to take into consideration when responding to the
538 entry of new competitors into the market?”). These can then be placed up on a wall, initially in random positions, to serve as a focus for discussion, during which the cards or notelets can be moved around into groups or clusters representing similar issues. In this way, a shared vision of the decision problem emerges. It is useful to guide the clustering by making use of simple checklists, often represented by some form of acronym. For example, the soft-systems methodology developed by Checkland (1981) makes use of “CATWOE” (Customers, Actors, Transformations, Worldview, Owners of the problem and Environment, i.e., external constraints and demands). The author and colleagues have found it useful to build up a structure around a “CAUSE” framework defined in the following: • Criteria: The various points of view or perspectives against which different courses of action need to be evaluated and compared, for example, costs, public image, worker morale, environmental impacts. Management goals will ultimately need to be expressed in terms of levels of performance for each criterion. • Alternatives: The courses of action which are available to decision makers. • Uncertainties: These may include both potentially resolvable uncertainties due to lack of current knowledge (e.g., whether or not a competitor will be launching a new product this year) and fundamentally unknowable risks (e.g., a major earthquake in California). • Stakeholders: Who has a concern in the outcome of the decision? Who can influence the consequences of a decision (including the possibility of sabotaging it)? Who has the political or organizational power to veto certain actions? • Environment: What external constraints and pressures limit the decision makers’ freedom of action? Once the key components building up a structured view of the decision problem have been identified as above, it is useful to guide the group into identifying causative and associative links between these elements, with the ultimate aim of building up a shared vision of the manner in which choice of different courses of action (i.e., the decisions) impact on the criteria (i.e., on goal achievement). The soft-systems methodologies of Checkland (1981) and the cognitive mapping concepts described by Eden and Ackermann (1998) provide useful tools in this regard, the latter being well-supported by the Decision Explorer software. At the end of the day, for purposes of applying formal
Decision-Making Approaches decision analytical approaches, it is necessary to summarize the problem structure which emerges from the structuring process in terms of a set (often just a finite list) of alternative courses of action and a description of the impacts of each alternative on levels of performance of each criterion, where any degree of uncertainty in evaluating these impacts is clearly identified. The structured decision problem thus becomes that of selecting the alternative which best satisfies the goals implied by each criterion.
III. COGNITIVE BIASES IN DECISION MAKING In the next section we shall be discussing technical procedures of decision support. These generally require that the user, or “decision maker,” provide a variety of value judgements as part of the process of identifying a desired course of action. It is necessary, therefore, for the decision support analyst to have some understanding of the manner in which decision makers may respond to the questions asked and the potential for these responses to bias the results coming from the decision support system. Nontrivial decision-making situations must involve decision makers in two key issues, namely, (1) management goals and objectives and (2) the uncertainties and risks associated with achievement of them. For decision-making processes to be effective, both decision makers and analysts supporting them need to be aware of biases and heuristics inherent in human judgements regarding these issues. In a purely “rational” approach, all objectives would be explicit at the start of the decision-making process, and it would remain only to assess the extent to which alternative courses of action satisfy these objectives. Life is, however, seldom like that. Simon (1976) argued that people exhibit bounded rationality and tend to “satisfice” rather than to “optimize.” This means that decision makers tend to focus on a relatively small number of objectives, namely, those which are perceived to be the most critical or currently least well satisfied. In developing and/or evaluating alternative courses of action, effort is directed initially at seeking improvement on these objectives. Once an adequate level of satisfaction has been achieved for these objectives, then attention turns to other objectives which now appear more pressing. Satisficing is thus a common heuristic approach to decision making. It can also be a powerful and useful heuristic, and for this reason it is sometimes incorporated into formal decision support systems, particularly the goal programming approaches to be described in Section
Decision-Making Approaches IV. Decision makers and decision analysts need to be aware, however, that the quality of results obtained will be dependent upon goals for each objective being demanding but still realistic. If such goals are insufficiently demanding, the process might terminate too early, but if the goals for the initial objectives are made too unrealistically demanding, then potential gains on other objectives may never be realized. Certain biases derive from the group context in which important decisions are often made (see, for example, Janis and Mann, 1977). On the one hand, defensive avoidance of conflict by some group members can lead to procrastination (postponing a decision in the hope that the conflict will vanish); attempts to shift responsibility for the decision to other people or organizational structures; and overemphasis on the favorable consequences of alternative courses of action, while downplaying unfavorable consequences. Good decision making requires that these conflict-avoidance biases be recognized and confronted. A related problem in the group decision-making context is that of “group-think,” in which the group moves to acceptance of a consensus (and overconfidence in the correctness of this consensus decision) without fully exploring consequences in terms of all goals or criteria. Such group-think may arise, inter alia, through members’ fear of ridicule or accusations of time-wasting should they speak against a perceived majority view. The multiple criteria decision analysis approach described in Section IV seeks to counter these group biases by seeking to establish all relevant “criteria” quickly and before conflict reaches destructive levels. Some of the more detailed studies of biases in decision making have been undertaken within the context of risk and uncertainty, the pioneering work being that of the decision psychologists Kahneman and Tversky. Examination of these biases does suggest that they may be applicable to a broader range of human judgmental tasks than those of probability assessment and inductive inference and may thus be relevant to the designers of the decision support system. It is thus useful to briefly record some of these biases here. It is beyond the scope of the present article to review this topic in detail (see, for example, Kahneman, Tversky and Slovic, 1982), but in the following few subsections we present an outline of three commonly observed biases.
A. Availability When asked to assess the probabilities or risks associated with specified events, people tend to associate greater credibility with those events for which it is
539 easy to recall examples of similar outcomes in the past or for which examples are easy to imagine. The problem is that some events generate considerably more publicity than others and thus are easier to recall. For example, a major airline crash will receive wide news coverage for days or weeks after the event, whereas fatalities from motor accidents receive, at most, minor coverage in local newspapers. As a result, many people may tend to overestimate the risk from airline accidents and to underestimate the risk of road accidents. Similarly, greater probabilities will be associated with events for which it is easy to construct imaginary scenarios, which may or may not be related to the inherent propensity for such events to happen. A particular problem caused by the availability bias, and related also to the next bias which we discuss (“representativeness”), is that people find it easier to recall instances which confirm prior prejudice (e.g., that a particular class of driver is more reckless than others) than those which contradict it. This, in turn, leads to enhanced estimates of the associated probabilities and reinforcing of the prejudice. Good decision-making practice should thus allow time for free-thinking and brainstorming to enable to the decision maker to explore and to imagine a wider range of outcomes before committing to a final decision. Good decision support system design needs to encourage and facilitate such processes.
B. Representativeness Suppose we observe a sequence of eight tosses of a coin. Most people will judge a sequence of heads and tails given by HHHHHTTT as inherently more surprising and less likely than a sequence such as HHTTHTHH. Yet statistically both outcomes are equally probable (1 in 256). The reason for the fallacious judgement appears to be that the latter is more characteristic or “representative” of what we expect from a random sequence. In other words, people rate as more likely those outcomes which are viewed as more representative of their expectations. For example, when given a personality description of an individual unknown to them, people will tend to consider it most likely that the individual belongs to the occupational group (such as engineer, lawyer, psychologist) for which the personality type appears typical without taking into consideration important factors such as the proportion of each occupational group in the relevant population. The representativeness bias may lead decision makers to ignore some critical information, such as base
540 rate frequencies of outcomes as in the classification into the occupational groups above. It can also lead to substantially false conclusions regarding the existence or otherwise of patterns in data. For example, the HHHHHTTT outcome may result in a conclusion that “luck” has turned in some predictable manner and that more tails can now be expected. In the same way, managers may easily overreact to recent apparent trends, leading to poor decision-making practice. As with the availability bias, the best antidote to the representativeness bias would be to take time to critically examine the data and to ensure that all relevant data are properly taken into consideration, generally through the use of formal statistical analysis.
C. Anchoring and Adjustment In forecasting risks (i.e., probabilities) or future events such as prices or demands, people will frequently start from some initial nominal value and then adjust this up or down as other factors are taken into consideration. Such adjustments tend, however, to be insufficient to account for the new information, so the estimates remain too tightly anchored to the initial values. The initial values may be entirely randomly generated (for example, during preliminary unstructured discussions) or may be linked to current situations or to simple statistical trends. The result is that the ranges of future variation may be seriously underestimated and that decision makers may be overconfident in their projections or forecasts so that risks are inadequately taken into consideration during the decision process. Once again, the bias needs active compensation by building formal means of identifying alternative futures at an early stage of deliberation before estimates are too solidly “anchored.” One means of achieving this is by incorporation of scenario planning concepts (see van der Heijden, 1996) into the total decisionmaking process. We shall return to this concept in the discussion of risk and uncertainty in Section V.
IV. MULTIPLE CRITERIA DECISION AID AND SUPPORT A. General Principles Perhaps the most critical demand on decision makers is the need to achieve balance between conflicting goals or objectives. If consequences of decisions were entirely one dimensional (e.g., maximization of
Decision-Making Approaches profit) or if it were possible to simultaneously optimize all objectives, then no true “decision making” is involved; the process can be left to a computer. Human decision making, involving the making of value judgements or tradeoffs, comes into play when decision makers recognize many different goals or criteria for comparing different courses of action that are to a greater or lesser extent in conflict with each other. Much of the management theory and management science literature does recognize the existence of multiple goals, but very often this recognition is implicit rather than explicit. A recent notable exception has been the “balanced scorecard” concept introduced by Kaplan and Norton (1996), in which the authors clearly identify the need for balance between financial, customer-oriented, internal operation, and learning and growth goals in any organization, each of which can be subdivided further. They give attention particularly to the management structures necessary to ensure such balance. The most explicit recognition of multiple goals may be found in the range of management science techniques which have been classified as multiple criteria decision-making (MCDM) methods, or multiple criteria decision analysis or aid (MCDA). The characterizing feature of these approaches is the establishment of formal and to some extent quantified procedures for the following three phases of the problem: 1. Identification of relevant criteria, i.e., points of view or axes of preference according to which possible courses of action can be distinguished. 2. Ranking, or possibly more extensive evaluation, of alternative courses of action according to each identified criterion. 3. Aggregation across criteria to establish an overall preference ranking for the alternatives. We now briefly expand on these three phases before turning to some more detailed description of the tools of MCDA, which as we shall see can be grouped into three broad schools, namely, value measurement or scoring methods, goal and reference point methods, and outranking methods.
1. Identification of Criteria A criterion is defined in this context as any concern, interest, or point of view according to which alternative courses of action can (more-or-less) unambiguously be rank ordered. In selecting criteria for use in decision analysis, the following properties of the set being chosen should be borne in mind:
Decision-Making Approaches
541
• Complete: Ensure that all substantial interests are incorporated. • Operational: Ensure that the criteria are meaningful and understandable to all role players. • Decomposable: Ensure as far as is possible that the criteria are defined in such a way that meaningful rank orders of alternatives according to one criterion can be identified without having to think about how well the alternatives perform according to other criteria. (The so-called condition of preferential independence.) • Nonredundant: Avoid double counting of issues. • Minimum size: Try to use as few criteria as possible consistent with completeness, i.e., avoid introduction of many side issues which have little likelihood of substantially affecting the final decision. Typically, the identification of appropriate criteria to be used requires a variety of brainstorming techniques, but a review of these is beyond the scope of this article. These issues are discussed further in some of the literature listed in the Bibliography. In many cases it is useful to structure the criteria into a hierarchical value tree, starting with a broad overall goal at the top, systematically broken down into increasingly precise subgoals, until at the lowest level we have the required set of criteria as described above. Such a value tree is illustrated in Fig. 1, which is based on experiences in applying MCDA to land use and water resources planning in the eastern escarpment regions of South Africa. The criteria are the right-hand boxes,
namely, household income, number of jobs, and so on down to flood levels. The advantage of such a hierarchical structure is that the application of MCDA can be decomposed, for example, by first evaluating alternatives within a subset of criteria (for example, the three contributing to social benefits) and then aggregating these to give a preference ordering according to “social” issues (thus forming a supercriterion). At a later stage, a further aggregation can combine social, economic, and environmental concerns. Sometimes a criterion may directly be associated with some measurable quantitative attribute of the system under consideration, for example, cost measures or many of the benefits listed in the value tree of Fig. 1. Such an association may facilitate the next phase of the analysis, but is not critical to the use of the tools to be described below. These can generally be applied even for entirely qualitative criteria such as the personal well-being of employees, provided that alternatives can at least be compared with each other on this basis (i.e., which of the two options contributes more to the “personal well-being”).
2. Within-Criterion Comparison of Alternatives At this stage, alternatives are compared and evaluated relative to each other in terms of each identified criterion. The alternatives may be real courses of action or may be hypothetical constructs (performance categories as described below) built up to provide a set of benchmarks against which the real alternatives can
Household income Social benefits
Number of jobs Water supply Agricultural output
Quality of life
Economic benefits
Forestry output Secondary industry Area conserved
Environmental benefits
Number of ecotypes conserved Total dissolved solids River status
Dry season flow Flood level
Figure 1 Illustration of a value tree for a regional land use planning problem.
542 be evaluated. In both cases, however, the fundamental requirement is to be able to rank the alternatives from best to worst in terms of the criterion under consideration. If this cannot be done, then the definition of the criterion needs to be revisited. An important feature of this process is that it is carried out separately for each criterion and does not need reduction to artificial measures such as monetary equivalents. All that is required is for the decision maker, expert, or interest group to be able to compare alternatives with each other in terms of their contribution to the goals represented by the criterion under consideration. In some cases, criteria will be qualitative in nature (for example, a criterion such as personal well-being), so the rank ordering will have to be subjective or judgmental in nature. For smaller numbers of alternatives (say up to about seven or nine), this creates no problem as it will generally be possible to compare alternatives directly to generate the required rank orderings or evaluations in an unambiguous manner. For larger numbers of alternatives, however, direct comparisons become more difficult, and it is convenient to define a small number of performance categories, i.e., descriptions of different levels of performance that may be achieved, expressed as mini-scenarios (the hypothetical alternatives or outcomes mentioned earlier). Each actual alternative is then classified into that category which best matches its performance in terms of this criterion or possibly classified as falling between two adjacent categories. Since the categories are preference ordered, this implies a partial ordering of the alternatives, which is usually adequate for the application of many MCDA procedures (especially when linked to extensive sensitivity studies).
3. Aggregation across Criteria This is perhaps the most crucial phase, in which the generally conflicting preference orderings corresponding to the different criteria need to be reconciled or aggregated to produce a final overall preference ordering. The process can never be exact, as it must inevitably involve imprecise and subjective judgements regarding the relative importance of each criterion. Nevertheless, with due care and sensitivity analysis, a coherent picture can be generated as to which are the most robust, equitable, and defensible decisions. An important point to recognize is that the method of aggregation is critically dependent upon the methods of evaluation of alternatives used in the previous phase. Aggregation inevitably involves some assessment of the importance of each criterion relative to
Decision-Making Approaches the other criteria. This is typically expressed in terms of some form of quantitative “weight.” The meaning, interpretation, and assessment of importance weights is an often controversial aspect of MCDA practice, since the appropriate numerical weights to be used in any MCDA procedure must depend both on the specific procedure and on the context of the alternatives under consideration. It is, however, also true that many people will express judgements of relative importance (e.g., that environmental issues are “much more important,” or even something like “three times as important,” in comparison with economic issues) without concern for either the particular context or the methods of analysis being used. It is fallacious to incorporate such intuitive statements of relative importance uncritically into MCDA (although this often seems to be done). In the application of MCDA, care must be taken to match the elicitation of importance weights to methods used and the context. The different schools of MCDA differ in the manner in which they approach the second and third phases mentioned earlier. The three schools are reviewed in subsections IV.B to IV.D. In order to describe the methods, it is useful at this stage to introduce some notation. For purposes of discussion, we shall suppose that a choice has to be made between a discrete number of alternatives denoted by a,b,c,.... Many of the methods described below are easily generalized to more complicated settings, for example, multiple objective linear programming, but this would unnecessarily complicate the present discussion. We suppose that m criteria have been identified, which we shall index by i 1,2,...,m. If criterion i can be associated with a quantifiable attribute of the system, we shall denote the value of this attribute for alternative a by zi(a). Note that even if the attribute is naturally expressed in categorical terms (very good, good, etc.), this is still “quantifiable” in our sense as we can associate some numerical value with each category to represent the ordering.
B. Value Measurement or Numerical Scoring Approaches In this approach, we seek to construct some form of value measure or score, V(a), for each alternative a. In principle, the value measures do not need to possess any particular numerical properties apart from preservation of preference order, i.e., such that V(a) V(b) if and only if a is preferred to b. Within the usual framework of MCDA, we start by extracting partial values or scores for the alternatives
Decision-Making Approaches
543
as evaluated in terms of each criterion. These we denote by vi (a) for i 1,2,...m. Where criteria are associated with quantifiable attributes zi(a), it is evident that these partial values need to be functions of the attributes, i.e., vi(a) vi(zi(a)). We shall return to this case shortly, but let us first consider the general case without necessarily making the assumption of the existence of such attributes. Clearly, V(a) must be some function of the partial values v1(a), v2(a),...,vm(a). We shall suppose that the selection of a family of criteria satisfies the property of preferential independence discussed under selection of criteria and that the partial values are constructed so as to satisfy an interval scale property [i.e., such that equal increments in any specific vi(a) have the same impact or value in terms of tradeoffs with other criteria, no matter where they occur in the available range of values]. It can be shown that under these assumptions, it is sufficient to construct V(a) as an additive function of vi(a), i.e., V(a)
m
wivi(a)
(1)
i1
where wi is an importance weight associated with criterion i. In applying value measurement theory, the key practical points are those of assessing the partial values and the weights. Partial values can be assessed by direct comparison of alternatives or indirectly through an associated quantitative attribute zi. Let us first examine the direct comparison approach. A useful way to assess partial values in this case is by means of the so-called “thermometer scale” illustrated in Fig. 2. For example, in a problem such as that on which the value tree of Fig. 1 is based, we might need to compare m 6 policy alternatives, for example, involving three different patterns of land use (farming, forestry, and conservation) with and without the construction of a proposed large dam. For convenience, we might label the alternatives as “scenarios” A–F. Now consider a criterion such as water supply to undeveloped rural communities in the area. Since the desirability of each scenario from the point of view of this criterion may involve consideration of a number of poorly quantified issues such as convenience of access to sufficient clean water, it may not be possible to define a simple measure of performance. By the process of direct comparison on the thermometer scale, however, we can still get a meaningful evaluation for use in the value function model. We start simply by identifying the best and worst of the six alternatives according to this criterion of rural
Figure 2 Illustration of a thermometer scale.
water supply. (This judgment is left to those considered best able to make such an assessment.) Suppose that these are identified as scenarios C and E, respectively. Then C is placed at the top of the scale (denoted for convenience in Fig. 2 by an arbitrary score of 100) and E is placed at the bottom of the scale (denoted again for convenience at the 0 point of the scale). A third alternative, say scenario A, is then selected for evaluation by those performing the assessment. It is placed on the scale between C and E in such a way that the magnitudes of the relative spacings, or “gaps,” between C and A and between A and E represent the extent to which A is better than E but worse than C. For example, the position shown for scenario A in Fig. 2 is at about the 75% position, suggesting that the gap from E to A (the extent to which A is better than E) is about three times the gap from A to C. Put in another way, we could say that moving from E to A achieves 3⁄4 of the gain realized by moving all the way from E to C. There is generally no need to be overly precise in these judgments, as long as the sizes of the gaps appear qualitatively correct. Thereafter, each of the remaining alternatives are examined one at a time and placed firstly in the correct rank position among the previously examined alternatives. For example, B may then be placed below
544 A. Once the ranking is established, the precise position of the alternative is assessed, again taking into consideration the gaps between it and the two alternatives just above and below it in the rank ordering. In this process, the user may wish to readjust the positions of the previously examined alternatives. Figure 2 illustrates a final thermometer scale for all six policy scenarios (alternatives) evaluated according to this criterion of “rural water supply.” The full rank ordering of the scenarios is C-F-A-B-D-E. The gap between C and F is perceived to be relatively small, and even A is not far behind, so that C, F, and A are all judged to be relatively good in terms of this criterion. There is then a big gap between A and B, so that the remaining three alternatives are perceived to be much less satisfactory than C, F, and A, although there is little choice between B and D which are still somewhat better than E. It seems that people from widely differing backgrounds can relate relatively easily to diagrams such as Fig. 2 and do participate freely in adjusting the gaps to correspond to their own perceptions of the values of the alternatives. Thus, the thermometer scale diagram is not only a useful tool for assessing partial values, but also for communication between groups. Indirect evaluation requires the association of quantitative attributes zi(a) with all criteria. The evaluation consists of two stages. We first evaluate a value function, say vi(zi), which associates scores with all possible values of the associated attribute zi between a specified minimum and maximum. In theory this should be a smooth continuous function, but in practice it is usually sufficient to use a piecewise linear function with no more than four segments. Such a function can be constructed using the thermometer scale idea described previously, but applied to (say) five evenly spaced numerical values for the attribute rather than to policy alternatives directly. For example, one of the other criteria shown in Fig. 1 was “dry season flow” in the river. This was assessed by hydrologists in terms of the percentage reduction in streamflows below current conditions. Over the alternatives under consideration, values for this attribute ranged between 0 and 20% below current levels. The value function was thus approximated by comparing the impacts of five possible levels (0, 5, 10, 15, and 20%) relative to each other on a thermometer scale. The resulting value function could then be represented as in Fig. 3. Once the function has been assessed, the partial value score for any particular alternative is obtained simply by reading off the function value (on a graph such as that illustrated in Fig. 3) corresponding to its attribute value zi(a).
Decision-Making Approaches
Figure 3 Illustration of a value function.
It is worth noting the nonlinearity in the shape of the function in Fig. 3. This is quite typical. One of the big dangers in using scoring methods such as those described here is that users and analysts often tend to construct straight-line functions as the easy way out (often even viewing this as the “objective” or “rational” approach). Research has shown clearly that the results obtained from scoring methods can be quite critically dependent upon the shape of the function, so it is incumbent upon users of these tools to apply their minds to the relative value gaps between different levels of performance. Quite frequently, it is found that the functions exhibit systematically increasing or decreasing slopes (as in Fig. 3 where the slopes become increasingly negative) or have an “S” (sigmoidal) shape (or reverse S shape). Once the partial values have been assessed as above, the weights can also be evaluated. The algebraic implication of Eq. (1) is that the weights determine the desirable tradeoffs between the partial value scores for the different criteria, and for this reason it is important to delay assessment of weights until the people involved in the assessment have established a clear understanding of the ranges of outcomes relevant to each criterion. Various procedures have been suggested for the weight assessment, but one of the simplest and easiest to apply is that of “swing weighting.” The users are presented with a hypothetical scenario in which all criteria have the same score on the partial value function scales. Often the 0 point is suggested in the literature, but in our experience people find it easier to start from a less unrealistically extreme position, for example, one in which all partial values are 50. The question is then posed: “If you
Decision-Making Approaches could choose one and only one criterion to swing up to the maximum partial value score of 100, which one would it be?” This establishes the criterion having the largest weight wi in Eq. (1). The question is then repeated, excluding the previously chosen criterion, to establish the second largest weight, and so on. Once we have the rank ordering of the weights in this way, we can compare each criterion with the one known to have the maximum weight, and we can then pose the second question: “What is the value of the swing on this criterion, relative to that for the criterion with maximum weight, expressed as a percentage?” In some software, the presentation of this question is facilitated by use of bar graphs, with the heights of the bars representing the relative importance. This gives relative values for the weights, which are usually then standardized in some convenient manner, e.g., so that the weights sum to 1. The above methodology for fitting a preference model of the form given by Eq. (1) is based on what is sometimes termed “SMART” (simple multiattribute rating technique). An alternative and apparently widely used approach to fitting the same type of model is the technique termed the analytic hierarchy process (AHP). This approach is based on first assessing the scores by pairwise comparison of the alternatives on each criterion. In other words, each alternative is compared with every other alternative in terms of the relative importance of its contribution to the criterion under consideration. The comparisons are expressed in ratio terms, interpreted as estimates of vi(a)/vi(b) for the pair of alternatives a and b, and these ratios are used to derive the individual scores. This process is repeated for each criterion. The weights wi are assessed in the same way by pairwise comparisons of criteria, structured hierarchically (i.e., criteria at one level in the hierarchy are compared only with others sharing the same parent criterion at the next hierarchical level). The AHP approach has appeared to be very popular and is widely described in most management science texts. Reasons for the popularity seem to include the natural language (semantic) scales on which the comparisons may be made and the availability of user friendly supporting software. Nevertheless, the process is for most users somewhat of a black box, which in the view of the author may hinder rather than facilitate good decision-making practice. A number of theoretical objections to the validity of the process have also been raised, and references to discussions of these are provided in Belton and Stewart (2002). It should, however, also be noted that a number of modifications to the basic AHP approach have been proposed,
545 aimed at circumventing the more critical of these theoretical objections.
C. Goal and Reference Point Approaches Goal Programming is a separate article in this encyclopedia. It is, however, useful to summarize some key concepts within the broader multicriteria decisionmaking framework discussed here. Goal and reference point approaches are used primarily when the criteria are associated with quantifiable attributes zi(a) and are thus possibly most appropriate to technical phases of analysis (i.e., in order to shortlist alternatives for more detailed evaluation according to qualitative, intangible, and subjective criteria). The principle is quite simple. Instead of evaluating tradeoffs and weights (as in Section IV.B), the user simply specifies some desirable goals or aspirations, one for each criterion. These aspirations define in a sense a prima facie assessment by the user of what would constitute a realistically desirable outcome. Let gi be a goal or aspiration level specified for criterion i. The interpretation of gi will depend on the manner in which the corresponding attribute is defined: • Maximizing sense: If the attribute is defined such that larger values of zi(a) are preferred to smaller values, all other things being equal (typically some form of “benefit” measure), then the implied aim is to achieve zi(a)gi. Once this value is achieved, further gains in zi(a) are of relatively much lesser importance. • Minimizing sense: If the attribute is defined such that smaller values of zi(a) are preferred to larger values, all other things being equal (typically some form of “cost” measure), then the implied aim is to achieve zi(a)gi. Once this value is achieved, further reductions in zi(a) are of relatively much lesser importance. Sometimes planners like to target some form of intermediate desirable value, possibly something like a water temperature which should not be too hot or too cold. In this case, values of zi(a) in the vicinity of the target value gi are desirable, with greater deviations on either side to be avoided. Since the reasons for avoiding deviations in each direction will generally be different, it is usually convenient to define two separate criteria (“not too hot” and “not too cold”), each using the same attribute, but with different aspiration levels. For example, if the desired temperature range is 15–18C, then the goal for the not too cold criterion
Decision-Making Approaches
546 will be temperature 15C, while that for the not too hot criterion will be temperature 18C. Thus, for the purposes of further explanation, we shall assume that all attributes will be defined in one of the two senses defined by the above-bulleted items. The general thrust of the so-called goal programming or reference point approaches to MCDA is based firstly on defining deviational variables i(a) corresponding to the performance of each alternative in terms of each criterion, measuring the extent to which the goal is not met by alternative a, that is,
texts. For discrete choice, the calculations for each alternative are easily set up in a spreadsheet. For example, suppose that we are evaluating six policy alternatives in a regional water planning context and that four critical criteria have been identified, associated with the four quantitative attributes: investment cost ($m), water quality (ppm of contaminant), minimum flow levels in the river (m3/sec), and recreational access (thousands of person days per annum). Suppose that the values of these criteria for the six alternatives are as follows:
i(a) max{0,gi – zi(a)} for attributes defined in a maximizing sense and
Quality (ppm)
Minimum flow (m3/sec)
Recreational access (person days)
i(a) max{0,zi(a) – gi }
Alternative
Costs ($m)
for attributes defined in a minimizing sense. Algebraically (for purposes of inclusion in mathematical programming code), the deviational variables may be defined implicitly via constraints of the form:
Scenario A
93
455
1.8
160
Scenario B
127
395
1.9
190
Scenario C
88
448
1.5
185
Scenario D
155
200
2.5
210
zi(a) i(a) gi
Scenario E
182
158
3.1
255
Scenario F
104
305
1.7
220
for attributes defined in a maximizing sense and zi(a) – i(a) gi for attributes defined in a minimizing sense, linked to some process which minimizes all deviations as far as is possible. The key question at this stage relates to what is meant by minimizing all deviations. Very often, a simple and effective approach is simply to choose the alternative for which the sum of (possibly weighted) deviations is minimized. This is the basis of conventional goal programming. Without going into any detailed review at this stage, it is this author’s view that a more robust approach is to use the so-called Tchebycheff norm popularized in the approaches termed reference point techniques. In essence, we then identify the alternative a which minimizes a function of the form m
m
max [wii(a)] wii(a) i1
(2)
Note that the first two attributes require minimization and the latter two attributes require maximization. Suppose that goals are specified as follows: $120m for cost, 280 ppm for quality, 2.5 m3/sec for minimum flow, and 225 person days for recreational access. The unweighted deviations (i(a)) can be computed as follows:
Alternative
Costs
Quality
Minimum flow
Recreational access
Scenario A
0
175
0.7
65
Scenario B
7
115
0.6
35
Scenario C
0
168
1
40
Scenario D
35
0
0
15
Scenario E
62
0
0
0
Scenario F
0
25
0.8
5
i1
where is a suitably small positive number (typically something like 0.01) and wi are weights reflecting the relative importance of deviations on each goal. It is important to emphasize that these weights are related to tradeoffs between attributes in the vicinity of the aspiration levels and are dependent upon the specific scale of measurement used. The best way to assess these weights is to evaluate the allowable tradeoffs directly. The above process can be applied in either the discrete choice or the mathematical programming con-
Suppose that the following tradeoffs have been assessed as follows: a reduction of 0.1 m3/sec in the minimum flow would be equivalent in importance to changes of $4m in costs, 10 ppm in contaminants, and 10,000 person days for recreational access. Arbitrarily setting w3 1 (numerical weight for the minimum flow criterion), these tradeoffs translate into the following weights for the other criteria: w1 0.025 (costs), w2 0.01 (quality), and w4 0.01. Using these weights and 0.01, we obtain the following
Decision-Making Approaches
547
values of the function given by Eq. (2) for each of the alternatives: Scenario A
1.781
Scenario B
1.173
Scenario C
1.711
Scenario D
0.885
Scenario E
1.566
Scenario F
0.811
Scenario F is then indicated as the best compromise, followed closely by scenario D. The remainder are shown to be considerably worse in the sense of having large deviations for one or more criteria. For a small number of alternatives, as in the above example, the goal programming or reference point approach does not generate too much insight. The methods come much more into their own, however, when there are a large number of alternatives that have to be screened and especially when the problem has a mathematical programming structure. In the linear programming case, the trick is to minimize a new variable D, subject to the constraints Dwi i(a), to the constraints described above for implicitly defining the deviational variables and to the natural constraints of the problem. The proper setting up of the problem for solution would generally require the assistance of a specialist skilled in (multiobjective) linear programming, and we shall not attempt to provide all the details here.
D. Outranking Approaches In essence, the outranking approach attempts to characterize the evidence for and against assertions such as “alternative a is at least as good as alternative b,” rather than to establish any form of optimal selection per se. Initially, alternatives are compared in terms of each criterion separately, much as in value function approaches. The tendency is to make use of attribute measures [which we have previously termed zi(a)] to facilitate this comparison, although these attributes may be expressed on some form of nominal scale. The attribute values tend to be used in a relatively “fuzzy” sense, however, so that (for example) alternative a will only be inferred as definitely preferred to b if the difference zi(a) – zi(b) exceeds some threshold level. In determining whether alternative a can be said to be “at least as good as” alternative b, taking all criteria into account, two issues are taken into consideration:
1. Which criteria are concordant with the assertion? A measure of concordance is typically defined as the sum of weights associated with those criteria for which a is distinctly better than b, when the weights are standardized to sum to 1. It must be emphasized that the weights have a very different meaning to the tradeoff interpretation described for the other two schools of MCDA. For outranking, the weights may best be seen as a “voting power” allocated to each criterion, representing in an intuitive sense the power to influence outcomes that should be vested in each criterion. 2. Which criteria are strongly discordant with the assertion, to the extent that they could “veto” any consensus? A measure of discordance for attributes defined in a maximizing sense is typically defined by the magnitude of zi(b) – zi(a) (since by assumption zi(a)zi(b) for discordant attributes, when attributes are defined in a maximizing sense) relative to some predefined norm. The overall measure of discordance is then the maximum of the individual measures for each discordant criterion. In order to illustrate the concordance and discordance principles, consider the hypothetical comparison of two locations for a large new reservoir in an environmentally sensitive area which also contains a number of villages. Suppose that the options are to be compared in terms of four criteria: cost (in $m), number of people displaced, area of sensitive ecosystems destroyed (in thousands of acres), and impact on aquatic life (measured on a 0–10 nominal scale, where 0 implies no impact which is ecologically most desirable). Suppose assessments for the two locations have been made as follows: Cost ($m)
Number Area lost displaced (‘000 acres)
Ecological impact
Location A
18.15
200.15
30.15
7.15
Location B
25.15
450.15
5.15
4.15
0.35
0.25
0.25
0.15
Norm for 10.15 assessing discordance
350.15
30.15
9.15
Criterion weight
Location A is better than location B on cost and number displaced, and thus the concordance index for A versus B is 0.35 0.25 0.6. Correspondingly, the concordance for B versus A is 0.4.
548 The discordant criteria for A compared to B are area lost and ecological impact, with relative magnitudes 25/30 0.83 and 3/9 0.33, respectively, so that the overall measure of discordance is 0.83. Similarly, the measure of discordance for B compared to A is the maximum of 0.7 and 0.71, i.e., 0.71. The methods based on outranking principles compare all pairs of available alternatives in the above manner. Any one alternative a is said to outrank b if the concordance is sufficiently high and the discordance is sufficiently low. In some implementations, the outranking is viewed as “crisp,” i.e., an alternative either does or does not outrank another, with the decision being based on whether the concordance exceeds a predefined minimum level and the discordance does not exceed a predefined maximum level. In other implementations a fuzzy degree of concordance is constructed from the concordance and discordance measures. In either sense, the result is a measure of the extent to which the evidence favors one alternative over another. This could lead to elimination of some alternatives and/or the construction of a short list of alternatives for deeper evaluation. The techniques by which outranking methods establish partial or tentative rank orders of the alternatives are technically very complicated and beyond the scope of this article. Some details may be found in the books of Roy (1996) and of Belton and Stewart (2002). Outranking methods are relevant to situations in which (1) there are a discrete number of alternatives under consideration and (2) preference information such as detailed value trade-offs are not easily available (typically because the analysis is being carried out by expert groups on behalf of political decision makers who have been unwilling or unable to provide the sort of information required by the other two schools of MCDA).
V. RISK AND UNCERTAINTY IN DECISION MAKING All nontrivial decision making has to contend in some way with issues of risk and uncertainty. The future is always unknown, and consequences of decisions can never be predicted with certainty. Thus, even with the most careful and detailed analysis, perhaps as described in the previous section, the unexpected has to be expected! We thus conclude this article with a brief review of approaches that can be used in dealing with risk and uncertainty in decision making.
Decision-Making Approaches
A. Informal Sensitivity and Robustness Analysis One approach is simply to subject the results of decision analysis, such as that described in Section IV, to intensive sensitivity analysis. Each of the input assumptions will be critically evaluated in order to establish a plausible range of values or outcomes (recognizing, however, the dangers inherent in the anchoring and adjustment biases described in Section III.C, which may lead to underestimation of the range). The analysis may then be repeated for different assumptions across these ranges. The aim ultimately is to identify the course of action which is most robust in the sense of performing well over all plausible ranges of inputs, rather than that which optimizes performance for a single set of assumptions or inputs. Such sensitivity and robustness analysis is always to be recommended as part of the decision-making process. It must be realized, however, that this is not the panacea for all problems of risk and uncertainty. One of the difficulties is that the sensitivity analysis tends to have to proceed in a fairly ad hoc fashion by changing one or two input parameters at a time. In complex systems, results may be insensitive to changes in single inputs, but substantially more sensitive to certain combinations of inputs, and it is in general very difficult to identify these critical combinations. Some of the techniques from the soft-systems methodology of Checkland (1981) can be of value in this regard.
B. Statistical Decision Theory A separate article deals with decision theory, and the reader is referred to that for more details. In essence, however, statistical decision theory proceeds by establishing probability distributions on outcomes, using both subjective information and available data. However, we need to recall again the potential biases inherent in subjective probability assessments, as discussed in Section III. Decision-making values and goals are then captured in some form of utility function, representing the desirability of different outcomes. With this information, we can, in principle, identify the course of action which maximizes expected utility. This is an approach which falls very much into the category of rational approaches. While decision theory can provide useful insights (as much through the construction of probabilities and utilities as through the subsequent analysis), it needs to be used with caution. Some of the underlying theoretical assumptions have been challenged; consequently, the very exis-
Decision-Making Approaches tence of a “utility function” which can be assessed separately from the probabilities has been questioned. Furthermore, the extension of the utility function concept to the multicriteria type of problem discussed in Section IV raises a number of practical difficulties. Stronger assumptions are required in this case, which are even more difficult to verify, and the construction of the resultant utility models requires judgmentally rather demanding inputs from the decision maker.
C. Scenario Planning The concepts around the use of scenarios to represent future structural uncertainties while considering strategic planning options appear to have been developed within the Royal Dutch/Shell corporation and have been thoroughly documented by van der Heijden (1996). The idea is to construct, through an intensive brainstorming session, a small number of future scenarios. These are meant to be internally consistent descriptions of possible futures, describing a trajectory of changes in future conditions that band together in a coherent manner. Typically, a relatively small number (3–5) of detailed scenarios will be constructed, as it is important for decision makers to be able to compare decision alternatives across different scenarios, which becomes well nigh impossible if the number of scenarios exceeds the “magic number 7.” Scenario planning is similar to sensitivity analysis in the sense that there is value in identifying courses of action which are robust across all scenarios. The approach is, however, much more structured, and greater attention is paid to relationships between variables and to avoidance of biases such as anchoring and adjustment. To be done properly, scenario planning requires a considerable investment in time by senior management, but this is well justified for strategic decision making with far-reaching consequences.
D. Risk as a Criterion In some cases, it is possible to include avoidance of risk as a criterion and to handle it in the same way as the other criteria discussed in Section IV. This is, for example, routinely done in much of the portfolio investment analysis theory, where expectation and standard deviation of returns may be viewed as distinct decision criteria. In this case, the expectation is a measure of return under standard or expected conditions, while the standard deviation measures probable range of outcomes, i.e., the risk. What is thus intrinsically a monocriterion decision problem under uncertainty is ana-
549 lyzed as a bicriterion problem in which the uncertainty is subsumed into one of the criteria. The representation of risk, or risk avoidance, as a criterion allows the powerful tools of MCDA (Section IV) to be applied, which can be done at any level of detail appropriate to the decision context. In some cases, relatively “quick and dirty” tools will suffice if the decision consequences are limited in extent; in other cases, a deep analysis of risk preference is possible. The effective use of MCDA for this purpose is still a matter for future research. It should be noted, however, that measures of risk other than standard deviation (e.g., probabilities of certain undesirable or catastrophic consequences) appear to be more appropriate in some situations. The reader should also be warned that when using value function models for the MCDA, nominal “riskiness” scales (as an alternative to standard deviation) may easily violate the underlying assumptions of preferential independence and of the interval scale property and should be used with considerable caution.
SEE ALSO THE FOLLOWING ARTICLES Corporate Planning • Data Mining • Decision Support Systems • Decision Theory • Goal Programming • Strategic Planning for/of Information Systems • Uncertainty
BIBLIOGRAPHY Belton, V., and Stewart, T. J. (2002). Multiple Criteria Decision Analysis: An Integrated Approach, Kluwer Academic Publishers, Boston, MA. Checkland, P. (1981). Systems Thinking, Systems Practice, Wiley, New York. Eden, C., and Ackermann, F. (1998). Making Strategy: The Journey of Strategic Management, SAGE Publications, London. Goodwin, P., and Wright, G. (1997). Decision Analysis for Management Judgement, 2nd ed., Wiley, New York. Janis, I. L., and Mann, L. (1977). Decision Making, The Free Press, New York. Kahneman, D., Tversky, A., and Slovic, P. (1982). Judgement under uncertainty: Heuristics and biases. Cambridge UP, Cambridge. Kaplan, R. S., and Norton, D. P. (1996). The Balanced Scorecard, Harvard Business School Press, Boston, MA. Keeney, R. L. (1992). Value-Focused Thinking: A Path to Creative Decision Making, Harvard Univ. Press, Cambridge, MA. Rosenhead, J., and Mingers, J., Eds. (2001). Rational Analysis for a Problematic World Revisited, Wiley, New York. Roy, B. (1996). Multicriteria Methodology for Decision Aiding, Kluwer Academic Publishers, Dordrecht/Norwell, MA. Simon, H. A. (1976). Administrative Behavior, 3rd ed., The Free Press, New York. van der Heijden, K. (1996). Scenarios: The Art of Strategic Conversation, Wiley, New York. Von Winterfeldt, D., and Edwards, W. (1986). Decision Analysis and Behavioral Research, Cambridge Univ. Press, Cambridge, UK.
Decision Support Systems Clyde W. Holsapple University of Kentucky
I. INTRODUCTION II. DECISIONS III. DECISION SUPPORT SYSTEM FUNDAMENTALS
IV. CLASSIFICATION OF DECISION SUPPORT SYSTEMS V. CONCLUSION
GLOSSARY
processing system. This includes knowledge of any or all types (e.g., descriptive, procedural, reasoning) represented in a variety of ways (e.g., as databases, spreadsheets, procedural solvers, rule sets, text, graphs, forms, templates). language system The subsystem of a decision support system that consists of (or characterizes the class of) all acceptable problem statements. multiparticipant DSS A decision support system that supports multiple participants engaged in a decision-making task (or functions as one of the participants). presentation system The component of a DSS that consists of all responses a problem processor can make. problem-processing system That subsystem of a decision support system that accepts problems stated in terms of the language system and draws on the knowledge system in an effort to produce solutions. procedural knowledge Knowledge about how to produce a desired result by carrying out a prescribed series of processing steps. reasoning knowledge Knowledge about what circumstances allow particular conclusions to be considered to be valid.
artificial intelligence A field of study and application concerned with identifying and using tools and techniques that allow machines to exhibit behavior that would be considered intelligent if it were observed in humans. decision The choice of one from among a number of alternatives; a piece of knowledge indicating a commitment to some course of action. decision making The activity that culminates in the choice of an alternative; the activity of using knowledge as raw materials in the manufacture of knowledge about what to do. decision support system (DSS) A computer-based system composed of a language system, presentation system, knowledge system, and problem-processing system whose collective purpose is the support of decision-making activities. descriptive knowledge Knowledge about past, present, and hypothetical states of an organization and its environment. knowledge-based organization An organization in which the primary, driving activity is the management of knowledge. knowledge management The activity of representing and processing knowledge. knowledge-management technique A technique for representing knowledge in terms of certain kinds of objects and for processing those objects in various ways. knowledge system That subsystem of a decision support system in which all application-specific knowledge is represented for use by the problem-
I. INTRODUCTION Building on initial concepts introduced and demonstrated in the 1970s, the field of decision support systems (DSS) has progressed to a stage where these systems are routinely used by decision makers around the world. Organizations expend large sums to ensure that their employees, customers, suppliers, and
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
551
Decision Support Systems
552 partners have computer-based systems that provide the knowledge they need to make timely, sound decisions. These DSS have many variants, being implemented for a wide variety of decisional applications, facilitating various aspects of decision-making processes, and utilizing a variety of technologies. They range from systems that support individuals making decisions to multiparticipant DSS, which support the collaboration of multiple individuals in making a joint decision, in making interrelated decisions, or in transorganizational decision making. An appreciation of the objectives, characteristics, uses, development, and impacts of decision support systems begins with an understanding of decisions and decision making. This leads to an examination of support, why decision makers need support, and what kinds of decision support could be beneficial. Against this background, consideration of computer-based systems that can deliver support for decisions unfolds. Included is a consideration of characteristics that distinguish such systems from other types of business computing systems, an architecture of DSS components, and an examination of various classes of DSS. Classifications are based on technology used in developing the system and on whether the system supports an individual versus a multiparticipant decision maker.
II. DECISIONS Immersed in a competitive, knowledge-rich world, managers are daily confronted with the task of making decisions about allocations of their resources, about handling disturbances to their operations and plans, about taking advantage of new opportunities, and about interacting or negotiating with others. Each decision involves the use of knowledge of varying kinds of amounts, and many can benefit from (or even require) the use of technology known as DSS. Similarly, managers in an organization’s suppliers, partners, and competitors are faced with their own decision challenges and accompanying knowledge needs; these, too, can benefit from systems that help meet those needs. Moreover, consumers make decisions about products and services to satisfy their preferences. Increasingly, web-oriented decision support systems are available to supply knowledge for consumers’ decisional efforts. The study of DSS has many technical aspects. But before delving into these, it is important to appreciate the setting in which they are used. The setting is a competitive, knowledge-rich world in which managers make decisions about what to do with their or-
ganizations’ resources. Many decisions, ranging from simple to complex, are made every day. Each decision involves the use of knowledge of varying kinds and amounts, and many can benefit from (or even require) the use of technology known as DSS.
A. Making a Decision A classic view among management theorists is that a decision is a choice about a course of action, about a strategy for action, or leading to a desired outcome. The classic view of decision making is that it is an activity culminating in the selection of one from among multiple alternative courses of action. Decision-making activity identifies alternative courses of action and selects one of them as the decision. The number of alternatives identified and considered could be very large. The work involved in becoming aware of alternatives often makes up a major share of a decision-making episode. It is concerned with such questions as where do alternatives come from, how many alternatives are enough, should more effort be devoted to uncovering alternatives, and how can large numbers of alternatives be managed so none is forgotten or garbled? One role of a DSS is to help decision makers cope with such issues. Ultimately, one of the alternatives is selected. But, which one? This depends on a study of the alternatives in an effort to understand their implications. The work involved in selecting one of the alternatives usually makes up a major share of a decision-making episode. It is concerned with such questions as to what extent should each alternative be studied, how reliable is our expectation about an alternative’s impacts, are an alternative’s expected impacts compatible with our purposes, what basis should be used to compare alternatives to each other, and what strategy will be followed in arriving at a choice. Another role of a DSS is to support the study of alternatives. Some DSS may even recommend the selection of a particular alternative and explain the rationale underlying that advice. Complementing the classic view of decisions, there is the knowledge-based view which holds that a decision is a piece of knowledge indicating the nature of an action commitment. A decision could be a piece of descriptive knowledge. For instance, “spend $10,000 on advertising in the next quarter” describes a commitment of what to do about advertising expenditures. This decision is one of many alternative descriptions (e.g., spend $5000) that could have been chosen. A decision could be a piece of procedural
Decision Support Systems knowledge, involving a step-by-step specification of how to accomplish something. For instance, “determine the country with the most favorable tax structure, identify the sites within that country having sufficient qualified work forces, then visit those sites to assess their respective qualities of life, and, from among those that are acceptable, locate the new factory at the site with the best transportation infrastructure” is a chunk of procedural knowledge committing an organization to a certain sequence of actions. It is one of many alternative procedures that could have been chosen. When we regard a decision as a piece of knowledge, making a decision means we are making a new piece of knowledge that did not exist before. We are manufacturing new knowledge by transforming or assembling existing pieces of knowledge. A DSS is a system that aids the manufacturing process, just as machines aid in the manufacturing of material goods. Not only is there the new piece of knowledge called a decision, but the manufacturing process itself may have resulted in additional new knowledge as byproducts. For instance, in manufacturing a decision, we may have derived other knowledge as evidence to justify our decision. We may have produced knowledge about alternatives that were not chosen, including expectations about their possible impacts. More fundamentally, we may have developed knowledge about improving the decision manufacturing process itself. Such by-products can be useful later in making other decisions.
1. Knowledge in Decision Making A decision maker possesses a storehouse of knowledge, plus abilities to both alter and draw on the contents of that storehouse. This characterization holds for all types of decision makers—individuals, teams, groups, and organizations. In the multiparticipant cases, both the knowledge and the abilities are distributed among participants. When using a DSS, a decision maker’s storehouse is augmented by computerbased representation of knowledge and the decision maker’s ability to process knowledge is supplemented by the DSS’s ability to process those representations. From a decision-oriented perspective, three primary types of knowledge have been identified: descriptive, procedural, and reasoning. Each of these can exist in explicit or tacit modes within a decision maker. Each can be explicitly represented in and processed by a DSS using a variety of computer-based techniques. Knowledge about the state of some world is called descriptive knowledge. Commonly referred to as data
553 or information, it includes descriptions of past, present, future, and hypothetical situations. A decision maker can acquire descriptive knowledge via observation and can produce it by transforming or assembling existing pieces of knowledge. Knowledge about how to do something is quite different from knowledge of a state. Because it is concerned with step-bystep procedure for accomplishing some task, it is called procedural knowledge. As a decision maker comes into possession of more or better procedural knowledge, we say that decision maker is more skilled. Reasoning knowledge specifies the conclusion that can be drawn when a specified situation exists. A code of conduct, a set of regulations, a customer service policy, rules that prescribe forecasting approaches, and rules used to diagnose causes of situations are all examples of reasoning knowledge. Whereas procedural knowledge is “know how” and descriptive knowledge is “know what,” reasoning knowledge is “know why.” By putting together pieces of reasoning knowledge, we can reach logical conclusions and justify them by citing other reasons. This activity is known as drawing inferences. The reasoning knowledge that fuels an inference may be acquired or derived by a decision maker. Either way, as a decision maker comes to possess more or better knowledge, we say that decision maker is more of an expert. The raw materials that can go into a decisionmaking process are pieces of descriptive, procedural, and reasoning knowledge. These are common ingredients in decision-making recipes. During the process, varying amounts of descriptive, procedural, and reasoning knowledge may be added at different times in different combinations. That is, pieces of different types of knowledge can be made to interact, and the value of one piece may depend on having another available at the proper time. DSSs can store and use these types of knowledge to supply it when needed in a decision process or produce new knowledge for the decision process.
2. Structured versus Unstructured Decisions When issues relevant to making a decision are well understood, the decision tends to be structured. The alternatives from which the choice is made are clearcut, and each can be readily evaluated in light of the organization’s purposes and goals. Put another way, all the knowledge required to make the decision is available in a form that makes it straightforward to use. Often times, however, the issues pertinent to producing a decision are not well understood. Some
554 issues may be entirely unknown to the decision maker, which is a hallmark of unstructured decisions. The alternatives from which a choice will be made are vague, are difficult to compare and contrast, or cannot be easily evaluated with respect to the organization’s purposes and goals. It may even be that there is great difficulty in attempting to discover what the alternatives are. In other words, the knowledge required to produce a decision is unavailable, difficult to acquire, incomplete, suspect, or in a form that cannot be readily used by the decision maker. A semistructured decision lies between the two extremes: some aspects of the decision-manufacturing activity may be structured, whereas others are not. Decision support systems can be developed to assist in structured, semistructured, or unstructured situations. Consider the structured decision of selecting a travel plan for making a regular monthly inspection visit to a major supplier’s factory. From destination, duration, allowable dates to travel, budget limits, traveler preferences, and prior travel plans, the parameters for the decision are well known. All that is missing for deciding on a satisfactory travel plan is a characterization of alternatives that are available for the upcoming trip (e.g., costs, times, amenities). Characterizations of these alternatives could be found and presented by a DSS each time such a decision is to be made; the DSS might even rank the alternatives based on known criteria. As part of making a semistructured decision about amounts of a part to order from various suppliers, a DSS might solve such problems as estimating demand or deriving an “optimal” allocation scheme. The decision maker uses such solutions along with other knowledge (e.g., supplier dependability in part quality and delivery times, importance of cultivating ongoing supplier relationships, impacts on orders of other parts from the same suppliers) in reaching the decision. In the course of making an unstructured decision about how to react to a revolutionary technological or competitive change that may impact the viability of a current product offering, a DSS may be useful in “what if” analysis that shows impacts of alternative courses of action or a DSS may help explore internal and external knowledge sources in order to stimulate or provoke insights about coping with the unprecedented situation.
3. Decision-Making Phases and Problem Solving Flows Three decision-making phases are widely recognized: intelligence, design, and choice. Each one is susceptible to computer-based support. The intelligence
Decision Support Systems phase is a period when the decision maker is alert for occasions to make decisions, preoccupied with collecting knowledge, and concerned with evaluating it in light of the organization’s purpose. The design phase is a period when the decision maker formulates alternative courses of action, analyzes those alternatives to arrive at expectations about the likely outcomes of choosing each, and evaluates those expectations with respect to the organization’s purpose. During the design phase, the decision maker could find that additional knowledge is needed. This would trigger a return to the intelligence phase to satisfy that need before continuing with the design activity. In a choice phase, the decision maker exercises authority to select an alternative. This is done in the face of internal and external pressures related to the nature of the decision maker and the decision context. It can happen that none of the alternatives are palatable, that several competing alternatives yield very positive evaluations, or that the state of the world has changed significantly since the alternatives were formulated and analyzed. Nevertheless, there comes a time when one that is “good enough” or “best” must be selected. If that time has not yet been reached, the decision maker may return to one of the two earlier phases to collect more up-to-date knowledge, formulate new alternatives, reanalyze alternatives, reevaluate them, etc. Within each phase of a decision-making process, the decision maker initiates various subactivities. Each of these activities is intended to solve some problem. The decision maker might need to solve such problems as acquiring a competitor’s sales figures, predicting the demand for a product, assessing the benefits and costs of a new law, inventing a feasible way of packaging a product into a smaller box, or finding out the cultural difficulties of attempting to market a certain product in foreign countries. The overall task of reaching a decision is a superproblem. Only if we solve subproblems can we solve the overall decision problem. A decision-making process is fundamentally one of both recognizing and solving problems along the way toward the objective of producing a decision. For structured decisions, the path toward the objective is well charted. The problems to be surmounted are recognized easily, and the means for solving them are readily available. Unstructured decisions take us into uncharted territory. The problems that will be encountered along the way are not known in advance. Even when stumbled across, they may be difficult to recognize and subsequently solve. Ingenuity and an exploratory attitude are vital for coping with these types of decisions.
Decision Support Systems Decision support systems are developed to facilitate the recognizing and/or solving of problems within a decision-making process. In the case of a multiparticipant decision maker, they can assist in communications and coordinating the problemsolving flows among participants working on various problems simultaneously, in parallel, or in some necessary sequence.
B. The Need for Support Computer systems to support decision makers are not free. Not only is there the cost of purchasing or developing a DSS, there are also costs associated with learning about, using, and maintaining a DSS. It is only reasonable that the benefits of a DSS should be required to outweigh its costs. Although some DSS benefits can be difficult to measure in precise quantitative terms, all the benefits are the result of a decision maker’s need for support. When a decision maker needs support it is because of cognitive, economic, or time limits, or because of competitive pressures. Cognitive limits refer to limits in the human mind’s ability to store and process knowledge. Because decision making is a knowledge-intensive activity, cognitive limits substantially restrict an individual’s decisionmaking efficiency and effectiveness. If these limits are relaxed, decision-maker productivity can improve. A DSS serves as an extension to a person’s innate knowledge-handling skills, allowing problems to be solved more reliably or rapidly. To relax cognitive limits as much as possible, we could consider forming a very large team. But as a team incorporates more and more participants, the proportion of activity spent in solving communication and coordination problems rises relative to the problem solving directly concerned with making the decision. Thus increasing a team’s size runs into economic limits not only in terms of paying and equipping more participants, but also with respect to increased communication and coordination costs. Decision support systems can be a less expensive alternative by substituting for participants in performing knowledge handling tasks or by facilitating the communication and coordination among the participants in a decision process. A third limit that decision makers commonly encounter is a time limit. A decision maker may be blessed with extraordinary cognitive abilities and vast monetary resources but very little time. Time limits can put severe pressure on the decision maker, increasing the likelihood of errors and poor-quality de-
555 cisions. There may not be sufficient time to consider relevant knowledge, to solve relevant problems, or to employ a desirable decision-making strategy. Because computers can process some kinds of knowledge much faster than humans, are not error-prone, work tirelessly, and are immune to stresses from looming deadlines, DSSs can help lessen the impacts of time limits. Aside from relaxing limits on a decision maker, DSSs are needed for another important reason. Decision makers and organizations often find themselves in situations where their continued success—or even their outright survival—depends on being competitive. If one competitor successfully uses DSSs for better decision making and another does not, then the second competitor will be at a competitive disadvantage. To keep pace, it is prudent to consider using DSSs internally and providing them to customers, suppliers, and partners. Beyond this, some organizations actively seek out opportunities for using DSSs in innovative ways in order to achieve competitive advantages. To summarize, the nature of support a DSS can offer to its user will normally include at least one of the following: 1. It alerts the user to a decision-making opportunity or challenge. 2. It recognizes problems that need to be solved as part of the decision-making process. 3. It solves problems recognized by itself or by the user. 4. It facilitates or extends the user’s ability to process (e.g., acquire, transform, explore) knowledge. 5. It offers advice, expectations, evaluations, facts, analyses, or designs to the user. 6. It stimulates the user’s perception, imagination, or creative insight. 7. It coordinates or facilitates interactions among participants in multiparticipant decision makers.
III. DECISION SUPPORT SYSTEM FUNDAMENTALS One purpose of a DSS is to help problem-solving flows go more smoothly or rapidly: stimulating the user to perceive problems needing to be solved, breaking problems posed by the user into subproblems, actually solving problems posed by a user, and possibly combining and synthesizing solutions of subproblems into the solution of a larger problem. Traditional DSS definitions suggest that the purpose of a DSS is to aid
556 decision makers in addressing unstructured or semistructured decisions. However, some DSSs are used to help with structured decisions by handling large volumes of knowledge or solving complex subproblems more rapidly and reliably than humans. Nevertheless, the DSS emphasis is definitely on supporting decisions that its users regard as less than fully structured. Ultimately, the purpose of a DSS is to help a decision maker manage knowledge. A DSS accepts, stores, uses, derives, and presents knowledge pertinent to the decisions being made. Its capabilities are defined by the types of knowledge with which it can work, the ways in which it can represent these various types of knowledge, and its skills in processing these representations.
A. DSS Forerunners One way to appreciate the characteristics of a DSS is to compare and contrast them with traits of two other major types of business computing systems: data processing systems and management information systems (MIS). Both predate the advent of computer-based decision support systems. All three share the trait of being concerned with record keeping. On the other hand, the three kinds of business computing systems differ in various ways, because each serves a different purpose in the management of an organization’s knowledge resources In the 1950s and 1960s, data processing (DP) systems dominated the field of business computing. Their main purpose was and is to automate the handling of large numbers of transactions. At the heart of a DP system lies a body of descriptive knowledge (i.e., data), which is a computerized record of what is known as a result of various transactions having happened. In addition, a DP system endows the computer with two major abilities related to this stored data: record keeping and transaction generation. The first enables the computer to keep the records up to date in light of incoming transactions. The second ability is concerned with the computerized production of outgoing transactions based on the stored descriptive knowledge, transmitted to such targets as customers, suppliers, employees, or governmental regulators. Administrators of a DP system are responsible for seeing that record keeping and transaction generation abilities are activated at proper times. Unlike a DP system, the central purpose of MIS was and is to provide managers with periodic reports that recap certain predetermined aspects of an organization’s past operations. Giving managers regular snapshots of what has been happening in the organization
Decision Support Systems helps them in controlling their operations. Whereas DP is concerned with transforming transactions into records and generating transactions from records, the MIS concern with record keeping focuses on using this stored descriptive knowledge as a base for generating recurring standard reports. An MIS department typically is responsible for development, operation, and administration of DP systems and the MIS. Information contained in standard reports from an MIS certainly can be factored into decision-making activities. When this is the case, an MIS could be fairly regarded as a kind of DSS. However, the nature of support it provides is very limited due to several factors: its reports are predefined, they tend to be issued periodically, and they are based only on descriptive knowledge. The situation surrounding a decision maker can be very dynamic. Except for the most structured kinds of decisions, information needs can arise unexpectedly and change more rapidly than an MIS can be built or revised by the MIS department. Even when some needed information exists in a stack of reports accumulated from an MIS, it may be buried within other information held by a report, scattered across several reports, not presented in a fashion that is most helpful to the decision maker, or in need of further processing. Report generation by an MIS typically follows a set schedule. However, decisions that are not fully structured tend to be required at irregular intervals or unanticipated times. Knowledge needed for these decisions should be available on an ad hoc, spur-of-the-moment, basis. Another limit on an MIS’s ability to support decisions stems from its exclusive focus on managing descriptive knowledge. Decision makers frequently need to manage procedural and/or reasoning knowledge as well. They need to integrate the use of these kinds of knowledge with ordinary descriptive knowledge.
B. DSS Traits and Benefits Ideally, a decision maker should have immediate, focused, clear access to whatever knowledge is needed on the spur of the moment in coping with semistructured or unstructured decisions. The pursuit of this ideal separates DSS from their DP and MIS ancestors and suggests traits we might expect to observe in a DSS: 1. A DSS includes a body of knowledge that describes some aspects of the decision-maker’s world, may specify how to accomplish various tasks, and may indicate what conclusions are valid in various circumstances.
Decision Support Systems 2. A DSS has an ability to acquire and maintain descriptive knowledge and possibly other kinds of knowledge as well. 3. A DSS has an ability to present knowledge on an ad hoc basis in various customized ways as well as in standard reports. 4. A DSS has an ability to select any desired subset of stored knowledge for either presentation or for deriving new knowledge in the course of problem recognition and/or problem solving. 5. A DSS can interact directly with a decision maker or a participant in a decision maker in such a way that the user has flexibility in choosing and sequencing knowledge management activities. These traits combine to amplify a decision maker’s knowledge-management capabilities and loosen cognitive, temporal, and economic constraints. The notion of DSSs arose in the early 1970s. Within a decade, each of the traits had been identified as important and various DSSs were proposed or implemented for specific decision-making applications. By the late 1970s new technological developments were emerging that proved to have a tremendous impact on the DSS field and the popularization of DSSs in the 1980s and beyond: microcomputers, electronic spreadsheets, management science packages, and ad hoc query interfaces. Specific benefits realized from a particular DSS depend on the nature of the decision maker and the decision situation. Potential kinds of DSS benefits include the following: 1. In a most fundamental sense, a DSS augments the decision maker’s own innate knowledge management abilities. It effectively extends the decision maker’s capacity for representing and processing knowledge in the course of manufacturing decisions. 2. A decision maker can have the DSS solve problems that the decision maker alone would not even attempt or that would consume a great deal of time due to their complexity and magnitude. 3. Even for relatively simple or structured problems encountered in decision making, a DSS may be able to reach solutions faster and/or more reliably than the decision maker. 4. Even though a DSS may be unable to solve a problem facing the decision maker, it can be used to stimulate the decision maker’s thoughts about the problem. For instance, the decision maker may use the DSS for exploratory browsing, hypothetical analysis, or getting advice about dealing with the problem.
557 5. The very activity of constructing a DSS may reveal new ways of thinking about the decision domain or even partially formalize various aspects of decision making. 6. A DSS may provide additional compelling evidence to justify a decision-maker’s position, helping the decision maker secure agreement or cooperation of others. Similarly, a DSS may be used by the decision maker to check on or confirm the results of problems solved independently of the DSS. 7. Due to the enhanced productivity, agility, or innovation a DSS fosters within an organization, it may contribute to an organization’s competitiveness. Because no one DSS provides all these benefits to all decision makers in all decision situations, there are frequently many DSSs within an organization helping to manage its knowledge resources. A particular decision maker may make use of several DSSs within a single decision-making episode or across different decision-making situations.
C. The Generic Architecture Generally, DSS can be defined in terms of four essential aspects: a language system (LS), a presentation system (PS), a knowledge system (KS), and a problem-processing system (PPS). The first three are systems of representation. An LS consists of all messages the DSS can accept. A PS consists of all messages the DSS can emit. A KS consists of all knowledge the DSS has stored and retained. By themselves, these three kinds of systems can do nothing. They simply represent knowledge, either in the sense of messages that can be passed or representations that have been accumulated for possible processing. These representations are used by the fourth element: the PPS, which is the active part of a DSS, the DSS’s software engine. As its name suggests, a PPS is what tries to recognize and solve problems during the making of a decision. Figure 1 illustrates how the four subsystems of a DSS are related to each other and to a DSS user. Using its knowledge-acquisition ability, a PPS acquires knowledge about what a user wants the DSS to do or what is happening in the surrounding world. Such knowledge is carried in LS messages that serve as user requests or system observations. The PPS may draw on KS contents when using its acquisition ability. The knowledge-acquisition in the KS or an interpreted message can cause the PPS’s other abilities to spring into action. When a user’s request is for the solution to some problem, the knowledge-selection/derivation
Decision Support Systems
558
Figure 1 A generic framework of decision support systems.
ability comes into play. The PPS selectively recalls or derives knowledge that forms a solution. When a user’s request is for clarification of a prior response or for help in stating a request, the selection/derivation ability may or may not be exercised, depending on whether it needs the KS to produce the content of its response. The PPS can issue a response to the user, by choosing to present one of the PS elements. The presentation choice is determined by the processing, often drawing on KS contents. This simple architecture captures the fundamental aspects common to all DSSs. To fully appreciate the nature of any particular DSS, one must know about requests that make up its LS, responses that make up its PS, knowledge representations allowed or existing in its KS, and knowledge-processing capabilities of its PPS.
D. Tools for Developing DSSs Development tools are essential for building DSSs. The tools chosen for developing a particular DSS strongly influence not only the development process, but also the features that the resultant DSS can offer to a user.
A particular tool is oriented toward one or more knowledge-management techniques (e.g., text, spreadsheet, database, solver, or rule management). Conversely, a particular technique (in its many possible variants) is offered by more than one development tool. Thus, tools can be categorized in terms of the knowledgemanagement techniques they furnish. A spreadsheet tool offers some variant of the spreadsheet technique for knowledge management, a database tool provides some variant of a database technique for managing knowledge, etc. Although many tools tend to emphasize one technique or another, vestiges of additional techniques are often apparent. Some tools furnish healthy doses of multiple techniques. Tools can play different roles in a DSS development process. An intrinsic tool (e.g., Microsoft’s Excel software) serves as the PPS of the developed DSS and tends to furnish a ready-made LS and PS. With such a tool, DSS development becomes a matter of populating the KS with representations that the tool is able to process. Because they do not require the programming of a PPS, intrinsic tools are widely used by nontechnical persons to build their own DSSs. A partially intrinsic tool furnishes part of the DSS’s prob-
Decision Support Systems lem processor. The database control system used to operate on database repositories is an example of such a tool. An extrinsic tool does not participate in a PPS, but helps the developer produce all or part of the PPS or to create some portion of the KS contents. Tools in these two categories are of interest primarily to experienced or professional developers. Another angle from which to examine development tools involves the types of integration they permit within DSSs. This approach is relevant whenever multiple knowledge-management techniques are employed within the bounds of a single DSS. These techniques may be integrated within a single tool or across multiple tools. In the former case, nested and synergistic integration are distinct possibilities. In the latter case, integration can be via direct format conversion, clipboard, or common format approaches.
IV. CLASSIFICATION OF DECISION SUPPORT SYSTEMS Decision support systems can be classified in terms of the knowledge-management techniques used to develop them. In many cases, the focus is on a single technique, but compound DSSs employing multiple techniques are also common. A different kind of classification distinguishes DSSs that incorporate artificial intelligence methods from those that do not. Yet another is concerned with DSSs for multiparticipant decision makers.
A. Technique-Oriented Classes One way of looking at KS contents and PPS abilities is in terms of the knowledge-management techniques employed by a DSS. This gives rise to many special cases of the generic DSS architecture, each characterizing a certain class of DSSs by restricting KS contents to representations allowed by a certain knowledge-management technique and restricting the PPS abilities to processing allowed by that technique. The result is a class of DSSs with the generic traits suggested in Fig. 1, but specializing in a particular technique for representing and processing knowledge.
1. Text-Oriented DSSs For centuries, decision makers have used the contents of books, periodicals, and other textual repositories of knowledge as raw materials in the making of decisions. The knowledge embodied in text might be de-
559 scriptive, such as a record of the effects of similar decision alternatives chosen in the past or a description of an organization’s business activities. It could be procedural knowledge, such as a passage explaining how to calculate a forecast or how to acquire some needed knowledge. The text could embody reasoning knowledge, such as rules indicating likely causes of or remedies for an unwanted situation. Whatever its type, the decision maker searches and selects pieces of text to become more knowledgeable, to verify impressions, or to stimulate ideas. By the 1980s, text management had emerged as an important, widely used computerized means for representing and processing pieces of text. Although its main use has been for clerical activities, it can also be of value to decision makers. A text-oriented DSS has a KS comprised of textual passages of potential interest to a decision maker. The PPS consists of software that performs various manipulations on contents of any of the stored text. It may also involve software that can help a user in making requests. The LS contains requests corresponding to the various allowed manipulations. It may also contain requests that let a user ask for assistance covering some aspect of the DSS. The PS consists of images of stored text that can be projected on a console screen, plus messages that can help the decision maker use the DSS. When a DSS is built with a hypertext technique, each piece of text in the KS is linked to other pieces of text that are conceptually related to it. There are additional PPS capabilities allowing a user to request the traversal of links. In traversing a link, the PPS shifts its focus (and the user’s focus) from one piece of text to another. Ad hoc traversal through associated pieces of text continues at a user’s discretion. The benefit of this hypertext kind of DSS is that it supplements a decision maker’s own capabilities by accurately storing and recalling large volumes of concepts and connections that he or she is not inclined personally to memorize. The World Wide Web furnishes many examples of hypertext DSSs.
2. Database-Oriented DSSs Another special case of the DSS framework consists of those systems developed with a database (e.g., relational) technique of knowledge management. These have been used since the early years of the DSS field. Like textoriented DSSs, they aid decision makers by accurately tracking and selectively recalling knowledge that satisfies a particular need or serves to stimulate ideas. However, the knowledge handled by a database-oriented DSS tends to be primarily descriptive, rigidly structured, and often
560 extremely voluminous. The computer files that make up its KS collectively are called a database. The PPS has three kinds of software: a database control system, an interactive query processing system, and various custombuilt processing systems. One, but not both, of the latter two could be omitted from the DSS. The database control system consists of capabilities for manipulating database structures and contents. These capabilities are used by the query processor and custom-built processors in their effort at satisfying user requests. The query processing system is able to respond to certain standard types of requests for data retrieval (and perhaps for help). These requests constitute a query language and make up part of the DSS’s language system. Upon receiving an LS request, the query processor issues an appropriate sequence of commands to the database control system, causing it to extract the desired values from the database. These values are then presented in some standard listing format for the user to view. Users may prefer to deal with custom-built processors rather than standard query processors (e.g., faster responses, customized presentation of responses, more convenient request language). Such a processor is often called an application program, because it is a program that has been developed to meet the specific needs of a marketing, production, financial, or other application.
3. Spreadsheet-Oriented DSSs In the case of a text-oriented DSS, procedural knowledge can be represented in textual passages in the KS. About all the PPS can do with such a procedure is display it to the user and modify it at the user’s request. It is up to the user to carry out the procedure’s instructions, if desired. In the case of a databaseoriented DSS, extensive procedural knowledge is not easily represented in the KS. However, the application programs that form part of the PPS can contain instructions for analyzing data retrieved from the database. By carrying out these procedures the PPS can show the user new knowledge (e.g., a sales forecast) that has been derived from KS contents (e.g., records of past sales trends). But, because they are part of the PPS, a user cannot readily view, modify, or create such procedures, as can be done in the text-oriented case. Using the spreadsheet technique of knowledge management, a DSS user not only can create, view, and modify procedural knowledge held in the KS, but also can tell the PPS to carry out the instructions they contain. This gives DSS users much more power in handling procedural knowledge than is achievable with either text management or database manage-
Decision Support Systems ment. In addition, spreadsheet management is able to deal with descriptive knowledge. However, it is not nearly as convenient as database management in handling large volumes of descriptive knowledge, nor does it allow a user to readily represent and process data in textual passages or hypertext documents. Spreadsheet-oriented DSSs are in widespread use today, being especially useful for studying implications of alternative scenarios. The KS of such a DSS is composed of spreadsheet files, each housing a spreadsheet. Each spreadsheet is a grid of cells, each having a unique name based on its location in the grid. In addition to its name, each cell can have a definition and a value. The definition can be a constant (i.e., descriptive) or a formula (i.e., procedural). The PPS allows a user to change cell definitions, calculate cell values, view those values, and customize the LS and PS (e.g., via macros).
4. Solver-Oriented DSSs Another special class of DSS is based on the notion of solvers. A solver is a procedure consisting of instructions that a computer can execute in order to solve any member of a particular class of problems. For instance, one solver might be able to solve depreciation problems while another solves portfolio analysis problems, and yet another solves linear optimization problems. Solver management is concerned with the storage and use of a collection of solvers. A solver-oriented DSS is frequently equipped with more than one solver, and the user’s request indicates which is appropriate for the problem at hand. The collection of available solvers is often centered around some area of problems such as financial, economic, forecasting, planning, statistical, or optimization problems. There are two basic approaches for incorporating solvers into a DSS: fixed and flexible. In the fixed approach, solvers are part of the PPS, which means that a solver cannot be easily added to or deleted from the DSS nor readily modified. The set of available solvers is fixed, and each solver in that set is fixed. About all a user can choose to do is execute any of the PPS solvers. This ability may be enough for many users’ needs. However, other users may need to add, delete, revise, and combine solvers over the lifetime of a DSS. This flexibility is achieved when solvers are treated as pieces of knowledge in the KS. With this flexible approach, the PPS is designed to manipulate (e.g., create, delete, update, combine, coordinate) solvers according to user requests. In either case, the KS of a solver-oriented DSS is typically able to hold data sets. A data set is a parcel
Decision Support Systems of descriptive knowledge that can be used by one or more solvers in the course of solving problems. It usually consists of groupings or sequences of numbers organized according to conventions required by the solvers. For example, PPS editing capabilities may be used to create a data set composed of revenue and profit numbers for each of the past 15 years. This data set might be used by a statistics solver to give the averages and standard deviations. The same data set might be used by a forecasting solver to produce a forecast of next year’s profit, assuming a certain revenue level for the next year. In addition to data sets, it is not uncommon for a solver-oriented DSS to hold problem statements and report format descriptions in its KS. Because the problem statement requests permitted by the LS can be very lengthy, fairly complex, and used repeatedly, it may be convenient for a user to edit them (i.e., create, recall, revise them), much like pieces of text. Each problem statement indicates the solver and mode of presentation to be used in displaying the solution. The latter may designate a standard kind of presentation or a customized report. The format of such a report is knowledge the user specifies and stores in the KS.
5. Rule-Oriented DSSs The knowledge-management technique that involves representing and processing rules evolved within the field of artificial intelligence, giving computers the ability to manage reasoning knowledge. A rule has the basic form: If (premise), Then (conclusion), Because (reason). A rule says that if the possible situation can be determined to exist, then the indicated actions should be carried out for the reasons given. In other words, if the premise is true, then the conclusion is valid. The KS of a rule-oriented DSS holds one or more rule sets. Each rule set pertains to reasoning about what recommendation to give a user seeking advice on some subject. It is common for the KS to also contain descriptions of the current state of affairs, which can be thought of as values that have been assigned to variables. The problem processor for a rule-oriented DSS uses logical inference (i.e., reasons) with a set of rules to produce advice sought by a user. The problem processor examines pertinent rules in a rule set, looking for those whose premises are true for the present situation. This situation is defined by current state descriptions and the user’s request for advice. When the PPS finds a true premise, it takes the actions specified in that rule’s conclusion. This action sheds further
561 light on the situation, which allows premises of still other rules to be established as true, causing actions in their conclusions to be taken. Reasoning continues in this way until some action is taken that yields the requested advice or the PPS gives up due to insufficient knowledge in its KS. The PPS also has the ability to explain its behavior both during and after conducting the inference.
B. Compound DSS Each of the foregoing special cases of the generic DSS framework supports a decision maker in ways that cannot be easily replicated by a DSS oriented toward a different technique. If a decision maker would like the kinds of support offered by multiple knowledgemanagement techniques, the options are use multiple DSSs (each oriented toward a particular technique) or a compound DSS which is a single DSS that encompasses multiple techniques. Just like a singletechnique DSS, a compound DSS is a special case of the generic framework shown in Fig. 1. Its PPS is equipped with the knowledge-manipulation abilities of two or more techniques and its KS holds knowledge representations associated with each of these. An oft-cited type of compound DSS combines the database and flexible-solver techniques into a single system in which the KS is comprised of a “model base” and a database. Here, model base is the name given to the solver modules existing in a KS (i.e., procedural knowledge). The notion of data sets in a KS is replaced by a formal database (i.e., descriptive knowledge). Correspondingly, the PPS includes “model base-management system” software for manipulating solver modules in the model base portion of the KS by selecting those pertinent to a problem at hand and by executing them to derive new knowledge. This PPS also includes database management system software for manipulating data in the form of records held by the database portion of the KS by selecting what data are to be used by solver modules for the problem at hand. A “dialog generation and management system” is the name given to the PPS’s knowledge acquisition and presentation abilities; it is concerned with interpreting user requests, providing help, and presenting responses to a user. A widely used kind of compound DSS combines database and fixed-solver techniques. The database (relational or multidimensional in structure) may be a repository of real-time information or a data warehouse, which is an archive of data extracted from multiple MISs and DSSs. Warehouse contents are not
562 updated, but rather replaced periodically with a new composite of extracted data. Aside from offering ad hoc query facilities, the PPS had a built-in portfolio of solvers. Execution of these solvers is called on-line analytical processing (OLAP).
C. Artificially Intelligent DSSs Artificially intelligent DSSs are systems that make use of computer-based mechanisms from the field of artificial intelligence (AI). Researchers in the AI field endeavor to make machines such as computers capable of displaying intelligent behavior, or behavior that would reasonably be regarded as intelligent if it were observed in humans. A cornerstone of intelligence is the ability to reason. This ability, in turn, represents a principal area of research in the AI field, concerned with the discovery of practical mechanisms that enable computers to solve problems by reasoning. An example is the inference engine, which lies at the core of the rule-management technique. Other AI advances finding their way into DSS implementations include natural language processing, machine learning, pattern synthesis, and pattern recognition. An example of the latter is data mining, which attempts to discover previously undetected patterns in large repositories of data (e.g., a data warehouse). Whereas OLAP is directed toward deriving knowledge that satisfies some goal, data mining is more exploratory in its attempt to discover knowledge not previously conceived.
D. Multiparticipant DSS Decision support systems that support multiple persons jointly involved in making a decision or a series of interrelated decisions are called multiparticipant DSSs (MDSSs). They are subject to all the generic features of DSS described previously. However, they have added features, making them suitable for supporting participants organized according to some structure of interrelated roles and operating according to some set of regulations. Multiparticipant DSS fall into two major categories: group decision support systems (GDSS) and organizational decision support systems (ODSS). A GDSS is devised to support situations involving little in the way of role specialization, communication restrictions, or formal authority differences among participants in a decision. An ODSS supports a decision situation in which participants play diverse roles, do not have completely open communication chan-
Decision Support Systems nels, and have different degrees of authority over the decisions. Cutting across the GDSS and ODSS categories is another class of systems known as negotiation support systems. A negotiation is an activity in which participants representing distinct (often conflicting) interests or viewpoints attempt to reach an agreement (i.e., joint decision) about some controversial issue. A negotiation support system (NSS) is intended to help the participants achieve an agreement.
1. The Generic MDSS Architecture As Fig. 2 shows, an MDSS has an LS, PPS, KS, and PS. Several kinds of users can interact with an MDSS: participants in the decision maker being supported, an optional facilitator who helps the participants make use of the MDSS, optional external knowledge sources that the MDSS monitors or interrogates in search of additional knowledge, and an administrator who is responsible for assuring that the system is properly developed and maintained. The MDSS itself is generally distributed across multiple computers linked into a network. That is, the PPS consists of software on multiple computers. The associated KS consists of a centralized knowledge storehouse and/or decentralized KS components affiliated with many computers. Public LS messages are the kind that any user is able to submit as a request to the MDSS. A private LS message is one that only a single, specific user knows how to submit to the MDSS. Semiprivate LS messages are those that can be submitted by a subset of users. When an MDSS does not provide a strictly public LS, some users are allowed to issue requests that are off limits to others or able to make requests in ways unknown to others. Similarly, a presentation system can have two kinds of messages: public and private. Public PS messages are those that do or can serve as responses to any user. A private PS message is a response that can go to only a single, specific user. A semiprivate PS message is a response available to some users. An MDSS that has only a public PS presents the same kind of appearances to all users. When private or semiprivate messages are permitted in a PS, some users are allowed to get knowledge that is off limits to others or able to see it presented in ways unavailable to others. A PS response can be triggered by a request from a user or by the PPS’s recognition that some situation exists (e.g., in the KS or in the environment). Figure 2 depicts a threefold classification of knowledge: knowledge about the system itself, knowledge about those with whom the system is related, and knowledge about the decision domain. Shown in the
Decision Support Systems
563 A Multiparticipant Decision Support System
Languege system
Problem processing system
Public Request
Knowledgeacquisition ability
Participant
Knowledge system
Infrastructure knowledge • Roles • Relationships • Regulations • Technology
System knowledge
For participant 1 Private
Facilitator
External knowledge sources
Public
Knowledgeselection/ derivation ability
Participant coordination ability
For participant 2 For participant 3
• • •
• • •
Knowledge for private use
For facilitator
Response
Administrator
Private
Knowledgepresentation ability
Shared
Knowledge for public use
Presentation system Domain Relational knowledge knowledge
Figure 2 A generic MDSS architecture.
upper right corner of Fig. 2, system knowledge includes knowledge about the particular infrastructure of which the MDSS is the technological part (i.e., knowledge about participant roles and relationships with which the MDSS must be concerned, plus knowledge of the regulations that it must follow, facilitate, or enforce), and knowledge about technical specifics of the computers involved and their network linkages. The remainder of the KS in Fig. 2 is partitioned into two vertical slices: domain knowledge shown on the left and relational knowledge shown on the right. Each of these categories can have public and private parts. Domain knowledge pertains to the subject matter about which decisions are to be made. It can involve
any mix of knowledge types (descriptive, procedural, reasoning). Some of this is public—available to be shared by all interested users. Other domain knowledge may be private, being accessible only to a particular individual user. Relational knowledge can also be public or private. It is concerned with characterizing the users of the MDSS, as distinct from roles they fill. Observe that the problem-processing abilities identified in Fig. 2 include all those shown in Fig. 1: knowledge acquisition, knowledge selection or deviation, and knowledge presentation. Depending on the MDSS implementation, each of these abilities can be exercised by a user doing some individual work and/or by all participants doing some collective work. As an example of
Decision Support Systems
564 the former, a participant may work to produce a forecast as the basis for an idea to be shared with other participants. As an example of the latter, participants may jointly request the MDSS to analyze an alternative with a solver or provide some expert advice. The problem processor for an MDSS can have a participant-coordination ability. The coordination ability embodies the technological support for an organizational infrastructure’s regulations. It helps regulate the filling of roles, behaviors of roles, and relationships between roles. It draws heavily on the KS’s system knowledge; if there is no or little knowledge, the coordination behaviors are programmed directly into the PPS. In the latter case, the PPS rigidly supports only one approach to coordination. In the former case, it begins to approach the ideal of a generalized problem processor—serving as a generalpurpose tool for building a wide variety of MDSSs involving diverse coordination mechanisms. With such a tool, development of each MDSS is accomplished by specifying the coordination mechanism (along with other knowledge) in the KS.
2. GDSS Group decision support systems have been the most extensively studied type of MDSS. They are a response to the perceived need for developing better ways to aid the group decision processes that are so commonplace today. The objectives of a GDSS are to reduce the losses that can result from working as a group, while keeping (or enhancing) the gains that group work can yield. There is a growing body of empirical research that has examined the effects of GDSS use on group performance. In broad terms, GDSS effects appear to depend both on situational factors and specific aspects of the technology itself. Three prominent situational factors are group size, task complexity, and task type. Research results consistently show that a GDSS increases group performance (versus not using a GDSS) more for larger groups than for smaller ones. Satisfaction of participants also tends to be greater as group size increases. There is evidence suggesting that GDSSs may be more appropriate for relatively complex decision tasks. For decision-related tasks of generating ideas, knowledge, alternatives, etc., GDSSs have been found to increase greatly participant performance and satisfaction. Researchers have also studied factors related to GDSS technology itself: anonymity, parallelism, structuring, and facilitation. The evidence is split on whether the participant anonymity allowed by a GDSS
yields better performance than lack of anonymity. It appears that the value of anonymity depends on the task being done and on the specific participants involved. Research has shown that enabling participants to work in parallel is a major benefit of GDSS technology. Investigation of structuring has found that group performance generally increases when a GDSS structures participant interactions. However, the specific structuring used must fit the situation; otherwise performance is impaired. Studies of using a facilitator versus not using one with a GDSS indicate that outcomes are better with a facilitator. Researchers have also found that repeated participant experiences with a GDSS tend to yield increased group performance.
3. ODSS The notion of an ODSS has been recognized for a long time in the DSS field. One early framework viewed an organizational decision maker as a knowledge processor having multiple human and multiple computer components, organized according to roles and relationships that divided their individual labors in alternative ways in the interest of solving a decision problem facing the organization. Each component (human or machine) is an intelligent processor capable of solving some class of problems either on its own or by coordinating the efforts of other components— passing messages to them and receiving messages from them. The key ideas in this early framework for ODSS are the notions of distributed problem solving by human and machine knowledge processors, communication among these problem solvers, and coordination of interrelated problem-solving efforts in the interest of solving an overall decision problem. Research has identified three main themes that ODSSs tend to have in common. First, an ODSS involves computer-based technologies and may involve communication technology as well. Second, an ODSS accommodates users who perform different organizational functions and who occupy different positions in the organization’s hierarchic levels. Third, an ODSS is primarily concerned with decisions that cut across organizational units or impact corporate issues. This third theme should not be regarded as a necessary condition for all ODSSs. However, it is particularly striking in the case of enterprise resources planning (ERP) systems. Although primarily touted as transaction handling and reporting systems, ERP systems have been shown by recent research to have substantial decision support benefits.
Decision Support Systems
565
V. CONCLUSION
ACKNOWLEDGMENT
Over the past half century, advances in computerbased technology have had an important impact on the way in which organizations operate. In the new millennium, continuing advances will revolutionize the way in which we think about organizations and work. The very nature of organizations is changing from an emphasis on working with materials to an emphasis on working with knowledge. Work with material goods will increasingly be seen as a secondary or almost incidental aspect of an organization’s mission. It will be little more than an automatic consequence of knowledge processing and resultant decisions. Furthermore, managing an organization’s human and financial resources will also become exercises in knowledge management. The knowledge-management efforts of tomorrow’s knowledge workers will be aided and supported by computers in ways described here, as well as in ways just beginning to be apparent. Computer coworkers will not only relieve us of the menial, routine, and repetitive, they will not only be reactive, responding to user’s explicit requests, but they will also actively recognize needs, meet some of those needs on their own, stimulate insights, offer advice, and facilitate knowledge flows. They will highly leverage an organization’s uniquely human skills, such as intuition, creative imagination, value judgment, and the cultivation of effective interpersonal relationships. The knowledge-management perspective of decision making and decision support is symptomatic of a major shift in the way in which organizations are viewed. This shift is still in a relatively early stage, but it will strongly affect the management of organizations, decision-manufacturing processes, decision support needs, and even the fabric of society. In a most fundamental sense, organizations will increasingly be regarded as joint human-computer knowledge-processing systems engaged in rich patterns of decision-making episodes. Human participants in these systems, from the most highly skilled to the least skilled positions, are knowledge workers whose decisional efforts will be supported by the increasingly powerful knowledge management capabilities of their computer-based counterparts: decisions support systems.
Portions of this article have been reproduced by permission from Decision Support Systems: A Knowledge-Based Approach by Holsapple and Whinston, 1996, West Publishing Company.
SEE ALSO THE FOLLOWING ARTICLES Data, Information, and Knowledge • Decision-Making Approaches • Decision Theory • Executive Information Systems • Expert System Construction • Group Support Systems • Hybrid Systems • Knowledge Acquisition • Knowledge Management • Strategic Planning
BIBLIOGRAPHY Blanning, R., et al. (1992). Model management systems. Information systems and decision processes (E. Stohr and S. Konsynski, eds.). Los Alamites, CA: IEEE Computer Society Press. Bonczek, R. H., Holsapple, C. W., and Whinston, A. B. (1981). Foundations of decision support systems. New York: Academic Press. Dos Santos, B., and Holsapple, C. (1989). A framework for designing adaptive DSS interfaces. Decision Support Systems, Vol. 5, No. 1. George, J. F. (1991). The conceptualization and development of organizational decision support systems. Journal of Management Information Systems, Vol. 8, No. 3. Holsapple, C. W. (1995). Knowledge management in decision making and decision support. Knowledge and Policy, Vol. 8, No. 1. Holsapple, C. W., Joshi, K. D., and Singh, M. (2000). Decision support applications in electronic commerce. Handbook of electronic commerce (M. Shaw, R. Blanning, T. Strader, and A. Whinston, eds.), Berlin: Springer-Verlag. Holsapple, C. W., and Whinston (1987). Knowledge-based organizations. The Information Society, Vol. 5, No. 2. Holsapple, C. W., and Whinston, A. B. (1996). Decision support systems: A knowledge-based approach. St. Paul, MN: West Publishing Company. Jacob, V., and Pirkul H. (1992). Organizational decision support systems. International Journal of Man-Machine Studies, Vol. 36, No. 12. Jessup, L.. and Valacich, J., eds. (1993). Group support systems: New perspectives. New York: Macmillan. Keen, P. G. W., and Scott Morton, M. S. (1978). Decision support systems: An organizational perspective. Reading, MA: Addison-Wesley. Nunamaker, J., Jr. et al. (1989). Experiences at IBM with group support systems. Decision Support Systems, Vol. 5, No. 2. Simon, H. A. (1960). The new science of management decision. New York: Harper & Row. Sprague, R. H., Jr., and Carlson, E. D. (1982). Building effective decision support systems. Englewood Cliffs, NJ: Prentice Hall.
Decision Theory Herbert A. Simon Carnegie Mellon University
I. APPROACHES TO DECISION MAKING II. ELEMENTS OF A THEORY OF CHOICE: THE ANATOMY OF DECISION MAKING III. NORMATIVE THEORIES OF DECISION
IV. V. VI. VII.
GLOSSARY
nition (focus of attention and information environment) and emotion (organizational loyalty.) rationality Choice of behavior that is appropriate to specified criteria (goals). It may be appropriate to the real-world situation (substantively rational) or to the situation as perceived by the actor (procedurally and boundedly rational). satisficing Choice of action that meets criteria judged by the actor to satisfy aspirations and to be attainable, but that does not claim to be the best of all objectively possible actions.
alternative generation Creating different potential courses of action among which a choice may be made. bounded rationality Behavior that is goal-oriented, but only within the limits of human knowledge and of ability to predict and compute the consequences of particular courses of action. game theory Theory of rational decisionmaking with multiple actors whose goals may coincide or conflict or both, and who have complete or incomplete information about each other’s actions and their consequences. heuristic search Search for good courses of action that examines only a few possible paths, choosing these on the basis of previous experience or by rules of thumb that have been effective in finding good (not necessarily optimal) outcomes. intuition in decision making Recognizing patterns that give memory access to information that is useful for searching selectively for good (satisficing) alternatives. multi-agent decision making Decision making that involves the interaction and collaboration of several (often many) people. optimizing Finding the objectively best alternative in a situation, according to some criterion. In complex real life situations, people can seldom find the optimum, but must satisfice. organizational identification Decision making aimed at furthering the organization’s goals, often against self-interest. Identification derives from both cog-
THE DECISIONS OF BOUNDEDLY RATIONAL ACTORS DECISION THEORIES INCORPORATING LEARNING DECISION MAKING IN ORGANIZATIONS CONCLUSION
DECISION THEORY is concerned with the ways in
which people choose courses of action, or ought (rationally) to choose them; that is, it describes and explains human decision-making processes, and it evaluates the rationality of the processes and the goodness of the outcomes. Rationality refers to the appropriateness of the actions chosen to the goals toward which they are directed. This article will examine both empirical and normative theories of decision.
I. APPROACHES TO DECISION MAKING “Decision” and “choice” may refer only to the selection of an action from a given set of alternatives, or may also embrace the discovery of the alternatives from which to choose. Observation of decision processes in daily life and in organizations reveals that in making decisions, especially the important ones, most of the thought and effort goes into finding and
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
567
Decision Theory
568 improving alternatives. This article considers how alternatives for decision are identified as well as how a choice is made among them. This article is especially concerned with how the limits of human knowledge, and the limits of human ability to search, calculate, and compare affect the decision-making process. That is, it deals with the bounded rationality of people amidst the complexity of real-life decisions. Section II discusses the components of the decision-making process and their relations. Section III reviews major theories of decision that are primarily normative and prescriptive. Section IV examines theories that mainly describe and explain choice empirically, with attention to the boundedness of human rationality. Section V discusses the role of learning processes in effective decision making. Section VI examines the decisions made in organizations. The motivations, knowledge, and problem formulations of decision makers in hierarchical organizations of the kinds that conduct most economic and governmental affairs in a modern industrial society are greatly influenced by their organizational environments. Almost all of the normative theories of Section III assume that alternatives are given. Theories treated in Section IV, which also encompass the activity of generating alternatives, are usually called “problem solving,” “design,” or even “discovery” theories. These labels disguise, but do not remove, their concern with decision making. Research under the explicit label of “decision theory,” has mostly been pursued in economics, statistics, operations research, and logic; research under the other labels, and with much more attention to the limits on human rationality, in cognitive psychology, marketing, engineering and architectural design, philosophy of science, and computer science. Until quite recently, there has been very little communication between these two groups of research communities and their literatures, an unfortunate chasm that this article seeks to bridge.
A. Two Descriptions of Choice The issues central to human decision making come sharply into focus when we contrast descriptions that approach it from opposite poles, one of them situated in behaviorist psychology, the other in economics and statistics—behavioral and global descriptions, respectively.
1. Behavioral Choice Theory Human behavior, as described by classical psychology, consists of responses to stimuli. A particular signal to an organism leads it to behave one way rather than another. Of course the state of the organism (e.g., whether it is hungry and what it knows about finding food), also affects the response to the stimulus (e.g., a sniff of prey). The response is triggered by the combined internal and external signals. Even an organism with multiple needs, but able only to take actions for satisfying one need at a time, can operate rationally to satisfy its many needs with little more machinery than that just sketched. It requires internal signals (thirst) that are transmitted when inventories of need-satisfying substances (water) are low, external signals that suggest directions of search for need-satisfying situations, and a mechanism that (1) sustains attention to sequences of acts aimed at satisfying an active need, and (2) tends to interrupt attention when another urgent need is signaled. A system with these characteristics is rational in choosing behaviors for surviving over indefinite periods of time. The scheme can be elaborated further, for example, to a traditional agricultural economy, where little deliberative choice has to be made even among sowing, cultivating, and harvesting because appropriate sensory stimuli (seasons, the stages of plant growth) focus attention at each time on the appropriate goal and the activities relevant to reaching it. In all of these scenarios, rationality means selecting actions that will meet currently signaled needs and signaling needs to replenish low inventories of the satisfying means. The system may incorporate various suboptimizations, but different wants are compared only when there is competition for attention. This kind of rationality, discussed in Section IV, is usually called procedural, and reflects the boundedness of human rationality.
2. Global Choice At the other extreme, neoclassical economics and mathematical statistics assume that choice considers much more than immediate internal and external stimuli and (formally) treat decision making as choice among alternative lives. As L. J. Savage, a major early contributor to this theory, put it: “. . . a person has only one decision to make in his whole life. He must decide how to live, and this he might in principle do once and for all.” This general notion is embedded in
Decision Theory a structure of choice consisting of a utility function that permits the expected utilities of all possible “lives” to be compared and the life with highest expected utility to be chosen. After making this claim of unbounded global rationality, Savage does hasten to replace it by the more modest claim that “. . . some of the individual decision situations into which actual people tend to subdivide the single grand decision do recapitulate in microcosm the mechanism of the idealized grand decision.” Of course, the reason why the global view of rationality requires all of life to be considered in each decision is that actions today affect the availability not only of all other present actions, but of future actions as well. And the reason why this global procedure is only followed piecemeal, if at all, lies in the boundedness of human rationality: the inability of human beings (with or without computers) to follow it. Theories of rational decision vary widely in the attention they pay to the interrelatedness of choices at each moment and over periods of time. To the extent that they concern themselves with the optimal outcomes of choice in the real world independently of realizable choice processes, they are said to describe substantive rationality. The theories of Section III are largely theories of substantive rationality. In actual application, they tend to be applied to relatively small slices of a life and of life’s decisions. To the extent that a substantive rationality model uses simplified submodels of the real world, and subdivides decisions in such a way as to cut realworld linkages, as it inevitably must, it provides only one approximation among many alternatives, becoming in its actual application another procedural theory.
II. ELEMENTS OF A THEORY OF CHOICE: THE ANATOMY OF DECISION MAKING To understand decision making is to understand the nature and structure of actions, how potential actions originate, and the criteria of choice among actions. The examples of the previous section illustrate how wide a range of possibilities this framework encompasses.
A. The Structure of Actions A decision theory may deal with single actions, sequences of actions, permitting periodic revision, as new information becomes available, and actions in environments that contain other goal-oriented agents
569 (people, animals, or robots). The theory becomes progressively more complex as it proceeds from each of these stages to the next.
1. Discrete Decision-Action Sequences The simplest actions, one-time actions, are carried out without significant revision during execution. Even such actions may be complex (e.g., constructing a building), and arriving at decisions about the particulars of the action (e.g., designing the building) may involve gradual discovery and evaluation of alternatives through a process of search.
2. Temporal Sequences of Actions Complex decisions require a whole structure of choices. Even in design that precedes action, the action is first defined in very general terms, then its details are gradually specified. As information revealed during the latter stages of design requires reconsideration of the earlier stages, there is no sharp line between generating alternatives and evaluating them. Thus, early stages in designing a bridge might opt for a suspension bridge, but subsequent discovery of unexpected difficulties in foundation conditions might cause a switch to a cantilever design. A somewhat different scenario arises where choosing a present action will have continuing effects on the options for a whole sequence of future periods, but where the contingent decisions beyond the first can be revised at each successive decision point before new action is taken. Games like chess have this characteristic, as do investment decisions, and decisions about the rates of manufacturing operations. Each choice of a current action creates a new situation with consequences for the availability and outcomes of future actions. A player in chess is only committed to a single move at a time, but looks ahead at alternative successive moves and the opponent’s possible replies in order to evaluate the consequences of the immediate move. There is no commitment to subsequent moves after the opponent’s reply. The decision maker forms a strategy in order to evaluate and choose an initial action, but the remaining steps of the strategy can be revised as the situation develops. In the bridge design case, design decisions are reversible until the design is complete, but not (except for minor, and usually costly, corrections) after construction begins. All uncertainty about consequences is supposed to be resolved before that time. In the
570 case of chess, future actions remain reversible until actual execution time. Where future actions are revisable, the passage of time will resolve some of the uncertainties of the present, and this new knowledge can be used to improve the later actions.
3. Decision Making with Multiple Agents When outcomes depend on the simultaneous actions of other agents (people, animals, or robots), these other agents become an integral part of the uncertainty with which each agent is faced. Even when all are trying to achieve the same goal, each must try to guess (or learn) what the others will do before deciding what action is best for them (e.g., must predict on which side of the street the others will drive). The problem of multiple independent agents with partially competing goals was first attacked systematically by Cournot in 1838, in his theory of two-firm oligopoly, and again, nearly a century later, by Chamberlin in 1933, then in a far more general way by von Neumann and Morgenstern in their 1944 treatise on game theory. The principal contribution of these monumental theorizing efforts was to demonstrate that multi-person situations attack the very foundations of decision theory, for they destroy the widely accepted definition of rationality: The optimal (goal-maximizing) choice for each person is no longer determinable without knowledge of the choices made by the others. A plethora of alternative definitions of the criteria for rational decision have been proposed by game theorists in the past half century, with little resolution as to which are to be preferred, either for normative or descriptive theory. This problem will receive continuing attention throughout this chapter.
B. The Criteria or Goals for Decision In decision theory, the discussion of choice criteria has largely focused on two polar alternatives: optimizing, that is, selecting the best alternative according to some criterion; and satisficing, that is, selecting an alternative (not necessarily unique) that meets some specified standards of adequacy. Especially in the context of multiple agents, there has also been some discussion of variants of optimizing: minimaxing utility, or minimaxing regret.
1. Optimizing Except in the multi-agent case, it seems at first blush to be hard to fault optimization as a criterion for ra-
Decision Theory tional choice. How can one argue for action B if action A is better in terms of the agreed-upon criterion? A first difficulty, as already observed, is that in most multi-agent choice environments there are as many distinct utility functions as agents, so that “best” is not defined, leaving no consensus about the criterion for choice. Even when the definition of “best” is clear, computational complexity may hide the best alternative from people and from even the most powerful computers. In the finite game of chess (a space of perhaps 1020 branches), it still remains computationally infeasible to insure the discovery of the optimal move. Yet chess is a game of perfect knowledge, where every aspect of the situation is “knowable” to both players. Games like bridge and poker, where each player has private information, add another important layer of complexity. In the case of a single agent, assuming a comprehensive utility function that embraces all dimensions of choice ignores the difficulty that people find, when placed in even moderately complex choice situations, in comparing utilities across dimensions—comparing apples with oranges. In the field of marketing, a substantial body of empirical evidence shows that consumers use a variety of choice procedures that avoid such comparisons. One common form of pseudo-optimization meets computational limits by abstracting from the real-world decision situation enough of its complexities so that the optimum for the simplified situation can be found. Of course this choice may or may not be close to the real-world optimum. For example, in making sequential decisions, dynamic programming becomes a computationally feasible method for choice if (and usually only if) radical simplifying assumptions can be made. Here “radical” could mean, for example, that the problem’s cost functions can be closely approximated by sums of quadratic terms. Dynamic programming with a quadratic cost function has been used successfully to reduce costs by smoothing factory operations and inventory accumulations. This procedure simplifies the computations required for optimization by many orders of magnitude, and finds satisfactory, if not optimal, actions—provided that the quadratic approximation is not too far from the actual cost function. Moreover, with quadratic functions, only mean values of future uncertain quantities (e.g., sales) need be predicted, for the higher moments of the probabilities are irrelevant under these circumstances. A closely related method is used in computer chess programs: a function is defined for evaluating the “goodness” of positions, approximating as closely as
Decision Theory possible the probability that they will lead to a won game. Then, the legal moves and the opponent’s legal replies are discovered by look-ahead search, as deep as the available time (regulated by tournament rules) permits. The estimated “goodness” is computed for each of the end branches of this tree of possible continuations; then each preceding move for the player is assigned the goodness of the best branch (maximum for the player), and each proceeding move for the opponent is assigned the goodness of the opponent’s best branch (minimum for the player), and so on, until a “best” choice is reached for the impending move. This procedure is called minimaxing. Pseudo-optimizing and minimaxing with approximate evaluation functions are not optimizations, but special cases of the satisficing methods that will be discussed below. If people had consistent and comprehensive utility functions and could actually compute the real-world optimum, pure reasoning, employing only knowledge of the utility function and of the external world, would allow people to achieve substantive rationality and economists to predict their behavior. There would be no need to study people’s psychological processes. Neoclassical economics has generally behaved as if this research program were realizable. As soon as we acknowledge that global optimization is wholly infeasible in a world where human knowledge and computational capabilities are limited, and where judgments of utility are altered by shifts in attention and other internal psychological changes, a host of alternative approximate decision methods present themselves, and predicting human behavior requires a knowledge of which of these approximate methods can be and actually are used by people to arrive at their decisions. Later, we will see examples of many decision situations where, for example, the appearance of modern computers and of efficient algorithms for exploiting their computing powers have drastically changed the methods used to reach decisions. In its dependence on human knowledge, an empirical decision theory necessarily changes with the movements of history.
2. Satisficing A large body of evidence shows that people rarely actually engage in optimization (except, as just mentioned, in pseudo-optimization after severe simplification of the full situation). Instead, they generate possible actions, or examine given ones until they find one that satisfices or that reaches an acceptable standard in terms of one or more criteria. Satisficing has a number of attractive characteristics for humans
571 whose knowledge and computational abilities are limited, and who demonstrably have difficulty in comparing actions that differ along several, and often among many, dimensions; for example, choosing among buying a Buick car, a sailing sloop, or an oil painting. 1. Satisficing does not require that all possible actions (usually hard to define in any operational way) be made available for comparison. 2. It does not require computing the consequences of actions with precision. 3. It does not require deriving a single utility ordering from multidimensional goals. 4. Unless the satisficing criteria are too strict, it enormously simplifies the computational task of finding an acceptable action. With respect to 1, it is sometimes proposed to retain optimization but to incorporate a search for alternatives by comparing the expected cost of generating the next alternative with its expected advantage over the best existing alternative, and halting the process when it does not pay to search further. This proposal requires estimating both the cost of generating alternatives and the expected value of the improvement, imposing a large additional computational burden (which usually also calls for unavailable information). With respect to 3, observation of people selecting meals from restaurant menus shows how difficult choice is in a multidimensional space even when the dimensions are few. Empirical research, by Amos Tversky, Maurice Allais, and many others, has demonstrated that people do not possess a stable utility function over all of their wants and preferences that would enable them to choose consistently among unlike commodities. Choice depends heavily on context that directs attention toward particular aspects of the choice situation and diverts it from other aspects. This contingency of choice on the focus of attention is illustrated by, but extends far beyond, the well-known phenomenon of impulse buying. Linear programming has been, for a half century, a widely applied computational tool for making complex decisions. One reason for its practical importance is that it permits satisficing under the guise of optimizing. In linear programming if there are n applicable criteria, then one (for example, cost) is selected for optimization, and satisfactory levels of the other n 1 are set as constraints on acceptable actions, thereby wholly avoiding the comparison of goals, but of course at the expense of ignoring the
Decision Theory
572 possibilities of substitutions with changes in price. In the restaurant example, this could lead one to order the cheapest meal meeting certain standards of taste and nutrition, or with a different choice of the dimension for optimization, to order the seafood meal under $25, not containing eel and with the least cholesterol. With respect to 4, comparing the task of finding the sharpest needle in a haystack with the task of finding a needle sharp enough to sew with illustrates how satisficing makes the cost of search independent of the size of the search space, although dependent on the rarity of satisficing solutions. In a world rich in infinite search spaces, this independence is essential if a search method is to be practicable. Few people examine all the world’s unmarried members of the opposite sex of appropriate age before choosing a spouse. Satisficing is closely related to the psychological concept of aspiration level. In many situations, people set a plurality of goals or aspirations on the several dimensions of choice (for example, housing, careers, clothing, etc.) and use these to define a satisfactory lifestyle along each dimension. It is observed that when they reach their aspiration on any dimension, the aspiration tends to rise somewhat, and when they continue to fail to reach it for some time, it tends to decline. Trade-offs can reallocate income among dimensions without requiring a unified utility function, making comparisons of level only on single dimensions. Such an apparatus for decision making amounts to a linear programming or integer programming scheme, with the constraints adjusting dynamically to what is attainable. No special dimension need be selected for maximization or minimization. The gradual adjustments of aspiration levels will bring about near uniqueness of available satisfactory alternatives, and will take advantage of improving environments and adapt to deteriorating ones. Which alternative will be selected is likely to be highly path dependent, varying with the history of changes in the environment and the decision maker’s responses to them.
C. Search Processes Where the alternatives are not given at the outset, they must be discovered or designed by a search process, often called a design, problem solving, or discovery process. When the space of alternatives is large, or when uncertain consequences extend into the indefinite future, search may have to be highly selective, examining only a small fraction of the possibilities.
The space of conceivable alternatives is seldom small enough to be searched exhaustively.
1. Selective Search Some puzzle-like problems whose search space is almost trivially small are, nevertheless, quite difficult for people. For example, in the Missionaries and Cannibals puzzle, which involves carrying a group of people across a river in a boat of limited size requiring multiple trips, and with the proviso that cannibals must never be allowed to outnumber missionaries, there are only about 20 legal problem states. Yet intelligent people, on their first encounter with the problem, often take half an hour to solve it. In this puzzle one or more essential moves are counterintuitive, appearing to retreat from the final goal, instead of approaching it. In the subject’s early attempts to solve the problem, these counterintuitive moves are usually not even considered. In most real-world decision situations, however, the number of potential alternatives is very large and often infinite. These call for generating, with moderate computation, a small menu that contains at least one satisfactory alternative. A large part of problemsolving theory is concerned with describing powerful search heuristics for selecting promising paths. The substitution of satisficing for optimizing is an important heuristic of this kind.
2. Search under Uncertainty In repeated decisions, the purpose of search goes beyond finding good initial alternatives and includes assessing future consequences of a choice. For example, in chess, it is trivial to generate the twenty or thirty legal moves that face a player at any moment; the formidable computational challenge is to evaluate, usually by generating a tree of subsequent replies and moves, which of these initial moves is better. As the outcomes of future or present moves are seldom known with certainty, computer chess programs, as we have seen, assign to each terminal position in their search a value estimated from the numbers, kinds, locations, and mobilities of the pieces of each player. The scheme can also allow some kind of preference for lower or higher risk due to uncertainty. This approximate value is then used, as previously described, to work backward, minimaxing to evaluate the current move. The fundamental differences between uncertainty about nature and uncertainty about the behavior of other actors have already been discussed. Where un-
Decision Theory certainty resides in nature, a common procedure is to maximize expected value, with perhaps some bias for risk. Minimaxing in the face of nature’s uncertainty is equivalent to regarding nature as malevolent and preparing for the worst. In the case of uncertainty about human actors, goals may range from the completely complementary to the wholly opposed, creating considerable difficulty in defining in any objective way how other actors will respond to a decision maker’s choice, hence making it hard to choose a unique criterion for rational choice.
573
III. NORMATIVE THEORIES OF DECISION Formal decision theory began mainly as a normative science, initially providing advice to gamblers about good strategies. The early exchanges on this topic, in 1654, between Blaise Pascal and Fermat, and Jakob Bernoulli’s treatise on The Art of Conjecture in 1713 are key events in these beginnings. Today, the theory exists in many versions; a number of the most important are described here.
A. Classical Decision Theory D. The Knowledge Base The decision process must use its knowledge about the external environment; but in virtually all realworld situations, this knowledge is only a crude approximation to reality. Knowledge may be obtained by sensing the environment directly, by consulting reference sources, or by evoking information previously stored in some memory. To use any of these sources it must be accessed: sensory information, by focusing attention on specific parts of the stimulus; external reference sources, with the aid of more or less elaborate datamining processes; and memory, using recognition processes to locate and bring to awareness relevant stored information. An operational theory of decision making must distinguish between the knowledge potentially available and the knowledge actually accessible during decision processes that use these modes of information retrieval. Research has shown that a fundamental basis of expertise is the extensive use of memory, accessed by recognition of familiar patterns in the material being attended. These recognitions evoke already stored information relating to the patterns. The expert solves by recognition, and thereby by use of previous knowledge and experience, many problems or problem components that less expert persons can only solve (if at all) by extensive and time-consuming search. Much, and perhaps most, of the sudden “aha’s,” “insight,” “intuition,” and “creativity” that are characteristic of expert behavior result from the recognition of familiar patterns. Both expert and novice behavior combine heuristic search with recognition of patterns. What mainly distinguishes expert from novice is the availability to the expert of a vastly bigger repertoire of patterns and associated information, making pattern recognition a much larger component, and search a smaller component of expert than of novice problem-solving behavior.
Classical decision theory has retained its early shape, but with continuing debate about the interpretation of probability and the means to be used for estimating it. The main line of development has led to the theory of maximizing expected utility. Both frequency theories of probability and subjective theories of probability as “degree of warranted belief” have been entertained throughout the history of decision theory. Probability takes care of natural contingencies that may alter the consequences of choice. Frequency interpretations can be used when empirical evidence is available for estimating the probabilities, especially when there is enough evidence so that the law of large numbers assures close approximation of observed frequencies to probabilities. As already noted, when uncertainties involve not only the natural environment but also the behavior of actors other than the decision maker, the situation becomes more complex, because all actors may be seeking to adjust their behaviors to the expected behaviors of their collaborators and competitors. In some special cases, however, like the economic theory of perfect competition, these complexities are absent. Suppose there exists a price for a commodity at which the total quantity that will be offered by profit maximizing sellers is equal to the total quantity that will be offered by utility maximizing buyers. A seller who supplies a larger or smaller quantity will lose profit and a buyer who buys more or less will lose utility, so no one has an incentive to alter his/her behavior, and the equilibrium of the market (under certain assumptions about the dynamical processes for reaching equilibrium) is stable. Of course this stability depends critically upon the assumption of perfect competition, and uniqueness is not guaranteed without additional assumptions, or anything but local optimality for each buyer and seller. The Nash equilibrium, the generalization of this result to any situation where no participant has an
574 incentive to change behavior as long as the others maintain theirs, describes one of the few cases where the definition of rationality under optimizing assumptions is not problematic in the presence of more than a single agent. An application outside economics is the traffic flow problem, where cars are proceeding independently through a network of highways. In this case, we can usually expect one or more equilibrium distributions of traffic among the different highways such that no single motorist could, on average, shorten trip time by changing route as long as the others maintained their patterns. There is no guarantee that alternative equilibria are equally efficient. A particular equilibrium pattern might be improved, for most or even all drivers, if they could all shift simultaneously to another pattern, but no driver has a motive to shift pattern without changes by the others; satisfaction of the conditions for a Nash equilibrium guarantees only a local optimum. In particular, it is not generally possible to reach global optima by market forces alone without auxiliary processes and institutions that coordinate individual behaviors to avoid inferior local optima.
B. Statistical Decision Theories; Acceptance of Hypotheses The standard procedures used to interpret statistical findings and to base decisions upon them are closely related to classical decision theory. Initially, the problem was conceived as that of computing a strength of justifiable belief—sometimes expressed as the probability of the truth of a hypothesis. Thus, if the question were whether a fertilizer made sufficient improvement in the yield of a crop to justify the cost of applying it, experimental data would be used to compute probability distributions of the yields with and without the fertilizer. If there was “reasonable certainty” that the yield with the fertilizer was enough greater than the yield without it to cover the extra cost, it would be rational to apply it. But what was reasonable certainty? An early answer, called a “test of statistical significance,” was: If the probability is “high” that the difference of yield is not enough to cover the cost, then don’t use the fertilizer. That simply raised the new question: What is high? In contemporary statistical practice, this question is often answered in purely conventional terms. By long custom, more than 1 chance in 20 (0.05), or more than 1 chance in a 100 (0.01) have come to be regarded as high. If the probability is greater than 0.01 or 0.05 that the fertilizer’s effects will not cover its
Decision Theory cost, then its effect is not “statistically significant,” and it should not be used. Although there is no rational basis for this rule, in some domains of science and application it is deeply entrenched. A step forward was taken by J. Neyman and E. S. Pearson, who pointed out that one could err either by using the fertilizer when it was uneconomical or by failing to use it when it more than covered its cost; consequently the decisionmaker should compare these “errors of type 1 and type 2.” For example, if, prior to the field experiment, the evidence indicated a 50–50 chance that the fertilizer would be costeffective, then (if one accepts Bayes’ Principle), the fertilizer should be used whenever the mean added crop yield is worth more than the cost of fertilizer. This is quite different from odds of 0.05 or 0.01. A next step forward, taken by Abraham Wald in 1950, was to argue that net costs should be assigned directly to errors of type 1 and type 2, that these costs should be weighted by the probabilities of committing the errors, and the alternative with the largest expected balance of benefits over cost should be chosen.
C. Linear and Integer Programming Linear and integer programming were mentioned in Section II as methods for dealing with multidimensional goals. They supplement a one-dimensional measure of utility with an unlimited number of linear constraints to represent the other dimensions. The alternative is selected that maximizes the chosen dimension of utility while satisfying all of the constraints. The optimal diet problem, described earlier, illustrated how this procedure avoids comparing the relative importance of incommensurable constraints, for all of them have to be satisfied. Linear programming (LP) has a further important property: the alternatives that satisfy all the constraints form a convex set. Given one point in the set, any point in it with a higher utility can be found without any backtracking. The popularity of LP was quickly established after 1951, when George Dantzig introduced a powerful search algorithm, the simplex method, that exploits this convexity property and generally finds the optimum with acceptable amounts of computation even for problems containing thousands or tens of thousands of constraints. As soon as powerful electronic computers became available, LP became a practical tool for making many complex managerial decisions: to take two classical examples, blending crude oils in the petroleum industry, and mixing commercial cattle feeds, which al-
Decision Theory locate billions of dollars of raw materials in our economy each year. In this case, bounded human rationality (aided by computers) can be reconciled with optimization. But this is only achieved by LP after the problem has been redefined as a one-variable maximization problem subject to linear constraints. Moreover, the procedure assumes that the model embodies all the real-world consequences of the decision it is making, which is usually far from the case. Here as elsewhere, the real-world situation must be drastically simplified to fit the formal tools. The procedure may satisfice in the real world by optimizing in an approximation to that world; it may even come close to optimizing, although we cannot usually verify this. The story of integer programming (IP) is similar. Many real-world problems require integer solutions (you cannot equip a factory with 3.62 machines, but must settle for 3 or 4.). A variant of LP, which searched for the optimal integral solution, was introduced by Ralph Gomory in 1958, and increasingly powerful methods for finding such solutions have steadily appeared. Even when IP cannot find an optimum, it can often reach satisficing solutions substantially better than those attainable with less powerful methods.
D. Dynamic Programming Similar lessons have been learned with dynamic programming, whose computations are usually impracticable for problems of any generality. Systems of dynamic equations are solvable in closed form only in quite special cases, notably for linear systems with constant coefficients. The problem becomes much worse when uncertainty is introduced, for then a multitude of possible outcomes must be considered, increasing exponentially with the distance into the future the system peers. Again, we have seen that computationally tractable approximations can be obtained by making the further simplifying assumption that the costs can be approximated by sums of quadratic terms. But this approximation yields a second important simplification: now only the expected values of the probabilities remain in the equations, so that only mean values over all outcomes have to be estimated for each period. These mean values are called certainty equivalents. The standard deviations and higher moments of the probability distributions now do not have to be estimated at all. There is no guarantee, of course, that particular real-world situations can be approximated adequately
575 in these ways, but these forms of approximation illustrate the advantages that can be gained in many situations by optimizing a gross approximation (i.e., satisficing) instead of attempting to optimize a realistic detailed model of the situation.
E. Game Theory Game theory in situations with multiple actors has already been discussed in Section II.A.3 and little needs to be added. The attempt to build a general normative theory of what constitutes rationality in the choice of moves in n person games has failed, not through any lack of talent among the researchers who tackled this problem, but from its fundamental intransigence. Progress can be made only by extending the theory to encompass both the degree of compatibility or incompatibility of the agents’ goals and their means of communication and coordination of information and behaviors. Here, useful definitions of rationality in terms of optimization are unlikely to be found, both because of goal conflict and competition in multi-agent situations and because of the complexity of these situations when communication and coordination mechanisms are brought into the picture. But the matter need not be described quite so pessimistically. There has recently been, especially in experimental game theory, increasing exploration of satisficing solutions for games that allow most or all agents to reach satisfactory levels of aspiration. There are innumerable situations in the world today where ethnic, religious, national, and other divisive loyalties have created “unsolvable” social problems, creating a great need for normative procedures for arriving at satisficing arrangements that will be acceptable (if sometimes only minimally) to those engaged in such struggles. In situations like these, we cannot afford to sacrifice the attainable satisfactory for the unattainable “best.”
F. The Future of Normative Theories For more than a century, the problem of defining rational decision making has been studied intensively with the help of powerful formal tools. That study arrived, first, at a body of theory focused on maximizing goal attainment, where goals are summarized in a hypothesized multidimensional utility function. At a second stage, uncertainty was brought into the picture, adding severely to the informational and computational demands upon the theory.
Decision Theory
576 Then, the theory was gradually extended to multiagent systems, not only introducing new computational problems, but also making it extremely difficult to reach consensus even on the definition of “rationality,” except in a few special cases. In this literature, there were only a few attempts to deal with the generation of the new alternatives. Alternative generation, if considered at all, was viewed as a search balancing costs of locating new alternatives against their expected contribution to improving decisions; but (1) this approach calls for estimating search costs and benefits, only adding to an already excessive computational burden, and (2) it does not address the processes of alternative generation or the nature of the alternatives discovered, but simply hypothesizes a cost function for the process. Although decision theorists are very far from solving these problems, they have learned much about the structure they are seeking. More and more, they are engaging in experimental and other behavioral studies to deepen their understanding of how people actually make decisions. As one consequence, the normative theory of decision is now making contact with positive, empirical study to an unprecedented degree. It has steadily become clearer that even a normative theory must give major attention to the cognitive abilities and limitations of the human agents who make the decisions, and to the capabilities and limitations of their computer aides. The next section of this chapter turns to the empirical theory of decision making, but also explores the implications of the empirical theory for constructing a more general and viable normative theory that can be applied to practical affairs. For, as decision making is a goal-oriented, hence normative, activity, there is no great distance between an empirically supported positive theory of human decision making and a normative theory that respects bounded rationality and is applicable to real situations.
IV. THE DECISIONS OF BOUNDEDLY RATIONAL ACTORS The developments discussed in this section have mostly taken place in cognitive psychology (problem solving and learning theory), engineering and architecture (design theory), evolutionary theory (natural selection), and history and philosophy of science (scientific discovery). Cognitive science has emphasized the empirical side, whereas design is basically a normative endeavor, but one that must be highly sensitive to implementability, hence to the bounded rationality of people and machines.
Although there is a considerable consensus about the main features of these forms of decision theory, they are best discussed in relation to their several disciplinary origins.
A. Problem Solving Problem solving, which provided a major focus for research during the first decades of artificial intelligence and the so-called “cognitive revolution,” remains a very active domain of study. Early research studied mainly puzzle-like problems where the solver did not require special information; emphasis was upon search strategies and the heuristics for achieving selectivity. The main exception was research on chess, which discovered the powerful role played by recognition and knowledge retrieval as an aid to expert search. When given a problem, human subjects typically formulate a problem space to characterize the kinds and numbers of objects involved, their properties and the relations among them, and the actions that can be taken to change one situation into another while searching for one that satisfies the goal conditions. Subjects usually try to measure the “closeness” of the current problem situation to the goal in order to select hill-climbing actions that move closer to the goal. A more selective and quite common search procedure seeks solutions by means-ends analysis. Subjects gradually learn that certain differences between current situation and goal situation can be removed by applying particular operators. They then can compare current situations with goal situations to discover the differences and apply the operators associated with these differences. In many problem domains, this means-ends process can solve problems by removing differences successively. With the achievement of a rather rich theory of problem solving in puzzle-like domains, research moved on to domains where solution depended heavily upon special domain knowledge, and where search heuristics were much more domain specific. One typical heuristic for the game of chess is the rule: “If there is an open file (sequence of empty squares crossing the board), consider placing a rook on it”—i.e., when you notice a particular pattern on the board give serious consideration to a particular move. It has been found that high-level expert chess players (masters and grandmasters) hold in memory a quarter million or more patterns of pieces (each containing from two or three to a dozen or more pieces) that are seen
Decision Theory in games, and that they associate with each pattern heuristic information, as in the example of the rook move, that guides the choice of moves. Even more recently, two other directions of research on problem solving have gained prominence: (1) the study of the acquisition of problem-solving skills; and (2) the extension of research to more loosely structured problem domains like architectural design, scientific discovery, and even drawing and painting. Problem-solving theory almost always takes account of uncertainty about the environment and about the consequences of actions, but less frequently (with the notable exception of research on chess and other games) uncertainty about the behavior of other agents.
B. Design Theory Designing systems requires innumerable decisions about system components and the relations among them. Since the initial recognition of the capabilities of computers to aid in design, and even to automate design processes, there has been a rapid development of design theory, focusing upon the generation and evaluation of alternatives. Today, the theory is pursued in computer science and cognitive science as well as in architecture and all of the engineering disciplines. A notable early set of programs were written in 1955–1956 in FORTRAN by engineers at Westinghouse to design motors, generators, and transformers automatically from customers’ specifications, producing designs that went directly to the manufacturing floor. The programs could design about 70% of the devices ordered that were not already shelf items. They were modeled on procedures already regularly used by the company’s engineers: find products already designed that are similar to the customer’s order; use known function and parameter tables to modify the design to fit the customer’s specifications, and apply simple optimization methods to improve device components. Today, we see increasing numbers of programs that assist and collaborate with human designers. Computer-aided design (CAD) and computer-aided manufacture (CAM) are progressing from an aid to draftsmen and schedulers to an increasingly automated component of the design process itself. A powerful genetic algorithm can design electrical circuits of high quality (e.g., low-pass filters with specific properties) with moderate computational effort. Design
577 programs are used routinely in the chemical industry to discover and to assist in discovering reaction paths for synthesizing industrial chemicals and biochemicals. These are just examples from a sizeable population of design programs in current use. These programs have required their inventors to develop the theory of the design process, which is no longer simply an intuitive “skill” to be taught at the engineering drawing board. The theory applies principles of pattern recognition and heuristic search like those that have emerged from the empirical study of human decision making and problem solving. To be sure, when computers participate in design, their capabilities for rapid large-scale computation and for rapid storage of huge amounts of information—both far beyond human capabilities—are exploited, altering the balance between “brute force” and selective search. But in view of the magnitude of typical practical design problems, major use must be made in these programs of the principles that guide human problem solving: in particular, evocation of knowledge by recognition, highly selective search, and satisficing.
C. Theory of Discovery A central theme in the research on decision-making and problem-solving processes has been to expand the domain of the investigation continuously to new areas of knowledge and skill, beginning with wellstructured and simple puzzle-like problems, soon moving to problems where domain knowledge is important, and gradually to problem areas that are poorly structured and where “intuition,” “insight,” and “creativity” come into play. One important area in this last category is scientific discovery. Scientific discovery employs a collection of diverse processes that include discovering lawful regularities in data, planning and designing experiments, discovering representations for data, and inventing scientific instruments. Each of these activities is itself a complex decision process. This section provides two illustrations, from a much larger number, of how evidence from the domain of scientific discovery contributes to the theories of procedural (satisficing) rationality. The first illustration is to discover a descriptive scientific law is to find pattern in a body of data; to discover an explanatory law is to show how pattern in data can be explained in terms of more detailed processes. Thus, Kepler’s Third Law, P aD3/2, where P is the period of revolution of a planet about
Decision Theory
578 the sun, and D is its mean distance from the sun, is a descriptive law. Newton later explained it by deducing it mathematically from the laws of gravitational attraction. A law-discovery process, BACON, which has shown great power in finding historically important descriptive laws, is a very simple mechanism for choosing, one-by-one, patterns that might fit the data, testing for the goodness of fit, and if the fit is unsatisfactory, using information about the discrepancy to choose a different function. When one is found that fits adequately, a law has been discovered. BACON uses the method of generate-and-test, with heuristics to guide generation of prospective laws. Its law generator is responsive to discrepancies between the hypotheses it produces and the data that guide it, using the feedback to shape the next alternative it generates. BACON frequently finds laws after generating only a few candidates, and has rediscovered more than a dozen historically important laws of physics and chemistry with no information other than the data to be explained, and with no changes in the program from one example to another. BACON satisfices, terminating activity upon finding a pattern that meets its criterion of accuracy. When the standard is too low, it can find illusory “laws,” as Kepler did in his first hypothesis about the revolution of the planets (P aD2), and which he corrected a decade later. (With a stricter criterion, BACON finds this same “law” as the second function it tries, but rejects it.) The second illustration is experimentation which, along with empirical observation, are major activities in science, generally regarded in the philosophy of science as means for testing hypothesis. In recent years, there has been growing attention to experiments and observation as the major means for generating new hypotheses. New and surprising phenomena, whatever motivates their discovery, then initiate efforts to explain them. The work of the great experimenter and theorist, Michael Faraday, provides striking examples of these processes. Faraday, in about 1821, conjectured, from a vague belief in symmetry between electricity and magnetism, that, as currents created magnetic fields, magnetic fields “should” create adjacent electrical currents. After ten years of intermittent unsuccessful experimenting, Faraday, in 1831, found a way to produce a brief transient current in a wire when a nearby magnet was activated. In several months of work, without any strong guidelines from theory, but with feedback from a long series of experiments, Faraday modified his apparatus, finally producing a continuous current, the basis for the modern electric motor.
A computer program, KEKADA, capable of closely simulating Faraday’s experimental strategy (as well as the very similar strategy of Hans Krebs in his discovery of the reaction path for the in vivo synthesis of urea), has demonstrated the mechanisms used in such searches. KEKADA’s model of experimentation begins with extensive knowledge of the phenomena of interest, accesses it by recognizing patterns of phenomena, and uses heuristics to select promising experimental arrangements for producing relevant and interesting phenomena. For example, when its expectations of experimental results based on previous knowledge and experience are violated, KEKADA designs new experiments to determine the scope of the surprising phenomenon, and then designs experiments to discover possible mechanisms to explain it.
V. DECISION THEORIES INCORPORATING LEARNING Effective problem solving in a knowledge-rich domain depends vitally on prior knowledge of that domain, which must have been “programmed in” or learned, and stored with indexing cues to make it accessible. An implementable theory of decision making must specify what knowledge is available to the system, its organization, and its mechanisms for knowledge acquisition. Some knowledge and skill used in making decisions is applicable in many domains, but much of it is specific to one or a few domains. Learning has been a central topic in psychology for a century and recently in decision theory. In systems aimed at either optimizing or satisficing, adaptive mechanisms may gradually improve outcomes by learning from experience. The improvement may move asymptotically toward an optimum or, more commonly, it may simply move toward higher satisficing levels. Adaptive mechanisms commonly undertake selective search for alternatives to the current strategy. Hence, the learning mechanisms are themselves decision mechanisms. We cannot review the vast literature on learning, but will mention some examples of learning mechanisms to give some appreciation of their variety and power and their importance to rational decision making. 1. Many problem-solving systems store solution paths they find, and then use them as components of paths for solving new problems. Thus, a theoremproving system may store theorems it has proved and use them in proving new theorems. 2. Learning can change one’s knowledge of the
Decision Theory problem space, hence of the best directions for search; it can also be used to generate new alternatives that enlarge the problem space. In fact any alternative generator (e.g., BACON, discussed earlier) is a learning mechanism. In recent years, there has arisen a strong interest in systems that create and evaluate new alternatives by using learning mechanisms that imitate evolution by natural selection. In genetic algorithms, systems change by mutation, and changes that improve performance are preferentially retained. A program was mentioned earlier that designs electronic circuits in this way by progressively assembling and modifying sets of elementary components. 3. Generalizing, wherever uncertainty is present, and where it is reasonable to assume some reliable, if stochastic, process underlying the uncertain events, and learning the relevant probability distributions can aid rational decision making. Research along this line divides into two streams; one, associated with classical decision theory, undertakes to describe optimal learning processes; the other, associated with psychological learning theory, undertakes to describe actual human learning processes. In a wide class of stochastic decision processes, current decisions are optimal relative to current inexact estimates of the relevant parameters that describe the uncertain events, and at the same time, current data are used continuously to revise these estimates and update the parameters. A Bayesian probability model, for example, may be used for this purpose. So-called “connectionist” and “neural net” models of cognitive processes make extensive use of probabilistic processes of these kinds for learning, seeking to maximize the probabilities of correct choices. 4. In addition to connectionist mechanisms, there are powerful non-probabilistic discrimination networks, like E. Feigenbaum’s EPAM, that learn. Given a set of stimulus features, they apply successive tests to find features that discriminate among classes, building up classificatory trees that categorize stimuli. 5. When decision processes take the form of systems of if-then rules (productions), new rules can be learned by examining worked-out examples, and stored as productions that can be applied to subsequent problems. A worked-out example shows how the problem situation is steadily altered
579 step by step by the application of operators, to remove differences between the current and the goal situations. The learning system detects the differences and the operators that remove them, then constructs a new production whose “if” part corresponds to the difference and whose “then” part corresponds to the operator: “If [difference] is present, then apply [operator].” 6. The ease or difficulty of learning may be strongly influenced by what the learner already knows, with existing knowledge sometimes discouraging new learning that moves in novel directions. Thus, it has been widely observed that radical technological changes in industry typically cause great difficulties in adaptation for existing organizations and give birth to new organizations to exploit them.
VI. DECISION MAKING IN ORGANIZATIONS The economy of Adam Smith’s time was predominantly an economy of markets; business organizations of more than a modest size played in it an almost insignificant role. The industrial revolution saw a great burgeoning of large-scale organizations, greatly increasing manufacturing and marketing efficiency. The earliest large body of theory of organizational decision making focused upon division of labor and coordination through communications and authority relations, comparing the efficiencies of various organizational arrangements. It was initially derived from the practices of the large governmental and military organizations that have existed for the past several thousand years, and has been further developed by the fields of public and business administration. Within economics, an important effort has been made in recent years to incorporate organizations in (boundedly) rational decision theories by asking when various classes of economic transactions will be carried on through markets, and when within organizations. The transaction costs associated with particular activities when conducted by markets and by organizations are compared to determine where the economic advantage between these two institutional forms lies, under the usual assumption that agents will pursue their self-interests (opportunism). The research in this area pays a good deal of attention to empirical verification of its theories, especially through case studies of the experiences of particular industries and historical analysis, and in this way avoids the a priorism of classical theories of optimization. The particular characteristics that distinguish organizations from other multi-agent structures are
Decision Theory
580 (1) their hierarchical organization of authority and membership through an employment relation that implies acceptance by employees of the organizational authority, and (2) their central concern with problems of coordination, where the correctness of a member’s decisions depends on the decisions being made by many others. Organization theory has to describe both the processes for dividing complex tasks into nearly independent subtasks and their corresponding organizational units, and the processes of coordinating units to exploit their remaining interdependencies, balancing the advantages of unit independence and coordination. In addition to transaction costs and the problems of opportunism, several other major features must be added to create a credible theory of organizational decision making. In particular, members of organizations spend their working lives in environments that direct their attention to organizational rather than personal concerns, and that select the kinds of information they receive and the beliefs they acquire. Moreover, members typically acquire loyalties to goals of the organizations or of subdivisions within it. These processes of selective learning and acquisition of organizational loyalties, which collectively create organizational identification, greatly moderate the problem of keeping employees’ decisions consistent with organizational goals. We can conceive, today, of a computerized organization whose programs would make this consistency automatic. Real organizations fall far short of this condition, and a decision theory for organizations must address the problem of maintaining loyalty to organizational goals and dealing with the shirking problem. The decision processes of organizations, with their central task of coordinating the interdependent decisions of their component divisions and departments, cannot be described simply as market transactions.
VII. CONCLUSION Decision making is a ubiquitous human cognitive function that is accomplished by recognizing familiar patterns and searching selectively through problem spaces (which themselves may need to be chosen or even discovered) to arrive at a “satisficing,” or “optimal” alternative. In the course of time, two rather independent sets of decision theories have emerged. Theories of substantive rationality purport to deal with the outcomes of decision in the (idealized) real-world environment. They have been little concerned with the psychology of choice, beyond postulating that the agent orders preferences consistently among all the (given) alternatives,
and they represent uncertainty by probability distributions. They seek to discover the decision that optimizes goal achievement in the external environment. Theories of procedural rationality purport to describe how decisions are actually made by human beings, including how the alternatives are found. They focus upon the processes that people actually use, taking account of the limits on human knowledge and computational power. It would be a mistake to draw an impassable line between these two bodies of theory; indeed, interaction between them accelerated rapidly in the closing years of the 20th century. First, concern with incomplete information called attention to the complications that uncertainty adds to the decision process. One response was to replace certain knowledge with probabilities and to optimize expected values. Another was to introduce specific assumptions of irrationality due to “ignorance.” A third was to replace optimization with an adaptive learning system. Second, growing interest in imperfect competition and rationality in multi-agent situations called attention to the severe difficulty of defining rationality in such situations, leading to experimental studies of how people actually behave in these circumstances and developing the psychologically motivated theory of satisficing as an alternative to optimizing. Third, a new interest developed in normative theories of decision that could be implemented on computers. In the world of design, optimizing means finding a mathematical optimum in a simplified operative model that can find satisficing solutions to real-world problems. The two streams of substantive and procedural theories are now converging, and the coming years will produce an improved understanding of human decision processes as well as a growing collection of partly or wholly automated man-machine systems for coping with the decisions that people, business organizations, and governments have to make.
SEE ALSO THE FOLLOWING ARTICLES Corporate Planning • Cybernetics • Decision-Making Approaches • Decision Support Systems • Game Theory • Information Measurement • Information Theory • Optimization Models • Systems Science • Uncertainty
BIBLIOGRAPHY Akin, O. (1986). The psychology of architectural design. London, England: Pion, Ltd.
Decision Theory Conlisk, J. (1996). Why bounded rationality, Journal of Economic Literature, 34, 669–700. Dixit, A., and Nalebuff, B. (1991). Thinking strategically: The competitive edge in business, politics, and everyday life. New York: Norton. Dym, C. L. (1994). Engineering design: A synthesis of views. Cambridge, UK: Cambridge University Press. Earl, P. E. (1983). The corporate imagination. Armonk, NY: M.E. Sharpe. Hogarth, R. M., and Reder, M. W., eds. (1986). Rational choice: The contrast between economics and psychology. Chicago, IL: University of Chicago Press. Kleindorfer, P. R., Kunreuther, H. C., and Schoemaker, P. J. H. (1993). Decision sciences: An integrative perspective. Cambridge, UK: Cambridge University Press. Langley, P., Simon, H. A., Bradshaw, G. L., and Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative process. Cambridge, MA: The MIT Press.
581 March, J. G., and Simon, H. A. (1993). Organizations, 2nd ed. Oxford, UK: Blackwell. Newell, A., and Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice Hall. Rumelhart, D. E., and McClelland, J. L. (1986). Parallel distributed processing (2 vols.). Cambridge, MA: The MIT Press. Simon, H. A. (1997). Administrative behavior, 4th ed. New York: Free Press, Macmillan. Simon, H. A. (1996). The sciences of the artificial, 3rd ed. Cambridge, MA: MIT Press. Smith, V. L. (1991) Papers in experimental economics. Cambridge, UK: Cambridge University Press. von Neumann, J., and Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press. Williamson, O. E. (1985). The economic institutions of captialism. New York: The Free Press.
Desktop Publishing Reza Azarmsa Loyola Marymount University
I. II. III. IV. V. VI. VII. VIII. IX.
AN OVERVIEW OF DESKTOP PUBLISHING UNDERSTANDING DESKTOP PUBLISHING THE PROCESS OF DESKTOP PUBLISHING MATERIALS PRODUCED BY DESKTOP PUBLISHING BENEFITS OF DESKTOP PUBLISHING EVOLUTION OF DESKTOP PUBLISHING TEX AND LATEX PORTABLE DOCUMENT FORMAT ADOBE TYPE MANAGER
GLOSSARY alignment The way text lines up on a page or in a column. ascender The portion of a lowercase letter that rises above the main body or x height as in a “b.” baseline The imaginary line on which a line of type rests. bullet A large, solid dot preceding text to add emphasis. Also known as a blob. color separation The division of a multicolored original into the primary process colors of yellow, magenta, cyan, and black. A separate film is made for each color and these are each printed in turn, thus building up a color picture. descender The portion of a letter that extends below the baseline, as in the letter y. dots per inch (dpi) The measure of resolution for a video monitor or printer. High-resolution printers are usually at least 1000 dpi. Laser printers typically have a resolution of 300 dpi; monitors are usually 72 dpi. drop cap A large initial letter at the beginning of the text that drops into the line or lines of text below. Encapsulated PostScript (EPS) A file format that enables you to print line art with smooth (rather than jagged) edges and to see and resize the graphic on screen as it will print. These files can be produced
X. COMPARISON OF CONVENTIONAL AND ELECTRONIC METHODS XI. PUBLISHING CATEGORIES XII. DESIGN ELEMENTS IN DESKTOP PUBLISHING XIII. DESIGN PRINCIPLES XIV. COLOR XV. WAYS TO IMPROVE GRAPHICS XVI. TYPOGRAPHY XVII. WAYS TO IMPROVE TEXT
in graphic programs that produce PostScript code (such as Illustrator and Free-Hand). The EPS images do not print well on non-PostScript printers. grid A nonprinting design consisting of intersecting horizontal and vertical lines that can be used for the overall layout of a publication or for positioning drawn graphics, depending on the software. Also called layout grid. initial cap Large, capital letters (often ornamental) that are found at the beginning of paragraphs or chapters. These date back to European manuscripts, where they were (and still are) considered works of art. Before printing presses replaced hand lettering, a few talented scribes drew the characters into spaces left in the manuscripts for that purpose. layout The arrangement of text and graphics on a page. leading (Pronounced “ledding.”) The distance in points from the baseline of one line of type to the baseline of the next. Also called line spacing or interline spacing. phototypesetting The method of setting type photographically. pica A typographic unit of measurement equal to one-sixth of an inch. Twelve points equal one pica. pixel (stands for picture element). Pixels are square dots that represent the smallest units displayed on a computer screen. Characters or graphics are created by turning pixels on or off.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
583
584 point A basic unit of typographic measurement. There are 72 points to an inch. PostScript Adobe Systems’ page description language. Programs such as FreeHand use PostScript to create complex pages including text and graphics onscreen. This language is then sent to the printer to produce high-quality printed text and graphics. resolution Sharpness of definition of a digitized image depending on the number of scan lines to the inch. reverse lines White rules on a contrasting black, shaded, or colored background. sans serif Typefaces that do not have the small “finishing” lines at the end of each stroke of a letter. serif Typefaces that have small “finishing” lines at the end of each stroke of a letter. template A dummy publication that acts as a model, providing the structure and general layout for another similar publication. tracking The overall letterspacing in the text. Tracking can also be used to tighten or loosen a block of type. Some programs have automatic tracking options that can add or remove small increments of space between the characters. True Type An outline font format developed by Apple Computer Corporation and adopted by Microsoft Corporation. These fonts can be used for both the screen display and printing, thereby eliminating the need to have font files for each typeface. typesetting Text produced by a laser printer or highquality machine known as a typesetter. weight The degree of boldness or thickness of a letter or font. widow A single line of text or word at the top of a page or column. WYSIWYG (what you see is what you get) A display mode that enables the user to see an image on screen of how the text and graphics in a publication will appear on the printed page. x-height The vertical height of a lowercase x and the height of the bodies of all lowercase letters. Also called mean line.
I. AN OVERVIEW OF DESKTOP PUBLISHING Desktop publishing (DTP) is a phenomenon of desktop computing. Very few developments in microcomputers have grabbed the attention of computer users quite like DTP. It began to be widespread when Apple Computer announced the LaserWriter laser printer in January 1985.
Desktop Publishing The term desktop publishing is quite recent. Paul Brainerd, “father” of the PageMaker software program, is credited with coining the phrase. Simply, DTP refers to the process of producing documents of typeset quality using equipment that can sit on a desktop. Users can combine text and illustrations (both graphics and photographs) in a document without the traditional paste-up procedure often used in print shops, at the cost for less than traditional offset printing. Desktop publishing systems use the principle known as WYSIWYG (pronounced wizzy-wig): What you see is what you get. As the user keys text and graphics into the computer, he or she can see exactly how the output will appear on the page before printing the final copy. Another kind of software that contributed importantly to the new publishing process was the page description language PostScript. The PostScript language, used by laser printers for page description, makes these printers versatile type compositors. The PostScript interpreter allows the page composition language that controls the printer to combine text, drawings, and photographs for printing. Desktop publishing systems, including a microcomputer, a laser printer, word processing software, a page layout program, and graphic arts software, make it easy to communicate graphically as well as textually. Research indicates that textual communication is processed primarily in the left lobe of the brain and graphic communication is processed primarily in the right lobe. The combination of textual and graphic elements produces a powerful communications structure that facilitates simultaneous multichannel processing by both the left and right lobes of the brain. As a result, communication techniques that combine both textual and graphic elements produce effective results. Research indicates that most people read typeset documents about 27% faster than they read nontypeset documents. They also tend to view typeset documents as more credible, more persuasive, and more professional than nontypeset documents. Desktop publishing is changing the way people produce documents. It enhances the individual’s ability to control all or most of the process of producing printed materials. This potential means that multitalented people with writing, illustrating, editing, and designing skills can do the work effectively. Users gain knowledge in a variety of disciplines— writing, typesetting, graphic design, printing, and computing. Controlling the publication process permits authors to become more involved with the visual
Desktop Publishing impact of their ideas on the reader. The relationship between form and content may take on new meaning when authors integrate ideas with words, type style, graphics, and the other features involved in the production of publications with a high level of visual impact. Creating visually informative text gives the author a chance to gain a heightened sense of categories, divisions, and orderly progression. Also affected are small businesses, designers, and large corporations that have brought the production process in-house.
II. UNDERSTANDING DESKTOP PUBLISHING DTP provides the capability to produce reader- or camera-ready originals without the need for complicated prepress operations. The tools of production generally consist of a computer system and relevant software applications and are sufficiently compact to fit on a desktop. They provide the user with total control over the content and form of the publication. In general, prerequisite skills include a working knowledge of word processing, a general understanding of graphics, and a sense of what constitutes pleasing and functional layout and design. The significance of DTP is threefold. First, it provides the user with the tools to express thoughts and ideas in text, graphic, and sometimes photographic (halftone) form. These elements are pieces of the publication production process that have traditionally required the services of trade professionals to assemble into final form. Second, the user has immediate feedback on the appearance of the final publication. The page relationships of the elements, their sizes, and their physical characteristics are immediately visible and infinitely editable. Third, the paper output from the system is in finished form, ready for distribution to small numbers of readers or for further reproduction and subsequent mass distribution.
III. THE PROCESS OF DESKTOP PUBLISHING The DTP production process involves electronically placing text and graphics on a page and reproducing the page for distribution. Produce a newsletter, a proposal, a resume, or a high school yearbook and you follow the same procedures as publishers of books and magazines, albeit on a less elaborate scale. No matter the kind of document produced or the technique used, publishing procedure follows five basic steps:
585 1. 2. 3. 4.
Plan the publication Prepare the text Prepare the charts and illustrations Make up the publication’s pages (prepare them for reproduction) 5. Reproduce the publication Desktop publishing computerizes and accelerates the publishing cycle. Often text is created in a software application and then edited. Charts, graphs, illustrations, rules (lines), boxes, or circles are drawn either with DTP built-in tools or by other application software. Text and charts are placed on the page and resized and reshaped according to the predetermined page specifications. When the page is complete, it may be produced on a computer printer and then can be reproduced on a photocopy machine or a commercial printing press. Figure 1 illustrates the process of desktop publishing.
IV. MATERIALS PRODUCED BY DESKTOP PUBLISHING Desktop publishing has enjoyed quick acceptance in most areas of document production within business, industry, government, and education. It is a step beyond word processing, providing users with the capability to produce better looking and more informationally potent documents. The following is a list of some of the materials that can be produced using desktop publishing. Books Calendars Charts Diplomas Flow charts Greeting cards Letterhead stationery Maps Name tags Notices Programs Resumes Slides Testing materials
Brochures Catalogs Cover sheets Directories Flyers Indexes Magazines Memos Newsletters Pamphlets Proposals Schedules Statements Title pages
Bulletins Certificates Curriculum materials Documentation Graphs Instructional materials Manuals Menus Newspapers Poetry Prospectuses Signs Tabloids Visual aids
V. BENEFITS OF DESKTOP PUBLISHING Desktop publishing offers many advantages over the traditional printing process. Even though the initial costs of obtaining high-end DTP hardware and software
Desktop Publishing
586
This is a meaningless sentance that is meant to enhance the realism oft his graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant toenhance the realism of this graphic.
This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic. This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant toenhance the realism of this graphic. This is a meaningless sentance that is meant to enhance the realism of this graphic.
This is a meaningless sentance that is meant to enhance the realism of this graphic. This is a meaningless sent ance that is meant to enhance the realism of this graph ic.This is a meaningless sentance that is meant to enhance the realism of this grahic.This is a meani ngless sentance that is meant toenhance the realism of this graphic.This is a meaningless sentance that is meant to enhance the realism of this graphic.
This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meaningless sentance that is meant toenhance the realism of this graphic.
This is a meaningless sentance that is meant to enhance the realism of this graphic.This is a meanigless sentance that is meant to enhance the realism of this graphic.
Write and edit using your word processing application
Word processing document
This is a paragraph in which it's meaning is simply to take up space. This is a paragraph in which it's meaning is simply to take up space.
This is a paragraph in which it's meaning is simply to take up space. This is a paragraph in which it's meaning is simply to take up space.
This is a paragraph This is a paragraph in which it's in which it's meaning is simply meaning is simply to take up space. to take up space.
Produce your publicaton
Import graphics with a scanner
This is a paragraph in which its meaning is simply to takke up space. This is more of the same.
This is a paragraph in which its meaning is simply to takke up space. This is more of the same.
Commercial printer
Create illustrations using your graphics application
or
This is a paragraph in which its meaning is simply to take up space. This is more of the same.
This is a paragraph in which its meaning is simply to take up space. This is more of the same.
Film
Portable Document File
Laser printer or Ink Jet printer
Figure 1 Process of DTP. The DTP program provides a tool with which users can assemble text and illustrations into a page format.
are high, the system actually often saves money. Usually, the sense of accomplishment that comes with the finished product and the flexibility regarding the printing process are enough to justify that time. Additional potential benefits to desktop publishing include: 1. Reduced typesetting time and costs 2. Better looking publications with improved readability, credibility, and prestige 3. More control over the publishing process 4. Increased convenience to designers 5. Increased cooperation among students 6. Possibility for electronic distribution Reduced typesetting time and costs are a very tangible benefit of DTP. Because word processing and page composition programs allow you to directly create
and lay out your documents, there is no need for a manual typesetting step. This eliminates typesetting costs and time. Typesetting can often take days and cost as much as $50 per page. For producing newsletters or brochures on a regular basis, typesetting costs and time can be a significant problem. Desktop publishing can result in better quality publications because documents can be composed quickly, allowing more time for revisions and additions. Many documents that would not merit the time and expense of printing by traditional methods can be produced in near typeset form with DTP equipment. If documents are created using a word processing program, it is a small additional step to compose and print them. More control is achieved over the publishing process because the results are immediately available.
Desktop Publishing This makes it much easier to try out ideas and see what various options look like. In addition, potential communication problems between the document designer and the printer are avoided because they often are the same person. The availability of graphics and page composition programs makes design work much easier and faster. For example, a skilled artist can create technical illustrations much more productively using a computer than on paper. This is because it is possible to build up libraries of “clip art” that can be reused. Features such as automatic rescaling, rotating, and overlaying make the drawing process faster. Similarly, page layout can be accomplished much easier than manual paste-up due to features such as automatic pagination, line-spacing control, text wraparound, automatic hyphenation, and margin settings. Finally, it is possible to send and exchange documents electronically, eliminating the time and costs of physical delivery. Suppose you are editing a newsletter that contains contributions from people all over the country. The articles can be transmitted to you from their personal computers to yours in a matter of minutes. You can then edit the articles, compose the newsletter, and transmit the complete newsletter to the printing site. Alternatively, the newsletter could be published electronically on the Internet, meaning that people read it on their own computers as soon as it is made available. Desktop publishing creates interesting written communications far beyond what can be created with an ordinary word processor. Some of the advantages of a DTP program are as follows: 1. Improves the appearance of documents 2. Reduces the time and number of steps required to print pages 3. Produces customized documents 4. Produces camera-ready documents in a short time 5. Reduces production costs 6. Offers more capabilities and flexibilities 7. Encourages creativity 8. Increases productivity 9. Makes the job more fun
VI. EVOLUTION OF DESKTOP PUBLISHING A review of typesetting and printing advances offers a good starting point for understanding how DTP builds on and streamlines traditional publishing methods.
587
A. Ancient Times Publishing coincided with the development of written language. In ancient times, papyrus was used to produce printed material. Later, around the time of Jesus, the Chinese used bamboo to create a writing surface. Before the invention of the printing press, publications were hand-copied on pieces of paper or other writing surfaces. The first form of printing involved carving a message on a piece of stone or wood, coating it with some type of ink, and pressing that “plate” against a piece of paper. This method was mostly used for reproducing artwork rather than for printing characters.
B. Movable Type In the fifteenth century, Johannes Gutenberg’s popularization of movable type became one of the milestones of European history—it reduced publishing time and expense so books became more plentiful. Until then, books had been handcopied or printed from blocks of wood on which raised characters (letters, numbers, and punctuation marks) and illustrations were laboriously carved for each page. Movable type at that time involved small wooden blocks with individual characters that could be rearranged and used repeatedly. This wood type was later replaced by metal type for greater durability.
C. Linotype The next important publishing innovation did not occur for four centuries. With the invention of the Linotype machine in the 1880s, publishers could set an entire line of type at once rather than only character by character. Although the process entailed melting lead, it reduced publishing costs and production time.
D. Offset Printing and Phototypesetting In the 1950s and 1960s, the publishing process transformed from the “hot type” process involving molten lead into a photographic “cold type” process that is faster and cleaner. At this time, several photographic printing processes were introduced that used a flat plate rather than the raised type of earlier methods. The most important new process was offset printing, which is widely used today. Offset printing was soon joined by computer-based phototypesetting, which
Desktop Publishing
588 projects images of columns of text onto photosensitive paper. The paper is developed, cut into sections, and pasted onto a sheet of heavy paper to produce a camera-ready page mechanical—a paste-up of the complete page for offset printing. Illustrations are sized photographically and added as needed. By the early 1980s, staff overhead presented a problem for a growing segment of the publishing industry: businesses, government offices, and educational institutions that had become major publishers of brochures, books, catalogs, and other phototypeset publications. The advent of microcomputers and software development introduced DTP in 1985. This new development gave businesses and educational institutions an affordable publishing system anyone could learn to use. For some, bringing publishing to the desktop is known as the second publishing revolution.
E. Desktop Publishing The term desktop publishing is credited to Paul Brainerd of Aldus Corporation, the father of PageMaker. Desktop publishing actually originated in 1973, when an experimental multifunction workstation was developed at the Xerox Palo Alto Research Center (PARC). The workstation was called Alto. Because its designers had graphics applications in mind, the Alto had a high-resolution bit-mapped display screen and a mouse pointing device. Altos were assigned to a few selected sites, including the White House, the House of Representatives, the Senate, a few universities (especially Stanford), and several undisclosed locations in the United States and Europe. This large base of users provided input as to what was right and what was wrong with the Alto and with the various software packages that had been developed for it. These packages included Bravo, Gypsy, Markup, Draw, SIL, and Laurel for text editing and formatting, creating pictures and diagrams, and providing electronic mail by means of a central file server. None of the Alto machines were ever commercially viable on cost grounds, and there was no affordable printer. Apple Computers, inspired by the PARC systems, designed and marketed the Lisa desktop microcomputer. This unit, released by Apple in 1983, had a high-resolution screen, a mouse, a WIMP (windows, icons, mice, and pointers) interface, and high-capacity hard and floppy drives. Lisa was discontinued in 1984 when Apple produced the first Macintosh, which led to a family of microcomputers that further extended the idea of an integrated suite of applications emulating the original PARC systems. The following year Apple
produced a laser printer with an output resolution of 300 dots per inch (dpi), which is generally considered to have started the DTP revolution. This printer had a plain-paper typesetter using a raster image processor (RIP), which gave a resolution approximating professionally published documents. At 300 dpi a letter-size page requires about 1MB of storage to be set up in the RIP for printing manipulations. The fact is that the new technology has spawned a host of desktop publishers who are publishing—writing, producing, and distributing—to a readership, and the publishing process has suddenly become democratized and affordable beyond the dreams of only a few years ago. Macintosh computers get credit for desktop publishing as we perceive it today. But today DTP is not limited to Macs. Windows-based computers are, in some respects, an ideal platform for the development of a DTP system because they are popular, powerful, and open. Using a Windows-based computer, the user must specify what kind of graphics board is available. Until recently, the interval between the development of new technologies and their introduction into printing has been quite considerable. In the case of DTP, where some of the essential tools, such as text and graphics processing, are already providing effective publishing experiences, the interval has been extremely short. Expectations have already been raised by the early release of packages that claim to be DTP systems because they provide simple manipulations of both text and graphics on a single page. A particular selling point of such systems has been their facility to design pages of newspapers and magazines for delivery on the Internet system.
F. PostScript PostScript is a computer graphics language that describes pages of text and graphics and reproduces the descriptions on a printer, typesetting machine, or other output device. PostScript was developed by Adobe Systems of Palo Alto, CA, for Apple’s LaserWriter printer. PostScript is a widely accepted page description language for the Macintosh and other computers, such as the IBM PC series and compatibles. PostScript describes pages by using mathematical formulas that represent shapes rather than by specifying individual pixels in a bit-mapped graphic image (the small dots that make up an image are known as picture elements, or pixels). PostScript translates images into the tiny dots that make up text, graphics, and halftones on printed pages. PostScript encodes typefaces into outlines that a laser printer reconstructs
Desktop Publishing at the proper size and then fills in to solidify the outline. This approach has two clear advantages: computer memory is conserved, and many different types of output devices can recreate the standard PostScript format at a wide range of resolutions. PostScript’s unique system of typeface definition creates characters in a wide variety of type styles and sizes. This approach can cut typesetting costs considerably, because typefaces are traditionally sold by the font (a set of characters of a single typeface, type style, and size). In other words, most typesetting machines require separate purchases for 9-point Times Roman, 12-point Times Roman, 12-point Times Roman italic, and so forth. On the other hand, a single purchase gets you the Times Roman typeface for a PostScript system in all the major type styles—plain text, italic, bold, underline, outline, shadow, small caps, superscript, and subscript—and in virtually any type size (the Apple LaserWriter, for example, can produce any size from 3 points up). You don’t pay separately for each variation. It is also important to keep in mind that PostScript doesn’t work only on printers. For example, you can write PostScript programs to create special graphics effects. Computers can use PostScript to describe text and graphics in a standardized common format that other computers or PostScript programmers can employ. Video displays can use PostScript to interpret page descriptions, convert the results into bit-mapped images, and display the images on a computer screen. PostScript provides a common language for various audiences and devices that precisely and concisely describes pages of information that contain text, graphics, or both (Fig. 2). PostScript has emerged as a standard among page description languages; many hardware and software companies now make PostScript-compatible products from page layout and graphics software for the Macintosh and IBM PC to page printers and typesetting machines.
VII. TEX AND LATEX TeX is a typesetting language developed by Donald E. Kunth and is designed to produce a large range of documents typeset to extremely high-quality standards. For various reasons (e.g., quality, portability, stability, and availability) TeX spread very rapidly and can now be best described as a worldwide de facto standard for high quality typesetting. Its use is particularly common in specialized areas, such as technical documents of various kinds, and for multilingual requirements. The TeX system is fully programmable.
589 c
c c
d
P c
c
d
c
c
c
c
c
m
c d
c
c
c
Figure 2 PostScript outline of the letter P.
LaTeX is a document preparation system for highquality typesetting. It is most often used for mediumto-large technical or scientific documents, but it can be used for almost any form of publishing. LaTeX is not a word processor. The user must write in the commands that control the layout and appearance of the text. LaTeX is based on Donald E. Kunth’s TeX typesetting language. It was first developed in 1985 by Leslie Lamport, and is now being maintained and developed by the LaTeX3 project. LaTeX is available for free by anonymous ftp.
VIII. PORTABLE DOCUMENT FORMAT Portable Document Format (PDF) is designed to be totally cross platform. The resulting PDF files can be viewed and printed from several different platforms (Mac or PC) with the page layout and typography of the original document intact using Acrobat Reader, as well as on other systems, including OS/2 and UNIX.
IX. ADOBE TYPE MANAGER Adobe Type Manager (ATM) is a system software component that automatically generates high-quality screen font bitmaps from PostScript or OpenType outline font data. With ATM, you can scale your fonts without the characters appearing jagged, and you can also enable “font smoothing,” which further improves the appearance of your fonts on-screen by using your computer
Desktop Publishing
590 monitor’s color palette to intelligently improve the rendering of characters. Also ATM allows you to print PostScript fonts on non-PostScript printers.
X. COMPARISON OF CONVENTIONAL AND ELECTRONIC METHODS In DTP process, every step in the design and production, from conception to master copy, to short-run production, can be accomplished electronically from a desktop computer. Taken individually, each step in the electronic production cycle is direct and welldefined. Text can be electronically formatted in typographic detail; graphics created, or scanned; and all these pieces can be assembled in DTP software.
C. Photographic Images Conventional method—Camera-generated artwork is low-cost, high-quality, easy to generate, and can incorporate many special effects. But it is difficult to alter and include into an electronic production process. Traditional photography is preferred in all situations except where moderate quality is acceptable or where there is access to very high quality computer equipment. Electronic method—Using scanned artwork you can alter an image and create certain special effects not offered by film. Also, digital cameras and royalty-free photographs are readily available. Disadvantages are training and highly varying quality, depending on equipment.
D. Layout, Typesetting, and Composition A. Text Preparation Conventional method—Creating handwritten copy or composing copy on a typewriter requires very little training or specialized computer knowledge, but there is little flexibility or editing capability, and it is difficult to incorporate into a design process. The typewriters of today come with microprocessors, create disk files, and include such automatic editing or authoring tools as spell checkers. They just don’t require the user to be computer literate. Electronic method—Text prepared in a word processor on a computer is easy to edit, makes use of automated text tools, and incorporates into a page composition scheme. Specialized computer knowledge of operating systems and file structures is required, however, the strong advantages word processors offer compel their use.
B. Painting or Drawing Conventional method—Paper and pen or pencil are the traditional media of artists. Advantages include low cost, ease of use, and many special effects that are difficult to achieve on a computer. Contrarily, artwork is not easily altered, is not very precise, and is hard to adapt to a production process. Electronic method—Computer paint and draw programs with digital file output are easy to alter, include many special effects, are precise, and adapt easily to an electronic production cycle. Disadvantages include cost of equipment and lengthy rendering times. Electronic art is more frequently becoming the choice for a production process, but not all artists are welcoming the advantages of electronic art forms.
Conventional method—Paste-up boards, if in use, result in a high-resolution master. However, they are time-consuming and expensive to produce. Electronic method—Page layout programs have the support of automation tools, offer device independence, and are low cost and of good quality. Disadvantages include complex design strategy and lengthy setup times for initial run. Desktop page layout is a better choice for composing a page, particularly for repetitive situations.
E. Printing Process Conventional method—Letterpress, offset, gravure, screen printing, and heat-transfer printing offer high quality and low cost per unit, give better color reproduction, and offer large page sizes. Disadvantages include high setup costs and time and low adaptability for mixed print runs. Electronic method—Personal laser printers, inkjet printers, and imagesetters are adaptable and have short setup times. Quality is variable but can be from good (laser printers or copiers) to excellent (for imagesetters).
XI. PUBLISHING CATEGORIES Publishing can be divided into four distinct areas: business publishing, periodical publishing, book publishing, and personal publishing. All four areas can benefit from DTP technology. Business publishing involves the printing and distribution of company materials. It includes business re-
Desktop Publishing ports, brochures, catalogs, product documentation, forms, letterheads, and corporate communications such as internal memos, advertising and promotional materials, and other items. Periodical publishing is the printing and distribution of materials on a regular basis. The materials are usually distributed to the same group of individuals or a similar group at an interval that can be daily, as in the case of newspapers; weekly, as in the case of newsletters and magazines; monthly, as in the case of magazines and journals; or even annually, as in the case with some journals and reviews. Periodical publishing also involves a coherent content and usually similar look from issue to issue. What distinguishes book publishing from other types of publishing is that it involves the printing and distribution of a single, cohesive work. Book publishing’s production time is usually longer than that of other types of publications. Personal publishing is the printing and distribution of materials by individuals. The materials can include poetry, freelance writing, essays, reports, humor, school papers, greeting cards, wedding and birth announcements; party invitations, personalized calendars, etc. Personal publishing also encompasses artistic uses of publishing such as the publication of poetry. Often personal publishing overlaps with another type of publishing, as in the case of people who publish a book on their own or who send a periodical such as a newsletter to friends, members of an organization, or people with a common interest. The production of essays, reports, and other such documents is another area of personal publishing that benefits from DTP technology. The option of publishing your own reports can be an excellent method for dissemination of scientific information, especially in the publish-orperish academic world. People in the personal category of publishing usually publish for their own benefit—financial gain and commercial success are not usually the prime motivations.
XII. DESIGN ELEMENTS IN DESKTOP PUBLISHING The success of many DTP materials can be attributed in large measure to the quality and effectiveness of the graphic design. These are achieved through organizing preliminary thoughts, planning, and applying the techniques outlined in this section. Many people who develop DTP materials have little or no professional art background. To avoid producing poor graphic materials, they can consider a
591 number of practical suggestions and guiding principles and then apply these as the need arises.
XIII. DESIGN PRINCIPLES Effective graphic design is based on knowing the building blocks of design and when to use them. In graphic design, everything is relative. Tools and techniques that work in one situation will not necessarily work in another. For example, it is impossible to define the exact layout for informal balance or the appropriate color for a newsletter or book cover. In creating a visual, important design considerations are best faced by starting with a preliminary sketch of the intended visual. This is referred to as a rough layout. At the rough layout stage, little attention is paid to rendering the artistic details, but careful consideration is given to effective design. Graphic design must be seen as a means of communication rather than mere decoration. There should be a logical reason for the way you employ every graphic tool. That tool should relate to the idea it expresses as well as to the environment in which the final product will appear. The following guidelines provide a framework for effective graphic design.
A. Planning Design must be planned with consideration for the intended audience. What is the basic message, and what format will be used (newspaper, bulletin, poster)? The more you define your project’s purpose and environment, the better you will do. You should also consider the size and dimensions of the publication in addition to the resources, skills, techniques, materials, and facilities that you can employ to produce the publication. It is useful to develop a grid or choose a predesigned one in template form. Shown in Fig. 3, the grid adds continuity to a publication by defining where margins, rules, columns, and other elements should be on each page. Publications may be designed by drawing thumbnail sketches. A thumbnail sketch is a rough sketch of a page design. This technique helps designers visualize the final product.
B. Organization The visual and verbal elements of the design should be arranged in a pattern that captures the viewer’s
Desktop Publishing
592
D. Balance A psychological sense of equilibrium or balance is achieved when the design elements in a display are equally distributed on each side of the axis—either horizontally, vertically, or both. There are two kinds of balance: formal and informal. When the design is repeated on both sides, the balance is formal or symmetrical. Informal balance is asymmetrical; the elements create an equilibrium without being alike. Informal is dynamic and attention-getting, unlike formal balance. It requires more imagination from the designer. Informal balance is usually regarded as the more interesting choice (Figs. 4 and 5).
E. Unity Unity is the relationship that exists among the elements of a visual. These elements function together to provide a single dominant visual to capture the reader’s attention. Unity can be achieved by overlapping ele-
Figure 3 A grid contributes to the continuity of a publication.
attention and directs it toward the relevant details. The manipulation of line and space are the designer’s primary tools. The arrangement should be clear enough to attract and focus attention quickly. A configuration pattern is usually found in effective design. It may be established by the directional cues that are developed to guide the viewer to see details in proper sequence. Certain patterns guide the viewer’s eye throughout the publication.
C. Simplicity Generally speaking, the fewer elements in the design, the more pleasing it is to the eye. Simplicity is the first rule of design. Drawings should be bold, simple, and contain only key details. Picture symbols should be outlined. Use simple, easy-to-read lettering systems and a minimum of different type styles in the same visual or series of visuals.
Figure 4 An example of formal balance. Formal balance, if used too much, becomes monotonous.
Desktop Publishing
593
This graphic is reprinted by premission of Dynamic Graphics, Inc.
Informal Balance Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.
Duis autem vel eum iriure doloin hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer.
W World hunger can be solved by better use of natural resources around the world
Possim Assum
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.
Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis
Figure 5 An example of informal balance. There are an infi-
Figure 6 Using white space for emphasis.
nite number of possible designs in informal composition.
ments, by using pointing devices such as arrows, and by employing visual tools (lines, shape, color, texture, and space). The single dominant visual helps organize the reader’s eye movement throughout the publication.
F. Emphasis Through the use of size, relationships, perspective, lines, and such visual tools as color and space, emphasis can be given to the most important elements in the publication (Fig. 6).
G. Contrast Any element that is different from those surrounding it will tend to stand out. The contrast or variation may be in terms of size, shape, color, or orientation. Contrast also refers to the relative amount of space devoted to text, artwork, and white space. You can create a highimpact publication with definite light and dark areas as well as lots of white space and illustrations.
XIV. COLOR Color is an important adjunct to most visuals. Careful use of color plays an important role: (1) to heighten the realism of the image by depicting its actual colors; (2) to point out similarities and differences and to highlight important emphasis; and (3) to create a particular emotional response. Contemporary research in the psychology of motivation reveals that different colors stimulate more than the visual sense. They have “taste”, for example, blue is sweet, orange is edible. They have “smell”; pink, lavender, yellow, and green smell best. Colors also have psychological connotations; dark red and brown evoke masculine images of earth, wood, and leather; gold, silver, and black suggest the prestige and status associated with wealth. When selecting colors for visual materials, attention should be given to their elements. The hue, which is the specific color (red, blue, etc.), should be considered. Another element is the value of the color, meaning how light or dark the color should appear in the visual with relation to other visual elements. The final component is the intensity or strength of the color for its impact or coordinated effect.
Desktop Publishing
594
A. Using Color Color can be the most important part of presentation design. Used carefully, it can enhance your message, provide richness and depth, and put a personal stamp on your work. The most obvious reason for using color is to show things as we see them in nature— green trees, yellow bananas, and red bricks, for example. Abstractions, such as statistics, ideas, and proposals have no intrinsic colors, but color can be used to represent symbolic associations. For instance, red can suggest warning, danger, or financial loss. The following points are some reasons for using color. 1. Advertisers try to establish associations of certain colors with consumer products, sports teams, etc. You can apply the same logic by developing identifying color schemes for your presentation. 2. Use color to distinguish between like and unlike elements. To clarify a flow chart, show file names in red and program names in blue. Color can distinguish elements or classes of information. 3. Indicate the importance or progression of data by increasing the value and saturation level. Light-todark or gray-to-bright sequences are excellent ways to represent levels of importance. Use a chromatic or a rainbow series to show a graduated sequence. 4. You can draw attention to elements in your presentation by choosing colors that are either brighter or lighter than the rest. Suppose you want to emphasize some of the words in a list. On a dark gray, blue, or black background, use light yellow or cream as your main text color. For contrast, select full yellow (brighter) or white (lighter) as the emphasis color. Do not try to emphasize too much at once with several different colors; they may cancel each other out. 5. Be sensitive to cultural biases as well. Some people cannot accept pink as a serious color. That does not mean that pinks are not valid presentation colors. Pink is actually a serviceable color, but be aware of personal prejudices. 6. The setting, the subject matter, and the audience’s previous experience can create certain expectations. Weigh all factors involved and decide whether audience expectation is reason enough to develop a color presentation. 7. Do not sacrifice readability for pleasing color. Legibility takes precedence over all else in presentation materials, and poor color choices can interfere. Colors do not perform the same way under all conditions: pure blue on dark
backgrounds is extremely hard to read, but blue on white is fine. This isn’t an opinion; it is a fact of human vision. Base your color choices on what works well for the audience rather than on personal taste.
B. The Color Wheel You are probably familiar with the color wheel showing the range of colors and their relationships to each other (Fig. 7). Primary and secondary colors alternate to form the circle. We are used to thinking of red, blue, and yellow as the primary colors and green, orange, and purple as the secondary colors. Complementary colors are ranged opposite each other on the color wheel. The three sets of complements each pair a primary with a secondary color. In graphics, there are two methods of mixing colors. The first mixes colors of projected light on your computer screen, for example. The second method mixes colors of pigment, like the ink on a printed page. The differences between the two methods make it very difficult to match what you see on the screen with “hard copy” printed on paper or film. Whether it is projected light or an opaque substance, a color can be described by its properties or qualities—its hue, value (or lightness), and saturation.
Red (Primary) Orange (Secondary) Purple (Secondary)
Y Yellow (Primary)
Blue (Primary) Green (Secondary)
Figure 7 Color wheel.
Desktop Publishing Each color within the spectrum or around the color wheel is a hue (e.g., red or green). Value refers to the shade or the degree to which a color approaches black or white. Saturation is the intensity of a given hue.
595 usual good judgment and go overboard. Color graduations can fall into this category.
E. Using Black and White C. Colors on Computer The color you see on your computer screen is created by mixing red, green, and blue (RGB) light. As more colors are added, the image approaches white; as colors are taken away, the image approaches black. The color receptors in our eyes are sensitive to the additive primaries; that is the way we perceive the light around us. Computer color is mixed in RGB; pigment on paper or film is mixed in CYM (cyan, yellow, and magenta). If you make a color picture on screen and output it to paper, your system must translate the colors from additive to subtractive mode in order to move from colors of light to pigment colors. The translation can cause minor (or major) discrepancies between screen colors and output colors. Be prepared for this shift, first by identifying the degree to which it occurs on your particular setup, and second by choosing flexible palettes that do not depend too much on precision in order to work well. If your output is color film, you will probably notice less shift than if you are working on paper.
Since you will not always be working in color, it is good to be skilled at telling your story in black and white, too. Aside from a few technical considerations, the principles are much the same, whether your palette is monochrome or multicolor. Contrast, figure/ground definition and readability still top the list of design goals. Black and white means shades of gray as well. The illusion of gray is created with various black-and-white patterns. These patterns range from finely spaced dots to representational repeats of brick walls, with an infinite number of intermediate choices (Fig. 8). In commercial printing, ink density for a tint color is controlled by sandwiching a screen between the negative and printing plates. This screen stops light where the printer does not want ink to print, and lets light through where ink is desired. When a density of 50% is desired, the screen allows light through only half the image area and stops it from hitting the other
10%
20%
30%
40%
50%
60%
70%
80%
D. Graduated Color Graduated or ramped color is a special effect that allows one color to dissolve into another with no discernible break. Software products differ in regard to graduated color features. Some let the user specify two end colors; others add a color specification for the middle. Users sometimes choose the number of steps to use in creating the blend. With other systems, this is handled automatically by the software. Color can ramp horizontally, vertically, or radially or in some other way, according to individual program features. Graduated color is an appealing, easy way to lend depth to a picture. People are attracted to the impression of process, transition, and motion associated with graduated color. Backgrounds that ramp from top to bottom suggest a horizon and the vault of the sky. Traditionally, colors have been tricky to blend by manual methods like airbrush, but computers are ideal for this purpose. Whenever you come upon an easy way to do what used to be difficult, it can seem like magic. In the excitement, you may forget your
Figure 8 An example of shades of gray.
Desktop Publishing
596 half. The screen acts as a light stencil. Screen densities are determined by the percent of coverage; resolution is determined by the number of dpi—typically 85–150. Although this technology comes from the commercial printer’s need to control ink coverage on paper, most desktop graphics programs incorporate some of these techniques for producing gray values on laser printer output.
F. Texture Texture is a visual element that may serve as a replacement for the sense of touch. Texture can be used to give emphasis, to create separation, or to enhance unity. The feel for design is perhaps caught and not taught. Examine some of the graphic materials that are part of your everyday world—magazine advertisements, outdoor billboards, television titles and commercials, etc. You can find many ideas for designing your own materials by studying the arrangement of elements in such commercial displays. Imitation and practice are the best ways to develop graphic design skills.
age. Some color works such as photographs and transparencies can be introduced if the appropriate equipment is available. Software packages exist that enable a user to become a cartoonist and illustrator as well. If you want to use cartoons and illustrations in your publication, you can use one of the available clip art packages. These contain sets of predrawn artwork, such as symbols, that can be included in your design. Tints (screens) and boxes are a form of graphics that are an easy way to introduce variety and interest to a page. A monochrome page can be easily enhanced with the use of a 10 or 20% gray tint. Screens can be used effectively on a page that has only text and can highlight logos, headlines, and subheads or be laid over simple graphics.
A. Get Attention with Unusual Elements Any unusual visual elements capture the reader’s attention. For example, exaggerated quotation marks draw attention to the text and an exaggerated drop cap draws the eye immediately (Fig. 9).
XV. WAYS TO IMPROVE GRAPHICS Graphic design is creative, subjective, and personal: its functions are to inform, influence, educate, persuade, and entertain. The widespread use of laser printers and easy-to-use graphics and layout software have opened up the graphic arts to thousands of enthusiasts who otherwise might not have explored these disciplines. The advent of microcomputers and their prevalent use in education have made DTP readily available. During its short history, DTP has changed the look of many of the things we read. Desktop publishing theoretically allows any user to become a typographer, paste-up artist, editor, writer, compositor, and graphic artist, once specialized skills practiced only by people with years of training and experience behind them. Of course, an untrained user cannot be expected to instantly acquire and use all these skills effectively. The availability of DTP must be accompanied by an attention to detail that leads to good design. Graphics that can be used in DTP include photographs, line art (illustrations such as cartoons), digitized logos, charts, and graphs produced with your software package, and your own illustrations and diagrams produced with a paint or draw software pack-
ecam suidt mande onatd stent spiri usore idpar thaec abies 750sa Imsep pretu tempu revol bileg rokam revoc tephe rosve etepe tenov sindu turqu brevt elliu repar tiuve tamia queso utage udulc vires humus fallo 775eu Anetn bisre freun carmi avire ingen umque miher muner veris adest duner veris adest iteru quevi escit billo isput tatqu aliqu diams bipos itopu 800ta Isant oscul bifid mquec cumen berra etmii pyren nsomn anoct reern oncit quqar anofe ventm hipec oramo uetfu orets nitus sacer tusag teliu ipsev 825vi Eonei elaur plica oscri eseli sipse enitu ammih mensl quidi aptat rinar uacae ierqu vagas ubesc rpore ibere perqu umbra perqu antra erorp netra 850at mihif napat ntint riora intui urque nimus otoqu cagat rolym oecfu iunto ulosa tarac ecame suidt mande onatd stent spiri usore idpar thaec abies 875sa Imsep pretu tempu revol bileg rokam revoc tephe rosve etepe tenov sindu turqu brevt elliu repar tiuve tamia queso utage udulc vires humus fallo 900eu Anetn bisre freun carmi avire ingen umque miher muner veris adest duner veris adest iteru quevi escit billo isput tatqu aliqu diams bipos itopu 925ta Isant oscul bifid mquec cumen berra etmii pyren nsomn anoct reern oncit quqar anofe ventm hipec oramo
B T
mande onatd stent spiri usore idpar thaec abies 1000a Imsep pretu tempu revol bileg rokam revoc tephe rosve etepe tenov sindu turqu brevt elliu repar tiuve tamia queso utage udulc vires humus fallo 1025u Anetn bisre freun carmi avire ingen umque miher muner veris adest duner veris adest iteru quevi escit billo isput tatqu aliqu diams bipos itopu 1050a Isant oscul bifid mquec cumen berra etmii pyren nsomn anoct reern oncit quqar anofe ventm hipec oramo
uetfu orets nitus sacer tusag teliu ipsev 1075i Eonei elaur plica oscri eseli sipse enitu ammih mensl quidi aptat rinar uacae ierqu vagas ubesc rpore ibere perqu umbra perqu antra erorp netra 1100a Isant oscul bifid mquec cumen berra.
Figure 9 An exaggerated drop cap draws attention.
Desktop Publishing
597
F. Use Bleed Techniques
Figure 10 Reverse type.
B. Vary Size and Use Special Effects Normal size may be less interesting to the reader. Altered size and special effects draw attention to the visuals.
C. Reverse Type Reverse type is usually white letters on a black, dark, or color background. This type style is very useful for headlines or short sentences (Fig. 10).
D. Shades of Gray Shades of gray can contribute a great deal to the success of a visual. The purpose of the shadow is to lend a three-dimensional effect to the flat page and to draw the reader’s attention to that area.
E. Contrast A highlight of color on the printed page draws the eye immediately. When you are limited to black and white, you must be resourceful in gaining and retaining your reader’s attention. Contrast is one of the most effective ways of doing so (Fig. 11).
“Bleeds” refer to extra areas of ink coverage that extend beyond the trim of the page. To bleed an element means to imprint it to the edge of the paper. You can bleed rules, borders, large letter forms, photos, large solid areas, and other elements. A bleed gives the page a feeling of expansiveness; the page seems larger than it actually is, unbounded by margins.
XVI. TYPOGRAPHY Type is nothing more than letters, and everybody has worked with letters most of his or her life. First we see typography in books and perhaps packaging. Then advertising, brochures, parts lists, and manuals became part of our typographic experience. Today, typographic communication in the form of newsletters, annuals, reports, and proposals has been added to our typographic exposure. As an educator, you probably have a sense for what looks good in type, what is easiest to read, and what is accomplished by the most effective graphic communication. You know when a B looks like a B and when it does not. You know that if a typewriter malfunctions and sets letters so close to each other that the letters begin to overlap, the end result will be difficult to read. Consciously or subconsciously, you have gained knowledge regarding typographic communication. Of course, being familiar with typographic communication from a reader’s viewpoint does not make one a typographer. It is, however, a very important step in the right direction.
A. Type Style Type style is the range of shapes and thicknesses within a typeface family. Type style makes it possible to
MadRiver Blues Figure 11 The high contrast players against a white background and the bold text create a good example of the use of contrast.
Desktop Publishing
598 create legible text in a quick and consistent manner and adds emphasis to the text as well. Letters may be changed and arranged to create an interesting and highly effective document. Using a variety of type styles often adds to the effectiveness of a publication. Words can be bold, italic, shadow, outline, underlined, etc. Type style can be modified to add contrast or emphasis to a publication.
B. Type Weights Weight refers to the thickness of the lines that make up letters and varies from light to heavy or even black (Fig. 12). Although a single typeface may exist in a variety of weights, no typeface currently offers every weight, and weights are not consistent from one typeface to another.
C. Typeface Since the invention of movable type nearly 500 years ago until recently, about 20 different typefaces were available. The advent of computers and DTP added more than thousands of different typefaces to the list, and it is still growing. A typeface is classified as serif or sans serif. The serif, or cross-line, added at the beginning and end of a stroke probably dates from early Rome (Fig. 13). The serif was either a way to get a clean cut at the end of a chiseled stroke or an imitation of brushwritten letter forms. The serif first appeared in ancient Roman monuments, where the letters were chiseled in stone. Later on, serifs appeared in the handwritten manuscripts prepared with quill pens by
Figure 12 Type weights.
Figure 13 Examples of serif typefaces.
medieval scribes. Serif, or Roman, typefaces are useful in text because the serifs help distinguish individual letters and provide continuity for the reader’s eye. In French sans means without. Sans serif typefaces have no serifs (Fig. 14). In sans serif typefaces the strokes of the letters are usually of nearly even thickness, which tends to give them a very austere, mechanical appearance. This even, mechanical look works against sans serif faces when they are used in text applications. Their uniform strokes tend to melt together before the human eye. Sans serif typefaces are most useful in large point sizes—in newspaper headlines, book titles, advertisements, and chapter headings—or in small bursts that are designed to contrast with surrounding serif type. Sans serif type is typically used for signs because of the simplicity of its letter forms and because it does not seem to taper at a distance. These same characteristics make sans serif type useful for highlighting amid serif type, especially in listing, directory, and catalog formats.
D. Leading Leading, or line spacing, is used to improve the appearance and readability of the publication. Leading (pronounced ledding) originally referred to the practice of placing small strips of lead between lines of type to make a page more readable. Although the job is no longer done with actual lead, the purpose and process are basically the same. Leading for lines of type specifies the distance from the baseline of one line to the baseline of the next line. It is usually at least the height of the font to prevent the bottoms of the letters on the top line from printing over the tops of the letters on the bottom line. For smaller sizes of type, line spacing is often set one or two points (approximately 20%) greater than the type size to increase legibility. For example, if the
Figure 14 Examples of sans serif typefaces.
Desktop Publishing type size is 12 points, the recommended leading is 14 points and usually is shown as 12/14 (12-point type with 14-point leading). Leading that is too spacious makes the reader’s eyes wander at the end of one line trying to find the beginning of the next line. Closed leading can be used as a design tool. Sometimes you might want to tighten leading so that descenders (the portion of a letter that extends below the baseline, as in the letter y) from one line of type touch the ascenders (the portion of a lowercase letter that rises above the main body, as in the letter b) from the line below.
E. Word Spacing (Trackings) Word spacing refers to the amount of space between words in a line of type. When words are closer together, more words can be included on each line. Word spacing increases or decreases the density of type. In certain situations, that also can reduce the number of hyphenated words. If you reduce word spacing too much, however, the text becomes difficult to read.
F. Paragraph Spacing Adding space between paragraphs enhances readability and makes each paragraph more like a selfcontained unit. Paragraph spacing can be adjusted in many software packages.
G. Text Alignment Text alignment determines how lines of text are placed between the right and left margins. Lines can be flush left (aligned on the left and ragged on the right), flush right (aligned on the right and ragged on the left), justified (aligned on both the left and the right), or centered. Often text is aligned flush left. This style creates an informal, contemporary, and open publication where each word is separated by equal space. Hyphenation is kept to a minimum. Western readers normally read from left to right; therefore, flush right/ragged left should be used cautiously. When flush right/ragged left is used, readers tend to slow their reading speed and spend more time identifying words by putting together individual letters. Justified type, or flush left/flush right, because the text is forced together to fit in a line, is considered more difficult to read. Also, justified text contains
599 more hyphenated words. Centered text is useful for short headlines. If a centered headline is more than three or four lines, however, readers have to search for the beginning of each line (Fig. 15). Text does not always have to be presented in either justified or ragged right blocks. Often, it is desirable to “wrap” copy around the contours of a graphic.
H. Indention An indent is the amount of space a given line or paragraph is inset from the normal margin of a paragraph or from the column guides. Software packages offer automatic and manual indent options. Indention enhances readability—especially when there is a large number of consecutive paragraphs on a page. The first line of a paragraph can have an indent or outdent. First-line indent is the most common. In this style, the first line of each paragraph is indented from the left margin (Fig. 16). Hanging indent, or outdenting the first line, is the opposite of the first-line indent. In this style, the first line in a paragraph is at the left margin and the rest of the lines in the paragraph are indented from the left margin. Hanging indents are useful in bibliographical references and directory listings—the telephone book and a yearbook are typical examples (Fig. 17).
I. Tabs Tabs are used to align text at a particular location on the page, allowing great flexibility in formatting text. Most DTP packages space words and letters proportionally. Therefore, it is recommended to use tabs
The butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy.
The butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy.
The butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy.
The butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy.
Figure 15 Flush left, flush right, justified, and centered text.
Desktop Publishing
600 The butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy. The valley, with its towering waterfall and flower-strewn cliffs, grew small beneath its wings. It felt the heavy load of the water’s spray, carried by the gay breeze, on its wing. It saw the world like a green and swelling wave. Figure 16 A first-line indent.
rather than spaces to align text; otherwise, the printed text will be uneven. Tabs are especially useful in creating tables and columns.
J. Line Length The length of a line of text, also known as the column width, can affect readability and the overall look of a publication. The line length can be as narrow as one letter or as wide as the boundaries of the paper. If a line of text is too long, a reader may become weary. If a line is too short, the text may give a choppy appearance and may be difficult to read. A good rule of thumb is to relate line length to type size: the bigger the type size, the longer the line length can be. An appropriate line length enhances the appearance of the publication and increases its readability. A good-sized line, depending upon the type of document, should be from 8 to 15 words long.
K. Hyphenation Hyphenation is the placing of a hyphen into a word so the word can be broken between two lines of text.
Hyphenation can improve the appearance of your publication by helping you avoid particularly unequal line lengths in flush left/ragged right text and unsightly word spacing in justified text. Hyphenation is an area of trade-offs. A break in the middle of a word slows down reading. Desktop publishing programs offer different hyphenation options. You can have the program hyphenate the entire publication automatically or you can turn off the automatic hyphenation and hyphenate your document manually.
L. Drop Capital An enlarged first character at the beginning of a paragraph that extends (drops) into the following lines of text is known as a drop capital. A drop capital breaks up large blocks of text. Typically, drop capitals are used to begin a chapter or section in a book or magazine. Drop capitals provide an important visual transition between the headline and body copy. Drop capitals can have the same or different character attributes, such as font, size, color, and style, as the rest of the text in the paragraph (Fig. 18).
M. Stick-Up Capital An enlarged initial letter that extends above the body text is called a stick-up capital. A stick-up capital is used as a graphic element to draw attention to the beginning of a story or chapter (Fig. 19).
N. Type Size The type size of a letter (also called height or point size) is measured by its vertical height, in points. One point is equal to one-twelfth of a pica, which in turn is almost exactly one-sixth of an inch. Therefore, there are 72 points per inch. Point is normally abbreviated “pt.” For example, 72-pt. type would measure 72 points from the top of its ascender to the bottom of its descender. This
Azarmsa, Reza. Desktop Publishing with QuarkXPress for Windows. Needham Height, MA: Allyn & Bacon, 1996. Azarmsa, Reza. Desktop Publishing with PageMaker for Windows. Needham Height, MA: Allyn & Bacon, 1996.
he butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy. The valley, with its towering waterfall and flower-strewn cliffs, f ffs, grew small beneath its wings.
Figure 17 Outdents.
Figure 18 An example of drop capital.
T
Desktop Publishing
601
T
Ascender
he butterfly, in the heart of the pure blue sky, heard the roar of the green valley’s waterfall and its hunger for sleep on an unknown wildflower was making its wings numb and heavy. The valley, with its towering waterfall and flower-strewn cliffs, fffs, grew small beneath its wings. Figure 19 An example of stick-up capital.
means that x heights, as well as ascenders and descenders, will vary in size for different fonts even though their point sizes are the same. Except for f, j, l, and t, the body of lowercase letters rises to the x height (Fig. 20).
O. Letterspacing Letterspacing refers to the amount of space between each letter of a line or block of type. Each letter consists not only of the letter itself but also a tiny amount of space before and after the letter called the left- and right-side bearing. In general, it is desirable to keep the letterspacing range to a minimum, as wide spacing fluctuations give an uneven look to the page. Also, larger typefaces call for tighter letterspacing. Some DTP packages allow you to control the letterspacing.
P. Kerning Kerning means the adjustment of space between pairs of characters to create visually even text that is easy to read. Most of the popular DTP programs are capable of adding or subtracting space between characters in minuscule increments. Certain pairs of letters appear to be separated by too much space. According to the Adobe type catalog Font & Function, the table below shows the pairs that have a hard time “getting along.” AO AT AV AW AY Ac Ad Ae Ao Au Av
Aw FA Ka Ke Ko LY Ly OV OW OX PA
TA Ta Te To Tr Tu Tv Tw Ty VA Va
Ve Vo Vu Vy WA Wa We Wo Wr Wu Wy
YA Ya Ye Yo Yu av aw ay ev ew
ex ey ov ow ox oy rw ry va vo
wa we wo xc xe xo ya yc yr yo
Cap height Baseline
Type
Waistline x-height Descender
Figure 20 Type size anatomy.
Too much word spacing breaks the line into separate elements, inhibiting reading. It also creates gaps within columns. When word spacing is greater than line spacing, reading is difficult because the eye tends to move from top to bottom instead of from left to right. Spacing between words also affects your type’s appearance. Kern word spaces in harmony with letterspaces to create evenness and balance within a line or block of type. Once you know the basic typographic rules and have had a little firsthand experience with preparing simple typographic communication, it becomes relatively easy to produce a variety of documents. A surprising amount of beautiful and efficient communication can be created with just your printer’s core fonts. More than 80% of all printed communication is set with just these two typefaces, and this includes lots of very creative and sophisticated graphic design work.
XVII. WAYS TO IMPROVE TEXT The primary function of words is to communicate instantly and effectively. A difference in type style not only helps to attract the reader’s eye but also helps to organize information in terms of importance. The following points may enhance the effectiveness of text.
A. Avoid Too Many Typefaces Desktop publishing manufacturers loaded their software with typefaces. In practice, this abundance can create an absolute disaster in terms of good typographic design, and overloads of typefaces should be avoided. As a general rule, two typefaces (a headline face and a body copy face) are adequate for most page designs. A variety of textures and styles can be conveyed by your choice of leading (the space between lines of type), weight (bold, light, or thin), size, and alteration of spacing between letters.
B. Avoid Fancy Typefaces Although you want distinctive type, do not choose a typeface that is too elaborate. Fancy typefaces can be very difficult to read.
Desktop Publishing
602
C. Avoid All Caps Uppercase type has its uses, but use of only caps slows down readers. Some typefaces cannot be set in all caps.
D. Avoid Large Blocks of Text in Italic or Bold Try not to set a great deal of text in italic or bold, particularly if you are using small type. Also, italic’s small size in all caps tends to fit badly and is thus more difficult to read. Use italic type sparingly for emphasis or when irony or humor is intended. It can also imply an interjected conversational tone or a quotation. The use of italic can preclude quotation marks. Italic type is often used to set captions and the titles of books and other creative works.
E. Avoid Excessive Underlining Underlining is a means of emphasis. However, the availability of italic and bold types makes most underlining unnecessary. More than a few underlined words causes visual clutter and confusion. Also, it takes more time for readers to separate the words from the horizontal lines.
F. Control Widows and Orphans A widow is a line, a word, or a syllable of text at the end of a page or column that is separate from the paragraph that it finishes, which is at the end of the previous page or column. An orphan is a line, a word, or a syllable of text at the bottom of a page or column that is separate from the paragraph it begins, which starts the next page or column. Widows and orphans can be controlled in several ways: edit the paragraph and omit unnecessary words; substitute a shorter word for a longer one and tighten the spacing of any loosely set line. (You can do this by reducing the tracking gradually.) Also, widows and orphans can be eliminated by rejuggling the page or the column lengths. Desktop publishing software packages offer widow and orphan control, and you can specify the number of lines of widows or orphans. In short, DTP has changed the way people produce documents. It has enhanced the individual’s ability to
control all or most of the process of producing printed materials. The temptation is to add many different graphic and typographic elements in one publication. Such a mixture, though, usually defeats the purpose of attracting and keeping the reader’s attention. By keeping your design simple and using graphics and type selectively, you will produce the most effective and pleasing publication.
SEE ALSO THE FOLLOWING ARTICLES Copyright Laws • Electronic Mail • End-User Computing Tools • Hyper-Media Databases • Internet Homepages • Multimedia • Productivity • Speech Recognition • Spreadsheets • Word Processing
BIBLIOGRAPHY Azarmsa, R. (1998). Desktop publishing for educators: Using Adobe PageMaker. Boston: Allyn & Bacon. Bauermeister, B. (1988). A manual of comparative typography: The PANOSE system. New York: Van Nostrand Reinhold. Beaumont, M. (1987). Type: Design, color, character & use. Cincinnati, OH: North Light Books. Brown, A. (1989). In print: Text and type in the age of desktop publishing. New York: Watson-Guptill. Collier, D., and Floyd, K. (1989). Layouts for desktop design. Cincinnati, OH: North Light Books. Durbin, H. C. (1995). Desktop publishing systems. Easton, PA: Durbin Associates. Green, C. (1997). Desk top publisher’s idea. New York: Random House. Heinich, R. (1985). Instructional media and the new technology of instruction, 2d ed. New York: John Wiley & Sons. Henry, J. H. (1996). Do’s & don’ts of desktop publishing design. Ann Arbor, MI: Promotional Perspectives. Maxymuk, J. (1996). Using desktop publishing to create newsletters, library guides, & web pages: A how-to-do-it manual for librarians. New York: Neal-Schuman Publishers. Parker, R. C. (1990). Looking good in print, 2nd ed. Chapel Hill, NC: Ventana Press. Seybold, J. W. (May 1987). The desktop-publishing phenomenon, Byte, 12:149–165. Shushan, R., Wright, D., and Lewis, L. (1996). Desktop Publishing by Design, 4th ed. Redmond, WA: Microsoft Press. Silver, G. A. (1997). Layout, design, & typography: For the desktop publisher. Encino, CA: Editorial Enterprises. Tilden, S. (1987). Harnessing desktop publishing: How to let the new technology help you do your job better. Pennington, NJ: Scott Tilden. Willams, J. B., and Murr, L. E. (Spring 1987). Desktop publishing: New right brain documents, Library Hi Tech, 7–13.
Developing Nations Elia Chepaitis Fairfield University
I. INTRODUCTION II. CRITERIA: IMBALANCE AND UNDERDEVELOPMENT III. THE CRITICALITY OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT IV. CHANGING MARKET CONDITIONS: AN ACUTE PROBLEM FOR IS DEVELOPMENT V. CHALLENGES IN SYSTEMS DESIGN: INFORMATION INFRASTRUCTURES
VI. THE REGULATORY ENVIRONMENT: POLITICAL AND CULTURAL CONTROL VII. THE DIGITAL DIVIDE: INTERNATIONAL PROJECTS AND ACTION PLANS VIII. ICTS: TOOLS FOR PROJECT COORDINATION AND RESEARCH DISSEMINATION IX. ETHICS: BEYOND PIRACY AND PRIVACY X. CONCLUSION
GLOSSARY
soft information infrastructure The political, economic, and sociocultural factors which affect information quality, management, and access.
developing economies Formerly referred to as lesserdeveloped countries, whose nations are characterized by a gross domestic product (GDP) under 2000, and high rates of illiteracy, infant mortality, and an inadequate material infrastructure. digital divide An intellectual construct used to describe two information environments, one for the information “haves” and the other for the information “have-nots.” emerging economies Economies in transition from communism to a market economy, generally used for nations in the former Soviet bloc. hard information infrastructure The sum of those material factors, such as telephone lines or computer networks, upon which computer and communications systems rely. ICTs (information and communication technologies) Information systems that are also used for communications. information ethics Moral codes of behavior characterized by integrity and fairness in the creation, dissemination, and ethical use of information. information poverty An endemic problem, characterized by a dearth of quality information. IS (information system) A system that creates, processes, stores, and outputs information.
I. INTRODUCTION Information systems (IS) in developing, emerging, and newly industrialized economies are vital for economic prosperity and stability, and their development is one of the greatest challenges of the early 21st century. Increasingly, as the computer evolves as an invaluable communication device, IS are referred to as information and communication systems (ICTs). ICTs deeply affect the tempo and direction of social and political change in developing and transition economies, especially where wealth and infrastructures are inadequate and unevenly distributed. Moreover, effective ICTs are necessary for survival in a global economy, a competitive precondition for strategic relationships. Before the 1990s, before the advent of the Internet and e-business, organizations in developing economies who lacked the capacity for electronic data interchange (EDI) often were bypassed by potential external partners in favor of connected enterprises. In the 2000s, at every level of economic development, connectivity and information management are more critical for prosperity and stability than ever.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
603
604 Technology is not neutral: the danger that ICTs will be more pernicious than benign in some developing areas has been researched extensively. However, in general, technologically and empirically, greater opportunity exists in the early 21st century for those previously excluded from the Information Age since the advent of the global Internet and the introduction of wireless communications.
II. CRITERIA: IMBALANCE AND UNDERDEVELOPMENT Developing economies, once referred to as lesserdeveloped economies (LDCs), are characterized by a poor infrastructure, inferior growth rates, an imbalanced economy, and extremely low personal incomes. These economies lack necessary skills and resources to escape a heavy reliance on production from agriculture or mineral resources. This imbalance is a legacy of both external aggression and domination, but also of internal factors: poor natural endowments, inadequate human resources, political instability, and social inequality. Developing economies tend to be concentrated in Africa, Latin America, and the Middle East, and also in some states and clients of the former Soviet Union. In the 1990s, a new classification, emerging economies, was coined to describe postcommunist countries that face an arduous transition to a market economy. Russia, Belarus, and Ukraine share endemic developmental obstacles with traditional developing economies. Since 1980, numerous nations such as India, Brazil, Estonia, South Korea, and Turkey dramatically improved their GDP and became known as newly industrialized economies (NICs). However, these NICs also feature imbalanced economies and are globally competitive only in selective sectors. Clusters of poverty within NICs and emerging economies in the former Soviet Union resemble poverty in developing economies. A discussion of information systems in underdeveloped areas often applies to clusters in all three types of economies; technological dualism is common within all three models. Conversely, within developing economies, oases of advanced ICTs resemble more prosperous nations. Indeed, the rural poor in India (a newly industrialized country), Honduras (a developing country), and Ukraine (an emerging economy) share numerous problems with underdeveloped areas worldwide who lag in the information revolution. Conversely, wired clusters of urban, relatively affluent and cosmopolitan populations flourish within all three eco-
Developing Nations nomic models; these information-rich oases access global information and communications and resemble advanced economies. Current data do not reflect the maldistribution of GDP, telephone lines, and personal computers within economies. However, World Bank data illustrate the huge differences in information systems availability between advanced economies (the United States) on the one hand, and NICs, merging economies, and developing economies on the other hand (Table I). The correlation among GDP per capita, the number of telephone lines, and personal computer ownership is evident.
III. THE CRITICALITY OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT At every economic level, but especially for developing economies, information and communications are critical, both for internal economic development and also for global competitiveness. Internally, information can supplement and expand scarce resources such as capital, know-how, labor, inventory, basic materials, and management skills. Also, unlike these resources, information is not depleted when it is utilized but increases in both quality and quantity. Moreover, information can supplement factors of production simultaneously, not one at a time, with significant synergies emerging. For example, if information resources are used to minimize inventory, less capital, labor, and warehouses are required; also, more current and complete knowledge of consumer preferences, supply, and distribution is acquired. In resource-poor environments, such optimization is critical not only for the expansion of wealth, but also for the formulation of successful business strategies in volatile and turbulent environments. More dramatically, computers have emerged as essential communication devices in the past decade, compensating for egregious deficits in traditional telecommunications infrastructures. The maturation of wireless communications holds enormous promise for impoverished areas that have been unable to afford land-based telephone systems. Extended information and communication systems facilitate joint ventures with partners in advanced economies, improve customer relations internally, and enable developing economies to participate in global markets with flexibility and speed. They also can facilitate inexpensive village-to-village communication and expand scarce services in fields such as medicine, banking, insurance, and education.
Developing Nations
605 Table I The Correlation among GDP/Capita, Telephone Lines, and Personal Computers: A Wide Range of IS Development from Tanzania to the United Statesa Country
GDP/capita
Telephone lines
Personal computers
Algeria
1540
51.9
5.8
Bolivia
990
61.7
12.3
Columbia
2170
160.0
33.7
Egypt
1380
75.0
12.0
Guatemala
1680
55.0
9.9
Indonesia
580
29.0
9.1
Jordan
1620
61.8
13.9
Morocco
1190
52.6
10.8
Nigeria
250
3.9
6.4
Pakistan
510
18.4
4.3
Romania
1510
167.0
26.8
Russian Federation
1750
210.0
37.4
South Africa
3160
125.0
54.7
260
4.5
2.4
Tanzania Uzbekistan United States
640
66.0
N/A
31920
664.0
510.5
a
World Bank Data Profiles of GDP, Telephone Lines, and Personal Computers in 1999. “Indicators Database,” World Bank Group, July 2001.
However, the gap between wired and nonwired communities threatens to become increasingly significant as ICTs mature and improve. The movements toward increased bandwidth and toward the fusion of information technologies in advanced economies may deepen the digital divide, the gap between the information “haves” and “have-nots.” Although advances such as wireless communications convey “democratic” and broad benefits, the rate of adoption is sporadic and sluggish in developing areas and is not keeping pace with changes in global markets.
Indeed, in 2002, although IS problems have changed, the gap in ICT development relative to advanced economies resembles the gap in computerization 20 years earlier (Table II).
IV. CHANGING MARKET CONDITIONS: AN ACUTE PROBLEM FOR IS DEVELOPMENT Developing countries face a dilemma: an adequate telecommunications infrastructure is becoming
Table II Changing Problems in IS in Developing Economies 1980
2002
High cost of hardware
Political and cultural resistance
High level of education
Lack of external partnerships
Lack of effective software
Information quality
Development and use
Internet access/content regulations
Poor electrical/telecommunications/ education infrastructure
Financial market imperfections: lack of credit; fraud; mistrust
Lack of EDI capability commercial law
Inadequate property
Ethics and corruption
Language Urban–rural split
Developing Nations
606 increasingly necessary to attract global business partners (including customers); yet capital and partners are necessary to construct that infrastructure. No longer can developing economies rely on cheap labor resources to attract direct and indirect investment, or to ensure price competitive exports. Labor costs as a percentage of the costs of production have been steadily declining as a result of a confluence of factors, and typically represent 5 to 15% of the costs of production, depending on the product. (High tech products typically are tilted toward 5%, the lower end of the range.) Labor markets are shifting dramatically and nations that depend on low wage rates for competitive advantage are in peril, in large part as a legacy of the total quality movement (TQM). TQM emphasized superior design, continuous product improvement, capital-intensive equipment and processes, and less direct labor with less inspection. Enterprises are shifting strategies away from island hopping across low-wage markets with shoddy workmanship, unreliable infrastructures, and low quality cultures toward high skilled labor and quality cultures. To utilize ICTs for economic competitiveness, developing economies need capital, partners, and information resources as soon as possible. The movement toward superior production technology and processes is inexorable; ICTs are vital for quality improvements, integrated processes, a skilled workforce, and cost containment. Producers in developing economies find that affluent clusters in the domestic market are neither loyal, local, nor captive. The middle to upper class urban householders are not only on the privileged side of the digital divide, but also positioned to take advantage of e-commerce, improved and affordable transportation, and global services.
V. CHALLENGES IN SYSTEMS DESIGN: INFORMATION INFRASTRUCTURES Systems designers must consider that information systems in developing and emerging economies are affected by many of the same macro- and microeconomic forces which shape the economy as a whole: capital shortages, a lack of skilled labor, inefficient distribution channels, poor resource endowments, and remote locations or inhospitable terrain or climate. Information technology in many of these areas is more important for communications than for computation, but salient barriers to cost-effective information ICTs in developing economies are confined to the telecommunications infrastructure. In addition to the technology infrastructure, the local information infrastructure affects ICT effectiveness. The informa-
tion infrastructure is the context within which information and communications are managed. Information systems reside and develop in discrete environments which, although affected by global markets, feature distinctive, nonglobal components: spatial, material, cultural, institutional, organizational, political, and economic elements. These components must be considered integral and organic to any discussion of international information infrastructures. In both developed and emerging economies, ICT effectiveness is affected by two information infrastructures: hard and soft. The hard information infrastructure includes factors such as connectivity, electrical supply, available bandwidth, wireless resources, and the availability of IT vendors, maintenance, and supplies (Table III). The collection and manipulation of data, as well as the dissemination of information and knowledge, also depend on a soft information infrastructure: the legal, political, economic, and social information environment that impacts ICTs (Table IV). This “soft” information infrastructure is largely composed of nontechnical elements: spatial, cultural, organizational, political, and economic variables which are highly germane to the tempo and direction of economic development. For example, information quality may be variable in emerging economies. Information quality is affected by cultural, economic, and political variables such as social trust, informal relationships, and commercial law, respectively (Tables V–VII). These variables may be intrinsic to a gray economy, may compensate for unreliable distribution or credit mechanisms, or may simply be traditional but atavistic. Social characteristics that affect an information environment include habits of trust and integrity, language and alphabets, occupational mobility, educational patterns, ethics and equity, and ways of knowing (Table VII). Across information cultures, for examples, trust has a seminal impact not only on information quality and access, but also on a range of factors—from the viability of e-business and web page
Table III The Hard Information Infrastructure Communications Wireless Ground wire Platforms: hardware and software Electricity and other utilities Transportation
Developing Nations
607
Table IV The Soft Infrastructure Political and legal variables Economic and financial variables Sociocultural variables
design, to the degree of covertness in security. Moreover, trust has an egregious impact on information ethics in cultures with well-established habits of secrecy and deception, and thus can cripple integrated and extended ICTs. Within each list of soft factors, some potential impacts are presented in Tables V–VII. These environmental elements may not only be ubiquitous, but also be dynamic and critical in ICT development: they impact the accepted division of labor, wealth creation, and market development significantly. Also, political and legal factors are integral to the information infrastructure: authority and legitimacy, the definition of rights and obligations, and citizen/subject compliance affect information quality, integrity, and access.
In the 21st century, information system developers face a dynamic set of challenges: to understand, exploit, promote, and work around elements of hard and soft information infrastructures to optimize economic development. Since the information infrastructure is not only a skeletal series of linkages but also an organic membrane in which communications pass and evolve, the internal information infrastructure itself changes and resists change. Especially in emerging economies, the soft and organic information infrastructure is the underbelly of successful ICT development. One of the most significant issues in ICT development in developing economies is the regulation of the Internet.
VI. THE REGULATORY ENVIRONMENT: POLITICAL AND CULTURAL CONTROL Government priorities shifted in past decades as information systems became affordable and powerful, and as local computer retailers and maintenance industries proliferated. Also, popular culture and global
Table V The Soft Information Infrastructure Social and Cultural Variables: Some Possible Impacts (in italics) Information poverty: data integrity and completeness Soft information: reticence, oral traditions, fraud, lack of knowledge worker skills Existing elites: proprietary or hoarded data, monopolies Indirect planning: tempo of decision making, external variables weighted Traditional information content and format Bargaining discourse: areas of extreme limited disclosure Social capital, including trust language: possible mistrust of online data Ways of knowing Storytelling: information format neural nets, fuzzy logic Message style: interface and output format Nonverbal signals: agreements or malleable positions not textual Timing of exchanges Literacy and knowledge access Occupational mobility Language and alphabets/characters: dialect Logic Views of time Team-building patterns: incremental problem-solving Value on consensus: layers of duplicated and verified effort Ethics: piracy desirable for infant economy Equity: every worker reassigned if displaced Codes of conduct: tolerance of cheating
Developing Nations
608 Table VI The Soft Information Infrastructure Economic and Financial Variables: Some Possible Impacts (in italics) Financial institutions and practices: data integrity and availability Accounting standards and practices: choice of models; information availability Logistics and communication: procesing speed Fiscal and monetary stability: range of real and nominal variability Surplus and division of labor: disguised underemployment Labor productivity: output per unit variable GDP and distribution of wealth: marketing projections skewed Wealth creation mechanisms: gray economy affects GDP reliability Labor organization: hours worked, benefits, holidays
communications popularized the collateral benefits of ICTs, and more democratic governments succumbed to pressure for relatively unrestricted Internet access. However, throughout the world, the Internet poses a dilemma: the value of connectivity must be balanced against a range of problems such as fraud and pornography, problems which encourage legal and political controls. Regulations proliferate in some developing economies, where often the government is the sole Internet service provider (ISP), and prohibitions against online political discussions are brutally enforced. Occasionally, the regulatory environment is onerous, and compliance with cultural control and political censorship can curtail ICTs aid to market development (Tables VIII and IX). In traditional international trade and investment, developing economies often adopted regulations for a variety of purposes: quotas and other on-tariff barriers, luxury taxes on pricey imports, tariffs, protec-
tion for infant industry, special labor benefits, tolerance for intellectual property thefts, and special taxation of foreign enterprises. Restrictions on online commerce often protect authoritarian governments, fundamentalism, or sociocultural elites, rather than infant industry, labor, or distressed sectors of the economy. Some governments are hostile to the Internet, and political controls may restrict ICT in critical developmental areas (Table IX). The current regulatory environment often affects partnerships which depend on e-commerce, multiple ISPs, and intellectual property. Restrictive policies reflect the priorities, tensions, traditions, and tempo of economic development. Government priorities have shifted in the past decades as information systems became affordable and powerful, and as local computer retailers and maintenance industries proliferated. Also, popular culture and global communications popularized the collateral benefits of ICTs, and most democratic gov-
Table VII The Soft Information Infrastructure Political and Legal Variables: Some Possible Impacts (in italics) Authority and legitimacy: varying degrees of enforcement/interpretation Property and other commercial law: ability to raise and protect capital Well-defined rights and obligations Regulation of the Internet: persistent digital divide Successful appeal mechanisms Functional bureaucracy Citizen or subject loyalty/compliance: reliability of tax, GDPO data, nonconfiscatory taxation, tariffs, and nontariff policy; confiscation, expropriation; nationalization Conflict resolution mechanisms: degree of disclosure, methods of closure Government aid: national priorities; fields necessary for demonstration of need Supervision of electronic communities: privacy legislation; security of intellectual property
Developing Nations
609 Table VIII Internet Regulations: A Contrast between Common Legal Issues in All Economies and Legal, Political, and Cultural Regulation Areas in Developing Economics Internet regulations Common global legal issues
Additional issues in developing economics
Fraud
Political opinions and criticism
Privacy
Regulated data collection
Theft of intellectual property
Regulated computer ownership
Pornography
Secularism in fundamentalist societies
Hacking
Government ownership of ISPs
Cyberterrorism
External news reports
Sabotage and electronic crime
Constraints on intellectual property law
Prostitution rings
Western culture
Gambling offshore
Hardware, software, network costs
Taxation
Maintenance
Anti-trust activity
Information access
Table IX Political Resistance to Cyberspacea Belarus
Single, government-owned ISP (Belpak)
Burma
State monopoly on Internet access; computer ownership must be reported to the government
Tajikistan
Single, government-owned ISP (Telecom)
Turkmenistan
More restricted than Tajikistan
Usbekistan and Azerbaijan
Private ISPs controlled by telecommunications ministry
Kazakhstan and Kyrgyzstan
Private ISPs but exorbitant government fees for usage and connection
China
Users monitored and government-registered
Cuba
Government-controlled Internet
Iran
Censorship and blocked sites; content restrictions apply to discussions on sexuality, religion, United States, Israel, and selected medical and scientific data (anatomy)
Iraq
No direct access to the Internet; official servers located in Jordan; few citizens own computers
Libya
Internet access impossible
North Korea
Internet access impossible; government servers located in Japan
Saudi Arabia
Science and Technology Center screens information offensive to Islam values
Sierra Leone
Opposition and online press persecuted
Sudan
Single, government-owned ISP (Sudanet)
Syria
Private Internet access illegal
Tunisia
Two government-owned ISPs
Turkey
February 2001 content restrictions
Vietnam
Two government-owned ISPs
a
From Reporters without Borders, “The Twenty Enemies of the Internet,” August 9, 1999, press release: “Molly Moore, Turkey Cracks Down on Youths Using Internet,” Hartford Courant, February 4, 2001, A10.
Developing Nations
610 ernments succumbed to pressure for unrestricted Internet access.
VII. THE DIGITAL DIVIDE: INTERNATIONAL PROJECTS AND ACTION PLANS Since the 1970s, before computers evolved into potent communication devices, international resolutions, conferences, and manifestos called for a new international information order for the poorest nations. As the agencies of the United Nations turned their attention from peacekeeping toward development, information was perceived as a key to affluence and independence. Resentment toward multinationals’ perceived monopoly on information resources fueled a movement by nonaligned “Third World” nations to embrace development programs subscribed to by more than 90 members of the United Nations by the 1970s. A 1976 UNESCO commission on communications recommended balanced and equal access to information, and specifically noted that “diverse solutions to information and communication problems are required because social, political, cultural, and economic problems differ from one country to another and within a country, from one group to another.” The attention of international agencies shifted from television and telephones to computers and the Internet in the following 20 years, and the issue of governmentcontrolled media emerged as salient and divisive. Regional and national projects have been influenced by the European Union’s resolutions on privacy, by the World Trade Organization’s studies of mandatory data control standards, and by transnational efforts to combat undesirable Internet content
and costs. More than ever, the fusion of information technologies and the emergence of wireless communications have placed ICTs in the forefront of three developmental problems: the need for capital, the need for technical assistance, and the growing digital divide.
VIII. ICTS: TOOLS FOR PROJECT COORDINATION AND RESEARCH DISSEMINATION Numerous information systems journals, international aid organizations, and national commissions are currently dedicated to the democratization of computing and communication systems. An influential online journal, Eldis, is sponsored by Sussex’s Institute of Developmental Studies. Eldis published a World Economic Report in 2001 which explores the impact of ICTs on work in developing areas (Table X). This report focuses on the disparity between positive impacts of e-business in advanced economies, and the devastating affects on labor in developing economies. Online research and reports on economic development initiatives by scores of aid organizations, such as the World Bank, USAID, and OECD, as well as multiagency reviews are available. On a regional basis, within Africa alone, dozens of projects are coordinated and expanded through ICTs, such as Project Africa (centered in Senegal), the Leland Project (USAID), the Pan African Development Information System (PADIS) Project, and Projet Afrinet (French). Within Latin America, almost every country hosts a Web site which publicizes the goals and strategies of ICT adoption. Global audiences can reach Argentina, Bolivia, or Chile through UNESCO to read
Table X World Economic Report 2001: An Excerpt of Issues for Developing Economiesa Life at work in the information economy Irreversibility and speed A widening digital divide How will markets be affected How will work organizations be affected Education matters most of all How will the quality of life and work be affected Social factors and social choices for addressing negative consequences a
Eldis, World Employment Report 2001, International Labor Organization (ILO), 2001.
Developing Nations the latest statutes or development initiatives for ICTs. UNESCO also maintains an Internet Public Policy Network, G8 Global Information society Pilot Projects, OECD infrastructure guidelines, and a current ASEAN ICT Framework Agreements. The growth of virtual research and development centers for ICT development holds significant promise for swifter and more comprehensive responses to developmental challenges. To move into the information economy, developing countries must set priorities and learn from the mistakes of the early leaders. International aid organizations and forward-looking political leaders typically support the following goals: universal information access, decision support systems, access to international information highways, vibrant private sector leadership, and global access to data within developing economies. However, the lack of capital, the question of intellectual property, the liberalization of national communications and public broadcasting services, and the development of human resources are salient problems. For example, not only have human resources been depleted by brain drain out of the country before the Internet era, but skilled workers can now work for foreign rather than national enterprises online, without emigrating. The most persistent digital divides may endure within developing economies, if current patterns of economic development mirror uneven patterns of growth and investment, and if political patronage and largesse is a guide. Just as the gap between developing and advanced economies has widened in the past decade, so the gap between wired and nonwired populations will probably increase within the poorest nations, similar to the patterns we see in NICs like India today. Patterns of economic development are uneven in developing and emerging economies, and, barring seismic political and cultural changes, the digital divide will deepen along these lines, destabilizing the economy and constraining balanced long-term growth. The seismic fault lines of the digital divide will probably produce egregious inequity where there are rural, mountainous, or interior regions with poor or nonexistent utilities; where dialects or minority languages are spoken; where minority religions, castes, tribes, or female populations are predominant; where investments in education are lacking or minimal; and where political elites have no vested interest in popular support (Table XI). The digital divide is pernicious, not only to producing an informed population and to linking remote populations to global markets, but to spreading the services of pharmacists, educators, physicians, agronomists, managers, and technicians. In advanced economies, the information highway has opened opportunities to small nations, small busi-
611 Table XI The Digital Divide: Developing Economies Resources needed
Most critical areas
Capacity building
Fault lines across and between sectors
Technical assistance
Deprivation along telephone systems
Capital
Topography matters
Political leadership
Political nonelites excluded Agricultural–rural divisions Littoral–inland splits Industry-specific contrasts Education levels Language: the problem of dialect
nesses, innovative shops, and minority chat groups. In developing areas, the potential for the democratic expansion of e-commerce depends on substantial external aid, extensive economic integration, and responsible political leadership.
IX. ETHICS: BEYOND PIRACY AND PRIVACY Corruption, inequity, and immoral behavior in economic development migrate to information systems. In the 1990s, substantial intellectual and cultural resources were used to identify and define ethical problems for in international business, to achieve a rough consensus on priorities, and to design solutions. Unfortunately, these designers were often working with arm-chair philosophers and social engineers who did not focus on unethical behaviors within developing areas. However, the will to address moral behavior was established both globally and locally. In advanced economies, information ethics emerged as a logical and vital extension of computer ethics. Before this extension, computer ethics focused on software piracy and other crimes against property such as the theft of computer time for personal use. Social and economic issues surfaced, such as egregious failure to develop ergonomic and user-friendly information systems. In the 2000s, riveting problems demand attention: security, unemployment, and equitable data access for a wide range of shareholders. With the expansion of e-commerce and the need for economic justice, not only ethicists and IS professionals, but also international organizations and global consumer advocates seek effective professional and moral codes of behavior. In developing economies, the thrust and the centrality of information ethics differ from those societies with mature markets and global information systems.
Developing Nations
612 Without the ethical development and use of information, information poverty cannot be eradicated. In addition, the stakes are critical: first, establish stability, popular trust, and a consensus on the new economic order; second, focus on the expansion of wealth, opportunity, and equity. From Honduras to Russia, computer and information ethics are preconditions for both stable market mechanisms and the maturation of information systems. The lack of information integrity and access contributes to low or stagnant growth and cripples orderly transitions to an information economy. Immoral ICT behaviors will become immoral behaviors via ICTs without intervention, crippling economic development from Mexico to China. Mismanaged, hoarded, and distorted information results in missed opportunities and valuable partnerships, inefficient distribution and supply channels, fraud, unproductive hidden assets abroad, egregious personal aggrandizement, tax avoidance, and unfair advantage—luxuries which developing economies cannot afford. The search for an ethical consensus draws upon experience, development theory, and multicultural resources, and represents a marked shift away from utopianism and toward action-based pragmatism and individual accountability. The stakes are enormous; the evolution of information ethics affects not only the viability of ICTs, but the digital divide within the most needy societies. The success of open, integrated, and extended systems depends upon the availability of information technology as well as equitable and honest information-handling behaviors and other moral considerations. Information systems cannot succeed in environments with endemic corruption, information-hoarding elites, illegality, and other moral considerations. The significance of historical experience, consequence-oriented traditions, and culture are germane but must not be overstated. Although ethical problems in information management arise from local economic, political, legal, and cultural factors, research on information ethics reveals a craving for moral consensus at every stage of economic development. At all levels of society, a debate on ethics is taking place, conflicting ethical norms have surfaced, and strategies to deal with corruption sit at the top of political and economic agendas. A critical mass of evidence on unethical behavior has accumulated, and the short- and longterm repercussions are selfevident and instructive. Action-based ethics feature three major reference points: a reevaluation of existing practices and values in developing economies, a critical analysis of the relationships among ethics and a market economy both
internally and abroad, and a search across multiple cultures and within layers of cultures for the identification, definition, and solution to ethical problems. Although intercultural transfers of moral norms, or the “missionary position,” cannot be transferred from advanced to developing economies, developing economies are dynamic, instructive workshops and laboratories positioned at the leading edge of ethical inquiry in information ethics and other moral problems in economic development. The 1990s were a time of severe trial and soul-searching. Substantial intellectual and cultural resources are now available to identify and define ethical problems with rigor, to achieve a rough consensus on priorities, and to begin to design solutions in a nation replete with arm-chair philosophers and social engineers. Most importantly, the will to address ethical questions and habits of investigation into novel and multiple resources have been established.
X. CONCLUSION An overview of research priorities and projects in the past 20 years shows significant shifts in development priorities and information systems (Table XII). At present, information systems are mixed blessings for developing countries. On the one hand, ICTs hold the promise of assisting balanced and accelerated economic development. On the other hand, ICTs are deepening the digital divide, erasing the modest comparative advantages which developing areas have held in cheap labor markets or other natural endowments. Furthermore, the success of ICTs depends upon economic and political reform and the success of painful cultural disruptions. Economic developers cannot succeed without broad infrastructure improvements, or without external supplies of capital and technical assistance. Start-up costs, partnerships, increased competition, and the globalization of domestic markets are daunting challenges. The development of information systems in developing economies is both critical and problematic. The resource shortages, inequities, and cultural obstacles which impede balance and wealth creation also influence the effectiveness of ICTs. The limits of protectionism, tradition-based solutions, and cultural congruence are clear. A timely movement toward open, extended, and integrated ICTs is mandated. The international, regional, and national programs for ICT development require unprecedented investments but promise to reap massive, long overdue benefits. The campaign to spread the advantages of the informa-
Developing Nations
613
Table XII IS Research Areas 1980s
2000s
EDI
e-commerce
National computer industries (Brazil)
India’s software industry
Impact on rural/minority/female population
Technological dualism
Brain drain
Wireless
Networks
Ethics and equity
TQM
The digital divide
Globalization
“Soft” factors
Joint ventures
Information quality
Multinational investment
Open, integrated, and extended systems
Systems design and maintenance
International assistance
tion revolution to the poorest areas of the world is one of the salient challenges of this generation.
SEE ALSO THE FOLLOWING ARTICLES Digital Divide, The • Digital Goods: An Economic Perspective • Economic Impacts of Information Technology • Electronic Commerce • Ethical Issues • Future of Information Systems • Global Information Systems • Globalization • Globalization and Information Management Strategy • Internet, Overview • People, Information Systems Impact on
BIBLIOGRAPHY Amoako, K.Y. (1996). Africa’s information society initiative: An action framework to build Africa’s information and communication infrastructure. Information and Communication for Development Conference Proceedings, Midrand, South Africa. Avgerou, C., and Walsham, G. (2000). Information technology in context. Aldershot: Ashgate. Castells, M. (2000). The rise of the network society, Vol. I. New York: Blackwells. Chepaitis, E. (1990). Cultural constraints in the transference of information technology to third world countries, in International Science and Technology: Philosophy, Theory and Policy (Mekki Mtewa, Ed.). New York: St. Martin’s. Chepaitis, E. (1993). After the command economy: Russia’s information culture and its impact on information resource
management. Journal of Global Information Management, Vol. I, no. 2. Chepaitis, E. (2001). The criticality of information ethics in emerging economies: Beyond privacy and piracy. Journal of Information Ethics. Chepaitis, E. (2002). E-commerce and the information environment in an emerging economy: Russia at the turn of the century, in Global Information Technology and Electronic Commerce: Issues for the New Millennium. (Prashant Palvia et al. Eds.), 53–73. Marietta, GA: Ivy League Publishing. Davenport, T. (1997). Information ecology. New York: Oxford Univ. Press. Fukuyama, F. (1995). Trust: The social virtues and the creation of prosperity. New York: Free Press. Hedley, R. A. (2000). The information age: Apartheid, cultural imperialism, or global village? in Social Dimensions of Information Technology: Issues for the New Millennium. (G. David Garson, Ed.), 278–290. Hershey, PA: Idea Group Publishing. Iansiti, M. (1998). Technology integration: Making choices in a dynamic world. Cambridge, MA: Harvard Univ. Press. Kannon, P.K., Change, A.-M., and Whinston, A. B. Electronic communities in e-business: Their roles and issues. Information Systems Frontiers, Vol. 1, No. 4, 415–426. Tallon, P. P., and Kraemer, K. L. (2000). Information technology and economic development: Ireland’s coming of age with lessons for developing countries. Journal of Global Information Technology Management, Vol. 3, No. 2, 4–23. Wild, J. L., Wild, K. L., and Han, J. C.Y. (2000). International business: An integrated approach: E-business edition. New York: Prentice Hall. World Bank Data Profiles. (July, 2001). World Bank Group.
Digital Divide, The Randal D. Pinkett Building Community Technology (BCT) Partners, Inc.
I. THE DIGITAL DIVIDE II. DIGITAL DIVIDE POLICY III. CLOSING THE DIGITAL DIVIDE
IV. THE INTERNATIONAL DIGITAL DIVIDE V. THE DIGITAL DIVIDE DEBATE
GLOSSARY
ICT Information and communications technology including computers and the Internet. National Information Infrastructure (NII) An interconnected network of computers, databases, handheld devices, and consumer electronics. nongovernmental organization (NGO) Similar to CBOs, whereas NGO is a term more commonly used outside of the United States.
community-based organization (CBO) Private, nonprofit organizations that are representative of segments of communities. community building An approach to community revitalization that is focused on strengthening the capacity of residents, associations, and organizations to work, individually and collectively, to foster and sustain positive neighborhood change. community content The availability of material that is relevant and interesting to some target audience (e.g., low-income residents) to encourage and motivate the use of technology. community network Community-based electronic network services, provided at little or no cost to users. community technology Community-based initiatives that use technology to support and meet the goals of a community. community technology center (CTC) Publicly accessible facilities that provide computer and Internet access as well as technical instruction and support. community telecenter Similar to a community technology center, but community telecenter is a term more commonly used in remote or rural areas outside of the United States. digital divide A phrase commonly used to describe the gap between those who benefit from new technologies and those who do not. Free-Net Loosely organized, community-based, volunteermanaged electronic network services. They provide local and global information sharing and discussion at no charge to the Free-Net user or patron.
I. THE DIGITAL DIVIDE The digital divide is a phrase commonly used to describe the gap between those who benefit from new technologies and those who do not. The phrase was first popularized by the National Telecommunications and Information Administration (NTIA) in the U.S. Department of Commerce in their 1995 report “Falling through the Net: A Survey of the Have Nots in Rural and Urban America.” Thereafter, the NTIA released three additional reports: “Falling through the Net. II. New Data on the Digital Divide” in 1998, “Falling through the Net. III. Defining the Digital Divide” in 1999, and “Falling through the Net. IV. Toward Digital Inclusion” in 2000. In their most recent report, the NTIA wrote: A digital divide remains or has expanded slightly in some cases, even while Internet access and computer ownership are rising rapidly for almost all groups. For example, the August 2000 data show that noticeable divides still exist between those with different levels of income and education, different racial and ethnic groups, old and young, single and dual-parent families, and
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
615
Digital Divide, The
616 those with and without disabilities. . . . Until everyone has access to new technology tools, we must continue to take steps to expand access to these information resources.
Excerpts from the 1999 NTIA report include: 1. Income—Households with incomes of $75,000 and higher were more than 9 times as likely to have a computer at home (see Fig. 1), and more than 20 times as likely to have access to the Internet than those with incomes of $5000 or less (see Fig. 2). 2. Education—The percentage-point difference between those with a college education or better, when compared to those with an elementary school education, was as high as 63% for computer penetration (see Fig. 3), and 45% for Internet penetration (see Fig. 4). 3. Race—Black and Hispanic households were approximately one-half as likely as households of Asian/Pacific Islander descent, as well as White households, to have a home computer (see Fig. 5), and approximately one-third as likely as households of Asian/Pacific Islander descent, and roughly two-fifths as likely as White households, to have home Internet access (see Fig. 6). 4. Geography—Americans living in rural areas lagged behind those in urban areas and the central city, regardless of income level. For example, at the lowest level of income ($5000 and below), those
in urban areas and the central city were almost one and a half times as likely to have a home computer (see Fig. 7) and more than twice as likely to have Internet access than those in rural areas (see Fig. 8). 5. Income and race—For households earning between $35,000 and $74,999, 40.2% of Blacks and 36.8% of Hispanics owned a computer, compared to 55.1% of Whites (see Fig. 9), while for households earning between $15,000 and $34,999, 7.9% of Blacks and 7.6% of Hispanics had Internet access, compared to 17% of Whites (see Fig. 10). A similar pattern emerged in each income category. Corroborating evidence can be drawn from a number of related studies. In 1995, RAND’s Center for Information Revolution Analyses (CIRA) published the results of a 2-year study entitled, “Universal Access to EMail: Feasibility and Societal Implications.” Consistent with the NTIA reports, they found “large differences in both household computer access and use of network services across income categories. . . large differences in household computer access by educational attainment . . . [and] rather large and persistent differences across race/ethnicity in both household computer access and network services usage.” In 1997, Bellcore released the results of a national public opinion survey entitled “Motivations for and Barriers to Internet Usage: Results of a National Pub-
90.0 80.0
U.S. Central City
70.0 % of U.S. households
Rural 60.0
Urban
50.0 40.0 30.0 20.0 10.0 0.0 Under $5000
$10,000$14,999
$20,000$24,999
$35,000$49,999
$75,000+
Income
Figure 1 The 1998 computer penetration rates by income. Data from the U.S. Department of Commerce.
Digital Divide, The
617 70.0 60.0
U.S.
50.0
Rural
% of U.S. households
Central City Urban 40.0 30.0 20.0 10.0 0.0 Under $5000
$10,000$14,999
$20,000$24,999
$35,000$49,999
$75,000+
Income
Figure 2 The 1998 Internet penetration rates by income. Data from the U.S. Department of Commerce.
lic Opinion Survey.” In similar fashion to the NTIA, they found that a disproportionately high 58% of those with a household income below $25,000 reported a lack of awareness of the Internet. The Spring 1997 CommerceNet/Nielsen Internet Demographic Study (IDS), conducted in December
1996/January 1997 by Nielsen Media Research, also confirmed the NTIA’s observations. This study was the first to collect data on patterns of use with computers and communications technologies as a function of income and race. In 1998, using the IDS data, Vanderbilt University released a report, “The Impact
80.0 70.0
% of U.S. households
60.0 50.0
U.S. Central City Rural Urban
40.0 30.0 20.0 10.0 0.0 Elementary
Some H.S.
H.S. diploma or GED
Some college
B.A. or more
Education
Figure 3 The 1998 computer penetration rates by education. Data from the U.S. Department of Commerce.
Digital Divide, The
618 60.0
U.S.
50.0
Central City % of U.S. households
Rural 40.0
Urban
30.0
20.0
10.0
0.0 Elementary
Some H.S.
H.S. diploma or GED
Some college
B.A. or more
Education
Figure 4 The 1998 Internet penetration rates by education. Data from the U.S. Department of Commerce.
of Race on Computer Access and Use,” that examined differences in PC access and Web use between AfricanAmericans and Whites. The most interesting finding from their study was that income did not explain race differences in home computer ownership. While 73%
of White students owned a home computer, only 33% of African-American students owned one, a difference that persisted when a statistical adjustment was made for the students’ reported household income. They concluded that “in terms of students’ use of the Web,
60.0
50.0 White % of U.S. households
Other 40.0
Black Hispanic
23.4%-pts
30.0
21.5%-pts
20.0
16.8%-pts
10.0
0.0 1984
1989
1994
1997
1998
Year
Figure 5 The 1998 computer penetration rates by race. Data from the U.S. Department of Commerce.
Digital Divide, The
619 25.0
20.0
White
% of U.S. households
Other Black 15.0
Hispanic
13.8%-pts
10.0
5.0 2.7%-pts 0.0 1998
1994 Year
Figure 6 The 1998 Internet penetration rates by race. Data from the U.S. Department of Commerce.
90.0 U.S.
80.0
Rural Urban
70.0
% of U.S. households
Central City 60.0
50.0 40.0
30.0 20.0 10.0
0.0 Under $5,000
$5,000$9,999
$10,000$14,999
$15,000$19,999
$20,000$24,999
$25,000$34,999
$35,000$49,999
$50,000$74,999
$75,000+
Income
Figure 7 The 1998 computer penetration rates by income by geographic location. Data from the U.S. Department of Commerce.
Digital Divide, The
620 70.0 U.S. 60.0
Rural Urban Central City
% of U.S. households
50.0
40.0
30.0
20.0
10.0
0.0 Under $5,000
$5,000$9,999
$10,000$14,999
$15,000$19,999
$20,000$24,999
$25,000$34,999
$35,000$49,999
$50,000$74,999
$75,000+
Income
Figure 8 The 1998 Internet penetration rates by income by geographic location. Data from the U.S. Department of Commerce.
particularly when students do not have a home computer, race matters.” In 2000, Stanford University released a preliminary report, “Internet and Society,” which focused on the digital divide, but with a noticeably different explanation for the problem. Focusing on the Internet solely (to the exclusion of computer ownership), they concluded that education and age were the most important factors facilitating or inhibiting access. According to their analysis, “a college education boosts rates of Internet access by well over 40 percentage points compared to the least educated group, while people over 65 show a more than 40 percentage point drop in their rates of Internet access compared to those under 25.” Regardless of the explanatory construct, all of these statistics suggest that the digital divide is a very real phenomenon. In addition to the aforementioned studies, the digital divide has been framed along a number of other dimensions, including inequities in computer and Internet access, use, and curriculum design between urban and suburban schools, or predominantly minority and predominantly White schools; the lack of participation of underrepresented groups in computer-
related and information technology-related fields and businesses, and the dearth of online content, material, and applications, geared to the needs and interests of low-income and underserved Americans.
II. DIGITAL DIVIDE POLICY The United States policy related to the digital divide originates within the context of universal service which, at its inception, referred to universal access to “plain old telephone service” (POTS). This was later expanded to include information and communications technology (ICT). In 1913, the Kingsbury Commitment placed the national telephony network of American Telephone and Telegraph (AT&T) under control of the federal common carrier. In exchange, AT&T received further protection from competition while agreeing to provide universal access to wiring and related infrastructure. The Communications Act of 1934 extended this accord and ensured affordable telephone service for all citizens by regulating pricing, subsidizing the cost of physical infrastructure to re-
Digital Divide, The
621 90.0 80.0 70.0
White Other
% of U.S. households
Black 60.0
Hispanic
50.0 40.0 30.0 20.0 10.0 0.0 Under $15,000
$15,000-$34,999
$35,000-$74,999
$75,000+
Income
Figure 9 The 1998 computer penetration rates by income by race. Data from the U.S. Department of Commerce.
mote areas, and establishing the Federal Communications Commission (FCC). In the Communications Act of 1934, the role of the FCC was described as follows: For the purpose of regulating interstate and foreign commerce in communication by wire and radio so as to make available, so far as possible to all the people of the United States, a rapid, efficient, nation-wide,
and world-wide wire and radio communications service with adequate facilities at reasonable charges.
The FCC would then play the central role in ensuring that national telephone service was both available and affordable, whereas it would be almost 60 years before telecommunications policy was revisited in the U.S. In 1993, the Clinton Administration released a report,
70.0 White
% of U.S. households
60.0 50.0
Other Black Hispanic
40.0 30.0 20.0 10.0 0.0 Under $15,000
$15,000-$34,999
$35,000-$74,999
$75,000+
Income
Figure 10 The 1998 Internet penetration rates by income by race. Data from the U.S. Department of Commerce.
622 “National Information Infrastructure: Agenda for Action,” which identified strategies for expanding the National Information Infrastructure (NII)—an interconnected network of computers, databases, handheld devices, and consumer electronics. The report addressed the following areas: 1. Promoting private sector investment, through appropriate tax and regulatory policies 2. Extending the universal service concept to ensure that information resources were available at affordable prices 3. Acting as a catalyst to promote technological innovation and new applications 4. Promoting seamless, interactive, user-driven operation of the NII 5. Ensuring information security and network reliability 6. Improving management of the radio frequency spectrum 7. Protecting intellectual property rights 8. Coordinating with other levels of government and with other nations 9. Providing access to government information and improving government procurement Meanwhile, in 1994, the NTIA in the U.S. Department of Commerce began gathering data related to computer and Internet access, by adding a series of questions to the U.S. Census Bureau’s “Current Population Survey.” This resulted in the first of the four “Falling through the Net” reports that highlighted the digital divide—as defined in the report as the gap “between those who have access to and use computers and the Internet.” Soon after, Congress passed the Telecommunications Act of 1996 with the aim of deregulating the telecommunications industry, thus promoting competition, lowering the overall cost of services to consumers, and expanding the notion of universal service through the inclusion of additional mechanisms that supported its provision. The Telecommunications Act of 1996 also established one of several major policy initiatives aimed at closing the digital divide—the Universal Service Fund, also known as the “E-Rate” Program. The Universal Service Fund was administered by the Universal Service Administration Company (USAC), a private, not-for-profit corporation responsible for providing every state and territory in the United States with access to affordable telecommunications services. All of the country’s communities— including remote communities—such as rural areas, low-income neighborhoods, rural health care
Digital Divide, The providers, public and private schools, and public libraries, were eligible to seek support from the Universal Service Fund. To accomplish this task, USAC administered four programs: the High Cost Program, the Low Income Program, the Rural Health Care Program, and the Schools and Libraries Program. Each of these programs provided affordable access to modern telecommunications services for consumers, rural health care facilities, schools, and libraries, regardless of geographic location or socioeconomic status, via 20 to 90% discounts on telecommunications services, Internet access, and internal connections. In addition to USAC, a few of the other major policy initiatives of the Clinton Administration included the following: 1. U.S. Department of Commerce, Technology Opportunities Program—The Technology Opportunities Program promoted the widespread availability and use of advanced telecommunications technologies in the public and nonprofit sectors. As part of the NTIA, TOP gave grants for model projects demonstrating innovative uses of network technology. TOP evaluated and actively shared the lessons learned from these projects to ensure the benefits were broadly distributed across the country, especially in rural and underserved communities. 2. U.S. Department of Education, Community Technology Center (CTC) Program—The purpose of the Community Technology Centers program was to promote the development of model programs that demonstrated the educational effectiveness of technology in urban and rural areas and economically distressed communities. These community technology centers provided access to information technology and related learning services to children and adults. 3. U.S. Department of Housing and Urban Development, Neighborhood Networks Program—The Neighborhood Networks initiative encouraged the establishment of resource and community technology centers in privately owned apartment buildings that received support from HUD to serve low-income people. The centers offered computer training and educational programs, job placement, and a diverse array of other support. The goal of Neighborhood Networks was to foster economic opportunity and encourage life-long learning. 4. National Science Foundation, Connections to the Internet Program—The Connections to the Internet program encouraged U.S. research and education institutions and facilities to connect to
Digital Divide, The the Internet and to establish high performance connections to support selected meritorious applications. This included three connection categories: (1) connections for K-12 institutions, libraries, and museums that utilized innovative technologies for Internet access; (2) new connections for higher education institutions; and (3) connections for research and education institutions and facilities that had meritorious applications with special network requirements (such as high bandwidth and/or bounded latency) that could not readily be met through commodity network providers. 5. U.S. Department of Agriculture, Construction and Installation of Broadband Telecommunications Services in Rural America—The Rural Utilities Service (RUS) was a loan program and the availability of loan funds under this program to finance the construction and installation of broadband telecommunications services in rural America. The purpose of the program was to encourage telecommunications carriers to provide broadband service to rural consumers where such service did not currently exist. This program provided loan funds, on an expedited basis, to communities up to 20,000 inhabitants to ensure rural consumers enjoyed the same quality and range of telecommunications services that were available in urban and suburban communities. 6. U.S. General Services Administration, Computers for Learning Program—The Computers for Learning (CFL) program transferred excess Federal computer equipment to schools and educational nonprofit organizations, giving special consideration to those with the greatest needs. Federal agencies used the CFL web site to connect the registered needs of schools and educational nonprofit organizations with available government computer equipment.
III. CLOSING THE DIGITAL DIVIDE Three primary models have emerged for closing the digital divide. These efforts fall under the heading of community technology, or community-based initiatives that use technology to support and meet the goals of a community. The first model is community networks, or community-based electronic network services, provided at little or no cost to users. The second model is community technology centers (CTCs), or publicly accessible facilities that provide computer and Internet access as well as technical instruction and support.
623 The third model is community content, or the availability of material that is relevant and interesting to some target audience (e.g., low-income residents) to encourage and motivate the use of technology. These approaches can be classified according to what they provide: hardware, software, training, infrastructure, online access, or content. They can also be classified according to the groups they target: individuals, schools, youth, community organizations, and the general public, or specific groups such as a neighborhood, racial or ethnic minorities, the homeless, and the elderly. Each model is described in greater detail below.
A. Community Networks Community networks are community-based electronic network services, provided at little or no cost to users. In essence, community networks establish a new technological infrastructure that augments and restructures the existing social infrastructure of the community. Most community networks began as part of the Free-Net movement during the mid-1980s. According to the Victoria Free-Net Association, Free-Nets are “loosely organized, community-based, volunteermanaged electronic network services. They provide local and global information sharing and discussion at no charge to the Free-Net user or patron.” This includes discussion forums or real-time chat dealing with various social, cultural, and political topics such as upcoming activities and events, ethnic interests, or local elections, as well as informal bartering, classifieds, surveys and polls, and more. The Cleveland Free-Net, founded in 1986 by Dr. Tom Grundner, was the first community network. It grew out of the “St. Silicon’s Hospital and Information Dispensary,” an electronic bulletin board system (BBS) for health care that evolved from an earlier bulletin board system, the Chicago BBS. In 1989, Grundner founded The National Public Telecomputing Network (NPTN) which, according to the Victoria Free-Net Association, “evolved as the public lobbying group, national organizing committee, and public policy representative for U.S.-based FreeNets and [contributed] to the planning of world-wide Free-Nets.” NPTN grew to support as many as 163 affiliates in 42 states and 10 countries. However, in the face of rapidly declining commercial prices for Internet connectivity, and a steady increase in the demands to maintain high-quality information services, NPTN (and many of its affiliates) filed for bankruptcy in 1996. While a number of Free-Nets still exist today,
624 many of the community networking initiatives that are presently active have incorporated some aspects of the remaining models for community technology— community technology centers and community content. The aforementioned study at RAND’s CIRA involved the evaluation of the following five community networks: (1) Public Electronic Network, Santa Monica, CA; (2) Seattle Community Network, Seattle, WA; (3) Playing to Win Network, New York, NY; (4) LatinoNet, San Francisco, CA; and (5) Blacksburg Electronic Village (BEV), Blacksburg, VA. Their findings included increased participation in discussion and decision making among those who were politically or economically disadvantaged, in addition to the following results: • Improved access to information—Computers and the Internet allowed “individuals and groups to tap directly into vast amounts and types of information from on-line databases and from organizations that advertise or offer their products and services online.” • Restructuring of nonprofit and community-based organizations—Computers and the Internet assisted these organizations in operating more effectively. • Delivery of government services and political participation—Computers and the Internet promoted a more efficient dissemination of local and federal information and services and encouraged public awareness of, and participation in, government processes. Examples of other community networks include Big Sky Telegraph, Dillon, Montana; National Capital FreeNet, Ottawa, Ontario; Buffalo Free-Net, Buffalo, New York; and PrairieNet, Urbana-Champaign, Illinois.
B. Community Technology Centers Community technology centers (CTCs), or community computing centers, are publicly accessible facilities that provide computer and Internet access, as well as technical instruction and support. According to a 1999 study by the University of Illinois at UrbanaChampaign, CTCs are an attractive model for a number of reasons. First, they are cost-effective when compared to placing computers in the home. Second, responsibility for maintaining computer resources is assumed by an external agent. Third, knowledgeable staff members are present to offer technical support and training. Fourth and finally, peers and other community members are present, creating a pleasant so-
Digital Divide, The cial atmosphere. Consequently, CTCs are, by far, the most widely employed strategy to-date for community technology initiatives. For more than two decades, significant public and private funds have been invested in the development of CTCs nationwide, including the Intel Computer Clubhouse Network, the Community Technology Centers’ Network (CTCNet), PowerUp, the U.S. Department of Education CTC program, the U.S. Department of Housing and Urban Development Neighborhood Networks (NN) program, and more. CTCs have been the focus of numerous studies relating to computer and Internet access and use, and the factors influencing their effectiveness have been well-researched and documented, including determining space; selecting hardware, software, and connectivity; scheduling and outreach; budgeting; funding, and more. The Community Technology Centers’ Network (CTCNet), housed at Educational Development Center, Inc. (EDC), in Newton, Massachusetts, is a national membership organization that promotes and nurtures nonprofit, community-based efforts to provide computer access and learning opportunities to the general public and to disadvantaged populations. CTCNet is a network of more than 350 affiliate CTCs including multiservice agencies, community networks, adult literacy programs, job training and entrepreneurship programs, public housing facilities, YMCAs, public libraries, schools, cable television access centers, and after-school programs. CTCNet has conducted two evaluations examining the impact of computers and the Internet on individuals and families. Their first evaluation, “Community Technology Centers: Impact on Individual Participants and Their Communities,” was a qualitative study that involved semistructured interviews with 131 participants at five intensive sites: (1) Brooklyn Public Library Program, Brooklyn, NY; (2) Somerville Community Computing Center, Somerville, MA; (3) Old North End Community Technology Center (ONE), Burlington, VT; (4) New Beginnings Learning Center, Pittsburgh, PA; and (5) Plugged In, East Palo Alto, CA. The results of the study included: • Increased job skills and access to employment opportunities—Individuals were able to access information and resources about job search and employment opportunities (14%), improve job skills including computer and literacy skills (38%), and consider new, higher-wage, career options that involved the use of technology (27%). • Education and improved outlook on learning— Individuals gained access to lifelong learning
Digital Divide, The
•
•
•
•
•
opportunities such as computer literacy and mathematics programs (15%), changed their goals for learning and educational attainment (e.g., decided to pursue a GED or more) (27%), and improved their outlook and perspective on learning (e.g., using the computer they “learned that they can learn”) (27%). Technological literacy as a means to achieve individual goals—Individuals obtained greater computer awareness and new computer skills that increased their comfort with technology as a tool for accomplishing their goals (91%). New skills and knowledge—Individuals improved their reading and writing (37%), mathematics skills, and interest in science (8%). Personal efficacy and affective outcomes—Individuals achieved greater personal autonomy (18%) and feelings of pride and competence as a result of success with computers (e.g., decided to stay off drugs) (23%). Use of time and resources—Individuals found productive uses for their time (15%) which resulted in positive outcomes such as reduced reliance on public assistance (4%). Increased civic participation—Individuals identified new avenues for voicing their opinions on a range of social and political issues (5%), gained access to community, municipal, and government services and resources, and demonstrated greater interest in and engagement with current events (5%).
CTCNet’s second evaluation, “Impact of CTCNet Affiliates: Findings from a National Survey of Users of Community Technology Centers,” was a quantitative study that surveyed 817 people from 44 sites. This evaluation corroborated the findings from the 1997 study and also found that more than one-third of users with employment goals, half of users with educational goals, and more than half of users with goals of self-confidence and overcoming fear reported achieving their goals. Examples of other CTCs include Computer Clubhouse, Boston, Massachusetts; Austin Learning Academy, Austin, Texas; PUENTE Learning Centers, Los Angeles, California; New Beginnings Learning Center, Pittsburgh, Pennsylvania; and West Side Community Computing Center, Cleveland, Ohio.
C. Community Content Community content refers to the generation and availability of local material that is relevant and interesting
625 to a specific target audience (e.g., low-income residents) to encourage and motivate the use of technology. Community content can be broadly classified along two dimensions: information vs communication, and active vs passive. The information vs communication dimension highlights the Internet’s ability to both deliver information and facilitate communication. Interestingly, a 1997 study at Carnegie Mellon University found that people use the Internet more for communication and social activities than they do for information purposes. A simple example of the difference between these two forms of community content is the difference between reading and writing about community-related matters (information), and discussing and dialoguing about community-related matters (communications). Information-based community content takes the form of databases and documents that can be accessed online such as a directory of social service agencies, a listing of recommended websites, or a calendar of activities and events. Communications-centered community content takes the form of interactive, synchronous tools such as chat rooms and instant messaging, or asynchronous tools such as listservs (email lists) and discussion forums. Here, the distinguishing factor when contrasted with other forms of content is that the nature of the information or communication exchange is solely focused on, or of use to, members of the community. The active vs passive dimension, at one extreme, positions community members as the active producers of community content, while at the other extreme, it positions community members as the passive recipients of community content, with varying degrees of each designation found at each point along the continuum. A simple example that highlights the distinction between these two orientations toward community content is the difference between browsing a community website (passive) and building a community website (active). A passive disposition is static, unidirectional, and sometimes described as “one-to-many” because content is generated by a third party (one) and delivered to the community (many). It typically manifests itself in the form of centralized, one-way repositories of information that can be accessed by community members such as an information clearinghouse of city or municipal services; an online entertainment guide that lists movies, shows, live performances, and restaurants; or a web portal of local news, weather, sports, etc. Here, the distinguishing factor is that although very little, if any, content is produced by the community, it is still intended for the community. An active disposition is
Digital Divide, The
626 dynamic, multidirectional, and often described as “many-to-many,” because content is generated by the community (many) for the community (many). It typically manifests itself in the form of multiple-way, interactive communication and information exchange between end-users such as an online, neighborhoodbased, barter and exchange network, an e-mail listserv for the purpose of online organizing and advocacy, or a community-generated, web-based, geographic information system (GIS) that maps local resources and assets. Here, the distinguishing factor is that most, if not all, content is produced by the community. Community content was the focus of a report authored by the Children’s Partnership, a children’s advocacy, nonprofit organization, entitled “Online Content for Low-Income and Underserved Americans.” In the report, community content was defined according to the following five categories: • • • •
Information that is more widely available Information that can be customized by the user Information that flows from many to many Information that allows for interaction among users • Information that enables users to become producers of information. Based on an evaluation that included discussion with user groups, interviews with center and community network directors, interviews with other experts, and analysis of the web, they concluded the following with respect to low-income and underserved populations and existing Internet content: (1) a lack of local information, (2) literacy barriers, (3) language barriers, and (4) a lack of cultural diversity. This suggests that while the web has emerged as a valuable resource for mainstream users, additional effort must be made to make the Internet more attractive to local community residents and ethnic and cultural groups, as well as more accessible to users with limited literacy, users with disabilities, and nonnative English speakers. Community content is an emerging strategy for community technology initiatives, and additional work will be required to overcome these barriers. Notable community content sites include CTCNet (http://www.ctcnet.org), the Digital Divide Network (http://www.digitaldividenetwork.org), the America Connects Consortium (http://www.americaconnects. net), the Children’s Partnership’s Content Bank (http://www.contentbank.org), and the Community Connector (http://databases.si.umich.edu/cfdocs/ community/index.cfm).
IV. THE INTERNATIONAL DIGITAL DIVIDE Although the digital divide was initially recognized as an issue within the domestic United States, the phrase was very quickly adopted to encompass the disparities in information and communications technology (ICT) access and use across the globe. The following is an overview of the digital divide in selected countries and continents.
A. Africa Internet access arrived in Africa circa 1998, which clearly explains why most African countries have experienced relatively low levels of computer and Internet penetration when compared to other countries in their region. In 1999, the United Nations Development Program (UNDP) released the “World Report on Human Development,” which identified Africa as having 0.1% of the hosts on the Internet and the U.S. as having 26.3%. In 2000, Nua estimated that there were 407.1 million people online, of which 3.11 million were in Africa (less than 1%) from among its 738 million people. That same year, the International Telecommunications Union (ITU) reported that while Africa represented 12.8% of the world’s population, it had just 2% of wired telephones, 1.5% of wireless telephones, and 1.5% of personal computers. Efforts to close the digital divide and expand ICT diffusion in Africa were bolstered in April 1995, when the Lisbon Center for the Study of Africa (CEA), ITU, United Nations Educational, Scientific and Cultural Organization (UNESCO), International Development Research Center (IDRC), and Bellanet International held the “African Regional Symposium on Telematics for Development” in Addis Abeba. One month later, at the annual meeting of the CEA, ministers from various African countries gathered and passed Resolution 795, “Building the African Speedway for Information,” which led to the implementation of a national telecommunications infrastructure for the purpose of planning and decision making, and the establishment of a committee comprised of ICT experts charged with preparing Africa for the digital age. One year later, the CEA Ministers passed Resolution 812, which established an even broader initiative known as the “African Information Society Initiative (AISI).” AISI, the recognized plan for fostering an environment within Africa that could support broad deployment of ICT throughout the continent, was based on the following nine points:
Digital Divide, The 1. Develop nation plans for building information and communication infrastructure 2. Eliminate legal and regulatory barriers to the use of information and communications technologies 3. Establish and enable an environment to foster the free flow and development of information and communications technologies in society 4. Develop policies and implement plans for using information and communications technologies 5. Introduce information and communications applications in the areas of highest impact on socioeconomic development at the national level 6. Facilitate the establishment of locally based, lowcost, and widely accessible Internet services and information content 7. Prepare and implement plans to develop human resources in information and communications technologies 8. Adopt policies and strategies to increase access to information and communications facilities with priorities in servicing rural areas, grassroots society, and other disenfranchised groups, particularly women and youth 9. Create and raise awareness of the potential benefits of African information and communications infrastructure In 1998, following the ITU Africa Telecomm Conference, African Telecomm Ministers established the “African Connection” and the African Telecomms Union (ATU) to further promote strategies to bring ICT to the continent on a large-scale basis. This was followed by a series of related conferences and forums focused on building the information and communications infrastructure of Africa. Alongside the efforts of AISI and ATU, certain African countries have made noticeable progress toward strengthening the availability and use of ICT within their regions, mostly notably South Africa. According to Nua, in 2000 the largest share of Internet users in Africa was located in Southern Africa, of which South Africa alone represented roughly threequarters of the continent’s total users, followed by Zimbabwe and Botswana. During the changeover years to the post-apartheid South Africa, the Center for Developing Information and Telecommunications Policy was established in coordination with the transitioning government of the African National Congress (ANC). After the ANC came into complete power in 1994, a process was undertaken to understand and subsequently address the disparity in ICT in South Africa, which included the Information Society and Devel-
627 opment Conference (ISAD) in May 1996. This led to the Telecommunications Act of 1996, which established an independent regulatory agency for ICT in South Africa and created the Universal Service Agency (USA), a public agency responsible for overseeing ICT-related initiatives throughout South Africa. Since then, a number of initiatives have been undertaken including training programs by the Department of Labor and efforts to bring computers into schools by 2005 sponsored by the Department of Education and led by nongovernmental organizations (NGOs) such as SchoolNet and the Multi-Purpose Community Centers Program. South Africa has also witnessed the emergence of various Internet Cafes (commercial access sites) and USA telecenters (public access sites) in addition to phoneshops, libraries equipped with computers and Internet access, and other programs that provide ICT training and support.
B. Australia In 2000, the Australian Bureau of Statistics (ABS) released figures concerning the digital divide in Australia, which determined that 40% of Australian households were connected to the Internet by the end of the year, while 50% of Australian adults had accessed the Internet during the past year. Their research also identified the following factors related to computer and Internet access: household income, showing that households with an annual income above $50,000 were almost twice as likely to have a computer (77%) and Internet (57%) at home than those below this figure (37 and 21%, respectively); region, showing that households located in metropolitan areas (i.e., the east coast of Australia) were more likely to have computer and Internet access (59 and 40%, respectively) when compared to nonmetropolitan areas (52 and 32%, respectively); family composition, showing that households with children under 18 were more likely to access computers and the Internet (48%) when compared to their counterparts without children under 18 (32%); age, showing that adults between the ages of 18 and 24 were the most likely to have a computer and Internet access at home (88%) and adults above the age of 55 were the least likely to do so; employment status, showing that employed adults were more than twice as likely to have a computer and Internet access (82%) than unemployed adults (38%); gender, showing that women (47%) were less likely than males (53%) to use computers and the Internet; and education, showing that computer and Internet
628 access for adults with a bachelor’s degree (64%) was more than twice the level of access among adults with a secondary school education (28%). Subsequently, in June 2001, the National Office of Information Economy (NOIE) in the Commonwealth Government of Australia released a report entitled “Current State of Play,” which examined the digital divide in Australia from an even broader perspective. Their report corroborated the findings from the ABS and focused on three areas: (1) readiness, or access to ICT as indicated by levels of penetration and infrastructure; (2) intensity, or the nature, frequency, and scope of ICT use; and (3) impacts, or the benefits of ICT to citizens, businesses, and communities. This information was then used to inform a variety of national, regional, and local initiatives aimed at closing the digital divide. Nationally, NOIE worked with the Department of Communications, Information Technology and the Arts, through the Networking the Nation (NTN) project, which has provided online access primarily in nonmetropolitan areas, created a multicultural webbased portal with community information and content, established a national directory of organizations providing subsidized access and free computers, offered training programs aimed at women and the elderly, and more. Regionally and locally, NTN has funded a number of programs such as AccessAbility, an online resource to raise awareness concerning disabled users of the Internet; Farmwide, which provides assistance to residents who have to place long-distance telephone calls to access the Internet; Building Additional Rural Networks (BARN), which supports network infrastructure and innovative technology development in rural areas, and more. Finally, to specifically address the needs of indigenous people, NOIE has instituted several programs such as the Remote Islands and Communities Fund, which provides assistance for the ICT needs of people in remote islands and communities; the Open Learning Projects to Assist Indigenous Australians, which offers funds for educational packages designed to serve indigenous students and electronic networks to link indigenous postgraduate students with other academic institutions; and the Connecting Indigenous Community Links, which provides public Internet access sites in seven indigenous communities.
Digital Divide, The ternet Use,” and other similar surveys. The 2000 survey revealed that 53% of Canadians age 15 years and older had used the Internet in the last 12 months. The breakdown of Internet use in the last 12 months along various social, geographic, and demographic lines was 56% of men and 50% of women; 90% of people aged 15 to 19 and 13% of people aged 65 to 69; 30% of people with an annual income below $20,000 and 81% of people with an annual income above $80,000; 79% of people with a university education and 13% of people with less than a high school diploma; 55% of people in urban areas and 45% of people in rural areas; and as high as 61% of people in Alberta and British Columbia and as low as 44% of people in Newfoundland and New Brunswick. Beginning in the early 1990s, Canada witnessed a concerted effort to foster greater levels of civic participation among its citizens by leveraging the affordances of ICT. In 1992, the Victoria Free-Net and National Capital Free-Net were founded, and modeled after the Cleveland Free-Net in the United States. These early Free-Nets were the predecessors for a number of community networking initiatives in Canada that aimed to engage citizens in the public sphere. In 1994, Telecommunities Canada was created as an umbrella and advocacy organization for community networks, and between 1995 and 1998, the number of Free-Nets in Canadian cities increased from approximately 24 to 60. However, much like the community networking movement in the United States, a number of these Free-Nets experienced financial difficulties that caused them to shift their focus or completely disband. Similarly, the Coalition for Public Information (CPI), a federally incorporated, nonprofit organization, was founded in 1993, as a coalition of organizations, public interest groups, and individuals with a mandate to foster universal access to affordable, useable information and communications services and technology. With tremendous activity around ICT advocacy and policy between 1993 and 1999, CPI has since become dormant in its efforts to take action related to these issues. To further promote ICT access and use throughout the country, the Canadian government has played a very active role through its “Connecting Canadians” initiative. These efforts involved a variety of governmentsponsored programs and activities aimed at making Canada the most connected country in the world. The initiative was based on the following six pillars:
C. Canada Canada has been tracking statistics related to the digital divide through the “General Social Survey of In-
• Canada Online—This included the creation of up to 10,000 public Internet access sites in rural, remote, and urban communities. It also involved
Digital Divide, The
•
•
•
•
•
connecting Canada’s public schools and libraries to the Internet and upgrading Canada’s network infrastructure. Smart Communities—These initiatives were focused on promoting the use of ICT for community development such as better delivery of heath care, improved education and training, and enhanced business and entrepreneurial opportunities. Canadian Content Online—A series of activities were organized to digitize and make widely available information and content related to Canadian people, culture, and history. This also included new application and software development to support information and content delivery. Electronic Commerce—The Canadian Government worked with specific provinces, territories, businesses, and other community members to implement their Electronic Commerce Strategy, released in 1998, which identified strategies for promoting greater availability and use of electronic commerce as an integral component of Canada’s economic landscape. Canadian Governments Online—A comprehensive effort to deliver government services electronically as a means to provide more efficient and effective access to information and better respond to the needs of Canadian citizens. Connecting Canada to the World—As an overarching endeavor, the Connected Canadians initiative aimed to establish Canada as a recognized leader in the digital age, so as to bolster the country’s attractiveness to foreign investors and global businesses toward strengthening the overall economy.
D. China In China, the digital divide is a phrase that was first referenced in the Okinawa Charter in Global Information Society and referred to the new difference among groups and polarization among the rich and the poor because of the different level of information and communication technologies among different countries, areas, industries, enterprises, and groups in the process of global digitalization. According to the China Internet Network Information Center, by the end of 2000, telephone penetration in China had reached 398 million households (24.4%)—281 million wired and 117 million wireless—computer penetration had reached 30 million people (2.5%), and Internet penetration had reached 18 million people (1.5%), with noticeable differences between the more
629 affluent areas of eastern China, such as Beijing, Shanghai, and Guangdong, and the less developed provinces of central and western China. On September 25, 2000, the Chinese government took a first and comprehensive step toward telecommunications policy by enacting the Telecommunications Statutes. This was done to encourage increased investment in telecommunications infrastructure and promote greater competition in the telecommunications industry, toward lowering prices for these services. This included associated steps to break up the then-monopoly China Telecom into three separate companies for wired communications (China Telecom), wireless communications (China Mobile), and paging/satellite communications (transferred to Unicom). Furthermore, the Chinese Ministry of Information Industry (MII) began regulating access to the Internet while the Ministries of Public and State Security (MSS) began monitoring its use. This included Internet Information Services Regulations that banned the dissemination of information that could potentially subvert the government or endanger national security. A number of initiatives were undertaken in China to close the digital divide. The Digital Alliance was formed as a result of the 2001 High Level Annual Conference of Digital Economy and Digital Ecology in Beijing. It included representatives from the telecommunications industry, media industry, and academia, with the aim of eliminating the gap between the developed countries and China, the eastern and western regions of China, and different industries, social, and demographic groups. This was accomplished by providing technological resources, assistance, consultation, and application development for enterprises and community technology centers throughout China. A number of cities in China also implemented strategies to deliver information and content electronically. For example, Shanghai completed an information exchange network, international trade electronic data interchange (EDI) network, community service network, social insurance network, and electronic commerce network. Similarly, Beijing undertook a comprehensive online project that spanned the areas of e-commerce, e-government, social insurance and community service, and science, technology, and education. Finally, in 1999, the Chinese government released the “Framework of National E-Commerce Development” and the International E-Commerce Center of China under the Ministry of Foreign Trade announced the Western Information Service Project, both with the objective of promoting more widespread
Digital Divide, The
630 deployment and use of electronic commerce throughout the region.
E. Europe In Europe, efforts to close the digital divide have been very closely aligned with efforts to establish an “information society”—a new economy fueled by ICT— among the member countries of the European Union. Toward this end, the European Commission released the results of a multinational survey called “Measuring Information Society 2000,” that examined ownership and use of digital technologies, interest and intent to purchase digital technologies, use of the Internet, and Internet connection speed, among others indicators, for each of the member countries of the European Union. According to the report, looking across the entire European Union, desktop computer ownership stands at 35%, with measurable differences between men (39%) and women (32%), people age 15 to 24 (46%) and 55 and up (16%), people at lower and upper levels of educational attainment (16% for people age 15 at the completion of formal schooling and 53% for people age 20 and up at the completion of formal schooling), as well as people at the lower and upper quartile with respect to income (16 and 61%, respectively). Similarly, Internet access stood at 18%, with measurable differences between men (21%) and women (16%), people age 15 to 24 (23%) and 55 and up (8%), people at lower and upper levels of educational attainment (6% for people age 15 at the completion of formal schooling and 63% for people age 20 and up at the completion of formal schooling), as well as people at the lower and upper quartile with respect to income (8 and 37%, respectively). Not surprisingly, the digital divide manifests itself to a greater or lesser extent among the various countries located in the region. For desktop computer ownership and Internet access, the countries with the highest levels of home penetration were the Netherlands (66 and 46%, respectively), Denmark (59 and 45%, respectively), Sweden (56 and 48%, respectively), Luxembourg (45 and 27%, respectively) and Finland (45 and 28%, respectively), whereas the countries with the lowest levels of penetration were Spain (34 and 10%, respectively), France (29 and 15%, respectively), Ireland (28 and 17%, respectively), Portugal (20 and 8%, respectively) and Greece (15 and 6%, respectively). European efforts to establish an information society began in 1993, when the Brussel’s European Coun-
cil convened a group of experts to draft the “Bangemann Report,” which identified the initial steps toward its implementation. This was followed by the creation of the Information Society Project Office (ISPO, later renamed the Information Society Promotion Office) in 1994, an entity responsible for coordinating the activities of relevant public and private organizations. Subsequently, in 1995, the G7 Ministerial Conference on the Information Society was held in Brussels where eight core principles were identified for fostering wider access and use of ICT throughout Europe. Another major milestone was achieved in 1997 at the Ministerial Conference on “Global Information Networks,” where ministers of 29 European countries met to identify strategies for reducing barriers, increasing cooperation, and devising mutually supportive national and international agendas that would advance the development of telecommunications infrastructure and services. Finally, in December 1999, the eEurope Initiative was announced along with a comprehensive plan, “eEurope: An Information Society for All,” which delineated specific steps toward the realization of an information society among the member countries of the European Union. The key objectives of eEurope were: • Bringing every citizen, home, and school and every business and administration into the digital age and online • Creating a digitally literate Europe, supported by an entrepreneurial culture ready to finance and develop new ideas • Ensuring the whole process is socially inclusive, builds consumer trust, and strengthens social cohesion This was accomplished by focusing on 10 priority areas, which included: (1) European youth in the digital age, (2) cheaper Internet access, (3) accelerating e-commerce, (4) fast Internet for researchers and students, (5) smart cards for secure electronic access, (6) risk capital for high-tech small-to-medium sized enterprises (SMEs), (7) eParticipation for the disabled, (8) healthcare online, (9) intelligent transport, and (10) government online.
F. Latin America According to Gartner Inc.’s Dataquest, access to telecommunications infrastructure such as telephone lines, and narrowband and broadband Internet connections in Latin America still lags behind that of
Digital Divide, The other developing areas. Their 2000 report “What Will It Take to Bridge the Digital Divide in Latin America?” found that while 80% of United States residents had a telephone connection, Latin America ranged from a high of 24.5% in Chile to a low of 7.9% in Peru. With respect to Internet access, while there were more than 6 million broadband connections in the United States, Latin America included only 4 countries with significant penetration including Brazil (53,000 connections), Argentina (38,000 connections), Chile (22,000 connections) and Mexico (20,000 connections). Similarly, according to “Nua Internet Surveys 2001,” while the world’s share of Internet users stood at 40% between the United States and Canada, Latin America represented only 4% of this total. However, despite these statistics it is also clear that Latin America represents a region that is adopting ICT at a fairly aggressive rate. For example, according to the Internet Software Consortium, Latin America experienced the largest increase in Internet hosts in 1999 (136%), compared to North America (74%), Asia (61%), Europe (30%), and Africa (18%). Similarly, Latin American is leading the way with respect to online content. For example, according to Funredes, in 2000, 10.5% of the world population were native English speakers, in addition to 6.3% who were native Spanish speakers, and 3.2% who were native Portuguese speakers. Latin countries made the greatest strides toward increasing the number of Internet sites being offered in their native language with Portuguese and Spanish oriented sites leading the way with 162% and 92% increases between 1998 and 2000, versus a 20% decrease for English oriented sites during the same period. In addition to establishing access and providing relevant content, there are two related issues that are central to closing the digital divide in Latin America. First is the need to reduce the relatively high rate of illiteracy throughout the region, which prevents entire segments of the population from even accessing information online. This is particularly problematic in countries such as Brazil, El Salvador, Guatemala, Honduras, Nicaragua, and Haiti. Despite recent gains to offer more content in the native languages of these countries, this issue is only exacerbated by the fact that English is still the dominant language in cyberspace. Second is the need to further promote competition in the ICT industry despite successful efforts to privatize nearly all of the major telecommunications firms in each country. According to CEPAL’s Division of Production, Productivity and Management, “the primary focus of privatization policies in the
631 telecommunication sector of Latin America might— with the notable exception of Brazil—have not been an increase in competition, but rather to maximize foreign direct investment and establish access to the international financial markets (Argentina) or to defend an important national operator (Mexico).” According to the ITU, in the years following privatization, investment in fixed-line telephone lines actually fell in many countries but by the end of 2000 reached approximately 80 million in Latin America and the Caribbean, up from 60 million in 1998. Similarly, with relatively greater levels of competition in the wireless industry, the trend is now in the direction of a more competitive and dynamic marketplace. For example, in 2000 Pyramid Research estimated the percentage of mobile Internet access at 0%, whereas other forms of access were estimated at 93.4% (dial-up subscription), 2.8% (dial-up free), 0.8% (cable-modem), 0.2% (DSL), 0.9% (ISDN), and 1.9% (leased lines). In 2002, Pyramid Research projects mobile Internet access to increase to 14.7%, whereas the other forms of access are projected to change to 46.5% (dial-up subscription), 34.3% (dial-up free), 2.3% (cable-modem), 2.6% (DSL), 1.4% (ISDN), and 1.3% (leased lines), respectively. In summary, CEPAL characterizes the Latin American telecommunications industry as having low competition in the fixed line segment, medium competition in the mobile segment, and high competition in the Internet segment.
V. THE DIGITAL DIVIDE DEBATE Two issues have emerged in the debate surrounding the digital divide since the phrase was first introduced in the mid-1990s. The first issue is related to the framing of the problem. While several studies have measured the digital divide in terms of access to ICT, a number of leading research and policy organizations have recommended that the digital divide should instead be measured against the outcomes that ICT can be used as a tool to address, such as improvements in the quality-of-life for community members. In 2001, the Morino Institute, a nonprofit organization that explores the opportunities and risks of the Internet and the New Economy to advance social change, released a report, “From Access to Outcomes: Raising the Aspirations for Technology Investments in LowIncome Communities.” They wrote, To date, most initiatives aimed at closing the digital divide have focused on providing low-income communities with greater access to computers, Internet connections, and other technologies. Yet technology
Digital Divide, The
632 is not an end in itself. The real opportunity is to lift our sights beyond the goal of expanding access to technology and focus on applying technology to achieve the outcomes we seek—that is, tangible and meaningful improvements in the standards of living of families that are now struggling to rise from the bottom rungs of our economy.
The argument here is that access to technology alone, without appropriate content and support, as well as a vision of its transformative power, cannot only lead to limited uses, but shortsighted ones as well. This suggests that framing the digital divide in terms of access obscures and oversimplifies its root causes—social and economic inequities—whereas framing the digital divide in terms of outcomes shifts the focus to more appropriate indicators of social and economic equality. The second issue is related to the use of ICT to improve the lives of individuals, families, and communities. Many have argued that the digital divide is simply a modern day reflection of historical social and economic divides that have plagued society for years. Over the past decade, the community technology movement has gathered momentum toward closing the gap with programs targeted at access, training, content, and more. However, over the past century there has been a parallel effort to revitalize distressed communities often referred to as the community building movement—one that has wrestled with complementary issues in its’ efforts to alleviate poverty by instituting programs aimed at education, health care, employment, economic development, and the like. The community technology movement, primarily in the form of community technology centers (CTCs), and the community building movement, primarily in the form of community-based organizations (CBOs), have historically existed in separate, rather than holistic spheres of practice. In 2001, PolicyLink, a national nonprofit, research, communications, capacity building, and advocacy organization, released a report, “Bridging the Organizational Divide: Toward a Comprehensive Approach to the Digital Divide.” In this report they coined this disconnect as the “organizational divide” and wrote, “As we develop policies and programs to bridge the Digital Divide we must ensure that these are linked to broader strategies for social change in two ways. First, we must allow the wisdom and experience of existing community infrastructure to inform our work. Second, we must focus our efforts on emerging technologies as a tool to strengthen and support the community infrastructure.” Demonstration projects such as the Camfield Estates-MIT Creating Community Connections Project in Roxbury, Massachusetts, were conducted around this same time to serve as models for
how community technology and community building could work in concert. The argument here is that leaders in both of these fields must devise strategies to connect these two movements toward unleashing their collective transformative power. This suggests that from a certain perspective the digital divide should actually be envisioned as a digital opportunity.
SEE ALSO THE FOLLOWING ARTICLES Developing Nations • Digital Goods: An Economic Perspective • Economic Impacts of Information Technology • Electronic Commerce • Ethical Issues • Future of Information Systems • Global Information Systems • Globalization • Internet, Overview • National and Regional Economic Impacts of Silicon Valley • People, Information Systems on • Telecommuting
BIBLIOGRAPHY Beamish, A. (1999). Approaches to community computing: Bringing technology to low-income groups, in High technology in low-income communities: Prospects for the positive use of information technology (D. Schön, B. Sanyal, and W. J. Mitchell, Eds.), 349–368. Cambridge, MA: MIT Press. Bishop, A P., Tidline, T. J., Shoemaker, S., and Salela, P. (1999). Public libraries and networked information services in low-income communities. Urbana-Champaign, IL: Graduate School of Library and Information Science, University of Illinois at UrbanaChampaign. Cohill, A. M., and Kavanaugh, A. L. (1997). Community networks: Lessons from Blacksburg, Virginia. Blacksburg, VA: Artech House Telecommunications Library. Contractor, N., and Bishop, A. P. (1999). Reconfiguring community networks: The case of PrairieKNOW. Urbana-Champaign, IL: Department of Speech Communication, University of Illinois at Urbana-Champaign. Hooper, P. (1998). They have their own thoughts: Children’s learning of computational ideas from a cultural constructionist perspective, unpublished Ph.D. dissertation. Cambridge, MA: MIT Media Laboratory. Morino, M. (1994). Assessment and evolution of community networking. Paper presented at Ties That Bind, Apple Computer, Cupertino, CA. National Telecommunication and Information Administration (1995). Falling through the net: A survey of the “have nots” in rural and urban America. Full Report, July. Available at http://www.ntia.doc.gov/ntiahome/digitaldivide/. National Telecommunication and Information Administration. (1998). Falling through the net. II. New data on the digital divide. Full Report, July. Available at http://www.ntia.doc.gov/ ntiahome/digitaldivide/. National Telecommunication and Information Administration. (1999). Falling through the net III: Defining the digital divide. Full Report, July. Available at http://www.ntia.doc.gov/ntiahome/digitaldivide.
Digital Divide, The National Telecommunication and Information Administration. (2000). Falling through the net. IV. Toward digital inclusion. Full Report, October. (Available at http://www.ntia.doc.gov/ntiahome/digitaldivide/. O’Bryant, R. (2001). Establishing neighborhood technology centers in low-income communities: A crossroads for social science and computer information technology. In Townsend, A. Projections: The MIT Student Journal of Planning—Making places through information technology (2) 2, 112–127. Pinkett, R. D. (2000). Bridging the digital divide: sociocultural constructionism and an asset-based approach to community technology and community building. Paper presented at the 81st Annual Meeting of the American Educational Research Association (AERA), New Orleans, LA, April 24–28. Available at http://www.media.mit.edu/~rpinkett/papers/ aera2000.pdf Resnick, M., Rusk, N., and Cooke, S. (1998). The computer clubhouse: Technological fluency in the inner city. in High
633 technology in low-income communities: Prospects for the positive use of information technology (D. Schön, B. Sanyal, and W. J. Mitchell, Eds.) 263–286. Cambridge, MA: MIT Press. Schön, D. A., Sanyal, B., and Mitchell, W. J. (Eds.). (1999). High technology and low-income communities: Prospects for the positive use of advanced information technology. Cambridge, MA: MIT Press. Shaw, A. C. (1995). Social constructionism and the inner city: Designing environments for social development and urban renewal, unpublished Ph.D. dissertation. Cambridge, MA: MIT Media Laboratory. Turner, N. E., and Pinkett, R. D. (2000). An asset-based approach to community technology and community building, in Proceedings of Shaping the Network Society: The Future of the Public Sphere in Cyberspace, Directions and Implications of Advanced Computing Symposium 2000 (DIAC-2000), Seattle, WA, May 20–23. Available at http://www.media.mit.edu/~rpinkett/papers/diac2000.pdf.
Digital Goods: An Economic Perspective Claudia Loebbecke University of Cologne
I. INTRODUCTION: CONCISE SUBJECT DEFINITION II. DIGITAL GOODS: CORE OF THE DIGITAL ECONOMY III. DEFINITION, PROPERTIES, AND DIFFERENTIATION CRITERIA IV. ISSUES OF LEGAL AND TECHNICAL PROTECTION
V. VI. VII. VIII.
GLOSSARY
can be executed based on an electronic infrastructure such as the Internet. This article first positions digital goods at the core of the digital economy. It points to the main economic characteristics of digital goods as well as to criteria for differentiating among different kinds of digital goods. In more detail, the article then covers five specific areas relevant to digital goods: (1) legal and technical protection, (2) pricing, (3) bundling and unbundling, (4) peculiarities of online delivered content, and (5) economics of digital content provision on the Web.
copyright One of several legal constructs introduced to ensure that inventors of intellectual property receive compensation for the use of their creations. Copyrights include the right to make and distribute copies; copyright owners have the right to control public display or performance and to protect their work from alteration. digital goods Goods that can be fully expressed in bits so that the complete commercial business cycle can be executed based on an electronic infrastructure such as the Internet. on-line delivered content (ODC) Data, information, and knowledge tradable on the Internet or through other on-line means. Examples include digital online periodicals, magazines, music, education, searchable databases, advice, and expertise. ODC can be offered without a link to physical media. ODC explicitly excludes executable software. watermarking Hiding of data within digital content. A technique for adding data that can be used to identify the owners of various rights, to record permissions granted, and to note which rights may be attached to a particular copy or transmission of a work.
I. INTRODUCTION: CONCISE SUBJECT DEFINITION Digital goods are goods that can be fully expressed in bits so that the complete commercial business cycle
SELECTED PRICING ISSUES UNBUNDLING AND BUNDLING ON-LINE DELIVERED CONTENT ECONOMICS OF DIGITAL CONTENT PROVISION ON THE WEB
II. DIGITAL GOODS: CORE OF THE DIGITAL ECONOMY A major characteristic of the digital economy is its shift to the intangible. Terms with similar connotation include intangible economy, internet economy, virtual economy, or information society. The creation and manipulation of dematerialized content has become a major source of economic value affecting many sectors and activities. It profoundly transforms economic relationships and interactions, the way firms and markets are organized and how transactions are carried out. The digital economy is not limited to the Internet. Analog technologies such as radio and TV are also to be considered integral parts of the digital economy because these technologies are getting used to an increasing degree, and further media integration is foreseeable in the near future.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
635
Digital Goods: An Economic Perspective
636 To some extent, the digital economy runs squarely against the conventional logic of economics. Digital goods are not limited by physical constraints and are not limited to traditional economic characteristics, such as “durable,” “lumpy,” “unique,” and “scarce.” Instead, digital goods can simultaneously be durable and ephemeral, lumpy and infinitely divisible, unique and ubiquitous, scarce and abundant. The business of purely digital goods is different from conventional electronic business areas, which focus on trading or preparing to trade physical goods or hybrids between physical and digital goods. Trading of digital goods demands new business models and processes. Classical economic theory does not usually address the issue of digital goods as tradable goods. The value of digital goods, especially information, is traditionally seen as derived exclusively from reducing uncertainty. In the digital economy, however, digital goods— information/content—are simultaneously production assets and goods. This article focuses primarily on digital goods in their capacity as goods to be sold. From a supplier’s perspective, the growing importance of digital goods as intangible assets and the resulting complexity can be seen in the differences between book value and stock market values. These differences can partly be explained by the crucial role attributed to brands, content, publishing rights, and intellectual capital, which may emerge via, be embedded in, or be stimulated by digital goods. Increasing discussions about information markets—mainly driven by information sciences—are targeting the peculiar case of markets for specialized digital goods such as knowledge or specialized information. Research on information brokers, structuring, retrieval, pricing of on-line databases, etc. represents an important point of reference. Despite all the work during the past 50 years, the problem of classifying, storing, and retrieving digital goods remains a
Table I
major problem regardless of media type. Multimedia searching by document content is a technology that has reached the initial demonstration phase, but is still in its infancy. There is, essentially, no mature method of storage for retrieval of text, images, and sounds (including speech and music) other than through the use of words, normally in the keyword format. The challenge is to obtain the essential digital content rapidly and attractively. The implied challenge is to modernize the style of presentation to make the key digital goods accessible with as little effort and time as possible. Major efforts are required to establish the right mix of media to convey a particular type of digital good.
III. DEFINITION, PROPERTIES, AND DIFFERENTIATION CRITERIA Simply speaking, digital goods are goods that can be expressed in bits and bytes. Table I shows selected kinds of digital goods with some illustrations. Due the variety of terms used, some comments are necessary. A commonly quoted analysis provided by Shapiro and Varian focuses on information that the authors define as everything that is digitalized, i.e. can be shown as a sequence of bits. As an example, for information goods they mention books, magazines, films, music, stock market prices, sporting events, and web sites. Hence, the term information as applied by Shapiro and Varian covers basically the same concepts as our term digital goods stated above. In addition to such a variety of information goods, we also include software and interactive services such as chat rooms under the term digital goods. In the following, we first take a broader perspective to help us understand and characterize the phenomenon of digital goods, including content and soft-
Kinds of Digital Goods
Kinds of digital goods
Illustrations
Searchable databases
➲ Restaurant guides, phone books
Dynamic information
➲ Financial quotes, news
On-line magazines and newspapers
➲ International, national, regional; general and special interest publications
Reports and documents
➲ Easy multiplication and indexing
Multimedia objects
➲ Music, video files, texts, and photos
Information services
➲ Offerings by travel agencies, ticket agencies, stock brokerages
Software
➲ Off-the-shelf products, customized products
Interactive services
➲ On-line forums, chat rooms, telephone calls, games
Digital Goods: An Economic Perspective ware. We offer a discussion of special properties of digital goods in this broad sense. After that we offer some criteria for further differentiation among the different kinds of digital goods. We then focus on the equally common, even if narrower concepts, of digital content in general and on-line delivered content more specifically. In the legal literature, the concept of digital assets is more common than that of digital goods. This concept then tends to embrace everything that has value and is available in digital form, or could be valuable if it were changed into its digital form. The term digital goods is often used interchangeably with the term intangible goods. However, in the literature the term intangibility refers to two rather different concepts. Levitt suggests that the terms goods and services be replaced by tangibles and intangibles and hence observes that intangible products are highly people intensive in their production and delivery mode. This does not really match with a more recent interpretation of intangibility, which is suitable also for digital goods, aiming at immaterial goods (not services), often expressible in bits and bytes. Digital goods are affected by electronic business from their inception all the way to their use by an end user, facilitating extensive transformations and product innovations. They take advantage of the digitization of the market and of distribution mechanisms. Hence, they are particularly suited for electronic markets. The whole set of business processes can be handled digitally, thus minimizing not only transaction costs, but also order fulfillment cycle time. However, as Shapiro and Varian state “. . . the so-called new economy is still subject to the old laws of economics.”
A. Economic Properties of Digital Goods Digital goods possess some basic properties that differentiate them from physical goods. Digital goods are indestructible or nonsubtractive, meaning that they are not subject to wearing out from usage, which can often occur in the case of physical products. They are easily transmutable; manipulation of digital products is easier than that of physical products. Last, digital goods are easily and cheaply reproducible. Digital goods are characterized by high fixed costs (first copy costs), dominated by sunk costs, and by low variable and marginal costs. This constellation typically leads to vast economies of scale. In practice, digital goods can be copied at almost no cost and can be transmitted with minimum delay to almost everywhere. This copying of digital products at almost no
637 marginal costs, the ease of transformation in the production process, and the interactivity of the products are reforming the production of digital goods compared to the production economics of physical goods. Costs for content creation (programming) for highend multimedia digital titles are often high. As a consequence, companies that create digital goods have an interest in reusing the same content as many times as possible and in as many media as possible without having to pay “first copy” costs again. We speak about a systematic disconnection of production and usage, which naturally has an impact on distribution. It leads to the idea of windowing: Pay for content creation once, then reuse it for free. To secure profitability, the producer of digital goods must be able to recoup at least its costs at the first showing of the product. While a free exchange of information (digital goods) is a crucial prerequisite for innovation, the incentives for innovation and investments are diminished by the difficulties of claiming property rights for digital products. The digitization of content creates a considerable degree of freedom for the provision and the transformation of content. Consumers are partly involved in the production of information, i.e. the choice of content, mode of display, transformation, etc., and therefore evolve into prosumers. For some authors the role of prosumer is restricted to the simultaneity of production and consumption, and the nonstorability of services, extended by processes of simple self-service. The disconnection of production and usage leads to the so-called “value paradox”: Only when products are well known and highly in demand are they attributed a high value and the possibility of generating revenues. That is why comparatively unknown providers of digital goods distribute their products to the widest possible public for free (e.g., artists or freeware coders). At the same time, customers are only willing to pay for “scarce” products. Different from physical products, scarcity in digital goods does not come naturally. Instead it has to be reinforced by limited editions and individualization of the copies (e.g., through watermarking) or other restrictive measures. A frequently applied distinction of products is that made between search goods and experience goods. This distinction is built on customers’ chances to judge the value of a product. The quality of search goods can be determined without actually using them. With experience goods, knowledge about quality is learned from experiencing the product, i.e. from using the good. Search features of a product can be evaluated prior to its usage (e.g., price), but experience features can be evaluated only after usage (e.g., taste). There
638 is also the additional category of “credence goods” where, even after usage, consumers cannot judge the value properly, because they are lacking some necessary skills (e.g., clinical diagnostics). These three terms offer a continuum of judgment, starting from search products—which can be assessed easily—to experience products and finally credence products. In many cases, digital goods tend to be experience goods or even credence goods. To overcome the implied difficulties for advertising and sales (why should one buy information that one has already experienced or tried?), many digital goods are sold based on strong brands or teasers. For instance, without having experienced an article in a newspaper, the brand of the newspaper leads to the expected sales. Further, teasers such as abstracts or chapters serve as triggers for book and magazine sales. Digital goods, especially information and content products, are often classified as public goods. Public goods share two main characteristics: nonrivalry and nonexclusiveness in usage. Nonrivalry is a product feature normally given in the case of digital goods due to the low costs and ease of reproduction. Nonexclusiveness is a feature of the legal system. In legal systems emphasizing private property, technical and legal means are in place to prevent unwanted joint usage. An automobile is protected by a lock (technical solution) or the threat of police punishment (legal solution) to prevent its use by unauthorized persons. In the digital world, copyrights grant creators of digital products certain rights, which—at least supposedly—can be enforced via technical or legal means. Therefore, digital products cannot be generally termed public goods, even when it is technically difficult to prevent unauthorized persons from using digital products. In addition, the basic laws and constitutions of most countries grant citizens access to “relevant news.” So such news in digital form offered, for example, on the Internet could be characterized as public goods. The specific article written about the news, however, could be copyright protected and hence not be a public good.
B. Criteria Used to Differentiate among Digital Goods Digital goods represent a variety of economic goods, which require different business processes and economic models. To distinguish within the group of digital goods, we use the following criteria: transfer mode, timeliness, usage frequency, usage mode, external effects, and customizability. With them we are partially
Digital Goods: An Economic Perspective following the discussion offered by Choi, Stahl, and Whinston. Concerning the transfer mode, we distinguish between delivered and interactive goods. Delivered goods are transferred to the user as a whole or in pieces, i.e. by daily updates, etc. Interactive goods or services require a synchronous interaction with the user. Examples are remote diagnostics, videoconferences, and interactive computer games. Some careful observation is necessary: Many services on the Internet today are called “interactive,” although in reality they are “supply on demand.” For instance, when watching “interactive” television, the user merely downloads pieces over time. Neither is a search engine fully interactive, because searches are only orders for personalized delivery. Most digital goods are based on delivery as the transfer mode. Only a realtime application with the need for consecutive questions and answers implies interactivity. Interactive goods are by definition tailored to the specific user, making problems of resale and copyright irrelevant. The criterion of timeliness covers the constancy and dependence of the value of digital goods over time. Products like news, weather forecasts, or stock prices normally lose value as time goes by. The timeliness of any product correlates with the intended usage. For instance, when planning an excursion, weather data are only valuable ahead of time. On the other hand, for scientists studying the accuracy of weather, forecasts deliver value only after the predicted day. The third criterion is usage frequency. Some goods are intended for single use. They lose their customer value after or through use. For instance, the query on a search engine has no recurring value. Other products are designed for multiple uses; examples include software and games. The perceived total value of digital goods designed for multiple uses may well accumulate with the number of uses. One can observe different patterns of marginal utility functions over time. Computer games tend to become boring after a while, leading to negative marginal utility. Software applications on the other hand often render learning effects, leading to increasing marginal utility. Regarding the usage mode, we can distinguish between fixed and executable goods. Fixed documents allow handling and manipulation in different ways and by different means than executable goods. With executable goods such as software, suppliers define the form by which the good can be used. Furthermore, the transformation of fixed documents into executable software increases the possibilities of control by the supplier. For example, suppliers could distinguish among read-only access, sort-and-print access, and a
Digital Goods: An Economic Perspective deluxe package that allows the user to make changes to the data pool and to define any possible data queries. Thus, differentiated products to be sold at varying prices can be created out of a common data pool. Another differentiation criterion within digital goods is the external effects associated with products. Products with positive external effects raise the value for customers with increasing numbers of users. For instance, the more participants who agree on a common standard, the more potential partners for exchange exist. In the same way, multi-user Dungeons computer games deliver more opportunities through a larger number of participants. But with restricted capacities, too many participants can cause traffic jams or obstructions, turning positive effects into negative ones. Negative external effects imply a higher value for users resulting from a lower number or restricted number of other participants. This is especially applicable for exclusive information providing competitive advantages, such as internal corporate information used as the basis for speculation on the stock exchange. Customizability reflects the extent to which goods can be customized to specific customer needs. An electronic newspaper has a high degree of customizability in that an average customer is able to design a personal version through combinations of articles. But the articles themselves—being equal for all customers—show low customizability. Consequently, the level of analysis has to be specified (the entire, personalized newspaper or the standard article) in order to be able to judge the customizability of digital goods.
IV. ISSUES OF LEGAL AND TECHNICAL PROTECTION Legal questions, usually interpreted in the sense of legal protection of the value of digital products, are of high interest. The development and the application of legal rules have to take into account the properties of digital products and the corresponding technical possibilities and constraints. Content creators and owners need to protect their property. Traditional content media (such as paper documents, analog recordings, celluloid film, canvas paintings, and marble sculpture) yield degraded content when copied or require expensive and specialized equipment to produce high-quality copies. The technical burden on traditional content creators (such as book authors) for protecting their material has been small. For digital goods, there is no obvious limit to the value that can be added by creating and providing ac-
639 cess to digital content. High-quality copies (in fact, identical copies) of digital content are easy to produce. In this context, legal issues are partially dealt with by application of already existing legal institutions (civic law, criminal law, and international law) and partially covered by rather new legal constructs (concerning contracts or media).
A. Copyrights Copyrights are one of several legal constructs relevant to any business producing digital goods. However, they are not new in the world of digital goods. They have been introduced to ensure that inventors of intellectual property receive compensation for the use of their creations. In the international context, copyrights are granted on the basis of the Treaty of Bern, the Treaty of Rome, and the Trade-Related Aspects of Intellectual Property Rights (TRIPS) Agreement. Copyrights have different components. The most notable component is the right to make and distribute copies. In addition, copyright owners have the right to control public display or performance and to protect their work from alteration. Further, content owners hold the rights over derivative works, that is, the creation of modified versions of the original. As a means to protect intellectual capital, copyrights have gained special importance in the context of digital goods. Because digital goods are easy and cheap to duplicate, copyright protection is essential for ensuring the above-mentioned compensation for product inventors and creators. If creators cannot get paid, what would ensure the continued creation of digital goods? Therefore not only the creators but also digital intermediaries and distributors have high economic incentives to see to it that copyrights are respected and remunerated. However, traditional copyright laws have not been designed for handling digitized goods. Nationally and internationally updated rule sets are under development.
B. Watermarking Watermarking represents a technical solution fostering the implementation of the above-mentioned copyrights. Digital watermarks are designed to add value to legitimate users of the protected content and to prevent piracy. In addition, digital watermarks can be utilized for market research. Following Acken, digital watermarking can be defined as the hiding of data within the digital content.
640 It provides a technological way to add data that can be used to identify the owners of various rights, to record permissions granted, and to note which rights may be attached to a particular copy or transmission of a work. Digital watermarks are invisible when the content is viewed. They indicate an original, but do not control somebody’s access to it. This means that— similar to an original signature—watermarks do not prevent photocopying, which might be needed for fair use. Digital watermarks can add value for different legitimate uses while increasing barriers to pirates. The benefits depend on the particular digital good and the associated and differentiated needs, burdens, and benefits for their creators, distributors, and recipients. For many business applications, there is great value in being able to reconstruct relevant events. In accounting, the resulting timeline is called an audit trail. For digital content, digital watermarks can be used to indicate recipients or modifiers without the administrative burden of keeping the associated information and links separate from the digital content itself. Digital watermarking needs to support scalability to be able to match the different value requirements of digital good. Some digital goods, for example, a film classic like High Noon, carry high, long-term value. Other digital goods, such as yesterday’s stock quotes, have only limited value. The longer lasting the value of digital goods is, the more time pirates have to break the protection methods. Therefore, a scalable system is required that renews itself over time.
V. SELECTED PRICING ISSUES Production costs cannot be used as a guideline for pricing because there is no link between input and output. Mass consumption does not require mass production. Economies of scale are determined by consumption, not by production. Economies of scale in digital goods production are limited; economies of scale in digital goods distribution can be significant due to a combination of the high fixed costs of creating the necessary infrastructure and the low variable costs of using it. Economies of scale in distribution are accentuated by consumption characteristics: Consumers tend to use the supplier with the largest variety although they only take advantage of less than 5% of the choices available. Due to the issues that derive from the abovementioned characteristics of digital goods, neither cost-based pricing nor competition-based pricing are
Digital Goods: An Economic Perspective reasonable pricing strategies. Marginal costs are zero or near zero. So by applying cost-based or competition-based pricing mechanisms, sales prices would tend to zero or near zero. But prices near zero make it impossible for producers to get back their high fixed cost. So the only reasonable strategy for pricing information goods is to set the price according to the value the customer places on it. Because consumer valuations are different, it is also important to differentiate prices. Different approaches can be used as the basis for price differentiation. Probably, the two most popular ones are grouping and versioning. Grouping refers to the distinction of prices among different customer groups for the same product. Typical examples from the nondigital world are reduced prices for students or elderly people. The problem for grouping when selling digital goods over the Internet lies in the difficulty of proving people’s identity and “characteristics.” How do we check, for example, whether a student number from an unknown university is correct, how do we find out where the potential customer is actually located. Technical verification procedures are on the market, but rarely applicable at reasonable effort and cost. Versioning refers to price differentiation based on slightly different product characteristics. Different product versions are sold at different prices. Versioning is already familiar to us from nondigital information goods; consider the pricing of hardcover versus paperback books. For digital goods, Shapiro and Varian suggest numerous ways to create different versions (see Table II). A consumer’s willingness to pay is often influenced by the consumption or nonconsumption of others. Accordingly, it is not an adequate approach for assessing the value of digital goods, given the ease of replication/sharing and associated externalities. Furthermore, the pricing of digital goods raises the fundamental issue of inherent volatility of valuation when the value of digital goods is highly time sensitive. For instance, stock market information may be worth millions in the morning and have little value in the afternoon. Finally, offering digital goods over an extended period of time may lead to the establishment of electronic communities. Electronic communities are likely to create value in five different ways: usage fees, content fees, transactions (commissions), advertising, and synergies with other parts of the business. Translating these income opportunities to the more narrowly defined area of digital goods, usage fees could be in the form of fixed subscriptions, paying per page, or paying per time period independent of the quality of the
Digital Goods: An Economic Perspective Table II
641
Approaches to Versioning of Digital Goods
Basis for versions of digital goods
Illustrations
Delay
➲ Books, FedEx
User interface
➲ Search capability
Convenience
➲ More or less restricted time or place of service availability
Image resolution
➲ Higher resolution depending on storage format, etc.
Feature and functions
➲ Quicken vs. Quicken Deluxe, which includes a mortgage calculator.
Flexibility of use
➲ Allowing users to store, duplicate, or print information
Speed of operation
➲ Time to download or to execute programs
Capability
➲ Number of words for dictionary/voice recognition
content. Content fees would most likely be based on fixed amounts per page, but should tackle the issue of valuing the content (quality/relevance). Commissions and advertising income are triggered by attractive digital goods on display. Strictly speaking, however, the subsequent income would not stem from the digital goods, but from attracting customers to a page regardless of its content or from offering some empty space for third-party advertising in addition to the actual digital goods offered. The range of pricing schemes for digital goods is becoming broader and more sophisticated. Pricing models may imply giving actual goods away for free and then charging for complementary services, updates, etc. They are developed for bundles of digital products as well as for single units. Economists are developing theoretical solutions to these problem areas. However, some of the mechanisms developed demand an enormous amount of data, thus questioning the trade-off between allocation efficiency and operational cost-effectiveness.
VI. UNBUNDLING AND BUNDLING We see at least two different trends in the digital age: (1) the trend toward unbundling and disintermediation because of the absence of former economies of scale in printing and distribution of content, and (2) the trend toward bundling as a tool to shift consumer rents to the producers. Traditionally, many digital goods have been bundled solely to save on these costs: • Transaction and distribution costs: the cost of distributing a bundle of goods and administering the related transactions, such as arranging for payment
• Binding costs: the cost of binding the component goods together for distribution as a bundle, such as formatting changes necessary to include news stories from wire services in a newspaper bundle • Menu costs: the cost of administering multiple prices. If a mixed bundling strategy is pursued, where the available components are offered in different combinations, then a set of n goods may require as many as 2n prices (one for each subset of one or more goods). Yet these costs are much lower on the Internet than they used to be for physical goods. Thus software and other types of content may be increasingly disaggregated and metered, as on-demand software applets or as individual news stories and stock quotes. Such a phenomenon is described as unbundling. Unbundling also goes along with the separation of digital goods from the delivery media. Traditionally the pricing of content has been based on the delivery medium—mostly measured in convenience—rather than on actual quality. For instance, the price of a book depends heavily on its printing quality and the number of pages, while the price for an excellent book is almost the same as for a poor one. Electronic trading in digital goods technically allows unbundling. The Internet is precipitating a dramatic reduction in the marginal costs of production and distribution for digital goods, while micropayment technologies are reducing the transaction costs for their commercial exchange. Content can be priced separately from the medium allowing for price differentiation based on the estimated value of the content. Unbundling, however, also raises problems as administration becomes more complex. On the other hand, the low marginal costs as well as the low transaction costs of digital goods also lead to other ways for the packaging of digital goods
642 through strategies such as site licensing, subscriptions, rentals. These aggregation schemes can be thought of as bundling of digital goods along some dimension. For instance, aggregation can take place across products, as when software programs are bundled for sale in a software suite or when access to various content of an on-line service is provided for a fixed fee. Aggregation can also take place across consumers, as with the provision of a site license to multiple users for a fixed fee, or over time, as with subscriptions. Following Bakos and Brynjolfsson, aggregation or bundling is a powerful strategy to improve profits when marginal production costs are low and consumers are homogeneous because of the changing shape of the demand curve. The economic logic of bundling is based on different consumers’ valuations for bundled and unbundled goods. The larger the number of goods bundled, the greater the typical reduction in the variance. Because uncertainty about consumer valuations hinders effective pricing and efficient transactions, this predictive value of bundling can be valuable. For example, consumer valuations for an on-line sports scoreboard, a news service or a daily horoscope will vary. A monopolist selling these goods separately will typically maximize profits by charging a price for each good that excludes some consumers with low valuations for that good and forgoes significant revenues from some consumers with high valuation. Alternatively, the seller could offer all the information goods as a bundle. Under a very general set of conditions, the law of large numbers guarantees that the distribution of valuations for the bundle has proportionately fewer extreme values. Such a reduction in buyer diversity typically helps sellers extract higher profits from all consumers. The law of large numbers makes it much easier to predict consumers’ valuations for a bundle of goods than their valuations for the individual goods when sold separately. Thus, the bundling strategy takes advantage of the law of large numbers to average out unusually high and low valuations, and can therefore result in a demand curve that is more elastic near the mean valuation of the population and more inelastic away from the mean. When different market segments of consumers differ systematically in their valuations for goods, simple bundling will no longer be optimal. However, by offering a menu of different bundles aimed at each market segment, bundling makes traditional price discrimination strategies more powerful by reducing the role of unpredictable idiosyncratic components of valuations.
Digital Goods: An Economic Perspective In summary, bundled goods typically have a probability distribution with a lower variance per good compared to the separated goods. Hence, bundling can help to improve seller’s profits. One can show that bundling could improve seller’s profits when consumer preferences are negatively correlated.
VII. ON-LINE DELIVERED CONTENT (ODC) Loebbecke introduces the concept of on-line delivered content as a special kind of digital goods. ODC deserves further attention because the concept includes mainly those forms of digital goods that have gained attention in the Internet age.
A. Concept, Examples, and Characteristics On-line delivered content is data, information, and knowledge that can be traded on the Internet or through other on-line means. Examples include digital on-line periodicals, magazines, music, education, searchable databases, advice, and expertise. The decisive characteristic of ODC is its ability to be offered independently of physical media by selling it through a communication network. Whether ODC is then transferred to a computer memory (e.g., with a printout or by burning a CD) or not is irrelevant for the classification of the ODC. Streaming content like a digital video transfer and the transfer of data that can be looked at later off-line are both equally valid ODC forms. ODC focuses on the content of digital products. For that reason, software products including computer games are not covered by the ODC concept. Different from common concepts of digital goods, the term ODC, as defined and applied here, is limited to stand-alone products consisting solely of content/information. Hence, the term ODC implies that only the content is the object of a transaction; no physical product is shifted among suppliers, customers, or other players. When trading ODC, the complete commercial cycle—offer, negotiation, order, delivery, payment—is conducted via a network such as the Internet. Figure 1 illustrates this definition of ODC. The ODC concept can be illustrated by three examples: 1. Music. ODC refers to music that can be downloaded from the Web. Afterwards, if desired, it can be stored on a CD-ROM. ODC does not include the ordering of a CD-ROM to be
Digital Goods: An Economic Perspective
643 performance” are the actual values bought, and they will never be delivered via any technical infrastructure (at least not within the limits of current imagination). Therefore, a ticket, even if bought and—with regard to the piece of paper— delivered over the Web does not represent unbundled, stand-alone value of content. It does not belong to ODC as understood in this article. (For simplicity reasons, this illustration leaves out the possibility of reselling a ticket and thus giving it a monetary function.)
Figure 1 Conceptualization of ODC. [Adapted from Choi, S., Stahl, D., and Whinston, A. (1997). The economics of electronic commerce. Indianapolis, IN: Macmillan Technical Publishing.]
delivered to one’s home, since ODC—by definition—refers only to the content and excludes the need for any physical medium. 2. Databases. Databases are offered by on-line bookstores and various kinds of content are offered on web pages maintained by TV stations. The information/content contained in those web sites is a form of ODC, even if it is usually not traded separately. Possibilities for commercializing such content could be pay per view, pay per page, or pay per time concepts. By trying to sell such content (instead of offering it for free and counting on positive impact on other product lines such as books or TV programs) suppliers would rely on the actual value that potential customers associate with it. 3. Tickets. Tickets on planes, trains, or to concerts actually represent a counterexample. Certainly, all paper-based products, like posters, calendars, and all sorts of tickets, could be converted into or replaced by digital counterparts. Further, one can imagine ordering and receiving tickets for trains, planes, or concerts on-line. In the near future, technology will allow individuals to print tickets (administered wherever) just as travel agencies or event agencies do today. However, for consumers this is not the full delivery cycle. They do not pay for the piece of paper called a ticket, they pay for being moved from point A to point B or for attending a concert/stage performance. Those services of “being moved” or “concert
In addition to the issues inherent in trading physical goods on the Web, trading ODC on the Internet raises concerns such as version control, authentication of the product, control over intellectual property rights (IPRs), and the development of profitable intra- and interorganizational business models. Most forms of ODC belong to the group of experience goods (see above), for which the quality of the content is learned only from using/consuming the good. However, treating ODC as an experience good, i.e. letting potential clients “experience” ODC implies giving the actual content away for free (i.e., not trading it) and, in all likelihood, counting on receiving revenue via some synergy mechanisms. Once potential customers have experienced ODC, they have no more reason to buy it. ODC suppliers will try to solve this dilemma by shifting ODC as much as possible into the category of search goods. Possible steps for this are establishing strong brand reputation for Web sites or publishers or offering abstracts, sample chapters, or reviews as triggers to buy the whole product. As a consequence of the characteristics of digital goods such as indestructibility, transmutability, and reproducibility, the exclusivity of ODC may be difficult to durably maintain. Sharing may be simultaneous or sequential; in any case it affects the allocation of property rights. While a seller of a physical good loses his or her property right, a seller of ODC may continue to hold it. Even illegally sharing ODC often causes positive network externalities, which may even exceed the cost of sharing if caught. Once ODC has positive network externalities, control over reproduction and sharing is the primary objective of copyright protection. Related to the issue of externalities is the issue of value generation. Often there is no direct link between a transaction and the generation of value. Furthermore, ODC value can hardly be measured in monetary terms only. For instance, the appreciation of free TV could be measured in time budgets allocated; and appreciation of academic papers (increasingly
Digital Goods: An Economic Perspective
644 often provided as ODC) may be measured in number of quotes. Indirect value creation and the related problem of ODC value measurement lead to the problem of adequately pricing ODC, as discussed later in this chapter. While the conventional logic of economics is concerned with scarcity, the dematerialization logic inherent in ODC is concerned with abundance. Abundance and resulting ODC overload (the huge variety of ODC available to almost everybody) confront consumers with a dilemma. They want to take advantage of the increased choice of ODC, and at the same time, they seek to minimize the costs of searching. To respond to the first objective, new modes of consumption have emerged: zapping, browsing, or surfing. These are characterized by short attention span, latency, high frequency of switching, and capriciousness. The distinction between consumption and nonconsumption becomes difficult, rendering pricing problems even more intractable. The expanded choice of content makes consumer choice more difficult, thus continuously raising the cost of acquiring information about the content. To minimize these costs, the choice is increasingly determined by criteria other than product characteristics, e.g. brand familiarity or fashion. Low transaction costs lead to excessive volume of transactions that generate noise rather than useful content. Abundance of products and services stimulates the development of activities whose purpose is to monitor, evaluate, and explain their characteristics and performance.
B. Trading in ODC While the offering of free ODC has become extremely popular in the Internet area, only a few companies have started trading. To trade in ODC, several roles have to be fulfilled. The value chain depicted in Fig. 2 has been outlined by the European Commission for the electronic publishing business. It differentiates between two layers. The content-related layer addresses content creation, content packaging, and market making. The infrastructure-related layer comprises transportation, delivery support, and end-user interfaces. The framework suggests the following strategic roles to be played (Fig. 3). Online Networks manage a full electronic marketplace, Community Organizers focus on interest-centered target groups, Interactive Studios create content with new levels of functionality, Content Rights Agencies manage rights and match content to market needs, and, finally, Platform Providers create end-to-end, easy-to-use technical platforms for authors, publishers, and end users. Rather
Figure 2 Electronic publishing value chain. [From European Commission (1996). Electronic publishing—Strategic developments for the European publishing industry towards the year 2000. Brussels.]
recent concepts suggest that such activities be organized as value networks instead of value chains. The strategic roles to be fulfilled do not significantly change, regardless of conceptualization in a chain or in a web. Syndication is also of particular interest as a business model in the context of ODC trading. Syndication involves the sale of the same good to many intermediaries, who then integrate the good with others and redistribute the whole. First, syndication can only work with information goods since they can be duplicated and consumed by infinite numbers of people without becoming exhausted. Second, syndication requires stand-alone, modular products that may function well as a part of a whole. Third, syndication requires multiple points of distribution. The millions of existing web sites theoretically offer many different points of distribution. In such an environment, trading in ODC can be used to supply innovative content, especially differently packaged, more targeted information. It combines communication with content, leading to higher quality and thus added value to customers. Further-
Online networks
Interactive studios
Community organizers
Content rights agencies
Platform providers
Figure 3 Strategic roles in electronic publishing. [From European Commission (1996). Electronic publishing—Strategic developments for the European publishing industry towards the year 2000. Brussels.]
Digital Goods: An Economic Perspective
645 In the fourth setting, the focus is not on the actual goods, but on the space for sale around the content on the Web.
= ODC added value
Figure 4 Dimensions of ODC added value.
more, ODC customers are more in control of how much and what kind of content they want to obtain. When substituting print products by ODC, customers will request additional value such as availability (newest information, access to data from any location), format (multimedia such as video clips and sound), transparency and interactivity (user-friendly downloading, search functions, etc.), and innovative content (Fig. 4). In summary, ODC refers to digital goods that are manufactured, delivered, supported, and consumed via the Internet or similar networks. Typical examples of OCD are music, information, and expert knowledge. For these types of goods, as for almost all kinds of digital goods, traditional economic models based on scarcity and uniqueness leading to a market based on demand and supply do not apply. Once created, ODC is extremely easy and cheap to replicate, distribution costs are almost zero, and most other transaction costs except perhaps marketing and sales barely exist.
VIII. ECONOMICS OF DIGITAL CONTENT PROVISION ON THE WEB We distinguish four possibilities for profiting from providing digital content on the Web: (1) increasing the number of units sold, (2) increasing the margin per unit sold, (3) selling digital content as stand-alone product, and (4) generating advertising income from web pages. In the first two cases, the digital good is a free enhancement of the main, nondigitizable product offered (cars, coffee, computers), which cannot be delivered via the Internet. In the third setting, the product offered consists of information and thus can be transmitted digitally via the Internet (magazines, music, etc.). For such a good, the term on-line delivered content was introduced in the previous section.
1. Increased number of units sold. Internet-based marketing and public relations aim at increasing awareness about a company and its product and service range. As with traditional marketing, this is costless for consumers; profit is made when the marketing costs are compensated by additional sales. Currently the largest potential in Internetbased marketing is seen in attracting new customers worldwide and in establishing distant, long-term customer relationships. In most instances it is difficult to discover how many additional units are sold because of a web presence. Further, some of these may be substitutes for traditional sales (internal channel cannibalization). As long as overall worldwide or regional sales do not increase, but almost every bookstore, computer dealer, etc. is present on the Web (with rather different offers), it is not obvious how they all could increase their total turnover. It seems to be more like a football league: Every team gathers strength during the summer but by the end of the following season there are few winners, and there will always be some losers. There is no doubt, however, that Internetbased turnover is predicted to grow during the next few years. But with more efficient business processes and price transparency leading to decreasing margins there is not too much reason to foresee an increase in total (traditional and Internet-based) turnover and especially profits. 2. Increased margin per unit sold. Larger margins per unit could theoretically be achieved by lower costs (efficiency) or by charging higher prices per unit. Lower costs may be achieved by using the Web for various processes such as internal communications, receiving orders and payments, or providing customer service (process/business reengineering). Customers could, for instance, download information from the company’s web site and special requests could be answered via (automatic) e-mail. From a more in-depth perspective, most efficiency gains will result from decreased working capital achieved by introducing electronic commerce, e.g. Internetbased activities. Higher prices charged per unit need to be based on value added for customers. This means that a particular book, computer, or type of coffee that is advertised and sold via the Internet is more expensive than if it were sold via
Digital Goods: An Economic Perspective
646 traditional marketing media and sales channels. This notion is the reverse of the more popular idea of selling cheaper via the Internet due to economies of scale, improved transparency, and fewer players in the value chain. If, however, the Internet sale of a digital good provides no added value, then competition may well squeeze prices down to the level of the marginal cost of the goods. 3. Digital content sold as stand-alone product. This is the ODC situation, which was discussed at length in the previous section. 4. Advertising income generated from web pages. The market for advertising space on the web is booming. Only those companies whose contents attract a certain number of site visitors can sell additional space to others who then place their ads. While this opportunity for profit is gaining importance, it is mainly suitable for those large companies whose sites are well known and visited, e.g. TV stations, newspapers, and magazines, etc. It does not appear to be a feasible source of income for the millions of small and medium sized enterprises (SMEs) that also offer content on the Web. Large company infrastructures to market specific products are no longer required either for content provision on the Web or for the actual sale of digital goods. This causes an enormous growth of digital products and service offerings. However, small content providers still mainly count on positive, but indirect contributions of their Internet activities to their overall cost–benefit structure. For SMEs to continuously provide digital content on the Web, shifts in financial flows along intercorporate value chains are required. Table III outlines two scenarios regarding potential sources of income for digital content providers and the related shifts in intercorporate value chains. To clarify the terminology of Table III, Inter-
Table III
net providers “transport” content from content providers to customers. They are comparable to common carriers expecting payment for this intermediary service. If they manage to enhance their service line beyond transmission, e.g., with value-added services, this should allow them to charge consumers for more than just the transmission fee. Scenario 1: Digital content providers receive payment for their content directly from the consumers who not only have to pay the Internet providers but also the content providers for the information they access. Competition for customers among content providers would begin to develop; hence, the quality of information is likely to improve. The situation for Internet providers would mostly stay the same, unless—due to the higher Internet consumption price for users—the overall Internet traffic would decrease drastically. Scenario 2: Digital content providers receive payment from Internet providers who forward part of their income to the content providers. Internet providers can only win in this scenario if the low price of content and service in comparison to the previous scenario would lead to a drastic increase in overall Internet traffic. The situation for consumers would remain mainly the same. In summary, electronic media enable organizations to deliver products and services more cost-effectively and efficiently. In cases where the Internet is supposed to support the traditional business (e.g., book sales), the increasingly sophisticated services offered go beyond pure marketing efforts. They provide additional value to customers. While these services constitute extra costs, they barely generate additional profits. Potential clients take advantage of these services (e.g., search the bookstore database) without necessarily becoming customers. Involvement in Web-based activi-
Shifts in Financial Flows along Intercorporate Value Chains Content provider
Internet provider
Consumer
Currently
Receives no payment for content provided
Receives payment on time/volume basis
Content mostly free, pays for time and volume
Scenario 1
Receives payment based on content directly from the consumer
Receives payment on time/volume basis
Pays for content, time, and volume
Scenario 2
Receives a predefined share from the Internet provider
Receives payment on time/volume/content basis and shares with content provider
Pays for time/volume
Digital Goods: An Economic Perspective ties and increasingly also content provision on the Web seems to have become compulsory in many industry sectors. If eventually all companies achieve significantly lower cost for customized product and service delivery, the result cannot be a competitive advantage, but lower margins for the average player in the sector. Offering content on the Web has to be attractive for the providers in one of two ways: (1) strengthening a company’s competitive position with respect to its traditional products (e.g., higher turnover as a consequence of Web activities, or (2) expanding toward additional, profitable product lines (e.g., selling information/content-based products and services).
SEE ALSO THE FOLLOWING ARTICLES Advertising and Marketing in Electronic Commerce • Business-to-Business Electronic Commerce • Copyright Laws • Desktop Publishing • Digital Divide, The • Economic Impacts of Information Technology • Electronic Commerce, Infrastructure for • Electronic Data Interchange • Marketing
BIBLIOGRAPHY Acken, J. M. (1998). How watermarking adds value to digital content. Communications of the ACM, Vol. 41, No. 7, 75–77.
647 Bakos, Y. (1998). The emerging role of electronic marketplaces on the Internet. Communications of the ACM, Vol. 41, No. 8, 35–42. Bakos, Y., and Brynjolfsson, E. (1999). Bundling information goods: Pricing, profits and efficiency. Management Science, Vol. 45, No. 12, 1613–1630. Choi, S., Stahl, D., and Whinston, A. (1997). The economics of electronic commerce. Indianapolis, In: Macmillan Technical Publishing. Evans, P. B., and Wurster, T. S. (Sep.–Oct. 1997). Strategy and the new economics of information. Harvard Business Review, 71–82. European Commission (1996). Electronic publishing—Strategic developments for the European publishing industry towards the year 2000. Brussels: ECSC. Loebbecke, C. (1998). Content provision on the Web—An economic challenge for TV stations. Australian Journal of Information Systems, Vol. 6, Special Edition 1998—Electronic Commerce, 97–106. Loebbecke, C. (1999). Electronic trading in on-line delivered content. Proc. Thirty-Second Hawaii International Conference on System Sciences (HICSS-32), A. Dennis and D. R. King (Eds.). Shapiro, C., and Varian, H. R. (1998). Versioning: The smart way to sell information. Harvard Business Review, 76(6), 106–114. Shapiro, C., and Varian, H. R. (1999). Information rules: A strategic guide to the network economy. Boston: Harvard Business School Press. Wang, R. Y. et al. (Summer 1998). Manage your information as a product. Sloan Management Review, 95–105. Werbach, K. (May–June 2000). Syndication: The emerging model for business in the Internet era. Harvard Business Review, 78(3), 84–93.
Disaster Recovery Planning Ata Nahouraii
Trevor H. Jones Donald Robbins
Indiana University of Pennsylvania
Duquesne University
Indiana University of Pennsylvania
I. INTRODUCTION II. BACKGROUND III. STANDARDS
IV. EMERGING TECHNOLOGIES V. BACKUP/COPY RESTORATION TECHNIQUES VI. CONCLUSION
GLOSSARY
gram that shows the structure or sequence of operations in a program or a process. computer audit program A computer program written for a specific audit purpose or procedure. data Raw facts or an entity without any meaning. disk storage A type of magnetic storage that uses one or more rotating flat concentric plates with a magnetic surface on which data can be recorded by magnetization. documentation The collecting, organizing, storing, and dissemination of documents or the information recorded in documents. systems analysis The examination of an activity, procedure, method, or technique in order to determine what must be specified, designed, developed, and how the necessary operations may best be accomplished.
access time Time and transfer time. accounting Organization and the procedures that are concerned with asset safeguarding, and the reliability of financial records. address modification The process of changing the address part of a machine instruction. auditing The examination of information by a third party other than the user in order to establish substantive and compliance tests. The tests may be conducted internally by internal auditors, or externally by an external auditor. audit software A collection of programs and routines associated with a computer that facilitates the evaluation of machine-readable records for audit purposes. audit trail A means for systematically tracing the processing of a machine-generated output with the original source (input) to verify the accuracy of the processes. auxiliary storage A supplemental part of a computer’s permanent storage. backup Pertaining to equipment or application programs that are available for use in the event of failure. This provision now is an important factor in the design of every information processing system, especially in the design of real-time systems where a system failure may bring the total operations of a business to a possible standstill. block diagram A diagram of a system, computer, or program represented by annotated boxes and interconnecting lines to show relationship. It should be noted, a flowchart is a special type of flow dia-
A DISASTER RECOVERY AND CONTINGENCY PLAN
ensures a business’ survival when it is faced with a technological and/or information systems breakdown. Its primary objective is to prevent a calamity from occurring and to limit the impact of destructive events related to computer-based information systems. Most disaster recovery plans when put into effect, fail to serve the intended purpose. The value of a properly implemented disaster recovery approach is that prevention seeks to stop incidents before they can occur; and recovery restores services so that assets would not be adversely affected. This manuscript attempts to provide some insights as to the environment in which
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
649
650 these occurrences can happen, what resources should be considered to achieve the goals of asset safeguarding and maintaining successful information flow within the organization, as well as alternatives for recovery should front end security fail and information loss is imminent. Additionally, innovative technological trends for information security and asset safeguards will be introduced.
I. INTRODUCTION It is important to recall significant events that have taken place which may, and have, affected the flow of organizational information. Increasingly, terrorist incidents and Internet abuses over the past years have become more prominent and affect what approaches are needed for implementing emergency response plans (Table I). Examples of some of the most recent high profile situations are: • The bombing of the USS Cole and the sinking of the Russian submarine Jursk • The bombing of the Oklahoma Federal Building and World Trade Center (1993) • The San Francisco earthquakes and Chicago floods • On September 11, 2001, three hijacked jetliners— American Airlines Flight 11, United Airlines Flight 175, and American Airlines Flight 77—were crashed into the north and south towers of the World Trade Center, and the Pentagon, respectively. The people of the United States stood in shock and horror as this terrorist action took place in less than an hour’s time, killing thousands of people and leaving countless injured. The World Trade Center housed many Wall Street technology-based companies, banking firms, law offices, and international trading firms. When we investigate examples such as these we begin to realize the significant economic impact they have. Heavy winter storms, in 1994, caused a complete collapse of the roof of the MAC card data center in Northern New Jersey. Thousands of MAC transactions and an unknown, substantial amount of revenue were lost. World Trade Center systems could not be accommodated because the banks’ alternate processing facilities were already committed. Recent crises in the state of California concerning power generation have had negative impacts on business. The Wall Street Journal reported in its January 26, 2001, issue that Phelps Dodge, the second largest cop-
Disaster Recovery Planning per mining company, has had to close its operation which affected 2350 employees. Intrusion via software or user access also must be considered as potential threats. Examples of security breaches were brought to light by Hoffer and Straub in 1989 and also by Hein and Erickson in 1991. One such example includes Ronald Cojoe’s attempt to use his high-level system access and knowledge to defraud the city of Detroit for $123,000. A more humorous example is a malicious employee prank at the Calgary Herald. The prank consisted of placing obscene messages in files that seemed to randomly appear on the computer screens of the newspaper’s reporters, editors, and management information systems staff. The Hamburg Chaos Computer Club, which intruded into a variety of United States and European corporate and government computers in 1987 is another example of security breakdown. This incident, along with the observations of Allen in 1977, Steinauer in 1986, Hootman in 1989, Vacca in 1994, and Nahouraii in 1995 showed the weaknesses present in the security design of networks. Later it became a prelude to a WORM attack by Robert T. Morris, a first year computer science graduate student at Cornell who created a code that could disrupt any UNIX operating systems activity, making it inoperative. A WORM is software that propagates itself across a network and causes resources of one network to attack another. On November 2, 1988, this type of code was unleashed over the Internet. The devastation resulted in disruption of the operations of 6000 computers nationwide. Since 1988 at least 11 well-identified intrusions causing deletion of files, distributed denial of service (DDoS), or erasing hard drives have struck governmental and corporations around the world. These viruses, code named Michelanglo (1991), Word Concept (1995), Wazzu (1996), Melissa (1999), Chernobyl (1999), Explore.Zip (1999), BubbleBoy (1999), The Love Bug (2000), Killer Resume (2000), Zombies (1/2001), and most recently Anna (2/2001), generally appear in the form of file attachments. DDoS assault was launched against Network Associates’ DNS server. The attack lasted for nearly 90 minutes and users had difficulty connecting to Network Associates’ web servers. A similar attack was also launched on Microsoft’s DNS records the week before the realization that possible weaknesses existed in the Berkeley Internet Name Domain (BIND), the most widely used implementation of DNS according to P. J Connolly (Pj_conolly@infoworld. com), a senior analyst in the InfoWorld Test Center. However dramatic we might consider these circumstances to be, we should keep these occurrences in perspective. According to David Ballard, Product Marketing Manager for Veritas Software Corp., the number
Interests as identified by US defense report includes: Radiation transfer tech. Equipment testing and diagnostic software Infrared signature measurement software [IR] Penetration method: Targeting international conferences Exploiting joint research/ventures Co-opting of former employees [GM vs. VW].
Terroristic nature to gain political/cultural/global attention
Foreign/domestic collection to gain competitive edge in the global/military arena Typically technical in nature activity
Penetration method: 1993: New York’s World Trade Center 1994: Israeli Embassy in London 1995: Japan’s Subway 1995: Federal Building in Oklahoma City 1996: Several bombings in Tel Aviv and Jerusalem by HAMAS 1997: Killings at Hatsheput Temple in Egypt 2000: USS Cole bombing. 2001: September 11th World Trade Center and Pentagon attacks.
Historical incidents physical
Fraud/Embezzlement
Environmental causes
Active
Physical
Types of Intrusions
Passive
Table I
1992: Chicago Flood
1988: Kevin Mitnick attack on Digital Equipment
2000: Killer Resume Dec. ’00: Kris Virus
2000: Love Bug
1999: Chernobyl
1999: Melissa
1996: Wazzu
1995: World Concept
Horse
1991: SYSMAN.EXE Trojan
1990: Hide and C-eek Trojan Horse
2001: Ahmed Abad Earthquake (India)
1994: Iowa Flood
1992: Andrew Hurricane
1989: San Francisco Earthquake
1987: IBM Christmas Tree Exec
1989: “No Nukes Worm”
1989: Hurricane Hugo
Environmental
1987: Chaos Computer Club
Technical (viruses)
1999: Informix-accounting irregularities—Xerox; Luccent
1995: Clark Candy, CEO, Michael Carlow, defrauding PNC Bank
1995: The New Era Philanthropy fraud by John G. Bennet
1995: The Fall of Bank of England
1994: Calif. Orange County Bankruptcy
1992: Looting of United Fund by William Aramony, its CEO
1991: Charles Keating defrauds U.S. Fed. Home Loan Banks
Fraud/ embezzlement
Disaster Recovery Planning
652 one reason for loss of data is due to accidental deletion, corruption, or other software errors. There are often debates about the design requirements, specifications, and the language for the development of an application program interface as an ancillary part of the disaster recovery tool. McEnrue and Bourne studied various models of human performance and reaction including psychological and cognitive processes. In their study, subjects were introduced to threatening situations posed by software that led to computer site shutdowns in order to impede data loss. In 1962, Hunt modified the previous model by including synchronous systems and analyzed how humans perceive security and take measures for safeguarding sensitive files with or without automated information systems. Similarly, in the 156 cases of computer fraud that Allen studied, he found that 108 of these cases involved addition, deletion, or modification with input transactions. It should be noted that no system is sufficiently secure to stop all breaches of security. This is especially true in cases that involve fire or flood incidents. As reported by Ron Weber as well as Wayne and Turney, fire and water are often the major causes for computer outages. Because of these possibilities, a considerable amount of effort has been devoted by information scientists, EDP auditors, psychologists, and law enforcement analysts, to formulate a better description for the design and application of disaster recovery and computer security. With this in mind, designers and users have proposed various control objectives and techniques needed for emergency response. These include a variety of operational procedures for disaster recovery using recent emerging technologies such as biometrics and authentication, the use of hot sites, cold sites, firewalls, encryption, and steganography, that can be installed on computers of differing sizes such as IBM, Sun Stations, Microsoft Windows, Linux, or Unix based operating systems. In general, disaster expectations can be categorized into four categories: 1. Physical. These include terroristic actions, and input output devices failure, e.g., disk head crashes or systems failure as a result of sudden power surge or power interruptions caused by lightening. 2. Technical. Virus intrusions are categorized here as well as the detection of operating system or application software faults (bugs). 3. Environmental. This may be the most noticeable and newsworthy, but least considered. Examples are flood, fire, and earthquake as well as more
insidious items such as a misplaced water main which may freeze and burst causing subsequent water damage. 4. Fraud/embezzlement. This category includes deliberate and unauthorized data access, use, and manipulation.
II. BACKGROUND In the world of technology, in order for a company to remain viable, top management must establish a policy to safeguard integration of the servers, software, and storage systems. If the top management’s system of controls are not reliable, it is unlikely that the viability of the enterprise can be assured. The intent of the controls is to strategically prevent fraud as well as natural disaster that may occur. These controls are generally assimilated into environmental controls by management. Thus, environmental controls require a well-planned, well-executed procedure that assures successful communication, commerce, competitiveness, and growth both internally and among its external business so that the firm is always universally functional. The enterprise must also safeguard against the possibility of fraud execution or sabotage by implementing internal controls that effectively fulfill the auditing compliance standards. 1. 2. 3. 4. 5.
Proper segregation of duties Control over installations and changes Control over system librarian and utilities Password control Control over programmable read only [Prom] and erasable programmable read memory [EPROM] programs 6. Control over Utility scan These controls as suggested by Weber must be periodically reviewed for modification by management so that it can detect easily the system security aberrations caused by environmental hazards, I/O devices, human error, or by computer abuse. Environmental hazards include fires, floods, tornadoes, earthquakes, facility management systems of nuclear power plants, and other natural causes. I/O errors are input or output devices. They include damage to disk packs by faulty disk drives, reader’s error, off-line key to disk errors or errors in application programs that destroy or damage data, and mounting of incorrect files by the operational staff. Computer abuse is the violation of a computer system to perform malicious damage, crime, or invasion of privacy. Therefore, management
Disaster Recovery Planning should be alert that malicious damage, including deliberate sabotage, or intentionally attempting to defraud profit, or loss reporting as a means for tax evasion, or concealing performance to shareholders of the commercially traded security stocks, will be visible during internal audit. Similarly, crimes that include embezzlement, industrial espionage, the sale of commercial secrets or invasion of privacy by access to confidential salary information, and the review of sensitive data by a competing company should be easily detected by the internal controls of top management. The frequency of occurrence of computer abuse is difficult to determine, but the cost per incident reported is considered to be phenomenal. To put it simply, if environmental controls are absent, so goes the enterprise. That is, in today’s economy, if the enterprise fails to connect with its external suppliers and customers as a result of network crashes, or when the databases fail without any asset safeguarding plan in place, a serious and unpredictable problem ranging from embezzlement to system failures will result. Thus, internal controls, when placed by management, should be prepared before a disaster strikes.
III. STANDARDS The Foreign Corrupt Practices Act holds the corporate executives accountable for failing to plan adequately for a disaster. Fines of up to $10,000 and 5 years imprisonment may be levied for such a failure. A similar legislation called the Computer Security Act was approved by the United States Senate in 1987. This legislation gives the United States National Bureau of Standards a role in setting computer security measures in the civilian sector as shown by Brandt in 1977. This act mandated that by l992 a computer system or network must be certifiable as secure in four areas as was previously mentioned by Mitch in 1988 and Miller in 1991: • • • •
User identification Authentication Access control and file security Transmission security and management security
In response to these federal requirements, local area network (LAN) vendors are developing software that will support the foregoing requirements. In addition, banking laws, the federal system requirements, the Securities and Exchange Commission along with
653 the Internal Revenue Service (IRS) all have regulations dealing with the responsibility for controlling exposures relating to disasters. For example, the Federal Reserve Board requires that when companies use an Electronic Fund Transfer (EFT) system they must have a recovery plan readily available for recovery within 24 hours. This is discussed in the writings of Wong, Monaco, and Sellaro in 1994. The Auditing Standards Board of American Institute of Public Accountant (AICPA) issued the Statement of Auditing Standards (SAS) No. 82, “Consideration of Fraud in Financial Statement Audits.” By increasing the probability of uncovering or detecting fraud, the AICPA intends this standard to increase the integrity and reliability of the financial statements. The management security control generally is interpreted as an environmental control by auditing firms. These controls as suggested by Porter and Perry in 1992 must have in place a set of objectives. These objectives must: 1. Establish security objectives. These objectives contain standards against which actual system security can be judged. 2. Evaluate security risks. Management should evaluate file maintenance and recovery security risks for likelihood and cost of occurrence. For example, flood or earthquake will have a low probability of occurrence and a high cost per occurrence, whereas damage to I/O devices because of human negligence will occur more frequently but at a low cost per incident. Internal controls should estimate probabilities and cost associated with each possible security failure so that the expected values of loss when computed may be used as guides to the effectiveness of the security controls. 3. Develop a security plan. Management should consider a plan that will be cost effective at an acceptable level, should describe all controls, and identify the purpose of their inclusion. 4. Assign responsibilities. Management should assign accountability and assign responsibilities that include implementation of the plan and monitoring of the controls on an ongoing basis. 5. Test system security. The controls should be tested by management to make sure the staff is properly trained and that they are aware of the consequences of possible disaster. This process will determine where weaknesses exist in the security plan and assure that the existing plan provides recovery from security failures as intended.
654 6. Test the physical isolation of the computer facilities. Isolation can be accomplished by having a separate building for the computer center or having a secure location in the enterprise. 7. Test the use of construction standards. Security risks can be reduced by adequate construction standards. The walls and doors of the computer facilities should be in compliance with American Society of Mechanical Engineers (ASME) earthquake structuctural specifications and standards. Windows should be avoided completely. Data files and documentation should be stored in safes and vaults, power and communication lines should be protected with a secondary power company supplier or a stand alone power generator. 8. Test the compliance with fire and water standards. The risk of environmental damage can be reduced by following construction standards for fire which includes the use of fire-resistant walls, floors, ceilings, and doors. Fire extinguishment systems such as halon gas should be used to extinguish fires and minimize damage to data processing equipment and personnel. Water standards include the use of pumps and drains to minimize water damage from sprinklers or floods. Watertight floors in the computer room will help keep out water from floods or from other parts of the building. 9. Evaluate system security. Regularly plan a pilot run to test the recovery plan and the results of testing should be used to evaluate the effectiveness of controls in meeting disaster recovery objectives. 10. Testing the installation of disaster recovery software. This test is usually made to make sure the software transparency exists as claimed by its vendor. The test ensures that the data center operating system platform can be easily revised if needed. In summary, management must include in their internal control policies and procedures needed in planning, building, and maintaining assets safeguarding methods to guard against disaster. This is necessary not only for fraud prevention but also for the impact on the environment. According to Wayne and Turney (1984), management must be certain that they have evaluated the likelihood of failure and have adopted procedures to expeditiously return the organization to normal infrastructure functioning while minimizing data and asset loss, should a catastrophe occur. Management must also ensure its computer security
Disaster Recovery Planning to protect hardware and data against unauthorized access, destructive programs, sabotage, environmental changes, and fraudulent use. The corporations thus need to become increasingly vigilant about these issues due to: • An increase in installed micro-based computers in their workplaces, currently being augmented and superseded by personal digital assistants (PDA) or other hand-held devices • The use of automated systems to attain competitive business advantages • The sophistication of managers and employees in computing technologies • The increased volumes of sensitive data being kept on various secondary storage devices which are accessible through both internal and external sources • Greater interconnectivity of computers through networks • Increased investments in hardware, software, and networking technologies • The high cost of recovery as a result of data loss • The sophistication of potential intruders and digital viruses • The arsenal of mobile computing tools • The increasing cost effectiveness of e-commerce and the consequential reliance on digital technologies to support these models • Crowded skies and increased digital information movement created by the economy for consumers Furthermore, management should continuously address questions such as: What platforms should we use? How do we plan for the unexpected? Will it grow when we grow? Will it work with new technology in the future? Will it build upon our current systems? Can we link to our customers’ and suppliers’ systems? Can open standards be used? What about outsourcing? How do we finance all of this?
IV. EMERGING TECHNOLOGIES The use of access control can be traced back to around l000 B.C. when the Chinese developed a control system to guard their imperial palaces. Each member of the palace staff wore a ring engraved with intricate designs that identified which areas of the palace they were allowed to enter. Today, access controls range from simple locks and keys to sophisticated systems using biometrics’ physiological or physical techniques. The private sector as well as governmental agencies
Disaster Recovery Planning
655
have assimilated the use of these techniques for their security and recovery plans to safeguard their information systems and data centers. Biometrics identification systems are machines that verify the identity of a person based on the examination and assessment of unique personal physiological features (Fig. 1). Characteristics such as signatures, retinal blood vessel patterns, fingerprints, hand geometry, face prints, and speech are ideal as a basis for an identification system because, unlike keys, they cannot be lost or stolen. Biometry refers to the application of statistical methods to describe and analyze data concerning the variation of biological characteristics obtained by either observation or experiment. Biometrics identification systems require that the characteristics of people who will be using the system be gathered in advance. The importance of this was discussed in the writings of Zalud (1989) and Rosen (1990). When making identification, biometrics systems read in user’s characteristics and convert them into a form in which they can be compared to a set of reference samples. Computer algorithms are used to make the comparisons and verify identity. The success of a biometrics method is based upon the characteristic selected for comparison and the nature of the allowances made for variations.
For signature verification, a person will create a reference sample by signing his or her name about six times with a special pen whose movement is recorded by a computer. This information is stored as a reference sample, which is often referred to as a template. The computer then evaluates the variations in the signature with the stored templates. If the matching pressure and movements of the person signing are correctly matched, access is granted. Most of the characteristics that the systems work with can vary over time. Hands can swell from work, heat or allergies, fingerprints can be marred by scratches or embedded dirt, and voices can vary from colds. For these reasons, identified by Rosen (1990), Miller (1991), and Reynolds (1998), most of the machines allow for some degree of variability in the measured characteristics and update the file containing the reference sample after each use. Of equal importance is the retinal blood vessel analysis introduced for commercial use as a security system by Eyedentify Inc. Eyedentify’s system bases verifications on the unique blood vessel patterns in the retina of the user’s eye. Eyedentify’s retinal pattern verifier is equipped with a lighted concentric circle eye target and a headset. The back of the eye is similar to a map, and the machine reads only a small area of this map. Since no one knows which portion
Figure 1 Face recognition system.
656 of the map is being read, there is no way to duplicate or imitate a retinal pattern. As in signature matching, which is prone to variation over time, a person’s retinal patterns change if the subject experiences a heart attack or becomes diabetic, as noted by Sherman (1992). Fingerprinting is one of the easiest ways to identify a user and serves as the most obvious way to distinguish users. In 1990 Rosen stated fingerprinting can range from the entire fingerprint to the exact portions of the finger. Machines used for fingerprint analysis have a cylindrical light that rotates around the finger and reads the bumps and ridges from a section of the finger. If the fingerprint matches the image stored, access is granted. For hand geometry, methods that measure hand profile and thickness by light have been found very dependable. Miller (1991), Rosen (1990), and Sherman (1992) stated that a reader uses light to construct a three-dimensional image of a person’s hand, examining such characteristics as finger length, width, and hand thickness. Also, in 1990, Rosen noted that the reference and template of the person’s hand is updated following each verification. Originally developed at MIT for law enforcement agencies, face printing combines the use of “fuzzy logic” and artificial intelligence algorithms. One system called FaceTrac, captures images by calculating distances between the eyes, thickness of the lips, angle of cheekbones, and other features such as slope of the nose . . . etc., and generates a profile for each facial characteristic. Such a system was installed during Super Bowl XXXV at the Raymond James Stadium in Tampa by Graphco Technologies, located in Pennsylvania, which markets FaceTrac. The Graphco program has the ability to measure up to 128 distinct facial features. The uniqueness of face printing is that no matter what kind of camouflaging techniques one may use to evade recognition, it can easily be unmasked. Faceprinting systems, previously thought to be too complex and expensive, can now be successfully developed. In 1992 Sherman explained that the machine uses low-cost, microprocessor driven cameras and fuzzy logic with neural networking firmware for the recording of facial images. Facial printing is an excellent method of identification because facial imprints are less susceptible to change from mood and nervousness. Its major advantage is that it doesn’t require any physical contact with equipment. Speech recognition is becoming more popular within the field of biometrics (Figs. 2 and 3). As with the fingerprint technique which analyzes patterns, the voice technique not only measures the voice but
Disaster Recovery Planning
Figure 2 Voice recognition system illustration.
also other characteristics. Jaw opening and tongue shape and position are identified. To use this, the user enters a preassigned personal code on the voicekey keypad and then utters the password into the unit’s microphone. A decision to grant or deny access is based on the matching of voice templates as described by Rosen (1990) and Tiogo (1990). Steganography, is a new entry in message delivery. The word means “covered writing” in Greek. In contrast to data encryption or cryptography, where data need to be decrypted and can easily be recognized as an encrypted message, steganography aims to hide its messages inside other harmless messages. The obvious message is so benign that it does not infer that it carries a second message. David Kahn’s “The Code Breakers” provides an excellent accounting of this topic. Bruce Norman in his Secret Warfare: The Battle of Codes and Ciphers recounts numerous examples in the use of steganography during times of war. Null ciphers, commonly called unencrypted message, are also employed by this technique. The method camouflages the real message into a phonetic message.
Disaster Recovery Planning
Figure 3 Voice synthesis system illustration.
The advantage gained by the use of steganography is that when it is detected, a new steganographic application can be devised. The new application can be in the form of drawings, varying lines, colors, or other elements in pictures in order to conceal or reveal the message. Some available programs are JDeg.Shell, BPCS Steganography, PGM Stealth, and Piilo, a Unixbased program which uses images to hide messages. As we have discussed, biometrics may be used for authentication and securing information resources. However, the practical use of biometrics systems is limited to verification. Verification means that the machine accepts or rejects the claim of an unknown person. The downside of these techniques is that they do not identify users who are not part of its set of reference samples. The physical techniques are the other methods for computer security. These techniques are specifically designed for the network’s safety by authorizing access to the network users through the use of electronic “tokens.” The Digital Pathways Inc., of Mountain View, California produces a hand-held authorization device known as Secure Net Key. In 1987 the Delaware Valley Disaster Recovery Information Exchange Group reported that the device uses data encryption techniques to safeguard data transmissions from unauthorized intrusions, thereby reducing potential abuse. This was again stated by Taschek (1993) and Baig (1994). The National Semiconductor Corp. has also recently introduced a “Persona Card” which meets Computer Mem-
657 ory Card International Association (PCMCIA) standards and fits easily in the personal computer slots. It is designed to transmit encrypted data to the source for evaluation or processing. Security Dynamics of Cambridge, MA. has also created an access card known as “SECURE ID” which generates a random code every 60 seconds. Vacca (1994) reported that users must present the latest code to gain access. More recently, many organizations have adopted a Single-Sign-On [SSO] approach for their computer security. This method is a software-driven technique which permits users to gain access into multiple interrelated applications using a limited number of passwords for user’s identification and access control. Access control requirements differ for systems based on centralized concepts from those on decentralized format. Decentralized security devices from high-tech, high-cost random password generators, network routers, and callback modems require “firewall” configurations. Allen (1977), and then Tiogo (1988) wrote that Information Systems of Glenwood, Maryland, Harris Computer Systems in Ft. Lauderdale, Florida, and Digital Equipment Corp. are among the companies that build firewall security gateways between private networks and Internet. The idea is that all traffic must pass through the firewall. The Sidewinder Fire Wall Software from Secure Computing Corp. of Roseville, Minnesota, lets businesses strike back by feeding an intruder false data and tracing him or her back to their computer. Another company for physical security is Data Security Inc. of Redwood City, California. It produces user software that uses cryptography where it employs mathematical algorithms to scramble messages and create “digital signatures the equivalent of fingerprints” as explained by Snyder and Caswell (1989). Data Security Inc. worked on a joint venture with Enterprise Integration Technologies Corp. of Palo Alto, California to produce a system called TERISA-Systems which uses this cryptography technology to secure transactions on the World Wide Web (WWW) servers which was detailed by Nolle (1989), Rosen (1990), Snyder and Caswell (1989), Steinauer (1986), and Tiogo (1990).
V. BACKUP/COPY RESTORATION TECHNIQUES The necessity for backup operations may be one of the most overlooked and underplanned in the environment of technology. Because of the regulatory requirements, auditors now routinely verify an organization’s ability to recover from a disaster. However, the necessity for backup operations is still one of the
658 most overlooked and underplanned in the environment of technology. According to Info Security News magazine (http://www. infosecnews. com): • Companies begin meltdown in less than 5 days after losing critical data. • Fifty percent of companies that did not recover within 10 days, never fully recovered. • Ninety-three percent of those companies went out of business within 5 years. The various strategies and technologies employed in backup operations vary according to the type of operation, the function it is to serve, and the outcome expected. For centralized operations, the prevalent backup technology remains the movement of data between disk and local tape. This has been increasingly employed in enterprise-level applications where the volumes of data stored for real-time access continues to escalate, driven by technologies such as data warehouses. However, the use of tape backup procedures has begun to come under efficiency pressure as the operational windows for backups shrinks and the volumes of data scale into the multi-terabyte environment. At the time of writing, various partnerships of technology giants have demonstrated the ability to backup data in excess of one terabyte per hour. However, as technologies and the use of technologies change, different problems arise. Backups to tape are sufficient when dealing with single or low number source data, such as enterprise databases. However, as employees move to remote computing, the number of storage locations increases and can conceivably be outside and disconnected from the enterprise. Strategic Research Corp., in a survey of over 200 sites reports that only 18% backed up workstations in addition to the servers. Alternatives to tape backups include parallel backups with tape arrays as well as real-time data replication through Redundant Array of Independent Disk (RAID) configurations. Since disk reads are superior to tape, backups can be performed very efficiently, although at a higher cost. For distributed environments, which are now becoming much more prevalent due to open system configurations, techniques for the control of backup operations are moving away from server centric to the latest technology known as SAN (storage area networks). According to Dataquest, Inc., “software functionality does not have to reside on the server. The server does not always act as the intermediary for the backup. It can go directly from the user to the backup device.” Additionally, more people are putting the software protocol, or agent, on the network. Soon the SAN itself will be able to create the backup.
Disaster Recovery Planning Imposed on the exact protocol for creating backups are various site strategies and options for the physical storage of the backed up data. Off-site facilities are generally classified into three categories: 1. Hot sites—Separate locations where classified records, based on their priorities can be instantly replicated. These sites have computer workstations, file servers, their own functional security, backup generators, and dedicated or dial-up leased lines. This allows a failed operation to immediately access copies of damaged data and continue operations. 2. Warm sites—This refers to partially equipped backup sites. They consist of peripheral equipment in an off-line mode with minimal CPU capacity to enable the continuation of mission critical tasks only. 3. Cold sites—Sometimes referred to as “relocatable shells.” These are sites without any resources, except suitable power supplies, phone lines, and ventilation. The shells are computer ready and transportable to the disaster site for the salvage of equipment and data where possible. The platforms can be salvageable technology or obtained on a short term leased basis. Finally, one phenomenon which is finding its way into many areas of technology adoption and management, and which is observable within the area of backup and recovery, is that of outsourcing. Due to the scope and specialized and technical requirements of this work, backup procedures, processes and technologies can now be outsourced to third parties allowing removal of the management requirements and expertise from mainstream operations. Internet capabilities have particularly lent themselves to this format. For example, Datasave Services, PLC, provides a service of data storage and archiving in an off-site format. The method is to transfer all changes of a customer’s data to an off-site location where it is stored on digital linear tape (DLT). This is archived via modem and the transfer is made overnight following encryption. This provides for extremely high levels of transport security while minimizing disruption to daily operations. Examples of corporations that provide systems for supporting backup and restoration operations are 1. Data Recovery System (DRS): The Data Recovery System, developed by Integrity Solutions Inc., is located in Denver, Colorado. The DRS is enhanced with a backward file recovery capability which adds to all DRS/Update and
Disaster Recovery Planning
2.
3.
4.
5.
DRS/Recover. The backward file recovery uses “before images” to speed up the recovery process and reduce redundant backups. As indicated in Software Magazine, this feature is particularly useful when an abend(abnormal end) occurs near the end of a large job. The updated version of DRS also includes enhanced reporting capabilities, journal merge options, a compare facility, and control interval journaling. DRS supports CICS/VS, Vsam files and DL/1, and DB2 under DOS, AS400, or MVS/XA. The price ranges from $2000 to $17,000, depending on the size of mainframe selected. The Data Center Planner: The Data Center Planner, developed by Vycor Corp., is located in Landover, Maryland. This is a series of PC software packages for managing DP assets. The Disaster Recovery Planner creates and maintains an on-line disaster recovery plan; Supply Planner controls data center supplies; PC Manager controls the inventory of PCs, including components and software; and Communication Pioneer tracks voices and data communication lines and wirepath.The price of this software ranges from $995 to $5000. VMCENTER II: The VMCENTER II was developed by VM software, Inc., located in Reston, Virginia. This software provides system security and management of Direct Access Storage Devices (DASD), operations, performance, capacity, and recovery. Easy-to-use screens simplify auxiliary devices such as tape or disk mounting and scheduling tasks. The installation and maintenance procedure is centralized so that all components and features can be administered by a single user. The price ranges from $25,000 to $88,000, depending on the machine use. SPANEX: SPANEX is an automated job scheduling and job restart system for MVS operation developed by Westinghouse Management Systems Software at Pittsburgh, Pennsylvania, with which one of the authors was involved. It provides allowance for daily schedule variations and interaction with external or noncomputer tasks or events. Controls may be centralized or decentralized. The scheduler’s restart facility is built-in, and automatically detects the point of failure and initiates appropriate recovery plan. The system is priced at $20,000 and it runs under IBM MVS. The software group was sold to a British company in 1994. STRATUS (a subsidiary of SGI): The STRATUS system handles failure without complicated
659 housekeeping by hardwiring computer components together. This system works on one job at a time, but gives the job to two pairs of independent Motorola microprocessors so that it can compare results. Suppose the job at hand is adding $2 to $3. Each microprocessor crunches the numbers, and then the paired microprocessor compares results. If the results from one pair of microprocessors are $5 and $6, then those processors are shut down momentarily, while the computer simply uses the result from the other pair. 6. Remote Journaling: Remote journaling offers a higher level of data protection. Software monitors update to Virtual Sequential Access Method (VSAM) files in the IBM world. A remote on-line log is kept of all updates of records. If a failure occurs, data can be updated back to the point of failure. It may take a few hours to reconstruct files, depending on their size and the number of updates made. Both Sunguard and Comdisco Inc. offer remote journaling. Earlier this year IBM announced a plan to offer remote journaling software for its IMS and DB2 database. Some users are using distributed database management system (DBMS) technologies to provide on-line remote data protection. 7. Electronic Vaulting: Electronic vaulting is batch transmission of data networks to a remote location, while remote journaling, a more advanced technology, remotely records data updates as they occur. Database shadowing is the most advanced of these new services. With this technology, copies of entire databases are maintained at remote places. Electronic vaulting, which has existed in product form for about two years, allows for the batch transmission of critical data sets via T1 or T3 to a remote site, usually adjacent to standby data centers. One problem, however, is that products are not necessarily designed for multivendor environments. Sunguard Recovery Services of Wayne, Pennsylvania, which was the first to offer electronic vaulting services, originally marketed this service for IBM mainframes customers, but now covers all platforms of diverse systems. 8. The CTAM Method: Sheri Anderson, Senior Vice President for Production Systems and Services at Charles Schwab and Co. Inc. in San Francisco has come to depend on what is known in the disaster recovery business as the CTAM method for protecting and accessing critical data in an emergency. Chevy Track Access Method (CTAM)
Disaster Recovery Planning
660 works like this: users attempting to protect their data, copy them onto tape periodically and transport the tape by truck or jet to a second location for safekeeping. If a disaster occurs at a data center, the tapes can be rushed to a backup data center, loaded, and run. Unfortunately such an approach can result in loss of data if a disaster occurs any time after the last backup, and physically transporting tapes and loading data into backup CPUs can take several hours or even days. For this reason, several firms are evaluating technologies for transporting copies of important data to backup sites electronically. Vendors in the booming disaster recovery business are attempting to accommodate them by offering products and services for electronic vaulting and remote journaling of data. A few have even announced full database shadowing, which has the potential to recover all data back to the point of failure in a matter of minutes. 9. IBM’s Business Recovery Service: IBM has set its sights on the disaster recovery market with the recent introduction of a service designed to back up data for its large system customers in the event of a disaster. IBM’s Business Recovery Service (BRS) is designed to restore data on mid-range and mainframe systems, as well as PCs that are integrated into those environments through LANs. Under BRS, firms will be able to access backup data that is archived in one of 12 “hot sites” IBM is setting up across the nation. For PC data recovery, clients will be able to set up facilities closer to their home offices. In putting together a recovery plan, IBM would put together a solution that would define an alternate site in proximity to the customer’s office. In an event of a flood or fire we would be able to go in and duplicate the work environment. Working with IBM, clients will develop a disaster recovery plan that stipulates square footage required in an alternative facility and also specifies cabling, LAN design, and necessary equipment. Should a disaster occur, clients will have access to equivalent systems for data recovery, and IBM will provide technical support. The fee for the disaster recovery service, which is available on a limited basis, range from $500 to $40,000 per month, depending on systems configuration. For example, pricing for a System/AS400 mid-range “with a fairly rich configuration” will cost approximately $40,000 per month. Clients can subscribe to BRS for 1-, 3- or 5-year periods.
10. Ashton-Tate Corp.’s File Recovery, a subsidiary of Borland: Ashton-Tate’s File Recovery is a recovery program designed to diagnose and repair damaged data files. Originally designed for its Dbase File Recovery and now enhanced for PCbased relational data bases, it is almost completely automated. It successfully restores damaged files, but suffers from poor documentation and provides little guidance on the type of damage or the steps required to repair it.
VI. CONCLUSION Protecting a business information system and computer resources requires complete planning. The Computer Virus Industry Association recommends specific solutions to reduce problems regarding computer security from errors of omission or deliberate sabotage. These solutions have been outlined and discussed by multiple authors as well as the Delaware Valley Disaster Recovery Information Exchange Group. These solutions include: 1. 2. 3. 4. 5.
Access control Good documentation practices Effective employee training Proper dissemination of information A system of checks and balances
Alternatively, Miller (1991) wrote that a business’ security system should have physical safeguards (locks), administrative safeguards (policy), secondary storage safeguards (write-protect), software safeguards (access security), and communications safeguards (data encryption). This checklist is a basic “blueprint” to set up a successful computer security/disaster recovery planning system. Computer security involves the protection of computer hardware, software, and databases, from unauthorized use and possible deliberate destruction. Management has always been concerned with the protection of business and its client data. Management actions that can maximize security include: • Segregation of duties in the information system environment • Built-in internal and external system controls • Audit trail for file and program access controls • Use of security specialists • Thorough personnel investigation before hiring • Bonding of staff
Disaster Recovery Planning • Prompt removal of discharged personnel • Good documentation and crosstraining of personnel As technology and the need for data storage and manipulation become more entwined and insidious in daily lives and operations, the need to protect these collections and allow for the recovery of damaged operations becomes more important. Technologies dealing with user and consumer access to these data continue to become more complex. Without adequate safeguards for security and backup procedures for recovery, institutions and societies reliant on these types of stored information become more susceptible to unauthorized intrusion and disruption.
SEE ALSO THE FOLLOWING ARTICLES Computer Viruses • Crime, Use of Computers in • Documentation for Software and IS Development • Firewalls • Security Issues and Measures • Systems Analysis
661
BIBLIOGRAPHY Burch, J. G., Jr., and Sardinas, J. L., Jr. (1978). Computer control and audit—A total systems approach. New York: John Wiley & Sons. Clowes, K. W. (1998). EDP auditing. Toronto: Holt, Rinehart and Winston. Garrett, P. (2001). Making, breaking codes: An introduction to cryptology. Upper Saddle River, NJ: Prentice Hall. Lynch, R. M., and Williamson, R. W. (1976). Accounting for management—Planning and control; 2nd ed. New York: McGraw-Hill. Murdick, R. G., Ross, J. E. , and Claggett, J. R. (1984). Information systems for modern management, 3rd ed. Englewood Cliffs, NJ: Prentice Hall. Pfleeger, C. P. (2000). Security in computing, 2nd ed. Upper Saddle River, NJ: Prentice Hall PTR. Porter, T. W., and Perry, W. E. (1991). EDP controls and auditing, 5th ed. Boston: Kent Publishing. Rahman, M., and Halladay, M. (1988). Accounting information systems—Principles, applications, and future directions. Englewood Cliffs, NJ: Prentice Hall. Stallings, W. (2000). Network security essentials: Applications and standards. Upper Saddle River, NJ: Prentice Hall. Weber, R. (1999). Information systems control and audit. Upper Saddle River, NJ: Prentice Hall.
Discrete Event Simulation Jerry Banks AutoSimulations, Inc.
I. DEFINITION OF SIMULATION II. SIMULATION EXAMPLE III. MODELING CONCEPTS
IV. ADVANTAGES AND DISADVANTAGES OF SIMULATION V. STEPS IN A SIMULATION STUDY
GLOSSARY
Both existing and conceptual systems can be modeled with simulation.
discrete-event simulation model One in which the state variables change only at those discrete points in time at which events occur. event An occurrence that changes the state of the system. model A representation of a real system. simulation The imitation of the operation of a realworld process or system over time. Simulation involves the generation of an artificial history of the system, and the observation of that artificial history to draw inferences concerning the operating characteristics of the real system that is represented. system state variables Collection of all information needed to define what is happening within the system to a sufficient level (i.e., to attain the desired output) at a given point in time.
I. DEFINITION OF SIMULATION Simulation is the imitation of the operation of a realworld process or system over time. Simulation involves the generation of an artificial history of the system, and the observation of that artificial history to draw inferences concerning the operating characteristics of the real system that is represented. Simulation is an indispensable problem-solving methodology for the solution of many real-world problems. Simulation is used to describe and analyze the behavior of a system, ask “what if ” questions about the real system, and aid in the design of real systems.
II. SIMULATION EXAMPLE Consider the operation of a one-teller bank where customers arrive for service between one and ten minutes apart in time, integer values only, each value equally likely. The customers are served in a time between 1 and 6 minutes, also integer valued, and equally likely. Restricting the times to integer values is an abstraction of reality, since time is continuous, but this aids in presenting the example. The objective is to simulate the bank operation, by hand, until twenty customers are served, and to compute measures of performance such as the percentage of idle time of the teller, the average waiting time per customer, etc. Admittedly, twenty customers is far too few to draw conclusions about the operation of the system for the long run. To simulate the process, random interarrival and service times need to be generated. Assume that the interarrival times are generated using a spinner that has possibilities for the values 1 through 10. Further assume that the service times are generated using a die that has possibilities for the values 1 through 6. Table I is called an ad hoc simulation table. The setup of the simulation table is for the purpose of this problem, but does not pertain to all problems. Column 1, Customer, lists the 20 customers that arrive to the system. It is assumed that Customer 1 arrives at time 0, thus a dash is indicated in Row 1 of
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
663
Discrete Event Simulation
664 Table I
Ad Hoc Simulation
(1)
(3)
(4)
(5)
Customer
(2) Time between arrivals
(7) Time in system time
(8)
(9)
Service begins
(6) Time service ends
Arrival time
Service time
Idle in Queue
Time
1
—
0
2
5
5
2
0
2
2
0
0
2
5
7
2
3
0
3
1
6
6
7
13
7
0
1
18 19
4
84
5
84
89
5
0
0
7
91
3
91
94
3
2
0
20
7
98
1
98
99
— — —
Totals
Column 2, Time between Arrivals. Rows 2 through 20 of Column 2 were generated using the spinner. Column 3, Arrival Time, shows the simulated arrival times. Since Customer 1 is assumed to arrive at time 0, and there is a 5-minute interarrival time, Customer 2 arrives at time 5. There is a 1-minute interarrival time for Customer 3, thus, the arrival occurs at time 6. This process of adding the interarrival time to the previous arrival time is called bootstrapping. By continuing this process, the arrival times of all 20 customers are determined. Column 4, Service Time, contains the simulated service times for all 20 customers. These were generated by rolling the die. Now, the simulation of the service process begins. At time 0, Customer 1 arrived, and immediately began service. The service time was 2 minutes, so the service period ended at time 2. The total time in the system for Customer 1 was 2 minutes. The bank teller was not idle since the simulation began with the arrival of a customer. The customer did not have to wait for the teller. At time 5, Customer 2 arrived, and immediately began service as shown in Column 5. The service time was 2 minutes so the service period ended at time 7 as shown in Column 6. The bank teller was idle from time 2 until time 5, so 3 minutes of idle time occurred. Customer 2 spent no time in the queue. Customer 3 arrived at time 6, but service could not begin until time 7 as Customer 2 was being served until time 7. The service time was 6 minutes, so service was completed at time 13. Customer 3 was in the system from time 6 until time 13, or for 7 minutes as in-
1
4
0
72
34
7
dicated in Column 7, Time in System. Although there was no idle time, Customer 3 had to wait in the queue for 1 minute for service to begin. This process continues for all 20 customers, and the totals shown in Columns 7 (Time in System), 8 (Idle Time), and 9 (Time in Queue) are entered. Some performance measures can now be calculated as follows: Average time in system 72/20 3.6 minutes % idle time [34/99](100) 34% Average waiting time per customer 10/20 0.5 minutes Fraction having to wait 5/20 0.25 Average waiting time of those that waited 7/3 2.33 minutes This very limited simulation indicates that the system is functioning well. Only 25% of the customers had to wait. About one-third of the time the teller is idle. Whether a slower teller should replace the current teller depends on the cost of having to wait versus any savings from having a slower server. This small simulation can be accomplished by hand, but there is a limit to the complexity of problems that can be solved in this manner. Also, the number of customers that must be simulated could be much larger than 20 and the number of times that the simulation must be run for statistical purposes could be large. Hence, using the computer to solve real simulation problems is almost always appropriate.
Discrete Event Simulation
III. MODELING CONCEPTS There are several concepts underlying simulation. These include system and model, events, system state variables, entities and attributes, list processing, activities and delays, and finally the definition of discreteevent simulation.
A. System, Model, and Events A model is a representation of an actual system. Immediately, there is a concern about the limits or boundaries of the model that supposedly represents the system. The model should be complex enough to answer the questions raised, but not too complex. Consider an event as an occurrence that changes the state of the system. In the example, events include the arrival of a customer for service at the bank, the beginning of service for a customer, and the completion of a service. There are both internal and external events, also called endogenous and exogenous events, respectively. For example, an endogenous event in the example is the beginning of service of the customer since that is within the system being simulated. An exogenous event is the arrival of a customer for service since that occurrence is outside of the system. However, the arrival of a customer for service impinges on the system, and must be taken into consideration. This encyclopedia entry is concerned with discreteevent simulation models. These are contrasted with other types of models such as mathematical models, descriptive models, statistical models, and inputoutput models. A discrete-event model attempts to represent the components of a system and their interactions to such an extent that the objectives of the study are met. Most mathematical, statistical, and input-output models represent a system’s inputs and outputs explicitly, but represent the internals of the model with mathematical or statistical relationships. An example is the mathematical model from physics, Force Mass Acceleration based on theory. Discrete-event simulation models include a detailed representation of the actual internals. Discrete-event models are dynamic, i.e., the passage of time plays a crucial role. Most mathematical and statistical models are static in that they represent a system at a fixed point in time. Consider the annual budget of a firm. This budget resides in a spreadsheet. Changes can be made in the budget and the
665 spreadsheet can be recalculated, but the passage of time is usually not a critical issue. Further comments will be made about discrete-event models after several additional concepts are presented.
B. System State Variables The system state variables are the collection of all information needed to define what is happening within the system to a sufficient level (i.e., to attain the desired output) at a given point in time. The determination of system state variables is a function of the purposes of the investigation, so what may be the system state variables in one case may not be the same in another case, even though the physical system is the same. Determining the system state variables is as much an art as a science. However, during the modeling process, any omissions will readily come to light. (On the other hand, unnecessary state variables may be eliminated.) In the bank teller example, we might have the following system state variables—at clock time 5 we might have system state variables LQ(5) 0 and LS(5) 1. This is interpreted as the number in the queue at time 5 is 0, and the number in the system at time 5 is 1. Having defined system state variables and given an example, a contrast can be made between discreteevent models and continuous models based on the variables needed to track the system state. The system state variables in a discrete-event model remain constant over intervals of time and change value only at certain well-defined points called event times. Continuous models have system state variables defined by differential or difference equations giving rise to variables that may change continuously over time. Some models are mixed discrete-event and continuous. There are also continuous models that are treated as discrete-event models after some reinterpretation of system state variables, and vice versa.
C. Entities and Attributes An entity represents an object that requires explicit definition. An entity can be dynamic in that it “moves” through the system, or it can be static in that it serves other entities. In the example, the customer is a dynamic entity, whereas the bank teller is a static entity. An entity may have attributes that pertain to that entity alone. Thus, attributes should be considered as local values. In the example, an attribute of the entity could be the time of arrival. Attributes of interest in
666 one investigation may not be of interest in another investigation. Thus, if red parts and blue parts are being manufactured, the color could be an attribute. However, if the time in the system for all parts is of concern, the attribute of color may not be of importance. From this example, it can be seen that many entities can have the same attribute or attributes (i.e., more than one part may have the attribute “red”).
Discrete Event Simulation
A resource is an entity that provides service to dynamic entities. The resource can serve one or more than one dynamic entity at the same time, i.e., operate as a parallel server. A dynamic entity can request one or more units of a resource. If denied, the requesting entity joins a queue, or takes some other action (i.e., diverted to another resource, ejected from the system). (Other terms for queues include files, chains, buffers, and waiting lines.) If permitted to capture the resource, the entity remains for a time, then releases the resource. In the bank example, the teller is a resource. There are many possible states of the resource. Minimally, these states are idle and busy. But other possibilities exist including failed, blocked, or starved.
when the duration begins, its end can be scheduled. The duration can be a constant, a random value from a statistical distribution, the result of an equation, input from a file, or computed based on the event state. For example, a service time may be a constant 10 minutes for each entity; it may be a random value from an exponential distribution with a mean of 10 minutes; it could be 0.9 times a constant value from clock time 0 to clock time 4 hours, and 1.1 times the standard value after clock time 4 hours; or it could be 10 minutes when the preceding queue contains at most 4 entities and 8 minutes when there are 5 or more in the preceding queue. A delay is an indefinite duration that is caused by some combination of system conditions. When an entity joins a queue for a resource, the time that it will remain in the queue may be unknown initially since that time may depend on other events that may occur. An example of another event would be the arrival of a rush order that preempts the resource. When the preemption occurs, the entity using the resource relinquishes its control instantaneously. Another example is a failure necessitating repair of the resource. Discrete-event simulations contain activities that cause time to advance. Most discrete-event simulations also contain delays as entities wait. The beginning and ending of an activity or delay is an event.
E. List Processing
G. Discrete-Event Simulation Model
Entities are managed by allocating them to resources that provide service, by attaching them to event notices thereby suspending their activity into the future, or by placing them into an ordered list. Lists are used to represent queues. Lists are often processed according to first in first out (FIFO), but there are many other possibilities. For example, the list could be processed by last in first out (LIFO), according to the value of an attribute, or randomly, to mention a few. An example where the value of an attribute may be important is in shortest processing time (SPT) scheduling. In this case, the processing time may be stored as an attribute of each entity. The entities are ordered according to the value of that attribute with the lowest value at the head or front of the queue.
Sufficient modeling concepts have been defined so that a discrete-event simulation model can be defined as one in which the state variables change only at those discrete points in time at which events occur. Events occur as a consequence of activity times and delays. Entities may compete for system resources, possibly joining queues while waiting for an available resource. Activity and delay times may “hold” entities for durations of time. A discrete-event simulation model is conducted over time (“run”) by a mechanism that moves simulated time forward. The system state is updated at each event along with capturing and freeing of resources that may occur at that time.
D. Resources
IV. ADVANTAGES AND DISADVANTAGES OF SIMULATION
F. Activities and Delays An activity is a duration of time whose duration is known prior to commencement of the activity. Thus,
Competition in the computer industry has led to technological breakthroughs that are allowing hardware companies to continually produce better products. It
Discrete Event Simulation seems that every week another company announces its latest release, each with more options, memory, graphics capability, and power. What is unique about new developments in the computer industry is that they often act as a springboard for other related industries to follow. One industry in particular is the simulation-software industry. As computer hardware becomes more powerful, more accurate, faster, and easier to use, simulation software does too. The number of businesses using simulation is rapidly increasing. Many managers are realizing the benefits of utilizing simulation for more than just the one-time remodeling of a facility. Rather, due to advances in software, managers are incorporating simulation in their daily operations on an increasingly regular basis.
A. Advantages For most companies, the benefits of using simulation go beyond just providing a look into the future. These benefits are mentioned by many authors and are included in the following: • Choose correctly. Simulation lets you test every aspect of a proposed change or addition without committing resources to their acquisition. This is critical, because once the hard decisions have been made, the bricks have been laid, or the materialhandling systems have been installed, changes and corrections can be extremely expensive. Simulation allows you to test your designs without committing resources to acquisition. • Time compression and expansion. By compressing or expanding time simulation allows you to speed up or slow down phenomena so that you can thoroughly investigate them. You can examine an entire shift in a matter of minutes if you desire, or you can spend two hours examining all the events that occurred during one minute of simulated activity. • Understand “Why?” Managers often want to know why certain phenomena occur in a real system. With simulation, you determine the answer to the “why” questions by reconstructing the scene and taking a microscopic examination of the system to determine why the phenomenon occurs. You cannot accomplish this with a real system because you cannot see or control it in its entirety. • Explore possibilities. One of the greatest advantages of using simulation software is that once
667 you have developed a valid simulation model, you can explore new policies, operating procedures, or methods without the expense and disruption of experimenting with the real system. Modifications are incorporated in the model, and you observe the effects of those changes on the computer rather than the real system. • Diagnose problems. The modern factory floor or service organization is very complex—so complex that it is impossible to consider all the interactions taking place in one given moment. Simulation allows you to better understand the interactions among the variables that make up such complex systems. Diagnosing problems and gaining insight into the importance of these variables increases your understanding of their important effects on the performance of the overall system. The last three claims can be made for virtually all modeling activities, queueing, linear programming, etc. However, with simulation the models can become very complex and, thus, have a higher fidelity, i.e., they are valid representations of reality. • Identify constraints. Production bottlenecks give manufacturers headaches. It is easy to forget that bottlenecks are an effect rather than a cause. However, by using simulation to perform bottleneck analysis, you can discover the cause of the delays in work-in-process, information, materials, or other processes. • Develop understanding. Many people operate with the philosophy that talking loudly, using computerized layouts, and writing complex reports convinces others that a manufacturing or service system design is valid. In many cases these designs are based on someone’s thoughts about the way the system operates rather than on analysis. Simulation studies aid in providing understanding about how a system really operates rather than indicating an individual’s predictions about how a system will operate. • Visualize the plan. Taking your designs beyond CAD drawings by using the animation features offered by many simulation packages allows you to see your facility or organization actually running. Depending on the software used, you may be able to view your operations from various angles and levels of magnification, even 3-D. This allows you to detect design flaws that appear credible when seen just on paper in a 2-D CAD drawing. • Build consensus. Using simulation to present design changes creates an objective opinion. You
668 avoid having inferences made when you approve or disapprove of designs because you simply select the designs and modifications that provided the most desirable results, whether it be increasing production or reducing the waiting time for service. In addition, it is much easier to accept reliable simulation results, which have been modeled, tested, validated, and visually represented, instead of one person’s opinion of the results that will occur from a proposed design. • Prepare for change. We all know that the future will bring change. Answering all of the “what-if ” questions is useful for both designing new systems and redesigning existing systems. Interacting with all those involved in a project during the problemformulation stage gives you an idea of the scenarios that are of interest. Then you construct the model so that it answers questions pertaining to those scenarios. What if an automated guided vehicle (AGV) is removed from service for an extended period of time? What if demand for service increases by 10%? What if . . . ? The options are unlimited. • Wise investment. The typical cost of a simulation study is substantially less than 1% of the total amount being expended for the implementation of a design or redesign. Since the cost of a change or modification to a system after installation is so great, simulation is a wise investment. • Train the team. Simulation models can provide excellent training when designed for that purpose. Used in this manner, the team provides decision inputs to the simulation model as it progresses. The team, and individual members of the team, can learn by their mistakes, and learn to operate better. This is much less expensive and less disruptive than on-the-job learning. • Specify requirements. Simulation can be used to specify requirements for a system design. For example, the specifications for a particular type of machine in a complex system to achieve a desired goal may be unknown. By simulating different capabilities for the machine, the requirements can be established.
B. Disadvantages The disadvantages of simulation include the following: • Model building requires special training. It is an art that is learned over time and through experience. Furthermore, if two models of the same system are constructed by two competent
Discrete Event Simulation individuals, they may have similarities, but it is highly unlikely that they will be the same. • Simulation results may be difficult to interpret. Since most simulation outputs are essentially random variables (they are usually based on random inputs), it may be hard to determine whether an observation is a result of system interrelationships or randomness. • Simulation modeling and analysis can be timeconsuming and expensive. Skimping on resources for modeling and analysis may result in a simulation model and/or analysis that is not sufficient for the task. • Simulation may be used inappropriately. Simulation is used in some cases when an analytical solution is possible, or even preferable. This is particularly true in the simulation of some waiting lines where closed-form queueing models are available, at least for long-run evaluation.
C. Offsetting the Disadvantages In defense of simulation, these four disadvantages, respectively, can be offset as follows: • Simulators. Vendors of simulation software have been actively developing packages that contain models that only need input data for their operation. Such models have the generic tag “simulators” or templates. • Output analysis. Most simulation-software vendors have developed output-analysis capabilities within their packages for performing very extensive analysis. This reduces the computational requirements on the part of the user, although they still must understand the analysis procedure. • Faster and faster. Simulation can be performed faster today than yesterday, and even faster tomorrow. This is attributable to the advances in hardware that permit rapid running of scenarios. It is also attributable to the advances in many simulation packages. For example, many simulation software products contain constructs for modeling material handling using transporters such as conveyors, and automated guided vehicles. • Limitations of closed-form models. Closed-form models are not able to analyze most of the complex systems that are encountered in practice. In nearly fourteen years of the author’s consulting practice and current employment with a simulation software and consulting vendor, not one problem was encountered that could have been solved by a closed-form solution.
Discrete Event Simulation
669
V. STEPS IN A SIMULATION STUDY Figure 1 shows a set of steps to guide a model builder in a thorough and sound simulation study.
1. Problem Formulation Every simulation study begins with a statement of the problem. If the statement is provided by those that have the problem (client), the simulation analyst must take extreme care to insure that the problem is clearly understood. If a problem statement is prepared by the simulation analyst, it is important that the client understand and agree with the formulation. It is suggested that a set of assumptions be prepared by the simulation analyst and agreed to by the client. Even with all of these precautions, it is possible that the problem will need to be reformulated as the simulation study progresses.
2. Setting of Objectives and Overall Project Plan Another way to state this step is “prepare a proposal.” This step should be accomplished regardless of location of the analyst and client, viz., as an external or internal consultant. The objectives indicate the questions that are to be answered by the simulation study. The project plan should include a statement of the various scenarios that will be investigated. The plans for the study should be indicated in terms of time that will be required, personnel that will be used, hardware and software requirements if the client wants to run the model and conduct the analysis, stages in the investigation, output at each stage, cost of the study and billing procedures, if any.
3. Model Conceptualization The real-world system under investigation is abstracted by a conceptual model, a series of mathematical and logical relationships concerning the components and the structure of the system. It is recommended that modeling begin simply and that the model grow until a model of appropriate complexity has been developed. For example, consider the model of a manufacturing and material-handling system. The basic model with the arrivals, queues, and servers is constructed. Then, add the failures and shift schedules. Next, add the material-handling capabilities. Finally, add the special features. Constructing an unduly complex model will add to the cost of the study and the time for its completion without increasing the quality of the output. Maintaining client involvement will en-
Figure 1 Steps in a simulation study. [Reprinted with permission from Banks, J., Carson, J. S., Nelson, B. L., and Nicol, D. M. (2000). Discrete event system simulation, 3rd ed. Englewood Cliffs, NJ: Prentice Hall.]
hance the quality of the resulting model and increase the client’s confidence in its use.
4. Data Collection Shortly after the proposal is “accepted” a schedule of data requirements should be submitted to the client. In
Discrete Event Simulation
670 the best of circumstances, the client has been collecting the kind of data needed in the format required, and can submit these data to the simulation analyst in electronic format. Oftentimes, the client indicates that the required data are indeed available. However, when the data are delivered they are found to be quite different than anticipated. For example, in the simulation of an airline reservation system, the simulation analyst was told “we have every bit of data that you want over the last five years.” When the study commenced, the data delivered were the average “talk time” of the reservationist for each of the years. Individual values were needed, not summary measures. Model building and data collection are shown as contemporaneous in Fig. 1. This is to indicate that the simulation analyst can readily construct the model while the data collection is progressing.
5. Model Translation The conceptual model constructed in Step 3 is coded into a computer recognizable form, an operational model.
6. Verified? Verification concerns the operational model. Is it performing properly? Even with small textbook-sized models, it is quite possible that they have verification difficulties. These models are orders of magnitude smaller than real models (say 50 lines of computer code versus 2000 lines of computer code). It is highly advisable that verification take place as a continuing process. It is ill advised for the simulation analyst to wait until the entire model is complete to begin the verification process. Also, use of an interactive run controller, or debugger, is highly encouraged as an aid to the verification process.
7. Validated? Validation is the determination that the conceptual model is an accurate representation of the real system. Can the model be substituted for the real system for the purposes of experimentation? If there is an existing system, call it the base system, then an ideal way to validate the model is to compare its output to that of the base system. Unfortunately, there is not always a base system. There are many methods for performing validation.
9. Production Runs and Analysis Production runs, and their subsequent analysis, are used to estimate measures of performance for the scenarios that are being simulated.
10. More Runs? Based on the analysis of runs that have been completed, the simulation analyst determines if additional runs are needed and if any additional scenarios need to be simulated.
11. Documentation and Reporting Documentation is necessary for numerous reasons. If the simulation model is going to be used again by the same or different analysts, it may be necessary to understand how the simulation model operates. This will enable confidence in the simulation model so that the client can make decisions based on the analysis. Also, if the model is to be modified, this can be greatly facilitated by adequate documentation. The result of all the analysis should be reported clearly and concisely. This will enable the client to review the final formulation, the alternatives that were addressed, the criterion by which the alternative systems were compared, the results of the experiments, and analyst recommendations, if any.
12. Implementation The simulation analyst acts as a reporter rather than an advocate. The report prepared in Step 11 stands on its merits, and is just additional information that the client uses to make a decision. If the client has been involved throughout the study period, and the simulation analyst has followed all of the steps rigorously, then the likelihood of a successful implementation is increased.
SEE ALSO THE FOLLOWING ARTICLES Continuous System Simulation • Model Building Process • Optimization Models • Simulation Languages • Software Process Simulation
8. Experimental Design For each scenario that is to be simulated, decisions need to be made concerning the length of the simulation run, the number of runs (also called replications), and the manner of initialization, as required.
BIBLIOGRAPHY Banks, J., ed. (1998). Handbook of simulation: Principles, methodology, advances, applications, and practice. New York: John Wiley.
Discrete Event Simulation Banks, J., Carson, J. S., Nelson, B. L., and Nicol, D. M. (2000). Discrete event system simulation, 3rd ed. Upper Saddle River, NJ: Prentice Hall. Banks, J., and Norman, V. (November 1995). Justifying simulation in today’s manufacturing environment. IIE Solutions. Carson, J. S. (1993). Modeling and simulation world views, in Proceedings of the 1993 Winter Simulation Conference, (G. W. Evans, M. Mollaghasemi, E. C. Russell, and W. E. Biles, eds.)
671 pp. 18–23, 1–4, Piscataway, NJ: Institute of Electrical and Electronics Engineers. Law, A. M., and Kelton, W. D. (2000). Simulation modeling and analysis, 3rd ed. New York: McGraw-Hill. Pegden, C. D., Shannon, R. E., and Sadowski, R. P. (1995). Introduction to simulation using SIMAN, 2nd ed. New York: McGraw-Hill. Schriber, T. J. (1991). An introduction to simulation using GPSS/H. New York: John Wiley.
Distributed Databases M. Tamer Özsu University of Waterloo
I. II. III. IV.
INTRODUCTION DATA DISTRIBUTION ALTERNATIVES ARCHITECTURAL ALTERNATIVES OVERVIEW OF TECHNICAL ISSUES
GLOSSARY atomicity The property of transaction processing whereby either all the operations of a transaction are executed or none of them are (all-or-nothing). client/server architecture A distributed/parallel DBMS architecture where a set of client machines with limited functionality access a set of servers which manage data. concurrency control algorithm Algorithm that synchronize the operations of concurrent transactions that execute on a shared database. deadlock An occurrence where each transaction in a set of transactions circularly waits on locks that are held by other transactions in the set. distributed database management system A database management system that manages a database that is distributed across the nodes of a computer network and makes this distribution transparent to the users. durability The property of transaction processing whereby the effects of successfully completed (i.e., committed) transactions endure subsequent failures. isolation The property of transaction execution which states that the effects of one transaction on the database are isolated from other transactions until the first completes its execution. locking A method of concurrency control where locks are placed on database units (e.g., pages) on behalf of transactions that attempt to access them. logging protocol The protocol that records, in a separate location, the changes that a transaction makes to the database before the change is actually made.
V. VI. VII. VIII.
DISTRIBUTED QUERY OPTIMIZATION DISTRIBUTED CONCURRENCY CONTROL DISTRIBUTED RELIABILITY PROTOCOLS REPLICATION PROTOCOLS
one copy equivalence Replica control policy that asserts that the values of all copies of a logical data item should be identical when the transaction that updates that item terminates. query optimization The process by which the “best” execution strategy for a given query is found from among a set of alternatives. query processing The process by which a declarative query is translated into low-level data manipulation operations. quorum-based voting algorithm A replica control protocol where transactions collect votes to read and write copies of data items. They are permitted to read or write data items if they can collect a quorum of votes. read-once/write-all protocol (ROWA) The replica control protocol which maps each logical read operation to a read on one of the physical copies and maps a logical write operation to a write on all of the physical copies. serializability The concurrency control correctness criterion that requires that the concurrent execution of a set of transactions be equivalent to the effect of some serial execution of those transactions. termination protocol A protocol by which individual sites can decide how to terminate a particular transaction when they cannot communicate with other sites where the transaction executes. transaction A unit of consistent and atomic execution against the database. transparency Extension of data independence to distributed systems by hiding the distribution, fragmentation, and replication of data from the users.
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
673
674 two-phase commit (2PC) An atomic commitment protocol that ensures that a transaction is terminated the same way at every site where it executes. The name comes from the fact that two rounds of messages are exchanged during this process. two-phase locking A locking algorithm where transactions are not allowed to request new locks once they release a previously held lock.
I. INTRODUCTION The maturation of database management system (DBMS) technology has coincided with significant developments in computer network and distributed computing technologies. The end result is the emergence of distributed DBMS. These systems have started to become the dominant data management tools for highly data-intensive applications. Many DBMS vendors have incorporated some degree of distribution into their products. A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is the software system that permits the management of the distributed database and makes the distribution transparent to the users. The term “distributed database system” (DDBS) is typically used to refer to the combination of DDB and the distributed DBMS. These definitions point to two identifying architectural principles. The first is that the system consists of a (possibly empty) set of query sites and a nonempty set of data sites. The data sites have data storage capability while the query sites do not. The latter only run the user interface routines in order to facilitate the data access at data sites. The second is that each site (query or data) is assumed to logically consist of a single, independent computer. Therefore, each site has its own primary and secondary storage, runs its own operating system (which may be the same or different at different sites), and has the capability to execute applications on its own. A computer network, rather than a multiprocessor configuration, interconnects the sites. The important point here is the emphasis on loose interconnection between processors that have their own operating systems and operate independently.
II. DATA DISTRIBUTION ALTERNATIVES A distributed database is physically distributed across the data sites by fragmenting and replicating the data.
Distributed Databases Given a relational database schema, fragmentation subdivides each relation into horizontal or vertical partitions. Horizontal fragmentation of a relation is accomplished by a selection operation that places each tuple of the relation in a different partition based on a fragmentation predicate (e.g., an Employee relation may be fragmented according to the location of the employees). Vertical fragmentation divides a relation into a number of fragments by projecting over its attributes (e.g., the Employee relation may be fragmented such that the Emp_number, Emp_name, and Address information is in one fragment, and Emp_number, Salary, and Manager information is in another fragment). Fragmentation is desirable because it enables the placement of data in close proximity to its place of use, thus potentially reducing transmission cost, and it reduces the size of relations that are involved in user queries. Based on the user access patterns, each of the fragments may also be replicated. This is preferable when the same data are accessed from applications that run at a number of sites. In this case, it may be more cost-effective to duplicate the data at a number of sites rather than continuously moving it between them. Figure 1 depicts a data distribution where Employee, Project, and Assignment relations are fragmented, replicated, and distributed across multiple sites of a distributed database.
III. ARCHITECTURAL ALTERNATIVES There are many possible alternatives for architecting a distributed DBMS. The simplest is the client/server architecture, where a number of client machines access a single database server. The simplest client/server systems involve a single server that is accessed by a number of clients (these can be called multipleclient/single-server). In this case, the database management problems are considerably simplified since the database is stored on a single server. The pertinent issues relate to the management of client buffers and the caching of data and (possibly) locks. The data management is done centrally at the single server. A more distributed, and more flexible, architecture is the multiple-client/multiple-server architecture where the database is distributed across multiple servers that have to communicate with each other in responding to user queries and in executing transactions. Each client machine has a “home” server to which it directs user requests. The communication of the servers among themselves is transparent to the users. Most current DBMSs implement one or the other type of
Distributed Databases
675
Boston employees, Paris employees, Boston projects
Paris employees, Boston employees, Paris projects, Boston projects
Paris
Boston Communication Network
Toronto Toronto employees, Toronto projects, Paris projects
San Francisco San Francisco employees, San Francisco projects
Figure 1 A fragmented, replicated, and distributed database example.
the client/server architectures. A truly distributed DBMS does not distinguish between client and server machines. Ideally, each site can perform the functionality of a client and a server. Such architectures, called peer-to-peer, require sophisticated protocols to manage data that is distributed across multiple sites. The complexity of required software has delayed the offering of peer-to-peer distributed DBMS products. If the DDBSs at various sites are autonomous and (possibly) exhibit some form of heterogeneity, they are usually referred to as multidatabase systems or federated database systems. If the data and DBMS functionality distribution is accomplished on a multiprocessor computer, then it is referred to as a parallel database system. These are different than a DDBS where the logical integration among distributed data is tighter than is the case with multidatabase systems or federated database systems, but the physical control is looser than that in parallel DBMSs. In this article, we do not consider multidatabase systems or parallel database systems.
IV. OVERVIEW OF TECHNICAL ISSUES A distributed DBMS has to provide the same functionality that its centralized counterparts provide, such as support for declarative user queries and their optimization, transactional access to the database involving concurrency control and reliability, enforcement of integrity constraints, and others. In the remaining sections we discuss some of these functions; in this section we provide a brief overview. Query processing deals with designing algorithms that analyze queries and convert them into a series of data
manipulation operations. Besides the methodological issues, an important aspect of query processing is query optimization. The problem is how to decide on a strategy for executing each query over the network in the most cost-effective way, however, cost is defined. The factors to be considered are the distribution of data, communication costs, and lack of sufficient locally available information. The objective is to optimize where the inherent parallelism of the distributed system is used to improve the performance of executing the query, subject to the above-mentioned constraints. The problem is NP—hard in nature— and the approaches are usually heuristic. User accesses to shared databases are formulated as transactions, which are units of execution that satisfy four properties: atomicity, consistency, isolation, and durability—jointly known as the ACID properties. Atomicity means that a transaction is an atomic unit and either the effects of all of its actions are reflected in the database, or none of them are. Consistency generally refers to the correctness of the individual transactions; i.e., that a transaction does not violate any of the integrity constraints that have been defined over the database. Isolation addresses the concurrent execution of transactions and specifies that actions of concurrent transactions do not impact each other. Finally, durability concerns the persistence of database changes in the face of failures. The ACID properties are enforced by means of concurrency control algorithms and reliability protocols. Concurrency control involves the synchronization of accesses to the distributed database, such that the integrity of the database is maintained. The concurrency control problem in a distributed context is somewhat
Distributed Databases
676 different than in a centralized framework. One not only has to worry about the integrity of a single database, but also about the consistency of multiple copies of the database. The condition that requires all the values of multiple copies of every data item to converge to the same value is called mutual consistency. Reliability protocols deal with the termination of transactions, in particular, their behavior in the face of failures. In addition to the typical failure types (i.e., transaction failures and system failures), distributed DBMSs have to account for communication (network) failures as well. The implication of communication failures is that, when a failure occurs and various sites become either inoperable or inaccessible, the databases at the operational sites remain consistent and up to date. This complicates the picture, as the actions of these sites have to be eventually reconciled with those of failed ones. Therefore, recovery protocols coordinate the termination of transactions so that they terminate uniformly (i.e., they either abort or they commit) at all the sites where they execute. Furthermore, when the computer system or network recovers from the failure, the distributed DBMS should be able to recover and bring the databases at the failed sites up to date. This may be especially difficult in the case of network partitioning, where the sites are divided into two or more groups with no communication among them. Distributed databases are typically replicated; that is, a number of the data items reside at more than one site. Replication improves performance (since data access can be localized) and availability (since the failure of a site does not make a data item inaccessible). However, management of replicated data requires that the values of multiple copies of a data item are the same. This is called the one copy equivalence property. Distributed DBMSs that allow replicated data implement replication protocols to enforce one copy equivalence.
V. DISTRIBUTED QUERY OPTIMIZATION Query processing is the process by which a declarative query is translated into low-level data manipulation operations. SQL is the standard query language that is supported in current DBMSs. Query optimization refers to the process by which the “best” execution strategy for a given query is found from among a set of alternatives. In distributed DBMSs, the process typically involves four steps (Fig. 2): (1) query decomposition; (2) data localization; (3) global optimization; and (4) local optimization. Query decomposition takes an SQL query and translates it into one expressed in relational alge-
CALCULUS QUERY ON DISTRIBUTED RELATIONS
QUERY DECOMPOSITION
GLOBAL SCHEMA
ALGEBRAIC QUERY ON DISTRIBUTED RELATIONS CONTROL SITE
DATA LOCALIZATION
FRAGMENT SCHEMA
FRAGMENT QUERY
GLOBAL OPTIMIZATION
STATISTICS ON FRAGMENTS
OPTIMIZED FRAGMENT QUERY WITH COMMUNICATION OPERATIONS
LOCAL SITES
LOCAL OPTIMIZATION
LOCAL SCHEMA
OPTIMIZED LOCAL QUERIES
Figure 2 Distributed query processing methodology.
bra. In the process, the query is analyzed semantically so that incorrect queries are detected and rejected as early as possible, and correct queries are simplified. Simplification involves the elimination of redundant predicates that may be introduced as a result of query modification to deal with views, security enforcement, and semantic integrity control. The simplified query is then restructured as an algebraic query. The initial algebraic query generated by the query decomposition step is input to the second step: data localization. The initial algebraic query is specified on global relations irrespective of their fragmentation or distribution. The main role of data localization is to localize the query’s data using data distribution information. In this step, the fragments that are involved in the query are determined and the query is transformed into one that operates on fragments rather than global relations. As indicated earlier, fragmentation is defined through fragmentation rules that can be expressed as relational operations (horizontal fragmentation by selection, vertical fragmentation by projection). A distributed relation can be reconstructed by applying the inverse of the fragmentation rules. This is called a localization program. The localization program for a horizontally (vertically) fragmented query is the union
Distributed Databases (join) of the fragments. Thus, during the data localization step each global relation is first replaced by its localization program, and then the resulting fragment query is simplified and restructured to produce an equivalent query that only involves fragments that contribute to the query result. Simplification and restructuring may be done according to the same rules used in the decomposition step. As in the decomposition step, the final fragment query is generally far from optimal; the process has only eliminated those queries whose performance is likely to be worse (due to their involvement of unnecessary fragments). For a given SQL query, there is more than one possible algebraic query. Some of these algebraic queries are “better” than others. The quality of an algebraic query is defined in terms of expected performance. The process of query optimization involves taking the initial algebraic query and, using algebraic transformation rules, transforming it into other algebraic queries until the “best” one is found. The “best” algebraic query is determined according to a cost function that calculates the cost of executing the query according to that algebraic specification. In a distributed setting, the process involves global optimization to handle operations that involve data from multiple sites (e.g., join) followed by local optimization for further optimizing operations that will be performed at a given site. The input to the third step, global optimization, is a fragment query, that is, an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query that is close to optimal. Remember that finding the optimal solution is computationally intractable. An execution strategy for a distributed query can be described with relational algebra operations and communication primitives (send/receive operations) for transferring data between sites. The previous layers have already optimized the query; for example, by eliminating redundant expressions. However, this optimization is independent of fragment characteristics such as cardinalities. In addition, communication operations are not yet specified. By permuting the order of operations within one fragment query, many equivalent query execution plans may be found. Query optimization consists of finding the “best” one among candidate plans examined by the optimizer.* The final step, local optimization, takes a part of the global query (called a subquery) that will run at a particular site and optimizes it further. This step is very similar to query optimization in centralized
677 DBMSs. Thus, it is at this stage that local information about data storage, such as indexes, are used to determine the best execution strategy for that subquery. The query optimizer is usually modeled as consisting of three components: a search space, a cost model, and a search strategy. The search space is the set of alternative execution plans to represent the input query. These plans are equivalent, in the sense that they yield the same result but they differ on the execution order of operations and the way these operations are implemented. The cost model predicts the cost of a given execution plan. To be accurate, the cost model must have accurate knowledge about the parallel execution environment. The search strategy explores the search space and selects the best plan. It defines which plans are examined and in which order. In a distributed environment, the cost function, often defined in terms of time units, refers to computing resources such as disk space, disk I/Os, buffer space, CPU cost, communication cost, etc. Generally, it is a weighted combination of I/O, CPU, and communication costs. Nevertheless, a typical simplification made by distributed DBMSs is to consider communication cost as the most significant factor. This is valid for wide area networks, where the limited bandwidth makes communication much more costly than it is in local processing. To select the ordering of operations it is necessary to predict execution costs of alternative candidate orderings. Determining execution costs before query execution (i.e., static optimization) is based on fragment statistics and the formulas for estimating the cardinalities of results of relational operations. Thus the optimization decisions depend on the available statistics on fragments. An important aspect of query optimization is join ordering, since permutations of the joins within the query may lead to improvements of several orders of magnitude. One basic technique for optimizing a sequence of distributed join operations is through use of the semijoin operator. The main value of the semijoin in a distributed system is to reduce the size of the join operands and thus the communication cost. However, more recent techniques, which consider local processing costs as well as communication costs, do not use semijoins because they might increase local processing costs. The output of the query optimization layer is an optimized algebraic query with communication operations included on fragments.
VI. DISTRIBUTED CONCURRENCY CONTROL *The difference between an optimal plan and the best plan is that the optimizer does not, because of computational intractability, examine all of the possible plans.
Whenever multiple users access (read and write) a shared database, these accesses need to be synchronized to ensure database consistency. The synchronization is
678 achieved by means of concurrency control algorithms that enforce a correctness criterion such as serializability. User accesses are encapsulated as transactions, whose operations at the lowest level are a set of read and write operations to the database. Concurrency control algorithms enforce the isolation property of transaction execution, which states that the effects of one transaction on the database are isolated from other transactions until the first completes its execution. The most popular concurrency control algorithms are locking-based. In such schemes, a lock, in either shared or exclusive mode, is placed on some unit of storage (usually a page) whenever a transaction attempts to access it. These locks can be two types: shared, indicating that more than two transactions are allowed to access the data, and exclusive, indicating that the transaction needs to be the only one accessing data. Shared locks are also called read locks, since two transactions can read the same data unit, while exclusive locks are also called write locks, indicating that two transactions cannot revise the values of the data unit concurrently. The locks are placed according to lock compatibility rules such that read-write, write-read, and write-write conflicts are avoided. The compatibility rules are the following: 1. If transaction T1 holds a shared lock on data unit D1, transaction T2 can also obtain a shared lock on D1 (no conflict). 2. If transaction T1 holds a shared lock on data unit D1, transaction T2 cannot obtain an exclusive lock on D1 (read-write conflict). 3. If transaction T1 holds an exclusive lock on data unit D1, transaction T2 cannot obtain a shared lock (write-read conflict) or an exclusive lock (write-write conflict) on D1. It is a well-known theorem that if lock actions on behalf of concurrent transactions obey a simple rule, then it is possible to ensure the serializability of these transactions: “No lock on behalf of a transaction should be set once a lock previously held by the transaction is released.” This is known as two-phase locking, since transactions go through a growing phase when they obtain locks and a shrinking phase when they release locks. In general, releasing of locks prior to the end of a transaction is problematic. Thus, most of the locking-based concurrency control algorithms are strict in that they hold on to their locks until the end of the transaction. In distributed DBMSs, the challenge is to extend both the serializability argument and the concurrency control algorithms to the distributed execution envi-
Distributed Databases ronment. In these systems, the operations of a given transaction may execute at multiple sites where they access data. In such a case, the serializability argument is more difficult to specify and enforce. The complication is due to the fact that the serialization order of the same set of transactions may be different at different sites. Therefore, the execution of a set of distributed transactions is serializable if and only if the execution of the set of transactions at each site is serializable, and the serialization orders of these transactions at all these sites are identical. Distributed concurrency control algorithms enforce this notion of global serializability. In locking-based algorithms there are three alternative ways of enforcing global serializability: centralized locking, primary copy locking, and distributed locking algorithm. In centralized locking, there is a single lock table for the entire distributed database. This lock table is placed, at one of the sites, under the control of a single lock manager. The lock manager is responsible for setting and releasing locks on behalf of transactions. Since all locks are managed at one site, this is similar to centralized concurrency control and it is straightforward to enforce the global serializability rule. These algorithms are simple to implement, but suffer from two problems. The central site may become a bottleneck, both because of the amount of work it is expected to perform and because of the traffic that is generated around it; and the system may be less reliable since the failure or inaccessibility of the central site would cause system unavailability. Primary copy locking is a concurrency control algorithm that is useful in replicated databases where there may be multiple copies of a data item stored at different sites. One of the copies is designated as a primary copy and it is this copy that has to be locked in order to access that item. All the sites know the set of primary copies for each data item in the distributed system, and the lock requests on behalf of transactions are directed to the appropriate primary copy. If the distributed database is not replicated, copy locking degenerates into a distributed locking algorithm. In distributed (or decentralized) locking, the lock management duty is shared by all the sites in the system. The execution of a transaction involves the participation and coordination of lock managers at more than one site. Locks are obtained at each site where the transaction accesses a data item. Distributed locking algorithms do not have the overhead of centralized locking ones. However, both the communication overhead to obtain all the locks and the complexity of the algorithm are greater. One side effect of all locking-based concurrency
Distributed Databases control algorithms is that they cause deadlocks. The detection and management of deadlocks in a distributed system is difficult. Nevertheless, the relative simplicity and better performance of locking algorithms make them more popular than alternatives such as timestampbased algorithms or optimistic concurrency control.
VII. DISTRIBUTED RELIABILITY PROTOCOLS Two properties of transactions are maintained by reliability protocols: atomicity and durability. Atomicity requires that either all the operations of a transaction are executed or none of them are (all-or-nothing property). Thus, the set of operations contained in a transaction is treated as one atomic unit. Atomicity is maintained in the face of failures. Durability requires that the effects of successfully completed (i.e., committed) transactions endure subsequent failures. The underlying issue addressed by reliability protocols is how the DBMS can continue to function properly in the face of various types of failures. In a distributed DBMS, four types of failures are possible: transaction, site (system), media (disk), and communication. Transactions can fail for a number of reasons: due to an error in the transaction caused by input data or by an error in the transaction code, or the detection of a present or potential deadlock. The usual approach to take in cases of transaction failure is to abort the transaction, resetting the database to its state prior to the start of the database. Site (or system) failures are due to a hardware failure (e.g., processor, main memory, power supply) or a software failure (bugs in system code). The effect of system failures is the loss of main memory contents. Therefore, any updates to the parts of the database that are in the main memory buffers (also called volatile database) are lost as a result of system failures. However, the database that is stored in secondary storage (also called stable database) is safe and correct. To achieve this, DBMSs typically employ logging protocols, such as Write-Ahead Logging, which record changes to the database in system logs and move these log records and the volatile database pages to stable storage at appropriate times. From the perspective of distributed transaction execution, site failures are important since the failed sites cannot participate in the execution of any transaction. Media failures refer to the failure of secondary storage devices that store the stable database. Typically, these failures are addressed by introducing redundancy of storage devices and maintaining archival copies of the database. Media failures are frequently
679 treated as problems local to one site and therefore are not specifically addressed in the reliability mechanisms of distributed DBMSs. The three types of failures described above are common to both centralized and distributed DBMSs. Communication failures, on the other hand, are unique to distributed systems. There are a number of types of communication failures. The most common ones are errors in the messages, improperly ordered messages, lost (or undelivered) messages, and line failures. Generally, the first two of these are considered to be the responsibility of the computer network protocols and are not addressed by the distributed DBMS. The last two, on the other hand, have an impact on the distributed DBMS protocols and, therefore, need to be considered in the design of these protocols. If one site is expecting a message from another site and this message never arrives, this may be because (1) the message is lost, (2) the line(s) connecting the two sites may be broken, or (3) the site that is supposed to send the message may have failed. Thus, it is not always possible to distinguish between site failures and communication failures. The waiting site simply timeouts and has to assume that the other site is incommunicado. Distributed DBMS protocols have to deal with this uncertainty. One drastic result of line failures may be network partitioning in which the sites form groups where communication within each group is possible but communication across groups is not. This is difficult to deal with in the sense that it may not be possible to make the database available for access while at the same time guaranteeing its consistency. The enforcement of atomicity and durability requires the implementation of atomic commitment protocols and distributed recovery protocols. The most popular atomic commitment protocol is two-phase commit. The recoverability protocols are built on top of the local recovery protocols, which are dependent upon the supported mode of interaction (of the DBMS) with the operating system. Two-phase commit (2PC) is a very simple and elegant protocol that ensures the atomic commitment of distributed transactions. It extends the effects of local atomic commit actions to distributed transactions by insisting that all sites involved in the execution of a distributed transaction agree to commit the transaction before its effects are made permanent (i.e., all sites terminate the transaction in the same manner). If all the sites agree to commit a transaction then all the actions of the distributed transaction take effect; if one of the sites declines to commit the operations at that site, then all of the other sites are required to
Distributed Databases
680 abort the transaction. Thus, the fundamental 2PC rule states: if even one site rejects to commit (which means it votes to abort) the transaction, the distributed transaction has to be aborted at each site where it executes, and if all the sites vote to commit the transaction, the distributed transaction is committed at each site where it executes. The simple execution of the 2PC protocol is as follows (Fig. 3). There is a coordinator process at the site where the distributed transaction originates, and participant processes at all the other sites where the transaction executes. Initially, the coordinator sends a “prepare” message to all the participants each of which independently determines whether or not it can com-
mit the transaction at that site. Those that can commit send back a “vote-commit” message while those who are not able to commit send back a “vote-abort” message. Once a participant registers its vote, it cannot change it. The coordinator collects these messages and determines the fate of the transaction according to the 2PC rule. If the decision is to commit, the coordinator sends a “global-commit” message to all the participants; if the decision is to abort, it sends a “global-abort” message to those participants who had earlier voted to commit the transaction. No message needs to be sent to those participants who had originally voted to abort since they can assume, according to the 2PC rule, that the transaction is going
Coordinator
Participant
INITIAL
INITIAL ARE
PREP
write begin_commit in log
write abort in log
T
OR -AB
E
VOT
No
Ready to commit? Yes
VOTE-COMMIT
WAIT
GLOBAL-ABORT
write abort in log
No
READY
write commit in log
ACK COMMIT
MIT
COM
BAL-
GLO
ABORT
(Unilateral abort)
Yes Any No?
write ready in log
Abort
write abort in log
ACK
Type of msg?
Commit write commit in log
write end_of_transaction in log ABORT
Figure 3 2PC protocol actions.
COMMIT
Distributed Databases to be eventually globally aborted. This is known as the “unilateral abort” option of the participants. There are two rounds of message exchanges between the coordinator and the participants; hence the name 2PC protocol. There are a number of variations of 2PC, such as the linear 2PC and distributed 2PC, that have not found much favor among distributed DBMS vendors. Two important variants of 2PC are the presumed abort 2PC and presumed commit 2PC. These are important because they reduce the message and I/O overhead of the protocols. Presumed abort protocol is included in the X/Open XA standard and has been adopted as part of the ISO standard for Open Distributed Processing. One important characteristic of 2PC protocol is its blocking nature. Failures can occur during the commit process. As discussed above, the only way to detect these failures is by means of a timeout of the process waiting for a message. When this happens, the process (coordinator or participant) that timeouts follows a termination protocol to determine what to do with the transaction that was in the middle of the commit process. A nonblocking commit protocol is one whose termination protocol can determine what to do with a transaction in case of failures under any circumstance. In the case of 2PC, if a site failure occurs at the coordinator site and one participant site while the coordinator is collecting votes from the participants, the remaining participants cannot determine the fate of the transaction among themselves, and they have to remain blocked until the coordinator or the failed participant recovers. During this period, the locks that are held by the transaction cannot be released, which reduces the availability of the database. There have been attempts to devise nonblocking commit protocols (e.g., three-phase commit), but the high overhead of these protocols has precluded their adoption. The inverse of termination is recovery. When a failed site recovers from the failure, what actions does it have to take to recover the database at that site to a consistent state? This is the domain of distributed recovery protocols. If each site can look at its own log and decide what to do with the transaction, then the recovery protocol is said to be independent. For example, if the coordinator fails after it sends the “prepare” command and while waiting for the responses from the participants, upon recovery, it can determine from its log where it was in the process and can restart the commit process for the transaction from the beginning by sending the “prepare” message one more time. If the participants had already terminated the transaction, they can inform the coordinator. If they were blocked, they can now resend their earlier votes
681 and resume the commit process. However, this is not always possible and the failed site has to ask others for the fate of the transaction.
VIII. REPLICATION PROTOCOLS In replicated distributed databases, each logical data item has a number of physical instances. For example, the salary of an employee (logical data item) may be stored at three sites (physical copies). The issue in this type of a database system is to maintain some notion of consistency among the copies. The most discussed consistency criterion is one copy equivalence, which asserts that the values of all copies of a logical data item should be identical when the transaction that updates it terminates. If replication transparency is maintained, transactions will issue read and write operations on a logical data item x. The replica control protocol is responsible for mapping operations on x to operations on physical copies of x (x1, ..., xn). A typical replica control protocol that enforces one copy equivalence is known as Read-Once/Write-All (ROWA) protocol. ROWA maps each read on x [Read(x)] to a read on one of the physical copies xi [Read(xi)]. The copy that is read is insignificant from the perspective of the replica control protocol and may be determined by performance considerations. On the other hand, each write on logical data item x is mapped to a set of writes on all copies of x. The ROWA protocol is simple and straightforward, but it requires that all copies of all logical data items that are updated by a transaction be accessible for the transaction to terminate. Failure of one site may block a transaction, reducing database availability. A number of alternative algorithms have been proposed which reduce the requirement that all copies of a logical data item be updated before the transaction can terminate. They relax ROWA by mapping each write to only a subset of the physical copies. The majority consensus algorithm is one such algorithm which terminates a transaction as long as a majority of the copies can be updated. Thus, all the copies of a logical data item may not be updated to the new value when the transaction terminates. This idea of possibly updating only a subset of the copies, but nevertheless successfully terminating the transaction, has formed the basis of quorum-based voting for replica control protocols. The majority consensus algorithm can be viewed from a slightly different perspective: it assigns equal votes to each copy and a transaction that updates that logical data item
Distributed Databases
682 can successfully complete as long as it has a majority of the votes. Based on this idea, a quorum-based voting algorithm assigns a (possibly unequal) vote to each copy of a replicated data item. Each operation then has to obtain a read quorum (Vr) or a write quorum (Vw) to read or write a data item, respectively. If a given data item has a total of V votes, the quorums have to obey the following rules:
schemes have been investigated where the type of consistency between copies is under user control. A number of replication servers have been developed or are being developed with this principle. Unfortunately, there is no clear theory that can be used to reason about the consistency of a replicated database when the more relaxed replication policies are used.
1. Vr Vw V (a data item is not read and written by two transactions concurrently, avoiding the read-write conflict); 2. Vw Vr/2 (two write operations from two transactions cannot occur concurrently on the same data item; avoiding write-write conflict).
SEE ALSO THE FOLLOWING ARTICLES
The difficulty with this approach is that transactions are required to obtain a quorum even to read data. This significantly and unnecessarily slows down read access to the database. An alternative quorum-based voting protocol that overcomes this serious performance drawback has also been proposed. However, this protocol makes unrealistic assumptions about the underlying communication system. It requires that all sites detect failures that change the network’s topology instantaneously, and that each site has a view of the network consisting of all the sites with which it can communicate. In general, communication networks cannot guarantee to meet these requirements. The single copy equivalence replica control protocols are generally considered to be restrictive in terms of the availability they provide. Voting-based protocols, on the other hand, are considered too complicated with high overheads. Therefore, these techniques are not used in current distributed DBMS products. More flexible replication
Database Systems • Hyper-Media Databases • Management Information Systems • Object-Oriented Databases • Relational Database Systems • Structured Query Language • Temporal Databases
BIBLIOGRAPHY Bernstein, P. A., and Newcomer, E. (1997). Principles of transaction processing for the systems professional. San Mateo, CA: Morgan Kaufmann. Gray, J., and Reuter, A. (1993). Transaction processing: concepts and techniques. San Mateo, CA: Morgan Kaufmann. Helal, A. A., Heddaya, A. A., and Bhargava, B. B. (1997). Replication techniques in distributed systems. Boston, MA: Kluwer Academic Publishers. Kumar, V., (ed.) (1996). Performance of concurrency control mechanisms in centralized database systems. Englewood Cliffs, NJ: Prentice Hall. Özsu, M. T., and Valduriez, P. (1999). Principles of distributed database systems, 2nd ed. Englewood Cliffs, NJ: Prentice Hall. Sheth, A., and Larson, J. (September 1990). Federated databases: Architectures and integration. ACM Computing Surveys, Vol. 22, No. 3, 183–236. Yu, C., and Meng, W. (1998). Principles of query processing for advanced database applictions. San Francisco: Morgan Kaufmann.
Documentation for Software and IS Development Thomas T. Barker Texas Tech University
I. DEVELOPMENT DOCUMENTATION II. PURPOSE AND SCOPE OF DEVELOPMENT DOCUMENTATION III. DEVELOPMENT MODELS IN SOFTWARE IV. TYPES OF DEVELOPMENT DOCUMENTS
V. USER DOCUMENTATION VI. PURPOSE AND SCOPE OF USER DOCUMENTATION VII. USER DOCUMENTATION DEVELOPMENT METHODOLOGIES VIII. TYPES OF USER DOCUMENTS
GLOSSARY
programming activities required to produce the computer program. User documentation comprises those texts that teach, guide, and support work by novice and experienced users of the computer program. Often, but not always, these activities occur simultaneously, during development, so that the information obtained through both program user requirements research and audience analysis can inform and improve the actual product development. More commonly, however, the program is fully or nearly developed before it is turned over to the writers for user documentation. The focus with documentation has traditionally been on the implementation side of computer software and hardware production, as opposed to the marketing side or the social and cultural side. It articulates activities associated with a recurrent cycle of development and implementation leading to more refined development and implementation. It uncovers and makes available to others in the discourse community the developers’ thinking on how to make the best products possible within the constraints of users and organizations. By its very nature, documentation (especially the audience analysis and system testing activities) captures the needs of a population and makes it available in rich detail to those with the necessary technical skill to respond appropriately with yet better programs. In fact, the value of documentation comes from its ability to articulate important human activities, ones that operate at the very core of the information revolution.
application A computer program, usually the “application” of a file-sharing, client-server, router, or other specialized technology. developer A manager or producer within an organization who is responsible for creating software programs. development methodology A series of stages of a software or hardware creation following a pattern based on experience and theory of program design. Information Process Maturity Model (IPMM) A tool to create and measure processes used for technical publications whereby a core process repeats and replicates itself into a self-sustaining and constantly improving activity. software A set of computer instructions, called “code,” designed to perform a purpose. specifications (specs) A document containing requirements for a computer program or software manual. It tells the contents and purpose of the program or document.
DOCUMENTATION FOR SOFTWARE AND INFORMATION SYSTEMS DEVELOPMENT is the text or discourse
system whereby computer programs and systems used by organizations and society are created, explained, analyzed, and taught to the people who use them. Documentation falls into two types: development documentation and user documentation. Development documentation comprises those texts that support the
Encyclopedia of Information Systems, Volume 1 Copyright 2003, Elsevier Science (USA). All rights reserved.
683
684 Software documentation, both of the development and user type, has had to struggle against rejection by the community of users it dedicates itself to serving because of inconsistency in quality. Inconsistent quality has shaped the character of the documents produced to support computer programs. For instance, poor quality in early spreadsheet documentation sent writers to libraries and other design resources to reshape their work into something more functional and user-based. Thus, inconsistent quality shaped a whole trend in cognitive-psychology based document design techniques. Current trends in usability testing indicate another example of the shaping of design by reactions to charges of inconsistent quality. In the area of development documentation, developers have to deal with a number of problems that plague software engineering: flaws in the design and development process, lack of quality assurance, and lack of consistent testing of product. The most important charge lies in the area of consistency in development. Because of the antiquated, design-first model in use, developers and clients had little way to realistically predict quality in software systems. Lack of an efficient process led software developers to develop more user-responsive methods—a process that led to many failures as well as successes and continues to this day.
I. DEVELOPMENT DOCUMENTATION Development documentation comprises those documents that propose, specify, plan, review, test, and implement the products of development teams in the software industry. Those products include programs as operating systems (Linux, Microsoft Windows), application programs (Microsoft Quicken, Eudora), and programming languages (C , Visual Basic). Development documents reside on computer drives where librarians—so designated members of a development team—keep track of the information generated in them, making the most recent versions available to persons who will use it. Development documents include proposals, user or customer requirements descriptions, test and review reports (suggesting product improvements), and self-reflective documents written by team members, analyzing the process from their perspective. Most development documents go into the company’s knowledge archive, a resource that becomes increasingly important with the development of new information tracking and control called “knowledge management.”
Documentation for Software and IS Development
II. PURPOSE AND SCOPE OF DEVELOPMENT DOCUMENTATION The purpose of development documentation is both to drive development and track development. So it serves to create product and also create information about development itself. In driving development it represents communication among members of the development team. Table I shows the members of a software development team and their duties. The roles of development team members can vary greatly depending on specific projects. For example, some projects require intensive design, so a graphic artist or design specialist may join the team, or the quality assurance function might fall to separate persons such as a usability tester or “configuration manager” (someone who monitors the development stages themselves and the communication among members). But despite variations in the team, all members contribute to improving the software quality. The onset of electronic communication tools— e-mail, file transfer, web pages—has greatly helped development team members communicate among themselves about their work. As we will see below, useful electronic forms such as discussion lists have developed to allow for user information, in the form of requirements or suggestions, to enter in a meaningful way into the development process and affect the outcome. However, of the two purposes for development documentation, the tracking function, not the project management function, has led to greater quality improvement in software because it allows managers and team members (programmers, client representatives, writers, and testers) to reflect on their efforts and improve on them in subsequent development cycles. Typically, tracking information consists of production totals, cost totals broken down into categories of resource, cost per page of manuals, total man-hours on a project broken down into project participants, lines of code per hour, and lines of code per module. Tracking information also consists of usability data that publication and production designers can apply to the product cycle. Sophisticated project management systems allow managers to track finely grained behaviors that, theoretically, allow for optimal performance. Tracking information for development projects surged with the introduction, during the 1980s, of computerized management systems in business and industry—the same wave that put millions of novice users in front of computer screens. Tracking information, which has a predictive aspect, collects the data in a timely fashion so managers can adjust the process.
Documentation for Software and IS Development Table I
685
Software Development Team Members
Role on the development team
Duties
Kinds of documents
Client
Assess investment potential of the project; arrange for funding
Requirements specifications, product review forms
Developer
Guides the project participants; arranges with client
Proposal, project correspondence
Project manager
Organizes and keeps the project on schedule; provides resources for members
Program specifications, document specifications, project plan, review report
Designer
Uses the software requirements document to create a design for the program
Design specifications
Programmer(s)
Writes computer instructions that conform with the design specifications document
Computer program files, internal program documentation, test reports
Technical writer(s)
Performs a user analysis; designs and writes user documentation
User’s guide, user analysis report
Quality assurance manager
Plans and executes usability tests and process checks
Test report, quality assurance report
Tracking information also helps inform managers’ strategy and enhance their strategic power in development meetings. The information, collected efficiently and analyzed correctly, helps support the writer as advocate for user involvement in documentation. Finally, tracking information in the form of performance statistics for individual development team members can affect the hiring trends and, thus, the character of subsequent work in the mature organization.
III. DEVELOPMENT MODELS IN SOFTWARE Development models derive from the engineering community—such as the Software Engineering Institute at Carnegie Mellon University—or whatever discipline in which the product development occurred. These models basically follow two patterns: the waterfall method and the rapid application development method (both described below). The waterfall method benefits from the clear specification of the product at the time of design, and then the execution of that design specification through a series of stages based on institutional departments until the product is complete. The rapid application development method benefits from the experience of building a prototype or model of the entire product, and then testing that model and reforming it according to the test results until the product is complete. The third pattern, the object modeling method, varies the pattern of both types by allowing for both upfront design
and prototyping but providing an improved coordination with programming.
A. Waterfall Method and Its Documentation The waterfall method of software documentation consists of a series of stages called “phases” of the development life cycle. The life cycle of the product includes the stages in Fig. 1. The waterfall method derives its name from the stair-step fashion by which development events proceed from one stage to another. This development method assumes that all or most of the important information about user requirements is available to the development team at the beginning of the project. The development team (usually cross-functional) then follows the stages from idea to implementation, with each stage building on the next and not really going back. The process assumes a sign-off from one phase to the next as each phase adds value to the developing product. Because of the phase-by-phase structure in this model, it is often used best to develop complex products requiring detailed specifications and efficient team communication. The degree of communication required to make this model work makes it difficult to handle in anything but a mature organization where resources, processes, and communication behaviors and protocols are well established. In the waterfall method of development a heavy emphasis falls on the product specification document.
Documentation for Software and IS Development
686
Product Concept
Product Concept
Design and Implement Prototype
Requirements Analysis System and Software Design Code and Debug Operation and Maintenance
Figure 1 The waterfall method.
Usability Test Release and Maintain
Figure 2 The rapid application development method.
Seen as a blueprint for the entire project, the specification document needs to communicate with all the development team members (programmers, quality control persons, writers, sponsors, clients, managers, supervisors, and process control representatives). In the best of projects the product specifications document gets updated regularly to maintain its function as the central, directing script of the development. More commonly, however, the specification document gets forgotten as the programmers default to what is known as the code and fix process. The code and fix process is extremely time consuming and inefficient as it follows a random pattern of reacting to bugs and problems instead of a coordinated, document-driven process. The communication overhead required by the code and fix process can soon wreck the schedule and consume the entire remaining project budget. Besides that drawback, often the market window for a product would close up before this time-consuming process resulted in marketable product.
waterfall method, the idea is to use prototypes of products (software program interface mockups) to draw design requirements from actual users and implement them quickly. Quick implementation (up to 1/3 the time used for the waterfall method) allows for teams to work more responsively to market demands and the demands of technological advances that also drive development. The rapid application development model requires a very high-tech environment where software interface design tools and publications management processes allow for fast response to the ever-evolving specification of user needs requirements. For example, extensive usability testing often requires a lab or coordination with other testing groups in an organization. Budgeting and managing of tests, recruiting of test subjects, and planning for extra redesign meetings can slow down a process designed to be flexible and streamlined.
B. Rapid Application Development Method and Its Documentation
C. Object Modeling and Its Documentation
The rapid application development (or RAD) method of development follows a different philosophy than the waterfall method. It capitalizes on one main problem of the waterfall method: the gradual divergence of what the programmers actually create from the original specifications (a digression sometimes called “code creep”). As illustrated in Fig. 2, the rapid application development method places an emphasis on user involvement (during cycles of testing) as an ongoing source of design innovation. In contrast to the
Object modeling is a software development methodology that requires the developers and customers to express and record user requirements and development tasks using a consistent, highly abstract annotation system. Known as a modeling language, the annotation system for object modeling results in case models (embodying user specifications) that allow all the members of a development team to work in parallel, thus creating a very time-efficient method of production (see Fig. 3).
Documentation for Software and IS Development
687 tests, reviews, and all the other tasks surrounding organizational projects. External development documents are documents intended for customers and other users of the finished software product.
Product Concept
Model of User Case
A. Product Specification (Internal)
Parallel Testing and Production Integration of Testing and Production Final Testing and Release
Figure 3 The object modeling development method.
Object modeling allows developers and programmers to analyze the user’s experience (say, “renting a video,” or “filling a shopping cart”) in ways that lead to greater degree of control over programming and development. This control comes from the ability of standardized languages such as the OPEN Process Specification and the Unified Modeling Language to express relationships among user requirements. Using the video rental store as an example, the case of “user rents a video” (which describes a user activity with the system) can be made dependent on other cases, such as “registers as a customer” and “has no outstanding rentals.” With a finely grained analysis of dependencies and relationships among user requirements universally specified, the development team can cooperate independently and with increased efficiency. At the level of management of the documentation process, the object modeling requires many of the same documents as the waterfall or the rapid application development methods. However, it relies heavily on diagrams (such as the user case model diagram or the business model diagram) and on declarations of modeling standards used in the particular project. These very complicated and conventional diagrams play the key role in describing the abstract model of user tasks and activities that lies at the heart of the object modeling method.
A product specification document, often referred to as a product “spec” is the document that tells the persons in the project (developers, programmers, writers, marketers) what the outcomes or deliverables will be. These deliverables include the program and user documents. The specification document itself contains an overview of the concept for the software product, tools for production, list of program features, user requirements, and overall design of the program code.
B. Project Management Plan (Internal) A project management plan consists of statements that describe the actions taken by the development team using whatever method chosen to produce the software project. Like project plan documents in other engineering fields, it takes its shape and form from engineering management models. The management plan includes a schedule, project task list, list of responsibilities of team members, and any other documents (in an archive) relating to the project.
C. Internal Code Documentation (Internal) Internal code documentation consists of written text inside the actual program files that records information about a specific modules of code. A program that contains thousands of modules (each about a paragraph or so of outline-looking words and symbols) could contain an equal number of brief descriptive paragraphs written by the programmers at the time the code was written and tested. Internal code is written using special “remark” tags that distinguish the programmer’s notes from the program code itself. Internal code documentation can help users who adapt programs (scripts such as java scripts and cgi scripts) and require only an explanation of how a module works and what variables are in it to make it work for them.
IV. TYPES OF DEVELOPMENT DOCUMENTS D. Test/Usability Report (Internal) The genres of development documents fall into two categories: internal and external. Internal development documents track project status, report on changes,
The test report is one of the most important information gathering and synthesizing documents in the
688 array of development documents. It records the results of the various forms of product testing done by programmers, writers, and, increasingly, usability experts on the development team. Increasingly, with object modeling models and rapid development, usability testing has become not just something done after a program is finished, but a method of development itself, allowing software engineers and technical writers to work closely with clients in application development. Test reports usually contain descriptions of test situations, criteria for evaluators, test protocols, presentation of results, discussion of results, and a listing of rectifications done to the product as a result of the test.
E. Maintenance Documentation (External) Maintenance documentation is text, usually step-bystep procedures, that inform programmers and system administrators how to fix (sometimes called “patch”) a program once it is in full operation. Maintenance also means resetting files after business cycles and other routine, nonproblem oriented programming. This kind of maintenance usually refers to large, enterprise programs such as those used in government, utilities, and manufacturing and shipping. Such programs do not get replaced often and frequently require programmers to add or alter functionality. Programmers rely on maintenance documentation (and internal code documentation, discussed above) to keep the program running and responding to evolving user requirements.
F. System Overview Documents (Internal) System overview documents are those documents that describe the overall technical and functional structure of a program. Readers of these documents are typically administrators in charge of the department supported by the technology, and sometimes software and hardware vendors interested in bidding on new systems or reengineering the old ones. System overviews themselves look like charts with boxes arranged in ways that represent the whole system, use cases, user scenarios, and other design and management documents.
V. USER DOCUMENTATION User documentation refers to texts directed to an audience of users of computer systems (as opposed to those who build, sell, and maintain it). Users make up
Documentation for Software and IS Development a very diverse group because they work in all fields of industry, government, and education. The growth of diversity of users, in fact, is one of the most interesting stories in software documentation. Before the mid to late 1970s “computers” consisted of enormous mainframes that worked primarily with numbers, creating counts and calculations for finance and government. They maintained databases of tax, payroll, inventory, and other business and government related information. User documentation (computer manuals) at this time consisted of descriptions of systems, emphasizing the control structure (modules) in the program and the interface (menus). Descriptions of interfaces, for example, emphasized menu structures (“Using the Data Entry Screen”) instead of the familiar step-bystep of today (“How to Enter Client Data”). As computing technology became much smaller and cheaper during the early 1980s, in the form of the personal computer, many people began using the computing machine. These people, of course, did not have the training in computer science to help them figure out how to use software. So, overall, advancements such as competing operating systems and ever increasing processing speeds kept users dazed and confused. To respond to the large population of naïve users with highly technical information needs, technical communicators looked to related areas of research and began appropriating ideas to help get computer ideas across to people who knew little about programs they used. For example, the academic literature of this period reflected explorations of cognitive psychology as a feeder discipline for technical communication. Cognitive psychology emphasizes understanding through mental models. Writers and document designers reckoned that they could help users master the hurdles of reconceptualizing computers by using information structuring and presentation methods developed in cognitive psychology. Other disciplines explored for document design solutions included: communications theory, rhetoric, end-user computing, human factors, artificial intelligence, and various information design theories and methods (user-centered design, Information Mapping, Standard Typography for Organizing Proposals). User documentation matured in the 1990s, absorbing the Internet and the shift from single computers to networked, client/server configurations and functioning much more efficiently as a delivery system for technical information to naïve and intermittent users. Documentation shifted in the early 1990s to both print and online (using Microsoft’s WinHelp help compiler program) and in the early 2000s is predominantly an online medium with print as a backup media. The influences from feeder disciplines has re-
Documentation for Software and IS Development sulted in a highly flexible information product, delivering information on demand employing a heavy reliance on usability-based development methods, minimal presentation, and single-sourced information architectures.
VI. PURPOSE AND SCOPE OF USER DOCUMENTATION Along with the shift of users from geeks to the general public, the purpose and scope of user documentation have evolved from a focus on narrow-minded instruction in program features to a focus on integration of software into workplace surroundings. Document designers can reflect elements of the user’s workplace and precise information needs in document products so that the user is not only able to use the program but flourish at work with it. The purpose of documentation, thus, has grown to encompass designing the user’s experience—taking a holistic and systematic view of the user’s training and information needs—and responding to ensure peak performance. The next section examines some of the methods currently in place for developing software documentation.
VII. USER DOCUMENTATION DEVELOPMENT METHODOLOGIES Developing user documents follows methods derived from two groups of researchers: the engineering community and the English studies community. In fact, as the population of computer users changes, matures, grows, and learns, methodologies have also changed. Each methodology below implies a different method of using information about user needs in document development. The choice of method depends highly on the type and style of organization, the organizational culture, and compatibility of publications production and management with product development and its teams. In many workplaces, the real case aside, you use the methodology you say you use, without making sharp distinctions between one approach and another. The terms “task analysis” and “minimalism” are used fairly loosely.
A. Task Analysis Of all the methodologies for developing successful user documentation, the task analysis has gained more followers and resulted in more overall productivity in
689 the workplace than any competing types of document design. Certainly task-oriented manuals fit the information rich workplace of today better than the warmed-over system documentation that companies used to shrink wrap and foist on the bewildered public as “manuals.” Overall the primary difference between task analysis and the default method of document design lies in the definition of the workplace “task” as the primary unit of information. It localizes the solution to the learning problem as one of content. As content, tasks differ from jobs in that a job can consist of many tasks. Tasks become the primary content development method, breaking as they do into discrete stages and easy-to-follow steps. In contrast to manuals using a descriptive orientation (as characterized documentation from the earlier era) tasks represent an organization catering to the use of the program by a professional in the workplace. Because tasks represent the structure of jobs and, ultimately organizations, task orientation in manuals means that manuals reflect significant information about the readers’ organization.
B. Minimalism The minimalist methodology for organizing user documentation is based on the principles of cognitive psychology and learning theory. Those theories posit that the user brings an understanding or “schema” of thought to a task and the idea of instruction is to make sure that the learner’s schema, after the instruction, matches the appropriate one needed to operate the software. For example, a learning of information on financial transfers needs to acquire the “schema” of electronic funds transfer (as in altering computerized records) instead of the schema of physical funds transfer (as in an armored truck). The easiest way to create the appropriate schema in the user’s mind, minimalists argue, is to expose the learner to the kinds of tools and expectations offered by a technology (a software program) and allow him or her to make productive connections independently as part of the learning. The idea behind minimalism underlies other attempts by instructional theorists to create “learning environments” in which situations do the teaching. Clearly, theorists reason, such learning would guarantee an individual experience for each learner and thus enhance the transfer of knowledge into the workplace. Such experiential learning is based, in the case of the primary minimalist researcher today, John Carroll, formerly of IBM, on the following four principles: (1) emphasize learner actions, (2) provide tools
690 relevant to the user’s task, (3) support error recognition and recovery (to mirror natural learning), and (4) support different learning purposes: doing, understanding, and locating. While these principles underlie the minimalist approach, they belie the revolutionary nature of the manuals its practitioners create. Minimalist manuals bury the step-by-step approach, preferring stories (“scenarios”) that mimic workplace actions. These manuals also are a lot shorter than the compendium types often produced using task orientation. Shorter manuals with less verbiage allow for open-ended, yet highly focused learning.
Documentation for Software and IS Development The key to the IPMM, like the SEI CMM, is to identify key practices as part of a repeatable process, and then convince your team and organization to follow the process while at the same time developing ways to measure and improve it. Once measurements are in place the process just gets better and better, with constant, monitored improvement as the goal. As you can see, such an approach to developing publications would require considerable interaction among diverse groups of people: developers working in parallel or in coordination with publications, personnel who monitor and evaluate one another’s processes, managers and supervisors who may need to be convinced of the wisdom of the best practices approach, and so on.
C. Usability VIII. TYPES OF USER DOCUMENTS The usability approach to software documentation takes its lead from the rapid design prototyping of its sibling industry: software engineering. Like software engineering, the usability approach starts with an information technology basis, for the simple reason that its main requirement—repeated document testing—can hardly work in the slow-poke world of print. The usability approach seeks the solution to the software learning problem in design issues. Design issues, such as navigation, hyperlinking, and so forth, can be measured and improved based on user characteristics (as in usability based system design). Thus, the emphasis in this approach falls on navigation, structure, search and site mapping, and other technically oriented measures that attempt to achieve a usable information interface. Drafts, help-system prototypes, and web help pages are documents that lend themselves to the exhaustive prototype testing and constant revision required of the usability testing approach. The usability method is also useful in the design of interface-based help systems such as performance enhancing, embedded help systems.
D. Information Process Maturity Model The most successful and fully developed model of document production is, beyond doubt, the Information Product Maturity Model (IPMM) described by Joann Hackos and used in many organizations as the foundation of their quality documentation processes. Used to rank organizations and measure maturity of processes, this model is not so much a methodology as it is a model of how to run your methodology. It encourages the publications managers to develop consistency and measurability in processes.
User documents fall into categories of purpose, and there are three general purposes for reading associated with software use: reading to learn, reading to do, and reading to understand. These three purposes relate roughly to three kinds of documents: tutorials that teach, procedures that guide action, and reference manuals that explain in detail. Other interesting contrasts can be found among these purposes, but for the most part they identify the top-level categories of user documentation for software systems.
A. Procedural Procedural documents are based on step-by-step units organized into segments that, ideally, should follow the user’s workplace activities. In fact, the procedure gains its strength from the user’s interest not in the document, but in using the document to put the program to work. Procedural documents include users guides and online help.
1. User’s Guide The User’s Guide is the most popular and useful type of user documentation because it essentially encapsulates all that a user can do with a program between two covers. Pages of users guides are filled with stepby-step procedures that lead or “guide” the reader from one action to another, chronologically to a specified end result. Examples of procedures include: • How to install DocPro • Using the Client Interview feature to set up accounts
Documentation for Software and IS Development • Printing your work • Importing files from other word processors • Obtaining server use statistics by IP number User’s guides are usually organized by categories of software features. Software features often are grouped by most frequently used program modules or menus, sequences of jobs or workplace tasks, or some other method relating to the user’s need for intermittent assistance. Information access in user’s guides usually involves a table of contents, an index, a list of figures and tables, header and footer information, and chapter indicators (tabs, icons, colored pages, edge bleeds).
2. Online Help Online help began as the online equivalent of the user’s guide, but quickly developed into a variety of formats for delivering step-by-step and other information while the user was actually in front of the screen and got, for some reason, stuck. For this reason, online help satisfies the user’s need for point of need troubleshooting by providing a variety of access methods to documents. The typical online help file contains what is known as the three-pane layout: help buttons across the top pane (back, next, etc.), clickable table of contents in the left vertical pane (Getting Started, Setting up Accounts, Adding New Vendors, etc.), and the information (procedure or help topic) in the information window or viewing area under the top pane. Because of their electronic nature, help systems, usually developed using Microsoft’s help development program WinHelp, can provide more than one table of contents in the left-hand pane, where typically the access methods are: (1) table of contents, (2) keywords, and (3) index. The user can click on a simulated index tab at the top of the left pane and select from the topics listed to view a specific procedure in the righthand viewing area.
B. Reference Reference documentation represents manuals whose purpose it is to provide background information about all the elements of a functionality of a program (and sometimes its source code). Reference documentation is most familiar to advanced users who know how to operate a program but also need to maintain it and in some cases modify it. To perform these tasks users need descriptive information about the features of the software from a programmer’s point of view: What
691 source data files does the program use? What variables does the program identify?, and so on. Reference documents take their organization from the information they contain: patterning sections of manuals on program modules, alphabetical order, numerical order, and so on. For this reason they are often referred to as “system manuals.” Reference documents, whether online or hard copy, rely heavily on indexes and search keywords to help experienced users locate important but small amounts of information quickly.
1. System Manual The system manual is usually a system-oriented document that lists functionalities and program parts and explains what they do and how they do them. Often this form of documentation consists of lists and tables of data telling program modules, overviews of how the systems work, and explanations of program logic. A typical system manual might have the following contents: • • • • • •
Overview of System Modules Functional Modules Program Applets List of Data Files Database Specifications Program Integration Codes
2. Troubleshooting Manual The troubleshooting manual is a hybrid of procedures and tables that help maintenance programmers and administrators configure and set up programs. Readers for troubleshooting manuals are usually reacting to problems with the system and usually seek a procedure or explanation to fill their information needs. Troubleshooting manuals usually follow a problemby-problem organization, or take the form of a decision tree. Problem
Solution
System dead
Check plug
System frozen
Pull plug
3. FAQ and User-Based Documents FAQs and user-based documents are a newer breed of document that takes the idea of user functionality to the next level: gaining the user’s participation in developing content for manuals. As the numbers of computers grows with a system its users begin to apply it in
692 different situations, and with communication technologies (Internet form and e-mail lists) can communicate their questions, successes, and failures to one another and back to the company. Documents so developed are called user-based documents because their content consists of the discourse among knowledgeable, fellow users. Such documents also channel traffic away from the help desk and support personnel, acting as a filter for support center callers. The disadvantage, of course, is that often the questions are so narrowly focused that they don’t, in reality, get asked very frequently. The other disadvantage of user-based documents is that they lead to formulaic FAQs consisting of program features repackaged as “questions,” as in: How do I use the Print feature? What are the advantages of upgrading to a more expensive version of this program?
C. Tutorial Tutorial documentation is writing that intends to teach a program to a user so that the user can perform the tasks from memory. Tutorials often focus on the main tasks associated with what is referred to as the “typical use” scenario. This scenario consists of the steps a user takes to perform the most common program tasks. Typical uses of a word processor, for example, would include opening a program, entering text, saving the text, and closing the program. The idea behind tutorial user documentation is to encourage adoption of the program’s features by the user by teaching him or her the basics and letting job motivations take over. Tutorial user documentation consists of quick start guides, job performance aids, wizards, and demonstrations.
1. Quick Start Guide The Quick Start Guide is a form of tutorial that takes a subset of program functions (usually ones related closely to the typical use scenario for the target reader) and leads the reader through them as an introduction to the program. Quick start guides often will bypass the usual steps as an attempt to satisfy the user who is impatient with plodding through the user’s manual. Often they are detached from the main set of documents and presented in brochure, pamphlet, or poster-board media.
2. Job Performance Aid Job performance aids are a kind of tutorial-ondemand for users. Known as JPAs, performance aids
Documentation for Software and IS Development make training in the software more usable by making it available when the user really needs it. The design of performance-based documentation systems derives from research on training and aims to provide opportunities for learning to the user while on the job. In its more advanced forms job performance is informed by knowledge management processes within an organization. Because the user can determine which training to implement and when, job performance aids support autonomous learning—a pattern consistent with current viewpoints of the knowledgemanaged workplace. a. WIZARD Wizards are types of online tutorial documents that come to the user’s rescue when he or she faces a difficult—long, technically complicated—task and needs teaching on the spot. Wizard information can encompass everything from basic skill instruction to complicated, advance procedures (that only a “wizard” would know). Common situations requiring wizards would be setting up an initial web site (requiring much configuration) or importing spreadsheets from other programs. The wizard is a true hybrid form, merging procedural aims (just guide and don’t teach) and instructional aims (teach through performance support). Wizards also rely on technological delivery, as the most immediate media for delivering the information is the user’s computer, operating system, and browser window. Structurally the wizard technology interacts with the user following a step-by-step sequence. Along the way the wizard program performs the background work (calling up screens, searching for programs, checking for hardware, and so on) while the user can focus on supplying just the specific information needed in the workplace. For example, if I need to register the person sitting across the desk from me, I want to concentrate on getting the right information from that person—name, address, e-mail, and so on— and not have to worry about finding and opening the registration database, checking and saving the information, and closing the database. From my point of view, all I need to see is a series of sentences explaining what I’m supposed to submit and then confirmation that I have performed the task successfully. To the user the process is almost seamless. b. EMBEDDED HELP Embedded help is a form of performance aid that includes procedural and other information in the interface itself. An example of embedded help is those sentences that appear above data entry fields that tell
Documentation for Software and IS Development you whether the field requires case-sensitive text or not. In other examples, you will find more elaborate and systematized presentational methods. The goal of all embedded help is to eliminate the need for a manual or help system by providing screen task basics as a part of the interface. Being part of the interface means that embedded help systems rely on working with software engineers to develop the interface. While embedded help has the advantage of easy access, it also gives little control over the content of the help system to the user. Other examples of embedded help include mouse-over text, popup text, stretch text, and image alternative text.
SEE ALSO THE FOLLOWING ARTICLES Database Development Process • End-User Computing Managing • End-User Computing Tools • Ergonomics • Prototyping • System Development Life Cycle • Systems Design • User/System Interface Design
BIBLIOGRAPHY Barker, T. T. (2002). Writing software documentation: A taskoriented approach. New York: Allyn & Bacon.
693 Bødker, S. (1991). Through the interface: A human activity approach to user interface design. Hillsdale, NJ: Erlbaum. Boehm, B. W. (1988). A spiral model of software development and enhancement. IEEE Computing, Vol. 5, 61–72. Carroll, J. M. (Ed.) (1998). Minimalism beyond the Nurnberg funnel. Cambridge, MA: MIT Press. Greenbaum, J., and Morten, K., (Eds.) (1991). Design at work: Cooperative design of computer systems. Hillsdale, NJ: Erlbaum. Haas, C. (1996). Writing technology: Studies on the materiality of literacy. Hillsdale, NJ: Erlbaum. Hackos, J. T., and Redish, J. C. (1998). User and task analysis for interface design. New York: J. Wiley. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press. Miller, C. R. (1984). Genre as social action. Quarterly Journal of Speech, Vol. 70, 157–178. Mirel, B. (1998). Applied constructivism’ for user documentation. Journal of Business and Technical Communication, Vol. 12, No. 1, 7–49. Nardi, B. A., and O’Day, V. L. (1999). Information ecologies: Using technology with heart. Cambridge, MA: MIT Press. Norman, D. A., and Draper, S. W. (Eds.) (1986). User centered system design: New perspectives on human-computer interaction. Hillsdale, NJ: Erlbaum. Winsor, D. (1999). Genre and activity systems: The role of documentation in maintaining and changing engineering activity systems. Written Communication, Vol. 16, No. 2, 200–224. Zuboff, S. (1988). In the age of the smart machine: The future of work and power. New York: Basic Books.