Advances
in COMPUTERS VOLUME 17
Contributors to This Volume
Hsu CHANG TIENC H ICHEN MINGT. LIU NAOMISAGER ALAN F. W...
56 downloads
1512 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 17
Contributors to This Volume
Hsu CHANG TIENC H ICHEN MINGT. LIU NAOMISAGER ALAN F. WESTIN W . A . WOODS
Advances in
COMPUTERS EDITED BY
MARSHALL C. YOVITS Department of Computer and lnformation Science Ohio State University Columbus. Ohio
VOLUME 17
ACADEMIC PRESS
New York
San Francisco =London-
A Subsidiary of Harcourt Brace Jovanovich, Publishers
1978
COPYRIGHT @ 1978, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. N O PART O F THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDINL;, OR ANY INFORMATION STORAGE AND R E r n I E v A L SYSTEM, WITHOUT PERMISSION IN WRII'ING FROM THE PUBLISHER.
ACADEMIC PRESS, TNC.
111 Fifth Avenue, Ncw York, Ncw
York 10003
United Kitrgdortl Editiorr priblislred by ACADEMIC PRESS, I NC. ( L O N D O N ) LTD. 24/28 Oval Road, London N W I 7DX
LIBRARY O F CONCiRI.SS
CATALOG CAHD
NUMBER:
ISBN 0-12-0121 17-4 PRINTEI) IN THE UNITED STATES O F AMERICA
59 -15761
Contents
17 . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTRIBUTORS T O V O L U M E
iX
PREFACE .
xi
Semantics and Quantification in Natural Language Question Answering
.
W . A Woods
I. 2. 3. 4. 5. 6. 7.
8. 9. 10.
II. 12.
Introduction . . . . . . . . . . Historical Context . . . . . . . . Overview . . . . . . . . . . . The Meaning Representation Language . . The Semantics of the Notation . . . . . Semantic Interpretation . . . . . . . Problems of Interpretation . . . . . . Post- I n t erpret i ve Processing . . . . . An Example . . . . . . . . . . Loose Ends. Problems. and Future Directions Syntacticisemantic Interactions . . . . Conclusions . . . . . . . . . . References . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
2 4 8 11 21 24 37 54 58 64 75 84 86
Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base Naomi Sager
1 . Introduction . . . 2 . Principles and Methods 3 . Computer Programs for 4 . Applications . . . References . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
of Analysis . . . . . Information Formatting .
v
. . . . .
. . . . .
89 96 115
151 159
vi
CONTENTS Distributed Loop Computer Networks Ming T . Llu
I. 2. 3. 4. 5. 6. 7.
Introduction . . . . . . . Message Transmission Protocols and Loop Interface Design . . . . Network Operating System Design User Access and Network Services Performance Studies . . . . . Conclusion . . . . . . . . . . . . . . References
.
.
.
.
. 164
Formats .
.
.
.
.
.
.
. 169 . 178 . 183
. . . . . . . .
.
.
.
.
.
.
. . . . . . . . 195 . . . . . . . . 206 . . . . . . . . 215 . . . . . . . . 216
Magnetic Bubble Memory and Logic Tien Chi Chen and Hsu Chang
Introduction
.
.
.
.
.
.
.
.
1 . The Magnetic Bubble Phenomenon . 2 . Bubbles as Memory . . . . . .
3. 4. 5. 6.
. . . . . . . 224 . . . . . . . 225
.
Magnetic Bubble Logic . . . . . . Steering of Bubbles for Text Editing . . Storage Management . . . . . . Sorting . . . . . . . . . . . 7 . Information Selection and Retrieval . . 8 . Summary and Outlook: More than Memory Keferences . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
. . . . .
. . . . . .
232 243 252 257 269 274 . 278 . 279
Computers and the Public’s Right of Access to Government Information Alan F.Westin
I . Information Technology and Government Secrecy . 2 . Computer Impact on Public Access: Reports from the Information-Holders and Information-Seekers . . 3 . An Analysis of the Access Situation . . . . . 4 . Recommendations for Action . . . . . . .
. . . 283 . . . 292 . . . 309 . . . 311
CONTENTS
5 . The Future of Information Technology and Democratic Government . . . . . . . . . . . . . References . . . . . . . . . . . . . AUTHOR INDEX . . . . . . SUBJECTINDEX . . . . . . CONTENTS OF PREVIOUS VOLUMES.
vii
. . 314 . . 315
. . . . . . . . . 317 . . . . . . . . . 322 . . . . . . . . . 329
This Page Intentionally Left Blank
Contributors to Volume 17 Numbers in parentheses indicate the pages on which the authors’ contributions begin.
Hsu CHANG,IBM Watson Research Center, Yorktown Heights, New York 10598 (223)
TIEN C H I CHEN,IBM Sun Jose Research Laboratory, Sun Jose, California 95193 (223) MINGT . L I U Department , of Computer and Information Science, Ohio State University, Columbus, Ohio 43210 (163) NAOMISAGER,Linguistic String Project, New York University, New York, New York 10012 (89)
ALANF . WESTIN,Department of Political Science, Columbia University, New York, New York 10027 (283) W. A. WOODS, Bolt Beranek and Newman Inc., Cambridge, Massachusetts 02138 ( 1 )
ix
This Page Intentionally Left Blank
Preface
Volume 17 of Advances in Computers treats several topical, significant areas of major interest in the very dynamic computer field. Since the first volume of this serial publication appeared in 1960, computer and information technology and applications have experienced a number of major revolutions, each changing the field significantly. The current situation, primarily involving microprocessing and microelectronics, is no exception, and changes are taking place at an ever increasing rate. Over the years, Advances in Computers has attempted to cover the major areas of interest in computer and information science and technology, accurately presenting some of the more important trends. This serial publication is highly regarded. It is the oldest continuous series in this field. At this time, it is appropriate to quote from the Preface of the first volume of the series dated April 1960: “. . . advances in other fields have proved very successful in maintaining a sense of unity in these areas. It is hoped that the present venture will accomplish the same for the computer field.” I would like to think that Advances in Computers has done just that. The dynamic nature of the field has made this a difficult but exciting task. The present volume treats some complementary topics. There is the consideration, always of interest but now of particular emphasis, of natural language processing. Hardware is considered for extending the capabilities of microprocessors and for use in, and with, microprocessors. Finally, the volume is concerned with an area of the utmost importance, namely, public access to the vast amount of information collected and stored by the United States Government. There are two articles that deal with the increasingly significant area of natural language processing. This is of interest first because of the desirability of communicating with computer systems by natural language, the most convenient medium for humans, and secondly it is recognized that the more we understand about natural language processing the more we begin to know about the fundamental nature of human intelligence in general. In the first of these two articles, William Woods, of Bolt Beranek and Newman, Inc., discusses the developments in communication between humans and machines, from the early requirement that the human use machine language to the present situation, which permits communication involving notations which are natural and meaningful to human users. He xi
xii
PREFACE
considers the possibility that the ultimate product of such an evolution will be the use of a person’s own natural language. Several natural language question-answering systems that Woods and his colleagues have constructed in the past are reviewed and summarized. He presents a number of techniques and discusses the relative advantages and disadvantages of a number of alternatives leading toward natural language question-answering systems. He considers several examples, such as flight schedules question-answering systems, the LUNAR system which answers questions about the chemical analysis of moon rocks, the socalled augmented transition network (ATN) grammar, and a trip planning and budget management system. A number of future directions of research in natural language understanding are discussed, particularly involving the relationship between syntax and semantics, the understanding of ungrammatical sentences, and the role of pragmatics. He concludes that the field of computational linguistics is reaching a degree of sophistication which permits progress in a more general treatment of pragmatic issues. This will move the field of language understanding, he believes. in the direction of some of the traditional areas of artificial intelligence research. In the second of the two articles on natural language processing, Naomi Sager, of New York University, points out that since language is the major medium for both storage and transmission of information, a theory and model concerning how information is carried by language and how common meanings are extracted from different linguistic forms is important. She approaches this question from the viewpoint of information science and considers applications primarily in the retrieval of science information and in data base management. The problem is considered of how information contained in a collection of documents can be accessed from different points of view for a variety of different tasks. Presently it is required that data be supplied in structured form. The question is raised whether the data base can, if recorded in natural language, also be processed by a computer. Dr. Sager includes a number of potential applications which might be feasible if we could obtain structured data bases from natural language storage. Ming T. Liu, of Ohio State University, considers the area of distributed data processing and computer networking, especially with regard to local computer networks and, particularly, loop computer networks. He is primarily concerned with the formation of a unified distributed processing system from a network of separate computers. He surveys different types of local loop computer networks and discusses typical design problems involving these networks. He concludes that research on distributed loop computer networks shows that these types of networks should be very
PREFACE
xiii
helpful for mini- and microcomputer users in a geographically localized community. He points out that DLCNs have many cost and performance advantages for such computing groups. He believes that these types of networks can become forerunners of future distributed processing systems. Part of the microelectronic revolution has been the development of magnetic bubbles and magnetic bubble memories. Barely a decade after magnetic bubbles were suggested for the use of information storage, these memory chips are now available commercially. T. C. Chen and Hsu Chang of IBM point out that a 92,000 bit bubble memory chip is now available, and a million bit chip will probably be announced shortly. Their article, accordingly, discusses the use of bubbles as memory devices but concentrates on the more recent area of their logical uses, taking advantage of the synchronized movement of bubble memory information. They also review the physical basis of the bubble phenomenon. They point out that many of the key ideas are fundamentally simple, and yet the field is potentially rich in practical applications. Their high density, simple fabrication, low power dissipation, and nonvolatility make magnetic bubbles a very attractive medium for information storage. Logic is a simple extension. They point out that bubble logic requires only si.mple additional controls, whereas electronic logic for the same purpose requires considerable additional cost. They particularly stress the advantage of the nonvolatility of bubbles for reliable performance in hostile environments. They suggest that, with the availability of very large capacity, very low cost bubble devices, truly massive associative memories will soon be practicable. In the final article, Alan Westin, of Columbia University, treats an issue that all of us, as computer professionals and especially as informed citizens, should be very much concerned about, namely the right of public access to Government information. Dr. Westin points out that an increasingly larger proportion of information that Government agencies store about people is currently being automated. What effect does this have on our right of, and ability to achieve, access to this information? He summarizes legitimate fears that have been raised regarding use of these computerized files and access to them. He points out that it is necessary to understand the legal and political settings of Government secrecy, as well as basic patterns of computer usage in Government agencies which have established these automated files. Dr. Westin concludes that we need the Jeffersonian spirit today more than ever. Technology will continue to become increasingly pervasive, and the struggle over its control and uses will intensify. He asks for whose benefit are these new tools being used, and who controls them. Are they on the side
xiv
PREFACE
of civil liberty or of arbitrary authority? Dr. Westin thinks it is time that those concerned with access to the data now be heard, just as the advocates of applying computers to various organizational goals have been heard in the past. Traditionally, the adversary process among informed citizens of varying points of view has been a major impetus toward the development of wise public policy. I am pleased to thank the contributors to this volume. They have given extensively of their time and energy and have made this volume an important, timely, and interesting contribution to the literature. This volume continues the tradition established for Advances in Cornputrrs of providing authoritative summaries of important topics reflecting current growth in the field. Despite the currency of these topics, we expect that this volume will be of long-term interest. Editing this volume has been a most rewarding and pleasant experience.
MARSHALLC. YOVITS
Advances
in COMPUTERS VOLUME 17
This Page Intentionally Left Blank
.
ADVANCES IN COMPUTERS VOL . 17
Semantics and Quantification in Natural Language Question Answering W . A . WOODS Bolt Beranek and Newman Inc . Cambridge. Massachusetts
I. 2.
3. 4.
5.
6.
7.
Introduction . . . . . . . . . . . Historical Context . . . . . . . . . 2.1 Airlines Flight Schedules . . . . . . 2.2 Answering Questions about ATN Grammars 2.3 The LUNAR System . . . . . . . 2.4 TRIPSYS . . . . . . . . . . Overview . . . . . . . . . . . . . 3.1 Structure of the LUNAR System . . . 3.2 Semantics in LUNAR . . . . . . The Meaning Representation Language . . . 4.1 Designators . . . . . . . . . . 4.2 Propositions . . . . . . . . . 4.3 Commands . . . . . . . . . . 4.4 Quantification . . . . . . . . . 4.5 Specification of the MRL Syntax . . . 4.6 ProcedurallDeclarative Duality . . . . 4.7 Opaque Contexts . . . . . . . . 4.8 Restricted Class Quantification . . . . 4.9 Nonstandard Quantifiers . . . . . . 4.10 Functions and Classes . . . . . . 4.11 Unanticipated Requests . . . . . . The Semantics of the Notation . . . . . . 5.1 Procedural Semantics . . . . . . . 5.2 Enumeration Functions . . . . . . 5.3 Quantified Commands . . . . . . Semantic Interpretation . . . . . . . . 6.1 Complications Due to Quantifiers . . . 6.2 Problems with an Alternative Approach . 6.3 The Structure of Semantic Rules . . . 6.4 Relationship of Rules to Syntax . . . . 6.5 Organization of the Semantic Interpreter . 6.6 Organization of Rules . . . . . . . 6.7 The Generation of Quantifiers . . . . Problems of Interpretation . . . . . . . 7.1 The Order of Quantifier Nesting . . . 7.2 Interaction of Negations with Quantifiers .
. . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
2
. . . . .
. . . . .
. . . .
. . . .
. . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . .
.
. . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
4 4
6 6 8 8 9 10 11 11 12 12 12 13
14 15
17 17 20 20 21 21 21 23 24 26 26 27 29 30 33 35 37 37 38
1 CopyrightD 1978 by Academic Press. Inc. All rights of reproduction in any form reserved . ISBN 612-012117-4
2
8. 9. 10.
I1.
12.
W . A . WOODS 7.3 Functional Nesting and Quantifier Reversal . . . . . . . . 7.4 Relative Clauses . . . . . . . . . . . . . . . . 7.5 Other Types of Modifiers . . . . . . . . . . . . . 7.6 Averages and Quantifiers . . . . . . . . . . . . . . 7.7 Short Scope/Broad Scope Distinctions . . . . . . . . . . 7.8 Wh Questions . . . . . . . . . . . . . . . . . Post-Interpretive Processing . . . . . . . . . . . . . . . 8.1 Smart Quantifiers . . . . . . . . . . . . . . . . 8.2 Printing Quantifier Dependencies . . . . . . . . . . . An Example . . . . . . . . . . . . . . . . . . . Loose Ends, Problems. and Future Directions . . . . . . . . . 10.1 Approximate Solutions . . . . . . . . . . . . . . 10.2 Modifier Placement . . . . . . . . . . . . . . . 10.3 Multiple Uses of Constituents . . . . . . . . . . . . 10.4 Ellipsis . . . . . . . . . . . . . . . . . . . 10.5 Plausibility of Alternative Interpretations . . . . . . . . . 10.6 Anaphoric Reference . . . . . . . . . . . . . . . 10.7 Ill-Formed Input and Partial Interpretation . . . . . . . . 10.8 Intensional Inference . . . . . . . . . . . . . . . Syntactic/Semantic Interactions . . . . . . . . . . . . . . I I .I The Role of Syntactic Structure . . . . . . . . . . . . 11.2 Grammar Induced Phasing of Interpretation . . . . . . . . 11.3 Semantic Interpretation while Parsing . . . . . . . . . . 1 I .4 Top-Down versus Bottom-Up Interpretation . . . . . . . . I I .5 Pragmatic Grammars . . . . . . . . . . . . . . . 11.6 Semantic Interpretation in the Grammar . . . . . . . . . 11.7 Generating Quantifiers while Parsing . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
39 41 42 44 45 47 54 55 57 58
64 64 65 68 69 70 70 71 73 75 75 77 78 79 80 81 84 84 86
.
1 Introduction
The history of communication between man and machines has followed a path of increasing provision for the convenience and ease of communication on the part of the human . From raw binary and octal numeric machine languages. through various symbolic assembly. scientific. business and higher level languages. programming languages have increasingly adopted notations that are more natural and meaningful to a human user . The important characteristic of this trend is the elevation of the level at which instructions are specified from the low level details of the machine operations to high level descriptions of the task to be done. leaving out details that can be tilled in by the computer. The ideal product of such continued evolution would be a system in which the user specifies what he wants done in a language that is so natural that negligible mental effort is required to recast the specification from the form in which he formulates it to that which the machine requires. The logical choice for
NATURAL LANGUAGE QUESTION ANSWERING
3
such a language is the person’s own natural language (which in this paper I will assume to be English). For a naive, inexperienced user, almost every transaction with current computer systems requires considerable mental effort deciding how to express the request in the machine’s language. Moreover, even for technical specialists who deal with a computer constantly, there is a distinction between the things that they do often and remember well, and many other things that require consulting a manual and/or much conscious thought in order to determine the correct machine “incantation” to achieve the desired effect. Thus, whether a user is experienced or naive, and whether he is a frequent or occasional user, there arise occasions where he knows what he wants the machine to do and can express it in natural language, but does not know exactly how to express it to the machine. A facility for machine understanding of natural language could greatly improve the efficiency of expression in such situations-both in speed and convenience, and in decreased likelihood of error. For a number of years, I have been pursuing a long range research objective of making such communication possible between a man and a machine. During this period, my colleagues and I1 have constructed several natural language question-answering systems and developed a few techniques for solving some of the problems that arise. In this paper, I will present some of those techniques, focusing on the problem of handling natural quantification as it occurs in English. As an organizing principle, I will present the ideas in a roughly historical order, with commentary on the factors leading to the selection of various notations and algorithms, on limitations that have been discovered as a result of experience, and on directions in which solutions lie. Among the systems that I will use for examples are a flight schedules question-answering system (Woods, 1967, 1968), a system. to ask questions about an augmented transition network (ATN) grammar (not previously published), the LUNAR system, which answers questions about the chemical analyses of the Apollo 11 moon rocks (Woods et ai., 1972; Woods, 1973b), and a system for natural language trip planning and budget management (Woods et al., 1976). Some of the techniques used in these systems, especially the use of the ATN grammar formalism (Woods, 1969, 1970, 1973a), have become widely known and are now’ being used in many different systems and applications. However, other details, including the method of performing semantic interpretation, the treatment of quantification and anaphoric Principal contributors to one or more of the systems described here include Madeleine Bates, Bertram Bruce, Ronald Kaplan, and Bonnie Nash-Webber (now Webber).
4
W. A. WOODS
reference, and several other problems, have not been adequately described in accessible publications. This paper is intended to be a discussion of a set of techniques, the problems they solve, and the relative advantages and disadvantages of several alternative approaches. Because of the length of the presentation, no attempt has been made to survey the field or give an exhaustive comparison of these techniques to those of other researchers. In general, most other systems are not sufficiently formalized at a conceptual level that such comparisons can be made on the basis of published information. In some cases, the mechanisms described here can be taken as models of what is being done in other systems. Certainly, the general notion of computing a representation of the meaning of a phrase from representations of the meanings of its constituents by means of a rule is sufficiently general to model virtually any semantic interpretation process. The details of how most systems handle such problems as the nesting of multiple quantification, however, are difficult to fathom. Hopefully the presentation here and the associated discussion will enable the reader to evaluate for himself, with some degree of discrimination, the capabilities of other systems. 2. Historical Context 2.1 Airlines Flight Schedules
Airlines flight schedules was the focusing context for a gedanken system for semantic interpretation that I developed as my Ph.D. thesis at Harvard University (Woods, 1967). In that thesis, I was concerned with the problem of “semantic interpretation”-making the transition from a syntactic analysis of input questions (such as could be produced by parsing with a formal grammar of English) to a concrete specification of what the computer was to do to answer the question. Prior to that time, this problem had usually been attacked by developing a set of structural conventions for storing answers in the data base and transforming the input questions (frequently by ad hoc procedures) into patterns that could be matched against that data base. Simmons (1965) presents a survey of the state of the art of the field at that time.. In many of the approaches existing at that time, the entire process of semantic interpretation was built on particular assumptions about the structure of the data base. I was searching for a method of semantic interpretation that would be independent of particular assumptions about data base structure and, in particular, would permit a single language
NATURAL LANGUAGE QUESTION ANSWERING
5
understanding system to talk to many different data bases and permit the specification of requests whose answers required the integration of information from several different data bases. In searching for such an approach, I looked more to the philosophy of language and the study of meaning than to data structures and data base design. The method I developed was essentially an interpretation of Carnap’s notion of truth conditions (Carnap, 1964a). I chose to represent those truth conditions by formal procedures that could be executed by a machine. The representation that I used for expressing meanings was at once a notational variant of the standard predicate calculus notation and also a representation of an executable procedure. The ultimate definition of the meanings of expressions in this notation were the procedures that they would execute to determine the truth of propositions, compute the answers to questions, and carry out commands. This notion, which I referred to as “procedural semantics,” picks up the chain of semantic Specification from the philosophers at the level of abstract truth conditions, and carries it to a formal specification of those truth conditions as procedures in a computer language. The idea of procedural semantics has since had considerable success as an engineering technique for constructing natural language understanding systems, and has also developed somewhat as a theory of meaning. In my paper “Meaning and Machines” (Woods, 1973c), I discuss some of the more theoretical issues of the adquacy of procedural semantics as a theory of meaning. The flight schedules application initially served to focus the issues on particular meanings of particular sentences. The application assumed a data base essentially the same as the information contained in the Official Airline Guide (OAG, 1966)-that is, a list of flights, their departure and arrival times from different airports, their flight numbers and airlines, number of stops, whether they serve meals, etc. Specific questions were interpreted as requesting operations to be performed on the tables that make up this data base to compute answers. The semantic interpretation system presented in my thesis was subsequently implemented for this application with an ATN grammar of English to provide syntax trees for interpretation, but without an actual data base. The system produced formal semantic interpretations for questions such as: “What flights go from Boston to Washington?” “Is there a flight to Washington before 8: 00 A.M.?” “Do they serve lunch on the 11: 00 A.M. flight to Toronto?”
6
W. A. WOODS 2.2 Answering Questions about ATN Grammars
To prove the point that the semantic interpretation system used in the flight schedules domain was in fact general for arbitrary data bases and independent of the detailed structure of the data base, immediately after completing that system, I looked for another data base to which I could apply the method. I wanted a data base that had not been designed to satisfy any assumptions about the method of question interpretation to be used. The most convenient such data base that I had at hand was the data structure for the ATN grammar that was being used by the system to parse its input sentences. This data base had a structure that was intended to support the parser, and had not been designed with any forethought to using it as a data base for question answering. An ATN grammar, viewed as a data base, conceptually consists of a set of named states with arcs connecting them, corresponding to transitions that can be made in the course of parsing. Arcs connecting states are of several kinds depending on what, if anything, they consume from the input string when they are used to make a transition. For example, a word arc consumes a single word from the input, a push arc consumes a constituent phrase of the type pushed for, and a jump arc consumes no input but merely makes a state transition (see Woods, 1970, 1973a, 197Sa, for further discussion of ATN grammars). These states and arcs constitute the data base entities about which questions may be asked. In addition to the entities that actually exist as data objects in the internal structure for the grammar, there are some other important objects that exist conceptually but are not explicit in the grammar. The most important such entity is a path. A path is a sequence of arcs that connect to each other in the order in which they could be taken in the parsing of a sentence. Although paths are implicit in the grammar, they are not explicit in the data structure, i.e., there is no internal data object that can be pointed to in the grammar that corresponds to a path. Nevertheless, one should be able to talk about paths and ask questions about them. The techniques I will describe can handle such entities. Examples of the kinds of sentences this “grammar information system” could deal with are “Is there a jump arc from state S/ to S/NP?” “How many arcs leave state NP/?” “How many nonlooping paths connect state S/ with WPOP?” “Show me all arcs entering state S/VP.” 2.3 The LUNAR System
The LUNAR system (Woods et al., 1972; Woods, 1973b) was originally developed with support from the NASA Manned Spacecraft Center as a
NATURAL LANGUAGE QUESTION ANSWERING
7
research prototype for a system to enable a lunar geologist t o conveniently access, compare, and evaluate the chemical analysis data on lunar rock and soil composition that was accumulating as a result of the Apollo moon missions. The target of the research was to develop a natural language understanding facility sufficiently natural and complete that the task of selecting the wording for a request would require negligible effort for the geologist user. The application envisaged was a system that would be accessible to geologists anywhere in the country by teletype connections and would enable them to access the NASA data base without having to learn either the programming language in which the system was implemented or the formats and conventions of the data base representations. For example, the geologist should be able to ask questions such as “What is the average concentration of aluminum in high-alkali rocks?” without having to know that aluminum was conventionally represented in the data base as AL203, that the high-alkali rocks (also known as “volcanics” or “finegrained igneous”) were conventionally referred t o as TYPEAS in the data base, nor any details such as the name of the file on which the data was stored, the names of the fields in the data records, or any of a myriad of other details normally required to use a data base system. To a substantial extent, such a capability was developed, although never fully put to the test of real operational use. In a demonstration of a preliminary version of the system in 1971 (Woods, 1973b), 78% of the questions asked of the system were understood and answered correctly, and another 12% failed due to trivial clerical errors such as dictionary coding errors in the not fully debugged system. Only 10% of the questions failed because of significant parsing or semantic interpretation problems. Although the requests entered into the system were restricted to questions that were in fact about the contents of the data base, and comparatives (which were not handled at that time) were excluded, the requests were otherwise freely expressed in natural English without any prior instructions as to phrasing and were typed into the system exactly as they were asked. The LUNAR system allowed a user to ask questions, compute averages and ratios, and make listings of selected subsets of the data. One could also retrieve references from a keyphrase index and make changes to the data base. The system permitted the user to easily compare the measurements of different researchers, compare the concentrations of elements or isotopes in different types of samples or in different phases of a sample, compute averages over various classes of samples, compute ratios of two constituents of a sample, etc., all in straightforward natural English.
W. A. WOODS
Examples of requests understood by the system are “Give me all lunar samples with magnetite.” “In which samples has apatite been identified?” “What is the specific activity of A126 in soil?” “Analyses of strontium in plagioclase.” “What are the plag analyses for breccias?” “What is the average concentration of olivine in breccias?” “What is the average age of the basalts?” “What is the average potassiudrubidium ratio in basalts?” “In which breccias is the average concentration of titanium greater than 6 percent?” 2.4 TRIPSYS
TRIPSYS is a system that was developed as the context for a research project in continuous speech understanding (Woods ef al., 1976). The overall system of which it was a part was called HWIM (for “Hear What I Mean”). TRIPSYS understands and answers questions about planned and taken trips, travel budgets and their status, costs of various modes of transportation to various places, per diems in various places, conferences and other events for which trips might be taken, people in an organization, the contracts they work on, the travel budgets of those contracts, and a variety of other information that is useful for planning trips and managing travel budgets. It is intended to be a small-scale example of a general management problem. TRIPSYS also permits some natural language entry of information into the data base, and knows how to prompt the user for additional information that was not given voluntarily. Examples of the kinds of requests that TRIPSYS was designed to handle are “Plan a trip for two people to San Diego to attend the ASA meeting.” “Estimate the cost of that trip.” “Is there any money left in the Speech budget?” 3. Overview
Since the LUNAR system is the most fully developed and most widely known of the above systems, I will use it as the principal focus throughout this paper. A brief overview of the LUNAR system was presented in the 1973 National Computer Conference (Woods, 1973b), and an extensive technical report documenting the system was produced (Woods ef al., 1972). However, there has been no generally available document that
NATURAL LANGUAGE QUESTION ANSWERING
9
gives a sufficiently complete picture of the capabilities of the system and how it works. Consequently, I will first give a brief introduction to the structure of the system as a whole, and then proceed to relatively detailed accounts of some of the interpretation problems that were solved. Examples from the other three systems will be used where they are more self-explanatory or more clearly illustrate a principle. Where the other systems differ in structure from the LUNAR system, that will be pointed out.
3.1 Structure of t h e LUNAR System The LUNAR system consists of three principal components: a general purpose grammar and parser for a large subset of natural English, a ruledriven semantic interpretation component using pattern + action rules for transforming a syntactic representation of an input sentence into a representation of what it means, and a data base retrieval and inference component that stores and manipulates the data base and performs computations on it. The first two components constitute a language understanding component that transforms an input English sentence into a disposable program for carrying out its intent (answering a question or making some change to the data base). The third component executes such programs against the data base to determine the answer to queries and to effect changes in the data base. The system contains a dictionary of approximately 3500 words, a grammar for a fairly extensive subset of natural English, and two data bases: a table of chemical analyses with 13,000 entries, and a topic index to documents with approximately 10,000 postings. The system also contains facilities for morphological analysis of regularly inflected words, for maintaining a discourse directory of possible antecedents for pronouns and other anaphoric expressions, and for determining how much and what information to display in response to a request. The grammar used by the parsing component of the system is an augmented transition network (ATN). The ATN grammar model has been relatively well documented elsewhere (Woods, 1970, 1973a), so I will not go into detail here describing it, except to point out that it produces syntactic tree structures comparable to the “deep structures” assigned by a Chomsky type transformational grammar, vintage 1965 (Chomsky, 1965). Likewise, I will not go into much detail describing the inner workings of the data base inference and retrieval component, except to describe the semantics of the formal meaning representation language and discuss some of its advantages. What I will describe here are the problems of semantic interpretation that were handled by the system.
10
W. A. WOODS
All of the systems mentioned in Section 2 share this same basic structure with the following exceptions: 1) The airline flight schedules problem was implemented up through the parsing and interpretation stage, but was never coupled to a real data base. This system was implemented solely to validate the formal semantic interpretation procedure. 2) The TRIPSYS system does not construct a separate syntactic tree structure to be given to a semantic interpreter, but rather the ATN grammar builds semantic interpretations directly as its output representation.
3.2 Semantics in LUNAR A semantic specification of a natural language consists of essentially three parts: a) a meaning representation language (MRL)-a notation for semantic representation for the meanings of sentences, b) a specification of the semantics of the MRL notation, i.e., a specification of what its expressions mean, and c) a semantic interpretation procedure, i.e., a procedure to construct the appropriate semantic representations for a given natural language sentence. Accordingly, the semantic framework of the LUNAR system consists of three parts: a semantic notation in which to represent the meanings of sentences, a specification of the semantics of this notation (by means of formal procedures), and a procedure for assigning representations in the notation to input sentences. In previous writings on LUNAR, I have referred to the semantic notation. as a query language, but I will refer to it here, following a currently more popular terminology as a “meaning representation language” or MRL. To represent expressions in the MRL, I will use the socalled “Cambridge Polish” notation in wich the application of an operator to its arguments is represented with the operator preceding its operands and the entire group surrounded by parentheses. This notation places the operator in a standard position independent of the number of arguments it takes and uses the parentheses to indicate scoping of operators rather than depending on a fixed degree of the operator as in the “ordinary” Polish prefix notation (thus facilitating operators that take a variable number of arguments). Cambridge Polish notation is the notation used for the S-expressions of the programming language LISP (Bobrow et al., 1968), in which LUNAR is implemented.
NATURAL LANGUAGE QUESTION ANSWERING
11
Occasionally, the notations used for illustration will be slightly simplified from the form actually used in LUNAR to avoid confusion. For example, the DATALINE function used in LUNAR actually takes an additional argument for a data file that is omitted here.
4. The Meaning Representation Language
There are a number of requirements for a meaning representation language, but the most important ones are these: a) It must be capable of representing precisely, formally, and unambiguously any interpretation that a human reader can place on a sentence. b) It should facilitate an algorithmic translation from English sentences into their corresponding semantic representations. c) It should facilitate subsequent intelligent processing of the resulting interpretation. The LUNAR MRL consists of an extended notational variant of the ordinary predicate calculus notation and contains essentially three kinds of constructions: 0
0
0
designators, which name or denote objects (or classes of objects) in the database, propositions, which correspond to statements that can be either true or false in the data base, and commands, which initiate and carry out actions. 4.1 Designators
Designators come in two varieties-individual specifiers and class specifiers. Individual specifiers correspond to proper nouns and variables. For example, S10046 is a designator for a particular sample, OLIV is a designator for a certain mineral (olivine), and X3 can be a variable denoting any type of object in the data base. Class specifiers are used to denote classes of individuals over which quantification can range. They consist of the name of an enumeration function for the class plus possible arguments. For example, (SEQ TYPECS) is a specification of the class of type C rocks (i.e., breccias) and (DATALINE S10046 OVERALL OLIV) is a specification of the set of lines of a table of chemical analyses corresponding to analyses of sample S10046 for the overall concentration of olivine.
12
W. A. WOODS
4.2 Propositions
Elementary propositions in the MRL are formed from predicates with designators as arguments. Complex propositions are formed from these by use of the logical connectives AND, OR, and NOT and by quantification. For example, (CONTAIN S10046 OLIV) is a proposition formed by substituting designators as arguments to the predicate CONTAIN, and (AND (CONTAIN X3 OLIV) (NOT (CONTAIN X3 PLAG))) is a complex proposition corresponding to the assertion that X3 contains olivine but does not contain plagioclase. 4.3 Commands
Elementary commands consist of the name of a command operator plus arguments. As for propositions, complex commands can be constructed using logical connectives and quantification. For example, TEST is a command operator for testing the truth value of a proposition given as its argument. Thus (TEST (CONTAIN S10046 OLIV)) will answer yes or no depending on whether sample S10046 contains olivine. Similarly, PRINTOUT is a command operator which prints out a representation for a designator given as its argument. 4.4 Quantification
An important aspect of the meaning of English sentences that must be captured in any MRL is the use of quantifiers such as “every” and “some.” Quantification in the LUNAR MRL is represented in an elaborated version of the traditional predicate calculus notation. An example of an expression in this notation is (FOR EVERY XI / (SEQ SAMPLES) : (CONTAIN X1 OVERALL SILICON) ; (PRINTOUT XI)).
This says, “for every object X1 in the set of samples such that X1 contains silicon, print out (the name of) Xl.” In general, an instance of a quantified expression takes the form (FOR (quant) X / (class) : (p X) ; (q X))
where (quant) is a specific quantifier such as EVERY or SOME, X is the variable of quantification and occurs open in the expressions (p X)
NATURAL LANGUAGE QUESTION ANSWERING
13
and (q X), (class) is a set over which quantification is to range, (p X) is a proposition that restricts the range, and (q X) is the expression being quantified (which may be either a proposition or a command). For the sake of simplifying some examples, I will generalize the format of the quantification operator so that the restriction operation implied by the ":" can be repeated any number of times (including zero if there is no further restriction on the range), giving rise to forms such as
(FOR (quant) X / (class) ; (q X) ) and
(FOR (quant) X / (class) : (p X) : (r X) ; (q X) ). When there is no restriction on the range of quantification, this can also be indicated by using the universally true proposition T, as in
(FOR (quant) X I (class) : T ; (q X) ). 4.5 Specification of the MRL Syntax
A formal BNF specification of the LUNAR MRL is given here: (expression) = (designator) I (proposition) I (command) (designator) = (individual constant) 1 (variable) I ((function) (expression)* ) (proposition) = (elementary proposition) I (quantified proposition) (elementary proposition) = ((propositional operator) (expression)* ) (propositional operator) = (predicate) I (logical operator) (logical operator) = AND I OR I NOT I IF-THEN . . . (quantified proposition) = (FOR (variable) / (class) ; (proposition)) (class) = (elementary class) I (restricted class) (elementary class) = (class name) I ((class function) (expression)* ) (restricted class) = (class) : (proposition) (command) = (elementary command) I (quantified command) (elementary command) = ((command operator) (expression)* ) (quantified command) = (FOR (variable) / (class) ; (command)) In addition to the above BNF constraints, each general operator (i,e., function, predicate, logical operator, class function, or command operator) will have particular restrictions on the number and kinds of expres-
14
W. A. WOODS
sions that it can take as arguments in order to be meaningful. Each operator also specifies which of its arguments it takes literally as given, and which it will evaluate to obtain a referent (see discussion of opaque contexts below). Predicates, functions, class names, class functions, command operators, and individual constants are all domain-dependent entities which are to be specified for a particular application domain and defined in terms of procedures. In LUNAR, they are defined as LISP subroutines. Individual constants are defined by procedures for producing a reference pointer to the appropriate internal object in the computer’s model of the world; functions are defined by procedures for producing a reference pointer to the appropriate value given the values for the arguments; class names and class functions are defined by procedures that (given the appropriate values for arguments) can enumerate the members of their class one at a time; predicates are defined by procedures which, given the values of their arguments, determine a truth value for the corresponding proposition; and command operators are defined by procedures which, given the values of their arguments, can carry out the corresponding commands. I should point out that the defintion given here for classes and commands are not adequate for a general theory of semantics, but are rather more pragmatic definitions that facilitate question answering and computer response to commands. For a general semantic theory, the requirement for semantic definition of a class is merely a procedure for recognizing a member, and the semantic definition for a command is a procedure for recognizing when it has been carried out. That is, to be said to know the meaning of a command does not require the ability to carry it out, and to know the meaning of a noun does not require an ability to enumerate all members of its extension. The distinction between knowing how, and just knowing whether, marks the difference between pragmatic utility and mere semantic adequacy. The requirements placed on the definitions of the classes and commands in the LUNAR system are thus more stringent than those required for semantic definition alone. 4.6 Procedural/Declarative Duality
The meaning representation language used in LUNAR is intended to serve both as a procedural specification that can be executed to compute an answer or carry out a command, and as a “declarative” representation that can be manipulated as a symbolic object by a theorem prover or other inference system. By virtue of the definition of primitive functions and predicates as LISP functions, the language can be viewed simulta-
NATURAL LANGUAGE QUESTION ANSWERING
15
neously as a higher level programming language and as an extension of the predicate calculus. This gives rise to two different possible types of inference for answering questions, corresponding roughly to Carnap’s distinction between inrension and extension (Carnap, 1964b). First, because of its definition by means of procedures, a question such as “Does every sample contain silicon?” can be answered extensionally (that is, by appeal to the individuals denoted by the class name “samples”) by enumerating the individual samples and checking whether silicon has been found in each one. On the other hand, this same question could have been answered intensionally (that is, by consideration of its meaning alone without reference to the individuals denoted) by means of the application of inference rules to other (intensional) facts (such as the assertion “Every sample contains some amount of each element”). Thus the expressions in the meaning representation language are capable either of direct execution against the data base (extensional mode) or manipulation by mechanical inference algorithms (intensional mode). In the LUNAR system, the principal mode of inference is extensional, that is, the direct evaluation of the formal MRL expression as a procedure. However, in certain circumstances, this expression is also manipulated as a symbolic object. Such cases include the construction of descriptions for discourse entities to serve as antecedents for anaphoric expressions and the use of “smart quantifiers” (to be discussed later) for performing more efficient quantification. Extensional inference has a variety of limitations (e.g., it is not possible to prove assertions about infinite sets in extensional mode), but it is a very efficient method for a variety of question-answering applications. 4.7 Opaque Contexts
As mentioned above, the general operators in the meaning representation language are capable of accessing the arguments they are given either literally or after evaluation. Thus, an operator such as ABOUT in an expression like (ABOUT D70-181 (TRITIUM PRODUCTION) ) (meaning “Document D70-181 discusses tritium production”) can indicate as part of its definition that, in determining the truth of an assertion, the first argument (D70-181 in this case) is to be evaluated to determine its referent, while the second argument (TRITIUM PRODUCTION) is to be taken unevaluated as an input to the procedure (to be used in some special way as an intensional object-in this case, as a specification of a topic that D70-181 discusses).
16
W. A. WOODS
This distinction between two types of argument passing is a relatively standard one in some programming languages, frequently referred to as call by value versus call by name. In particular, in the programming language LISP, there are two types of functions (referred to as LAMBDA and NLAMBDA functions), the first of which evaluates all of its arguments and the second of which passes all of its arguments unevaluated to the function (which then specifies in its body which arguments are to be evaluated and what to do with the others). This ability to pass subordinate expressions literally as intensional objects (to be manipulated in unspecified ways by the operator that gets them) avoids several of the antinomies that have troubled philosophers, such as the nonequivalence of alternative descriptions of the same object in belief contexts. Although belief contexts do not occur in LUNAR, similar problems occur in TRIPSYS, for example, in interpreting the object of the verb “create,” where the argument to the verb is essentially a description of a desired object, not an object denoted by the description. In LUNAR, functions with opaque contexts are also used to define the basic quantification function FOR as well as general purpose counting, averaging, and extremal functions: NUMBER, AVERAGE, MAXIMUM, and MINIMUM. Calls to these functions take the forms: (NUMBER X / (class) : (P X) ) “The number of X’s in (class) for which (P X) is true.” (AVERAGE X / (class) : (P X) ; (F X) ) “The average of the values of (F X) over the X’s in (class) for which (P X) is true.” (AVERAGE X / (class) : (P X) ) “The average value of X (a number) over the X’s in (class) for which (P X) is true.” (MAXIMUM X / (class) : (P X) ) “The maximum value of X in the set of X’s in (class) for which (P X) is true.” (MINIMUM X / (class) : (P X) ) “The minimum value of X in the set of X’s in (class) for which (P X) is true. ” The proposition (P X) in each of these cases has to be taken as an intensional entity rather than a referring expression, since it must be repeatedly evaluated for different values of X.
NATURAL LANGUAGE QUESTION ANSWERING
17
Opaque context functions are also defined for forming the intensional descriptions of sets and the intensional union of intensionally defined sets: (SETOF X / (class) : (P X) ) “The set of X’s in (class) for which (P X) is true.” (UNION X / (class) : (P X) ; ((setfn) X) ) “The union over the X’s in (class) for which (P X) is true of the sets generated by ((setfn) X). ” 4.8 Restricted Class Quantification
One of the major features of the quantifiers in the LUNAR MRL is the separation of the quantified expression into distinct structural parts: (1) the basic class over which quantification is to range, (2) a set of restrictions on that class, and (3) the main expression being quantified. There are a number of advantages of maintaining these distinctions, one of which is the uniformity of the interpretation procedure over different kinds of noun phrase determiners that it permits. For example, the determiners “some” and “every”, when translated into the more customary logical representations, give different main connectives for the expression being quantified. That is, “every man is mortal” becomes (Ax)Man(x)+Mortal(x) while “some man is mortal” becomes (Ex)Man(x)&Mortal(x). With the LUNAR format, the choice of determiner affects only the choice of quantifier. Other advantages to this kind of quantifier are the facilitation of certain kinds of optimization operations on the MRL expressions, and the generation of appropriate antecedents for various anaphoric expressions. Recently, Nash-Webber and Reiter (1977) have pointed out the necessity of making a distinction between the quantification class and the predicated expression if an MRL is to be adequate for handling verb phrase ellipsis and “one”-anaphora. 4.9 Nonstandard Quantifiers
Another advantage of the restricted class quantifier notation is the uniform treatment of a variety of nonstandard quantifiers. For example, LUNAR treats the determiner “the” in a singular noun phrase as a quantifier, selecting the unique object that satisfies its restriction (and complaining if the presupposition that there is a unique such object is not satisfied). This differs from the traditional representation of definite description by means of the iota operator, which constructs a complex
18
W. A. WOODS
designator for a constituent rather than a governing quantifier. In the traditional notation, the sentence “The man I see is mortal,” would be represented something like MORTAL( i(x) : MAN(x) & SEE(1,x)). In the LUNAR MRL it would be (FOR THE X / MAN : (SEE I X) ; (MORTAL X)).
Quantifiers such as “many” and “most,” whose meaning requires knowledge of the size of the class over which quantification ranges (as well as the size of the class for which the quantified proposition is true) can be adequately handled by this notation since the range of quantification is specifically mentioned. These quantifiers were not implemented in LUNAR, however. Among the nonstandard quantifiers handled by LUNAR are numerical determiners (both cardinal and ordinal) and comparative determiners. Ordinal quantifiers (“the third X such that P”) are handled by a special quantifier (ORDINAL n) that can be used in the (quant) slot of the quantifier form. In general this ordinal quantifier should take another parameter that names the ordering function to be used, or at least require a preferred ordering function to be implied by context. The ordering of the members of the class used by LUNAR is the order of their enumeration by the enumeration function that defines the class (see Section 5 . 2 ) . Numerical quantification and comparative quantification are handled with a general facility for applying numeric predicates to a parameter N in the FOR function that counts the number of successful members of the range of quantification that have been found. Examples are (GREATER N (number)), (EQUAL N (number)), or even (PRIME N ) (i.e., N is a prime number). The interpretation of general numeric predicates as quantifiers is that if any number N satisfying the predicate can be found such that N members of the restricted class satisfy the quantified proposition (or successfully complete a quantified command), then the quantified proposition is true (or a quantified command is considered completed). In the implementation, the current value of N is tested as each successful member of the restricted class is found, until either the count N satisfies the numeric predicate or there are no more members in the class. The numeric predicate quantifier can be used directly to handle comparative determiners such as “at least” and “more than,” and can be used in a negated quantification to handle “at most” and “fewer than.” The procedure for testing such quantifiers can return a value as soon as
NATURAL LANGUAGE QUESTION ANSWERING
19
a sufficient number of the class have been found, without necessarily determining the exact number of successful members. The numerical determiner “exactly ( n ) ” is handled in LUNAR by the generalized counting function NUMBER embedded in an equality statement. (It could also be handled by a conjunction of “at least” and “not more than,” but that would not execute as efficiently.) The LUNAR MRL also permits a generic quantifier GEN, which is assigned to noun phrases with plural inflection and no determiner. Such noun phrases sometimes behave like universal quantification and sometimes like existential quantification. In LUNAR, unless some higher operator indicates that it should be interpreted otherwise, a generic quantifier is evaluated exactly like EVERY. Examples of types of quantification in LUNAR are (FOR EVERY X / CLASS : (P X) ; (Q X)) “Every X in CLASS that satisfies P also satisfies Q.” (FOR SOME X / CLASS : (P X) ; (Q X)) “Some X in CLASS that satisfies P also satisfies Q.” (FOR GEN X / CLASS : (P X) ; (Q X)) “ A generic X in CLASS that satisfies P will also satisfy Q.” (FOR THE X / CLASS : (P X) ; (Q X)) “The single X in CLASS that satisfies P also satisfies Q.” (FOR (ORDINAL 3) X / CLASS : (P X) ; (Q X)) “The third X in CLASS that satisfies P also satisfies Q.” (FOR (GREATER N 3) X / CLASS : (P X) ; (Q X)) “More than 3 X’s in CLASS that satisfy P also satisfy Q.” (FOR (EQUAL N 3) X / CLASS : (P X ) ; (Q X)) “At least 3 X’s in CLASS that satisfy P also satisfy Q.” (NOT (FOR (EQUAL N 3) X / CLASS : (P X) ; (Q X))) “Fewer than 3 X’s in CLASS satisfy P and also satisfy Q.” (EQUAL 3 (NUMBER X / CLASS : (P X) : (Q X) )) “Exactly 3 X’s in CLASS satisfy P and also satisfy Q.”
20
W. A. WOODS 4.10 Functions and Classes
Another of the attractive features of the LUNAR MRL is the way that quantification over classes, single and multiple valued functions, and the attachment of restrictive modifiers are all handled uniformly, both individually and in combination, by the quantification operators. Specifically, a noun phrase consisting of a function applied to arguments is represented in the same way as a noun phrase whose head is a class over which quantification is to range. For example “The departure time of flight 557 is 3:OO” can be represented as (FOR THE X / (DEPARTURE-TIME FLIGHT-557) : T ; (EQUAL X 3 :00)) (where T is the universally true proposition, signifying here that there are no further restrictions on the range of quantification). This permits exactly the same mechanisms for handling the various determiners and modifiers to apply to both functionally determined objects and quantification over classes. This uniformity of treatment becomes especially significant when the function is not single valued and when the class of values is being quantified over or restricted by additional modifiers as in (FOR EVERY X / (DATALINE S10046 OVERALL 5102) : T ; (PRINTOUT X)) and (FOR T H E X / (DATALINE S10046 OVERALL S102) : (REF* X D70-181) ; (PRINTOUT X)) where (DATALINE (sample) (phase) (constituent)) is the function used in LUNAR to enumerate measurements in its chemical analysis table and (REF* (table entry) (document)) is a relation between a measurement and the journal article it was reported in. 4.1 1 Unanticipated Requests
The structure of the meaning representation language, when coupled with general techniques for semantic interpretation, enable the user to make very explicit requests with a wide range of diversity within a natural framework. A s a consequence of the modular composition of MRL expressions, it is possible for the user to combine the basic predicates and functions of the retrieval component in ways that were not specifically anticipated by the system designer. For example, one can make requests such as “List the minerals”, “What are the major elements‘?’’,
NATURAL LANGUAGE QUESTION ANSWERING
21
“How many minerals are there?”, etc. Although these questions might not be sufficiently useful to merit special effort to handle them, they fall out of the mechanism for semantic interpretation in a natural way with no additional effort required. If the system knows how to enumerate the possible samples for one purpose, it can do so for other purposes as well. Furthermore, anything that the system can enumerate, it can count. Thus, the decomposition of the retrieval operations into basic units of quantifications, predicates, and functions provides a very flexible and powerful facility for expressing requests. 5. The Semantics of the Notation
5.1 Procedural Semantics
As mentioned before, the semantic specification of a natural language requires not only a semantic notation for representing the meanings of sentences, but also a specification of the semantics of the notation. As discussed previously, this is done in LUNAR by relating the notation to procedures that can be executed. For each of the predicate names that can be used in specifying semantic representations, LUNAR requires a procedure or subroutine that will determine the truth of the predicate for given values of its arguments. Similarly, for each of the functions that can be used, there must be a procedure that computes the value of that function for given values of its arguments. Likewise, each of the class specifiers for the FOR function requires a subroutine that enumerates the members of the class. The FOR function itself is also defined by a subroutine, as are the logical operators AND, OR, and NOT, the general counting and averaging functions NUMBER and AVERAGE, and the basic command functions TEST and PRINTOUT. Thus any well-formed expression in the language is a composition of functions that have procedural definitions in the retrieval component and are therefore themselves well-defined procedures capable of execution on the data base. In the LUNAR system, the definition of all of these procedyes is done in LISP, and the notation of the meaning representation language is so chosen that its expressions are executable LISP programs. These function definitions and the data base on which they operate constitute the retrieval component of the system. 5.2 Enumeration Functions
One of the engineering features of the LUNAR retrieval component that makes the quantification operators both efficient and versatile is the
22
W. A. WOODS
definition of quantification classes by means of enumeration functions. These are functions that compute one member of the class at a time and can be called repeatedly to obtain successive members. Enumeration functions take an enumeration index argument which is used as a restart pointer to keep track of the state of the enumeration. Whenever FOR calls an enumeration function to obtain a member of a class, it gives it an enumeration index (initially T), and each time the enumeration function returns a value, it also returns a new value of the index to be used as a restart pointer to get the next member. This pointer is frequently an inherent part of the computation and involves negligible overhead to construct. For example, in enumerating integers, the previous integer suffices, while in enumerating members of an existing list, the pointer to the rest of the list already exists. The enumeration function formulation of the classes used in quantification frees the FOR function from explicit dependence on the structure of the data base; the values returned by the enumeration function may be searched for in tables, computed dynamically, o r merely successively accessed from a precomputed list. Enumeration functions also enable the quantifiers to operate on potentially infinite classes and on classes of objects that do not necessarily exist prior to the decision of the quantifier to enumerate them. For example, in an expression such as (FOR SOME X / INTEGER : (LESSP X 10) ; (PRIME X))
(“some integer less than 10 is a prime”), a general enumeration procedure for integers can be used to construct successive integers by addition, without having to assume that all the integers of interest exist in the computer’s memory ahead of time. Thus, the treatment of this kind of quantification fits naturally within LUNAR’S general quantification mechanism without having to be treated as a special case. In the grammar information system application, an enumeration function for paths computes representations for paths through the grammar, so that paths can be talked about even though there are no explicit entities in the internal grammar representation that correspond to paths. (See the discussion on “smart” quantifiers below for a further discussion of the problems of quantifying over such entities.) An enumeration function can indicate termination of the class in one of two ways: either by returning NIL, indicating that there are no more members, or by returning a value with a NIL restart pointer, indicating that the current value is the last one. This latter can save one extra call to the enumeration function if the information is available at the time the last value is returned (e.g., for single valued functions). This avoids what
NATURAL LANGUAGE QUESTION ANSWERING
23
would otherwise be an inefficiency in treating multiple- and single-valued functions the same way. In LUNAR, a general purpose enumeration function SEQ can be used to enumerate any precomputed list, and a similar function SEQL can be used to enumerate singletons. For example, (FOR EVERY XI / (SEQ TYPECS) : T ; (PRINTOUT XI)) is an expression that will printout the sample numbers for all of the samples that are type C rocks. Functionally determined objects and classes, as well as fixed classes, are implemented as enumeration functions, taking an enumeration index as well as their other arguments and computing successive -members of their class one at a time. In particular, intensional operators such as AVERAGE, NUMBER, SETOF, and UNION are defined as enumeration functions and also use enumeration functions for their class arguments. Thus quantification over classes, computation of single-valued functions, and quantification over the values of multiple-valued functions are all handled uniformly, without special distinctions having to be made. 5.3 Quantified Commands
As mentioned earlier, both propositions and commands can be quantified. Thus one can issue commands such as (FOR (EQ N 5 ) X / SAMPLES : (CONTAIN X SI02) ; (PRINTOUT X)) (“Print out five samples that contain silicon”). The basic commands in such expressions are to be iterated according to the specifications of the quantifier. However, it is possible for such commands to fail due to a violation of presuppositions or of necessary conditons. For example, in the above case, there might not be as many as five samples that contain silicon. In order for the system to be aware of such cases, each command in the system is defined to return a value that is non-null if the command has been successfully executed and NIL otherwise. Given this convention, the FOR operator will automatically return T if such an iterated command has been successfully completed and NIL otherwise. There are other variations of this technique that could be useful but were not implemented in LUNAR, such as returning comments when a command failed indicating the kind of failure. In LUNAR, such comments were sometimes printed to the user directly by the procedure that failed, but the system itself had no opportunity to “see” those comments and take some action of its own in response to them (such as trying some other way to achieve the same end).
W. A. WOODS
24
In LUNAR, interpretations of commands are given directly to the retrieval component for evaluation, although in a more intelligent system, as in humans, the decision to carry out a command once it is understood would not necessa:ily automatically follow,
6. Semantic Interpretation
Having now specified the notation in which the meanings of English sentences are to be represented and specifying the meanings of expressions in that notation, we are now left with the specification of the process whereby meanings are assigned to sentences. This process is referred to as semantic interpretation, and in LUNAR it is driven by a set of formal semantic interpretation rules. For example, the interpretation of the sentence “S10046 contains silicon,” to which the parser would assign the syntactic structure S DCL NP NPR S10046 AUX TNS PRESENT VP VCONTAIN N P NPR SILICON
is determined by a rule that applies to a sentence when the subject is a sample, the object is a chemical element, oxide, or isotope, and the verb is “have” or “contain.” This rule specifies that such a sentence is to be interpreted as an instance of the schema (CONTAIN x y), where x is to be replaced by the interpretation of the subject noun phrase of the sentence, and y is to be replaced by the interpretation of the object. This information about conditions on possible arguments and substitutions of subordinate interpretations into “slots” in the schema is represented in LUNAR by means of the pattern + action rule [S:CONTAIN
(S.NP (MEM I SAMPLE)) ( S . V (OR (EQU 1 HAVE) (EQU 1 CONTAIN)) (S.OBJ (MEM 1 (ELEMENT OXIDE ISOTOPE))) +(QUOTE (CONTAIN (# 1 1) ( # 3 1))) I.
The name of the rule is S:CONTAIN. The left-hand side, or pattern part, of the rule consists of three templates that match fragments of syntactic structure. The first template requires that the sentence being
NATURAL LANGUAGE QUESTION ANSWERING
25
interpreted have a subject noun phrase that is a member of the semantic class SAMPLE; the second requires that the verb be either “have” or “contain;” and the third requires a direct object that is either a chemical element, an oxide, or an isotope. The right-hand side, or action part, of the rule follows the right arrow and specifies that the interpretation of this node is to be formed by inserting the interpretations of the subject and object constituents into the schema (CONTAIN (# 1 1) (# 3 l)), where the expressions (# m n) mark the “slots” in the schema where subordinate interpretation are to be inserted. The detailed structure of such rules is described in Section 6.3. (Note that the predicate CONTAIN is the name of a procedure in the retrieval component, and it is only by the “accident” of mnemonic design that its name happens to be the same as the English word “contain” in the sentence that we have interpreted.) The process of semantic interpretation can conveniently be thought of as a process that applies to parse trees produced by a parser to assign semantic interpretations to nodes in the tree. In LUNAR and the other systems above, except for TRIPSYS, this is how the interpretations are produced. (In TRIPSYS, they are produced directly by the parser without an intermediate syntax tree representation.) The basic interpretation process is a recursive procedure that assigns an interpretation to a node of the tree as a function of its syntactic structure and the interpretations of its constituents. The interpretations of complex constituents are thus built up modularly by a recursive process that determines the interpretation of a node by inserting the interpretations of certain constituent nodes into open slots in a schema. The schema to be used is determined by rules that look at a limited portion of the tree. At the bottom level of the tree (i.e., the leaves of the tree), the interpretation schemata are literal representations without open s b t s , specifying the appropriate elementary interpretations of basic atomic constituents (e.g., proper names). In LUNAR, the semantic interpretation procedure is implemented in such a way that the interpretation of nodes can be initiated in any order. If the interpretation of a node requires the interpretation of a constituent that has not yet been interpreted, then the interpretation of that constituent is performed before that of the higher node is completed. Thus, it is possible to perform the entire semantic interpretation by calling for the interpretation of the top node (the sentence as a whole). This is the normal mode in which the interpreter is operated in LUNAR. I will discuss later (Sections 11.3 and 11.4) some experiments in which this mechanism is used for “bottom-up” interpretation.
26
W. A. WOODS
6.1 Complications Due to Quantifiers
In the above example, the interpretation of the sentence is obtained by inserting the interpretations of the proper noun phrases “S10046” and “silicon” (in LUNAR these are “S10046” and “SI02,” respectively) into the open slots of the right-hand side schema to obtain (CONTAIN S10046 SI02). When faced with the possibility of a quantified noun phrase, however, the problem becomes somewhat more complex. If the initial sentence were “Every sample contains silicon,” then one would like to produce the interpretation
(FOR EVERY X / SAMPLE ; (CONTAIN X SI02)). That is, one would like to create a variable to fill the “container” slot of the schema for the main verb, and then generate a quantifier governing that variable to be attached above the predicate CONTAIN. As we shall see, the LUNAR semantic interpretation system specifically provides for the generation and appropriate attachment of such quantifiers. 6.2 Problems with an Alternative Approach
Because of the complications discussed above, one might ask whether there is some other way to handle quantification without generating quantifiers that are extracted from their noun phrase and attached as dominant operators governing the clause in which the original noun phrase was embedded. One might, instead, attempt to interpret the quantified noun phrase as some kind of a set that the verb of the clause takes as its argument, and require the definition of the verb to include the iteration of its basic predicate over the members of the class. For example, one might want a representation for the above example something like (CONTAIN (SET X / SAMPLE : T) SI02) with the predicate CONTAIN defined to check whether its first argument is a set and if so, check each of the members of that set. However, if one were to take this approach, some way would be needed to distinguish giving CONTAIN a set argument over which it should do universal quantification from one in which it should do existential quantification. One would similarly have to be able to give it arguments for the various nonstandard quantifiers discussed above, such as numerical quantifiers and quantifiers like “most.” Moreover, the same thing would have to be done separately for the second argument to
NATURAL LANGUAGE QUESTION ANSWERING
27
CONTAIN as well as the first (i.e., the chemical element as well as the sample), and one would have to make sure that all combinations of quantifiers in the two argument positions worked correctly. Essentially one would have to duplicate the entire quantificational mechanism discussed above as part of the defining procedure for the meaning of the predicate CONTAIN. Moreover, one would then have to duplicate this code separately for each other predicate and command in the system. Even if one managed to share most of the code by packaging it as subroutines, this is still an inelegant way of handling the problem. Even if one went to the trouble just outlined, there are still logical inadequacies, since there is no way with the proposed method to specify the differences in meaning that correspond to the different relative scopes of two quantifiers (e.g., “Every sample contains some element” versus “There is some element that every sample contains”). Likewise, there is no mechanism to indicate the relative scopes of quantifiers and sentential operators such as negation (“Not every sample contains silicon” versus “Every sample contains no silicon”). It appears, therefore, that treating quantifiers effectively as higher operators is essential to correct interpretation in general. 6.3 The Structure of Semantic Rules
As discussed above, in determining the meaning of a construction, two types of information are used: syntactic information about sentence construction and semantic information about constituents. For example, in interpreting the above example, it is both the syntactic structure of the sentence (subject = S10046; verb = “contain;” object = silicon) plus the semantic fact that S10046 is a sample and silicon is a chemical element that determine the interpretation. Syntactic information about a construction is tested by matching tree fragments such as those indicated below against the mode being interpreted:
S NP (1) (subject of a sentence) S VP V (1) (main verb of a sentence) (direct object of a sentence) = S VP NP (1) S.OBJ S.PP = S VP PP PREP (1) (preposition and object NP (2) modifying a verb phrase) NP.ADJ = NP ADJ (2) (adjective modifying a noun phrase). S.NP
s.v
= =
Fragment S.NP matches a sentence if it has a subject and also associates the number 1 with the subject noun phrase. S.PP matches a sentence that contains a prepositional phrase modifying the verb phrase and assaciates the numbers 1 and 2 with the preposition and its object, respectively.
W. A. WOODS
28
The numbered nodes can be referred to in the left-hand sides of rules for checking semantic conditions, and they are used in the right-hand sides for specifying the interpretation of the construction. These tree structure fragments can be named mnemonically as above for readability. The basic element of the left-hand side of a rule is a template consisting of tree fragments plus additional semantic conditions on the numbered nodes of the fragment. For example, the template (S.NP (MEM 1 SAMPLE)) matches a sentence if its subject is semantically marked as a sample. The pattern part of a rule consists of a sequence of templates, and the action of the rule specifies how the interpretation of the sentence is to be constructed from the interpretations of the nodes that match the numbered nodes of the templates. Occasionally, some of the elements that are required to construct an interpretation may be found in one of several alternative places in a construction. For example, the constituent to be measured in an analysis can occur either as a prenomial adjective (“a silicon analysis”) or as a post-nominal prepositional phrase (“an analysis of silicon”). To handle this case, basic templates corresponding to the alternative ways the necessary element can be found can be grouped together with an OR operator to form a disjunctive template that is satisfied if any of its disjunct templates are. For example, (OR (NP.ADJ (MEM 2 ELEMENT)) (NP.PP (AND (EQU 1 OF) (MEM 2 ELEMENT))).
Also occasionally, two rules will be distinguished by the fact that one applies when a given constituent is present and the other will require it to be absent. In order to write the second rule so that it will not match in circumstances where it is not intended, a basic template can be embedded in a negation operator NOT to produce a negated template that is satisfied if its embedded template fails to match and is not satisfied when its embedded template succeeds. For example, (NOT (NP.ADJ (EQU 2 MODAL))). In general, the left-hand side of a rule consists of a sequence of templates (basic, disjunctive, or negated). 6.3.1 Right-Hand Sides
The right-hand sides (or actions) of semantic rules are schemata into which the interpretations of embedded constituents are inserted before the resulting form is evaluated to give a semantic interpretation. The
NATURAL LANGUAGE QUESTION ANSWERING
29
places, or “slots,” in the right-hand sides where subordinate interpretations are to be inserted are indicated by expressions called REFS, which begin with the atom # and contain one or two numbers and an optional “TYPEFLAG.” The numbers indicate the node in the tree whose interpretation is to be inserted by naming first the sequence number of a template of the rule, and then the number of the corresponding node in the tree fragment of that template. Thus the reference (# 2 1) represents the interpretation of the node that matches node 1 of the 2nd template of the rule. In addition, the single number 0 can also be used to reference the current node, as in (# 0 TYPEFLAG). The TYPEFLAG element, if present, indicates how the subordinate node is to be interpreted. For example, in LUNAR there is a distinction between interpreting a node normally and interpreting it as a topic description. Thus (# 0 TOPIC) represents the interpretation of the current node as a topic description. There are a variety of types of interpretation used for various purposes in the rules of the system. The absence of a specific TYPEFLAG in a REF indicates that the interpretation is to be done in the normal mode for the type of node that it matches. 6.3.2 Right-Hand Side Evaluation
In many cases, the semantic interpretation to be attached to a node can be constructed by merely inserting the appropriate constituent interpretations into the open slots in a fixed schema. However, occasionally, more than this is required and some procedure needs to be executed to modify or transform the resulting instantiated schema. To provide for this, the semantic interpreter treats right-hand sides of rules as expressions to be evaluated to determine the appropriate interpretation. For rules in which the desired final form can be given literally, the right-hand side schema is embedded in the operator QUOTE which simply returns its argument unchanged. This is the case in the example above. In special cases, right-hand side operators can do fairly complex things, such as searching a discourse directory for antecedents for anaphoric expressions and computing intensional unions of sets. In the usual case, however, the operator is either QUOTE or one of the two operators PRED and QUANT that handle quantifier passing (discussed below). 6.4 Relationship of Rules to Syntax
.In many programming languages and some attempts to specify natural language semantics, semantic rules are paired directly with syntactic phrase structure rules so that a single compact pairing specifies both the syntactic structure of a constituent and its interpretation. This type of
30
W. A. WOODS
specification is clean and straightforward and works well for artificial languages that can be defined by context-free or almost context-free grammars. For interpreting natural language sentences, whose structure is less isomorphic to the kind of logical meaning representation that one would like to derive, it is less convenient, although not impossible. Specifically, with the more complex grammars for natural language, e.g., ATN’s and transformational grammars, the simple notion of a syntactic rule with which to pair a semantic rule becomes less clear. Consequently, the rules in the LUNAR system are not paired with the syntactic rules, nor are they constrained to look only at the immediate constituents of a phrase. In general they can look arbitrarily far down into the phrase they are interpreting, picking up interpretations of subordinate constituents at any level, and looking at various syntactic aspects of the structure they are interpreting, as well as the semantic interpretations of constituents. The rules are invoked not by virtue of applying a given syntactic rule, but by means of rule indexing strategies described below. 6.5 Organization of the Semantic Interpreter
The overall operation of the semantic interpreter is as follows: A top level routine calls the recursive function INTERP looking at the top level of the parse tree. Thereafter, INTERP attempts to match semantic rules against the specified node of the tree, and the right-hand sides of matching rules specify the interpretation to be given to the node. The possibility of semantic ambiguity is recognized, and therefore the routine lNTERP produces a list of possible interpretations (usually a singleton, however). Each interpretation consists of two parts: a node interpretation (called the SEM of the node) and a quantifier “collar” (called the QUANT of the node). The QUANT is a schema for higher operators (such as quantification) that is to dominate any interpretation in which the SEM is inserted (used for quantifier passing-see Section 6.7). Thus the result of a call to INTERP for a given node P is a list of SEM-QUANT pairs, one for each possible interpretation of the node. 6.5.7 Context-Dependent Interpretation
The function INTERP takes two arguments-the construction to be interpreted and a TYPEFLAG that indicates how to interpret it. The TYPEFLAG mechanism is intended to allow a constituent to be interpreted differently depending on the higher level structure within which it is embedded. The TYPEFLAG permits a higher level schema to pass down information to indicate how it wants a constituent interpreted. For example, some verbs can specify that they want a noun phrase interpreted
NATURAL LANGUAGE QUESTION ANSWERING
31
as a set rather than as a quantification over individuals. The TYPEFLAG mechanisms is also used to control the successive phases of interpretation of noun phrases and clauses (discussed below). When interpreting a node, INTERP first calls a function HEAD to determine the head of the construction and then calls a function RULES to determine the list of semantic rules to be used (which depends, in general, on the type of node, its head word, and the value of TYPEFLAG). It then dispatches control to a routine MATCHER to try to match the rules. If no interpretations are found, then, depending on the TYPEFLAG and various mode settings, INTERP either returns a default interpretation T, goes into a break with a comment that the node is uninterpretable (permitting a systems programmer to debug rules), or returns NIL indicating that the node has no interpretations for the indicated TYPEFLAG. 6.5.2 Phased Interpretation
In general, there are two types of constituents in a sentence that receive interpretations-clauses and noun phrases. The former receive interpretations that are usually predications or commands, while the latter are usually designators. The interpretation of these two different kinds of phrase are slightly different, but also remarkably similar. In each case there is a governing “head” word; the verb in the case of a clause, and the head noun in the case of the noun phrase. The interpretation of a phrase is principally determined by the head word (noun or verb) of the construction. However, there are also other parts of a construction that determine aspects of its interpretation independent of the head word. These in turn break down into two further classes: (1) modifying phrases (which themselves have dominating head words) that augment or alter meaning of the head, and (2) function words that determine governing operators of the interpretation that are independent of the head word and its modifiers. In the case of clauses, these latter include the interpretation of tense and aspect and various qualifying operators such as negative particles. In the case of noun phrases, these include the interpretation of articles and quantifiers and the inflected case and number of the head noun. As a consequence of these distinctions, the semantic interpretation of a construction generally consists of three kinds of operations: determining any governing operators that are independent of the head word, determining the basic interpretation of the head, and interpreting any modifiers that may be present. In LUNAR, these three kinds of interpretation are governed by three different classes of rules that operate in three phases.
32
W. A. WOODS
The phases are controlled by the rules themselves by using multiple calls to the interpreter with different TYPEFLAGS. The above description is not the only way such phasing could be achieved. For example, it would be possible to gain the same phasing of interpretation by virtue of the structures assigned to the input by the parser (see Section 11.2) or by embedding the phasing in the control structure of the interpreter. In the original flight schedules and grammar information implementations, this phasing was embedded in the control structure of the interpreter. Placing the phasing under the control of the rules themselves in LUNAR provided more flexibility. In TRIPSYS, the equivalent of such phasing is integrated, along with the semantic interpretation, into the parsing process. In general, the interpretation of a construction is initially called for with TYPEFLAG NIL. This first interpretation may in turn involve successive calls for interpretation of the same node with other TYPEFLAGS to obtain subsequent phases of interpretation. For example, clauses are initially interpreted with TYPEFLAG NIL, and the rules invoked are a general set of rules called PRERULES that look for negative articles, tense marking, conjunctions, etc., to determine any governing operators that should surround the interpretation of the verb. Whichever of these rules matches will then call for another interpretation of the same construction with an appropriate TYPEFLAG. The basic interpretation of the verb is done by a call with TYPEFLAG SRULES, which invokes a set of rules stored on the property list of the verb (or reachable from the entry for that verb by chaining up a generalization hierarchy). For example, in interpreting the sentence “S10046 doesn’t contain silicon”, the initial PRERULE PR-NEG matches with a righthand side (PRED (NOT (# 0 SRULES))). The SRULE S:CONTAIN discussed above then matches, producing eventually (CONTAIN S10046 SIOZ), which is then embedded in the PRNEG schema to produce the final interpretation (NOT (CONTAIN S10046 S102)). Ordinary noun phrases are usually interpreted by an initial phase that interprets the determiner and number, a second phase that interprets the head noun and any arguments that it may take (Lea,as a function), and a third phase that interprets other adjectival and prepositional phrase modifiers and relative clauses.
NATURAL LANGUAGE QUESTION ANSWERING 6.5.3 Proper Nouns and Mass Terms
In addition to the rules discussed above for ordinary noun phrases, there are two special classes of noun phrases-proper nouns and mass terms-that have their own rules. Proper nouns are the direct names of individuals in the data base. Their identifiers in the data base, which are not necessarily identical to their normal English orthography, are indicated in the dictionary entry for the English form. Mass terms are the names of substances like silicon and hydrogen. Proper nouns are represented in the LUNAR syntactic representations as special cases of noun phrases by a rule equivalent to NP + NPR, while mass terms are represented as ordinary noun phrases with determiner NIL and number SG. In general, the interpretation of mass terms requires a special treatment of quantifiers, similar to but different from the ordinary quantifiers that deal with count nouns (e.g., “some silicon” means an amount of stuff, while “some sample” means an individual sample). In the LUNAR system, however, mass terms are used only in a few specialized senses in which they are almost equivalent to proper nouns naming a substance. 6.6 Organization of Rules
As mentioned above, the semantic rules for interpreting sentences are usually governed by the verb of the sentence. That is, out of the entire set of semantic rules, only a relatively small number of them can possibly apply to a given sentence because of the verb mentioned in the rule. Similarly, the rules that interpret noun phrases are governed by the head noun of the noun phrase. For this reason, most semantic rules in LUNAR are indexed according to the heads of the constructions to which they could apply, and recorded in the dictionary entry for the head words. Specifically, associated with each verb is a set of “SRULES” for interpreting that verb in various contexts, and associated with each noun is a set of ‘“RULES” for interpreting various occurrences of that noun. In addition, associated with each noun are a set of “RRULES” for interpreting various restrictive modifiers that may be applied to that noun. Each rule essentially characterizes a syntactichemantic environment in which a word can occur, and specifies its interpretation in that environment. The templates of a rule thus describe the necessary and sufficient constituents and semantic restrictions for a word to be meaningful. In addition to indexing rules directly in the dictionary entry for a given word, certain rules that apply generally to a class of words are indexed in an inheritance hierarchy (frequently called an “is-a” hierarchy in
34
W. A. WOODS
semantic network notations) so that they can be recorded once at the appropriate level of generality. Specifically, each work in the dictionary has a property called MARKERS which contains a list of classes of which it is a member (or subclass), i.e., classes with which this word has an “is-a” relationship. Each of these classes also has a dictionary entry that may contain SRULES, NRULES, and RRULES. The set of rules used by the interpreter for any given phrase is obtained by scanning up these chains of inheritance and gathering up the rules that are found. These accesses are quite shallow in LUNAR but would be used more heavily in a less limited topic domain. In situations in which the set of rules does not depend on the head of the construction, the rules to be used are taken from a global list determined by the value of TYPEFLAG and the type of the constituent being interpreted. For example, in interpreting the determiner structure of a noun phrase, a global list of DRULES is used.
6.6.1 Rule Trees
Whether indexed by the head words of cons ructions or aken from global lists, rules to be tried are organized into a tree structure that can make rule matching conditional on the success or failure of previous rules. A rule tree specifies the order in which rules are to be tried and after each rule indicates whether a different tree of rules is to be tried next, depending on the success or failure of previous rules. The format for a rule tree is basically a list of rules (or rule groups-see multiple matches below) in the order they are to be tried. However, after any given element in this list, a new rule tree can be inserted to be used if any of the rules preceding it have succeeded. If no rules preceding it have succeeded, then the inserted tree is skipped and rules continue to be taken from the rules that follow it in the list. For example, the tree (R1 R2 (R4 R5) R3 R4 R5) indicates that R1 and R2 are to be tried in that order and if either of them succeed, the subsequent rules to be tried are R4 and R5. If neither R1 nor R2 succeed, then the remaining list R3, R4, R5 is to be tried next. This example illustrates how a rule tree can be used to skip around rules that are to be omitted if previous rules have succeeded. The most usual cases of rule trees in LUNAR are simple lists (i.e., no branching in the tree), and lists of rules with inserted empty trees (i-e., the empty list NIL) serving as “barriers” to stop the attempted matching of rules once a successful rule has been found.
NATURAL LANGUAGE QUESTION ANSWERING
35
6.6.2 Multiple Matches
Since the templates of a rule may match a node in several ways, and since several rules may simultaneously match a single node, it is necessary to indicate how the interpretation of a node is to be constructed in such a case. To provide this information, the lists of rules at each level of a rule tree can be organized into groups, with each group indicating how (or whether) simultaneous matches by different rules are to be combined. The format of a rule group is a list of rules (or other groups) preceded by an operator specifying the mode for combining simultaneous matches. Outside the scopes of rule groups, the mode to be used is specified by a default value determined by TYPEFLAG and the type of node being interpreted. Possible modes are AND (which combines multiple matches with an AND, i.e., treats multiple matches as finding different parts of a single conjoined meaning), OR (which combines multiple matches with an OR), SPLIT (which keeps multiple matches separate as semantic ambiguities), and FAIL (which prohibits multiple matches, i.e., complains if it finds any). To illustrate the behavior of rule groups in rule trees, a rule list of the form (A B NIL C (OR D E)) with default mode AND indicates that if either of the rules A or B is successful, then no further matches are tried (NIL is a barrier); otherwise, rules C, D, and E are tried. If both D and E match, then the results are OR’ed together, and if C matches together with D or E or both, it is AND’ed to the results of the OR group. The modes (AND, OR, SPLIT, and FAIL) also apply to multiple matches of a single rule. A rule may either specify the mode for multiple matches as its first element prior to the list of templates, or else it will be governed by the rule group or default mode setting at the time it is matched.
6.7 The Generation of Quantifiers
As mentioned above, the LUNAR interpretation system specifically provides for the generation and appropriate attachment of quantifiers governing the interpretations it produces. Central to this capability is the division of the interpretation of a constituent into two parts: a SEM that is to be inserted into the appropriate slot of the schema for some higher constituent, and a QUANT that serves as a ‘‘collar’’ of higher operators that is to be passed up to some higher level of the tree (around which the collar will be “worn”). A quantifier to be attached to some higher constituent is represented as a schema, which itself contains a slot into which
36
W. A. WOODS
the interpretation of that higher constituent is to be inserted. This slot (the “hole” in the collar) is indicated by a marker DLT. In the unquantified example sentence considered in Section 6.1, the SEM of the subject noun phrase is simply S10046, and the QUANT is the “empty” collar DLT. The quantifier schema in the second example would be represented as (FOR EVERY X / SAMPLE ; DLT). 6.7.1 Steps in Interpretation
The general procedure for interpreting a construction is a) Match an interpretation rule against the construction, subject to the control of the rule tree. b) If it matches, then determine from the right-hand side of the rule the set of constituent nodes that need to be interpreted. c) Call for the interpretation of all of the constituents required, associate their SEMs with the slots in the schema that they are to fill, and gather up all of the QUANTs that are generated by those interpretations. Call a function SORTQUANT to determine the order in which those quantifiers (if there are several) should be nested. d) Depending on an operator in the right-hand side of rule, either attach the quantifiers so generated around the outside of the current schema, or pass them further up the tree as the QUANT of the resulting interpretation. e) If multiple matches are to be combined with an AND or OR, it is their SEMs that are so combined. Their QUANTs are nested one inside the other to produce the QUANT of the result. 6.7.2 Quantifier Passing Operators
There are three principal operators for use in the right-hand sides of rules to determine the behavior of quantifier passing up the tree. These are the operators PRED, QUOTE, and QUANT. The first indicates that the schema it contains is a predication that will accept quantifiers from below; it causes any quantifiers that arise from constituent interpretations to be attached around the current schema to become part of the resulting SEM. The QUANT associated with such an interpretation will be the empty QUANT DLT. The operator QUANT, on the other hand, indicates that the schema it contains is itself a quantifier schema, and that the result of its instantiation is to be passed up the tree (together with other quantifiers that may have resulted from constituent interpretations) as the QUANT of the interpretation. The SEM associated with
NATURAL LANGUAGE QUESTION ANSWERING
37
such an interpretation is the variable name that is being governed by the quantifier. The operator QUOTE is used around a schema that is transparent to quantifier passing, so that any quantifiers that accumulate from constituent interpretations are simply aggregated together and passed on up the tree as the QUANT of the interpretation. The SEM of such an interpretation is simply the instantiated schema inside the QUOTE. In the LUNAR implementation, a function SEMSUB, which substitutes the SEMs of lower interpretations into the right-hand sides of rules, maintains a variable QUANT to accumulate the nesting of quantifiers returned from the lower interpretations. Then, after making the substitutions, the right-hand side of the rule is evaluated to determine the SEMQUANT pair to be returned. The result of the evaluation is the desired SEM of the pair, and the value of QUANT (which may have been changed as a side effect of the evaluation) is the QUANT of the pair. The operators PRED and QUANT in the right-hand sides of rules manipulate the variable QUANT to grab and insert quantifiers.
7. Problems of Interpretation 7.1 The Order of Quantifier Nesting
In the general quantification schema (FOR (quant) X / (class) : (p X) ; (q X)) both the expressions (p X) and (q X) can themselves be quantified expressions. Sentences containing several quantified noun phrases result in expressions with a nesting of quantifiers dominating the interpretation of the main clause. For example, the sentence “Every sample contains some element” has a representation (FOR EVERY X / SAMPLE ; (FOR SOME Y / ELEMENT ; (CONTAIN X Y) ) ). Alternative interpretations of a sentence corresponding to different orderings of the quantifiers correspond to different relative nestings of the quantifier operations. For example, the above sentence has an unlikely interpretation in which there is a particular element that is contained in every sample. The representation of this interpretation is (FOR SOME Y / ELEMENT ; (FOR EVERY X / SAMPLE ; (CONTAIN X Y) ) ).
38
W. A. WOODS
Thus, in interpreting a sentence, it is necessary to decide the appropriate order of nesting of quantifiers to be used. In general, this ordering is the left-to-right order of occurrence of the quantifiers in the sentence, but this is not universally so (for example, when a function is applied to a quantified noun phrase-see functional nesting below). In situations where the order of quantifiers is not otherwise determined, LUNAR assumes the left-to-right order of occurrence in the sentence. 7.2 Interaction of Negations with Quantifiers The construction of an interpretation system that will handle sentences containing single instances of a quantification or simple negation without quantification is not difficult. What is difficult is to make it correctly handle sentences containing arbitrary combinations of quantifiers and negatives. The interpretation mechanism of LUNAR handles such constructions fairly well. Consider the sentence “Every sample does not contain silicon.” This sentence is potentially ambiguous between two interpretations: (NOT (FOR EVERY X / SAMPLE ; (CONTAIN X SI02))) and (FOR EVERY X / SAMPLE ; (NOT (CONTAIN X SI02))). The difference lies in the relative scopes of the quantifer and the negative. One interpretation of the above sentence is handled in LUNAR by the interaction of the rules already presented. The interpretation of the PRERULE PR-NEG, discussed in Section 6.5.2, has the right-hand side (PRED (NOT (# 0 SRULES))), whose governing operator indicates that it grabs quantifiers from below. The interpretation of the noun phrase “every sample” produces the quantifier “collar”: (FOR EVERY X / SAMPLE : T ; DLT) which is passed up as the QUANT together with the SEM X. The righthand side of S:CONTAIN is embedded in the operator QUOTE, which is transparent to quantifiers, producing the SEM (CONTAIN X SI02) and passing on the same QUANT. The top level rule PR-NEG now executes its instantiated right-hand side: (PRED (NOT (CONTAIN X SI02))) which grabs the quantifier to produce the interpretation: (FOR EVERY X / SAMPLE : T ; (NOT CONTAIN X SI02))). The alternative interpretation of the above sentence can be obtained
NATURAL LANGUAGE QUESTION ANSWERJNG
39
by an alternative PRERULE for sentential negatives whose right-hand side is (BUILDQ (NOT #) (PRED (# 0 SRULES))) where BUILDQ is an operator whose first argument is a literal schema into which it inserts the values of its remaining arguments. In this case, the PRED expression produces (FOR EVERY X / SAMPLE : T ; (CONTAIN X SI02)) and the BUILDQ produces (NOT (FOR EVERY X / SAMPLE : T ; (CONTAIN X SI02))). If these two negative rules both existed in the list PRERULES, then the LUNAR interpreter when interpreting a negative sentence would find them both and would produce both interpretations. In the case where no quantifier is returned by the subordinate SRULES interpretation, then both rules would produce the same interpretation and the duplicate could be eliminated. In the case where a quantifier is returned, then the two interpretations would be different and a genuine ambiguity would have been found, resulting in a request by the system to the user to indicate which of the two interpretations he intended. However, if one decides to legislate that only one of the two possible scope choices should be perceived by the system, then only the corresponding rule for negation should be included in the PRERULES list. This is the choice that was taken in the demonstration LUNAR system. Since the interpretation of the negative operator outside the scope of the quantifier can be unambiguously expressed using locutions such as “Not every sample contains silicon,” LUNAR’S rules treat sentential negation as falling inside any quantifiers (as expressed by the PR-NEG rule discussed previously). Rules for interpreting determiners such as “not every” can easily be written to produce quantifier expressions such as (NOT (FOR EVERY X / (class) ; DLT)) to give interpretations in which the negative operator is outermost. 7.3 Functional Nesting and Quantifier Reversal
As previously mentioned, an interesting example of quantifier nesting occurs when an argument to a function is quantified. As an example, consider the flight schedules request, “List the departure times from Boston of every American Airlines flight that goes from Boston to Chicago.” This sentence has a bizarre interpretation in which there is one
40
W. A. WOODS
time at which every American Airlines flight from Boston to Chicago departs. However, the normal interpretation requires taking the subordinate quantifier “every flight” and raising it above the quantifier of the higher noun phrase “the departure time.” Such nesting of quantifiers is required when the range of quantification of one of them (in this case, the departure times) contains a variable governed by the other (in this case, the flights). In the logical representation of the meaning of such sentences, the higher quantifier must be the one that governs the variable on which the other depends. This logical dependency is exactly the reversal of the “syntactic dependency” in the parse tree, where the argument to the function is contained within (i.e., “dependent” on) the phrase the function heads. The LUNAR system facility for interpreting such constructions automatically gets the preferred interpretation, since the quantifiers from subordinate constituents are accumulated and nested before the quantifier for a given noun phrase is inserted into the quantifier collar. To illustrate the process in detail, consider the interpretation of the above example. In the processing of the constituents of the noun phrase whose head is “departure time,” the quantifier (FOR EVERY X2 I FLIGHT : (EQUAL (OWNER X2) AMERICAN) ; DLT) is returned from the interpretation of the “flight” noun phrase (which gets the SEM X2). The temporary QUANT accumulator in the function SEMSUB (discussed in Section 6.7), at this point contains the single ‘‘empty” quantifier collar DLT. This is now modified by substituting the returned quantifier for the DLT, resulting in the QUANT accumulator now containing the returned quantifier (FOR EVERY X2 / FLIGHT : (EQUAL (OWNER X2) AMERICAN) ; DLT) (with its DLT now marking the “hole” in the collar). When all of the subordinate constituents have been interpreted, and their SEM’s have been inserted into the right-hand side schema of the rule interpreting the “departure time” noun phrase, the resulting instantiated schema will be (QUANT (FOR THE X1 / (DTIME X2 BOSTON) : T ; DLT) 1. This is then evaluated, again resulting in the DLT in the temporary QUANT accumulator being replaced with this new quantifier (thus inserting the definite quantification THE inside the scope of the universal
NATURAL LANGUAGE QUESTION ANSWERING
41
quantifier EVERY that is already there). The result of this interpretation is to return the SEM-QUANT pair consiting of the SEM X1 and the QUANT (FOR EVERY X2 / FLIGHT : (EQUAL (OWNER X2) AMERICAN) ; (FOR T H E X 1 / (DTIME X2 BOSTON) : T ; DLT )). The right-hand side for the next higher rule (the one that interprets the command “list x”) contains a PRED operator, so that when its instantiated schema (PRED (PRINTOUT XI)) is executed, it will grab the quantifier collar from below to produce the interpretation (FOR EVERY X2 / FLIGHT : (EQUAL (OWNER X2) AMERICAN) ; (FOR THE XI / (DTIME X2 BOSTON) : T ; (PRINTOUT XI) )). 7.4 Relative Clauses
One of the features of the LUNAR system that makes it relatively powerful in the range of questions it can handle is its general treatment of relative clause modifiers. This gives it a natural ability to handle many questions that would be awkward o r impossible to pose to many data management systems. Relative clauses permit arbitrary predicate restrictions to be imposed on the range of quantification of some iterative search. The way in which relative clauses are interpreted is quite simple within LUNAR’S general semantic interpretation framework. It is done by a general RRULE R:REL, which is implicitly included in the RRULES for any noun phrase. The rule R : REL will match a noun phrase if it finds a relative clause structure modifying the phrase. On each such relative clause, it will execute a function RELTAG that will find the node in the relative clause corresponding to the relative pronoun (“which” or “that”), and will mark this found node with the same variable X that is being used for the noun phrase that the relative clause modifies. This pronoun will then behave as if it had already been interpreted and assigned that variable as its SEM. The semantic interpreter will then be called on the relative clause node, just like any other sentence being interpreted, and the result will be a predicate with a free occurrence of the variable X. This resulting predicate is then taken, together with any other RRULE predicates obtained from adjectival and prepositional phrase modifiers, to form the restriction on the range of quantification of the modified noun phrase.
42
W. A. WOODS
One consequence of a relative clause being interpreted as a subordinate S node (in fact, a consequence of any subordinate S node interpretation) is that, since the PRERULES used in interpreting the subordinate S node all have PRED operators in their right-hand sides, any quantifiers produced by noun phrases inside the relative clause will be grabbed by the relative clause itself and not passed up to the main clause. This rules out interpretations of sentences like “List the samples that contain every major element” in anomalous ways such as
(FOR EVERY X / MAJORELT : T ; (FOR EVERY Y / SAMPLE : (CONTAIN Y X) ; (PRINTOUT Y) )) (i.e., “For every major element list the samples that contain it”) instead of the correct (FOR EVERY Y / SAMPLE : (FOR EVERY X / MAJORELT : T : (CONTAIN Y X)) ; (PRINTOUT Y) ). Except in certain opaque context situations, this seems to be the preferred interpretation. As in other cases, however, although LUNAR’S interpretation system is capable of producing alternative interpretations for some other criteria to choose between, the demonstration prototype instead uses rules that determine just those interpretations that seem to be most likely in its domain.
7.5 Other Types of Modifiers In addition to relative clauses, there are other kinds of constructions in English that function as predicates to restrict the range of quantification, These include most adjectives and prepositional phrases. They are interpreted by RRULES that match the appropriate structures in a noun phrase and produce a predicate with free variable X (which will be instantiated with the variable of quantification for the noun phrase being interpreted). I will call such modifiers predicutors since they function as predicates to restrict the range of quantification. Examples of predicators are modifiers like “recent” and “about olivine twinning” in phrases like “recent articles about olivine twinning”, The interpretation of this phrase would produce the quantifier (FOR GEN X / DOCUMENT : (AND (RECENT X)(ABOUT X (OLIVINE TWINNING))) ; DLT ).
Note that not all adjectives and prepositional phrases are interpreted as just described. Many fill special roles determined by the head noun,
NATURAL LANGUAGE QUESTION ANSWERING
43
essentially serving as arguments to a function. For example, in a noun phrase such as “the silicon concentration in S10046,” the adjective “silicon” is specifying the value of one of the arguments to the function “concentration,” rather than serving as an independent predicate that the concentration must satisfy. (That is, this phrase is not equivalent to “the concentration in S10046 which is silicon,” which does not make sense). Similarly, the prepositional phrase “in S10046” is filling the same kind of argument role, and is not an independent modifier. I will call this class of modifiers role fillers. In some cases, there are modifiers that could either be treated as restricting predicates or as filling argument roles in a function, depending on the enumeration function that is being used to represent the meaning of the head noun. For example, a modifier like “to Chicago” in “flights to Chicago” could either be interpreted as an independent predicate (ARRIVE X CHICAGO) modifying the flight, or as an argument to a specialized flight enumeration function FLIGHT-TO which enumerates flights to a given destination. In the flight schedules application, the former interpretation was taken, although later query optimization rules (see smart quantifiers, below) were able to transform the resulting MRL expression to a form equivalent to the latter to gain efficiency. In general English, there are cases in which it seems moot whether one should treat a given phrase as filling an argument role or as a restricting predicate. However, there are also clear cases where the head noun is definitely a function and cannot stand alone without some argument being either explicitly present or inferable from context. In these cases such modifiers are clearly role fillers. On the other hand, the diversity of possible modifiers makes it unlikely that all adjectives and prepositional phrases could be interpretable as role fillers in any general or economical fashion. Thus, the distinction between predicators and role fillers seems to be necessary. There is another use of a modifier that neither fills an argument role nor stands as an independent predicate, but rather changes the interpretation of the head noun. An example is “modal” in “modal olivine analyses.” This adjective does not describe a kind of olivine, but rather a kind of analysis that is different from the normal interpretation one would make of the head “analysis” by itself. Such modifiers might be called specializers since they induce a special interpretation on the head noun. Note that these distinctions in types of modification refer to the role of modifier plays in a given construction, not to anything inherent in the modifier itself. The sentence “List modal olivine analyses for lunar samples that contain silicon” contains a mixture of the different kinds of modifiers. The
44
W. A. WOODS
presence of the specializer adjective “modal” blocks the application of the normal NRULE N : ANALYSIS (it has a NOT template that checks for it), and it enables a different rule N :MODAL-ANALYSIS instead. The adjective “olivine” and the prepositional phrase are both interpreted by REFS in the right-hand side of this rule to fill argument slots in the enumeration function DATALIN E. There are no predicators modifying “analyses,” but there is a potential predicator “lunar” modifying ”Samples” and a restrictive relative clause also modifying samples. In LUNAR, the apparently restrictive modifier “lunar” modifying a word like “samples” is instead interpreted as a specializer that does not make a difference, since LUNAR knows of no other kind of sample. However, this is clearly not a limitation of the formalism. The relative clause modifying “samples” is interpreted as described above to produce the predicate (CONTAIN X2 SI02). The interpretation of the noun phrase “lunar samples that contain silicon” thus consists of the SEM X2 and the QUANT (FOR GEN X2 / SAMPLE : (CONTAIN X2 SI02) ; DLT ). This SEM-QUANT pair is returned to the process interpreting the noun phrase “modal olivine analyses for ... ,” which in turn produces a SEM X1 and a QUANT
(FOR GEN X2 / SAMPLE : (CONTAIN X2 SI02) ; (FOR GEN X 1 / (DATALINE X2 OVERALL OLIV) : T ; DLT)). This is returned to the rule interpreting the main verb “list,” whose righthand side produces the SEM (PRINTOUT XI) with the same QUANT as above. This process returns to the PRERULE for positive imperative sentences, where the quantifiers are grabbed to produce the interpretation (FOR GEN X2 / SAMPLE : (CONTAIN X2 3 0 2 ) ; (FOR GEN X 1 / (DATALINE X2 OVERALL OLIV) : T : (PRINTOUT X1) )). 7.6 Averages and Quantifiers
An interesting class of quantifier interaction problems occurs with certain operators such as “average,” “sum,” and “number.” In a sentence such as “What is the average silicon concentration in breccias?” it is clear that the generic “breccias” is not to be interpreted as a universal quantifier dominating the average computation, but rather the
NATURAL LANGUAGE QUESTION ANSWERING
45
average is to be performed over the set of breccias. A potential way of interpreting such phrases would be to treat average as a specializer adjective which, when applied to a noun like “concentration,” produces a specialized enumeration function that computes the average. This special interpretation rule, would then interpret the class being averaged over in a special mode as a role filler for one of the arguments to the AVERAGE-CONCENTRATION function. However, this approach would lack generality, since it would require a separate interpretation rule and a separate AVERAGE-X function for every averageable measurement X. Instead, one would like to treat average as a general operator that can apply to anything averageable. Doing this, and making it interact correctly with various quantifiers is handled in the LUNAR system by a mechanism of some elegance and generality. I will describe here the interpretation of averages; the interpretations of sums and other such operators are similar. Note that there are two superficial forms in which the average operator is used: one is a simple adjective modifying a noun (“the average concentration. . .”), and one is as a noun referring to a function that is explicitly applied to an argument (“the average of concentrations . . .”). LUNAR’S grammar standardizes this variation by transforming the first kind of structure into the second (effectively inserting an “of ... PL” into the sentence). As a result, average always occurs in syntactic tree structures as the head noun of a noun phrase with a dependent prepositional phrase whose object has a “NIL ... PL” determiner structure and represents the set of quantities to be averaged. In interpreting such noun phrases, the NRULE invoked by a head noun “average” or “mean” calls for the interpretation of the set being averaged with the special TYPEFLAG SET. This will result in that node’s being interpreted with a special DRULE D:SETOF, which will construct an intensional set representation for the set being averaged. The data base function AVERAGE knows how to use such an intensional set to enumerate members and compute the average. The NRULE for “average” is [ N : AVERAGE
(NP.N (MEM I (MEAN AVERAGE))) (NP.PP (MEM 2 (QUANTITY))) + (QUOTE (SEQL (AVERAGE X / ( # 2 2 SET) )))
1.
7.7 Short ScopelBroad Scope Distinctions
Another interesting aspect of quantifier nesting is a fairly well-known distinction between so called short-scope and broad-scope interpretation
W. A. WOODS
46
quantifiers. For example, Bohnert and Backer (1967) present an account of the differences between “every” and “any” and between “some” and “a” in contexts such as the antecedents of if-then statements by giving “any” and “some” the broadest possible scope and “every” and “a” the narrowest. For example, using the LUNAR MRL notation, If any soldier stays home, there is no war (FOR EVERY x / soldier ; (IF (home x) THEN (not war)) If every soldier stays home, there is no war (IF (FOR EVERY x / soldier ; (home x)) THEN (not war))
If some soldier stays home, there is no war (FOR SOME x / soldier ;(IF (home x) THEN (not war))) If a soldier stays home, there is no war (IF (FOR SOME x / soldier ; (home x)) THEN (not war)). The scope rules of Bohnert and Backer are enforced rules of an artificial language that approximates English and are not, unfortunately, distinctions that are always followed in ordinary English. In ordinary English, only a few such distinctions are made consistently, while in other cases the scoping of quantifiers appears to be determined by which is most plausible (see discussion of plausibility evaluation in Section 10.5).
In LUNAR, a slightly different form of this shodbroad scope distinction arose in the interaction of operators like average with universal quantifiers. For example, the sentence “List the average concentration of silicon in breccias” clearly means to average over all breccias, while “List the average concentration of silicon in each breccia” clearly means to compute a separate average for each breccia. (In general, there are multiple measurements to average even for a single sample.) The sentences “List the average concentration of silicon in every breccia,” and “List the average concentration of silicon in all breccias” are less clear,
NATURAL LANGUAGE QUESTION ANSWERING
47
but it seems to be that the average over all breccias is slightly preferred in these cases. At any rate, the treatment of quantifiers needs to be able to handle the fact that there are two possible relative scopings of the average operator with universal quantifiers, and the fact that the choice is determined at least for the determiner “each” and for the ‘‘generic” or NIL-PL determiner. LUNAR handles these scope distinctions for the “average” operator by a general mechanism that applies to any operator that takes a set as its argument. As discussed above, the right-hand side of the N:AVERAGE rule calls for the interpretation of the node representing the set being averaged over with TYPEFLAG SET. This causes a DRULE D: SET OF to be used for interpreting that node. The right-hand side of D : SETOF is (SETGEN (SETOF X / ( # 0 NRULES) : (# 0 RRULES) )) where SETGEN is a function that grabs certain quantifiers coming from subordinate interpretations and turns them into UNION operations instead. The generic quantifier is grabbed by this function and interpreted as a union. However, the quantifier EACH is not grabbed by SETGEN but is passed on up as a dominating quantifier. Thus, the sentence “What is the average concentration of silicon in breccias?” becomes (FOR THE X4 / (SEQL (AVERAGE X5 i (UNION X7 / (SEQ TYPECS) : T : (SETOF X6 i (DATALINE X7 OVERALL SI02) : T)))) : T ; (PRINTOUT X4) ) (i.e., the average is computed over the set formed by the union over all type C rocks X7 of the sets of measurements of S102 in the individual X7’s). On the other hand, “What is the average concentration of silicon in each breccia?” becomes (FOR EACH XI2 i (SEQ TYPECS) : T ; (FOR THE X9 / (SEQL (AVERAGE X10 / (SETOF XI 1 / (DATALINE XI2 OVERALL SI02) : T ))) : T ; (PRINTOUT X9) )) (i.e., a separate average is computed for each type C rock X12). 7.8 Wh Questions
In addition to simple yesho questions and imperative commands to print the results of computations, LUNAR handles several kinds of so-
W. A. WOODS
48
called wh questions. Examples are “What is the concentration of silicon in S10046?”, “Which samples contain silicon?”, and “How many samples are there?” These fall into two classes: those in which an interrogative pronoun stands in the place of an entire noun phrase, as in the first example, and those in which an interrogative determiner introduces an otherwise normal noun phrase. In both cases, the noun phrase containing the interrogative word is usually brought to the front of the sentence from the position that it might otherwise occupy in normal declarative word order, but this is not always the case.
7.8.1 In terro ga five Determ h e r s
The natural representation of the interrogative determiners would seem to be to treat them just like any other determiner and represent a sentence such as the second example above as
S Q NP
DETWHQ N SAMPLE N U PL AUX TNS PRESENT VP VCONTAIN NP NPR SILICON The interpretation procedure we have described seems to work quite well on this structure using a DRULE that matches the interrogative noun phrase and generates the quantifier (FOR EVERY X / ( # 0 NRULES) : (AND ( # 0 RRULES) DLT) ; (PRINTOUT X)). Note that the DLT in the quantifier (where the interpretation of the main clause is to be inserted) is part of the restriction on the range, and the quantified operator is a command to print out the answer. The structure of the quantifier in this case seems somewhat unusual, but the effect is correct and the operation is a reasonably natural one given the capabilities of the semantic interpreter. However, when we try to apply this kind of analysis to conjoined
NATURAL LANGUAGE QUESTION ANSWERING
49
sentences, such as “What samples contain silicon and do not contain sodium?”, the standard kind of deep structure assigned by a transformational grammar to conjoined sentences is not compatible with this interpretation. The usual reversal of the conjunction reduction transformations in a transformational grammar would produce a structure something like S AND S Q NP
DETWHQ N SAMPLE NU PL AUX TNS PRESENT VP VCONTAIN NP NPR SILICON S Q NEG NP DETWHQ N SAMPLE N U PL AUX TNS PRESENT VP VCONTAIN NP NPR SODIUM.
This structure corresponds to the conjunction of the two questions “What samples contain silicon?” and “What samples do not contain sodium?”, which is the interpretation that it would receive by the LUNAR rules with the above DRULE for wh-determiners. However, this is not what the original conjoined question means; the intended question is asking for samples that simultaneously contain silicon and not sodium. In order to handle such sentences, it is necessary to distinguish some constituent that corresponds to the conjunction of the two predicates “contain silicon” and “not contain sodium,” which is itself a constituent of a higher level “what samples” operator. To handle such constructions correctly for both conjoined and nonconjoined constructions, LUNAR’S ATN grammar of English was modified to assign a different structure to wh-determiner questions than the one that is assigned to other determiners. These sentences are analyzed as a special type of sentence, a noun phrase question (NPQ), in which the top level structure of the syntactic representation is that of a noun phrase, and the matrix sentence occurs as a special kind of subsidiary relative clause. For example, the sentence
W. A. WOODS
50
“Which samples contain silicon?” is represented syntactically as
S NPQ NP DETWHICHQ N SAMPLE NU PL S QREL NP DETWHR N SAMPLE NU PL AUX TNS PRESENT VP VCONTAIN NP DET NIL N SILICON NU SG. This structure provides an embedded S node inside the higher level question, whose interpretation is a predicate with free variable bound in the question operator above. This embedded S node can be conjoined freely with other S nodes, while remaining under the scope of a single question operator. In this case, the appropriate DRULE (for a wh-determiner in a plural NPQ utterance) is simply [ D 1 WHQ-PL
(NP.DET (AND (MEM I WHQ) (EQU 2 PL))) --*
(QUANT (FOR EVERY X / ( # 0 NRULES) : ( # 0 RRULES) ; (PRINTOUT X)))I. Since the matrix sentence has been inserted as a relative clause in the syntactic structure assigned by the grammar, it will be interpreted by the RRULE R:REL in the subordinate interpretation (# 0 RRULES). A similar rule for interpreting singular noun phrases (“which sample contains. , ,”) produces a quantifier with (quant) = THE, instead of EVERY, thus capturing the presupposition that there should be a single answer. All of the interrogative determiners, “which,” “what,” and “how many” are treated in the above fashion. The right-hand side of the “how
NATURAL LANGUAGE QUESTION ANSWERING
51
many” rule is (FOR THE X / (NUMBER X / (# 0 NRULES) : (# 0 RRULES)) ; (PRINTOUT X)). Here again, the interpretation of the matrix sentence is picked up in the call (# 0 RRULES). (The use of the same variable name in two different scopes does not cause any logical problems here, so no provision was made in LUNAR to create more than one variable for a given noun phrase.) 7.8.2 lnterro ga tive Pro nouns
A general treatment of the interrogative pronouns would require modifications of the assigned syntactic structures similar to the ones discussed above for interrogative determiners in order to handle conjunctions correctly. That is, sentences such as “What turns generic quantifiers into set unions and passes ‘each’ quantifiers through to a higher level?” seem to require an embeded S node to serve as a conjoined proposition inside a single “what” operator. However, it is far more common for conjoined questions with interrogative pronouns to be interpreted as a conjunction of two separate questions. This is especially true for conjoined “what is ...” questions. For example, “What is the concentration of silicon in S10046 and the concentration of rubidium in S10084?” is clearly not asking for a single number that happens to be the value of the concentration in both cases. The LUNAR system contains rules for handling interrogative pronouns only in the special case of “what is. . .” questions. In this special case, conjoined questions fall into two classes, both of which seem to be handled correctly without special provisions in the grammar. In questions where the questioned noun phrase contains an explicit relative clause, that clause will contain an S node where conjunctions can be made and LUNAR’S current techniques will treat this as one question with a conjoined restriction (e.g., “What is the sample that contains less than 15% silicon and contains more than 5% nickel?”). On the other hand, when there is no explicit relative clause, LUNAR will interpret such questions as a conjunction of separate questions (e.g., “What is the concentration of silicon in S10046 and the concentration of rubidium in S10084?”). The conventional structure assigned to “what is. . .” sentences by a transformational grammar represents the surface object as the deep subject, with a deep verb “be” and predicate complement corresponding to
W. A. WOODS
52
the interrogative pronoun “what.” For example, in LUNAR the question “What is the concentration of silicon in S10046?” becomes S Q NP
DETTHE N CONCENTRATION NU SG PP PREPOF NP DET NIL N SILICON NU SG PP PREPIN NP NPR S10046 AUX TNS PRESENT VP V B E NP DET WHQ N THING NU SG/PL
A special SRULE for the verb “be” with complement “WHQ THING SG/PL” handles this case with a right-hand side schema:
(QUOTE (PRINTOUT (# 1 1))) where the REF (# 1 1) refers to the subject noun phrase. A somewhat more general treatment of the interrogative pronoun “what” would involve a DRULE whose right-hand side was
(FOR EVERY X / THING : DLT ; (PRINTOUT X) 1. Where the interpretation of the matrix sentence is to be inserted as a restriction, on the range of quantification and the overall interpretation is a command to print out the values that satisfy it. (THING in this case is meant to stand for the universal class.) One would not want to apply this rule in general to the simple “What is ...” questions as above, since it would result in an interpretation that was less efficient (i.e., would enumerate all possible things and try to filter out the answer with a n equality predicate). For example, “what is the concentration of silicon in S10046” would be interpreted
(FOR THE X / (DATALINE SIN46 OVERALL SI02) : T ; (FOR EVERY Y / THING : (EQUAL X Y) ; (PRINTOUT Y) ))
NATURAL LANGUAGE QUESTION ANSWERING
53
instead of (FOR T H E X / (DATALINE S 10046 OVERALL SI02) : T ; (PRINTOUT X ) ) . Thus, one would still want to keep the special “what is ...” rule and LUNAR would only use the general rule in cases where the ”what is ...” rule did not apply. (When the “what is ...” rule does apply, it does not even call for the interpretation of the “what” noun phrase that it has matched, so the general rule would not be invoked.) Alternatively, one could use the general rule for all cases and then perform post-interpretive query optimization (see Section 8) to transform instances of filtering with equality predicates to a more efficient form that eliminates the unnecessary quantification. 7.8.3 Other Kinds of Wh Questions Note that LUNAR interprets “what is ...” questions only as a request for the value of some function or the result of some search o r computation, and not as requesting a definition o r explanation. For example if LUNAR is asked “what is a sample” it will respond with an example (e.g., “S10046”), and if it is asked “what is S10046,” it will respond “S10046.” LUNAR is not aware of the internal structure of the defining procedures for its terms, nor does it have any intensional description of what samples are, so it has no way of answering the first type of question. There is no difficulty, however, in defining another rule for “what is ...” to apply to proper nouns and produce an interpretation with an operator NAME-CLASS (instead of PRINTOUT) that will print the class of an individual instead of its name. “What is S10046?” would then be interpreted as (NAME-CLASS S 10046), which would answer “a sample.” Getting LUNAR to say something more complete about how S10046 differs from other samples, such as “a sample that contains a large olivine inclusion,” is another matter. Among other problems, this would begin to tread into the area of pragmatics, where considerations such as the user’s probable intent in asking the question and appropriateness of response in a particular context, as well as semantic considerations of meaning, become an issue (see Section 11.5). All of this is well beyond the scope of systems like LUNAR. However, deciding what semantic representation to assign as the intent of such a question is not nearly as
54
W. A. WOODS
difficult as deciding what the defining procedure for some of the possible intents should be. LUNAR’S mechanisms are suitable for generating the alternative possible semantic representstions.
8. Post-Interpretive Processing
As mentioned before, the LUNAR meaning representation language has been designed both as a representation of executable procedures and as a symbolic structure that can be manipulated as an intensional object. Although every expression in the LUNAR MRL has an explicit semantics defined by its straightforward execution as a procedure, that procedure is frequently not the best one to execute to answer a question or carry out a command. For example, in the flight schedules applications, the literal interpretation of the expression (FOR EVERY X / FLIGHT : (CONNECT X BOSTON CHICAGO) ; (PRINTOUT X)) is to enumerate all of the flights known to the system, filtering out the ones that do not go from Boston to Chicago, and printing out the rest. However, in a reasonable data base for this domain, there would be various indexes into the flights, breaking them down by destination city and city of origin. If such an index exists, then a specialized enumeration function FLIGHT-FROM-TO could be defined for using the index to enumerate only flights from a given city to another. In this case, the above request could be represented as
(FOR EVERY X / (FLIGHT-FROM-TO BOSTON CHICAGO) : T ; (PRINTOUT X)). which would be much more efficient to execute. Given the possibility of using specialized enumeration functions, one can then either write special interpretation rules to use the more specific enumeration function in the cases where it is appropriate, or one can perform some intensional manipulations on the interpretation assigned by the original rules to transform it into an equivalent expression that is more efficient to execute. The first approach was used in the original flight schedules system. An approach similar to the latter was used in the grammar information system, and to some extent in LUNAR, by
NATURAL LANGUAGE QUESTION ANSWERING
55
using “smart” quantifiers (see below). Recently, Reiter (1977) has presented a systematic treatment of a class of query optimizations in systems like LUNAR that interface to a relational data base. Other post-interpretive operations on the MRL expression are performed in LUNAR to analyze the quantifiers and make entries in a discourse directory for potential antecedents of anaphoric expressions. Subsequently, definite descriptions and pronouns can make reference to this directory to select antecedents. I will not go into the treatment of anaphoric expressions in this paper other than to say that the search for the antecedent is invoked by an operator ANTEQUANT in the righthand side of the DRULES that interpret anaphoric noun phrases. In general, this results in the generation of a quantifier, usually a copy of the one that was associated with the antecedent. Occasionally, the antecedent will itself fall in the scope of a higher quantifier on which it depends, in which case such governing quantifiers will also be copied and incorporated into the current interpretation. Some of the characteristics of LUNAR’S treatment of anaphora are covered in Nash-Webber (1976) and woods et a l . (1972). 8.1 Smart Quantifiers
In the grammar information system, a notation of “smart” quantifier was introduced, which rather than blindly executing the quantification procedure obtained from semantic interpretation, made an effort to determine if there was a more specific enumeration function that could be used to obtain an equivalent answer. In general, the restriction on the range of quantification determines a subclass of the class over which quantification is ranging. If one can find a specialized enumeration function that enumerates a subclass of the original class but is still guaranteed to include any of the members that would have passed the original restriction, then that subclass enumeration function can be used in place of the original. In the grammar information system, tables of specialized enumeration functions, together with sufficient conditions for their use, were stored associated with each basic class over which quantification could range. A resolution theorem prover a la Robinson (1965) was then used to determine whether the restriction of a given quantification implied one of the sufficient conditions for a more specialized class enumeration function. If so, the more specialized function was used. Unlike most applications of resolution theorem proving, the inferences required in this case are all very short, and since the purpose of the inference is to
56
W. A. WOODS
improve the efficiency of the quantification, a natural bound can be set on the amount of time the theorem prover should spend before the attempt should be given up and the original enumeration function used. In general, sufficiency conditions for specialized enumeration functions are parameterized with open variables to be instantiated during the proof of the sufficiency condition and then used as parameters for the specialized enumeration function. The resolution theorem proving strategies have a nice feature of providing such instantiated parameters as a result of their proofs; e.g., by using a mechanism such as the “answer” predicate of Green (1969). Smart quantifiers were intended in general to be capable of other operations, such as estimating the cost of a computation from the sizes of the classes being quantified over and the depth of quantifier nesting (and warning the user if the cost might be excessive), saving the results of inner loop quantifications where they could be reused, interchanging the scopes of quantification to bring things that do not change outside a loop, etc. The capabilities actually implemented, however, are much more limited.
8.7.7 Path Enumeration in ATN’s Smart quantifiers were essential for efficiency in the grammar information system’s enumeration of paths through its ATN. The system contained a variety of specialized path enumeration functions: one for paths between a given pair of states, one for paths leaving a given state, one for paths arriving at a given state, one for paths irrespective of end states, and versions of all of these for looping and nonlooping paths. Each specialized enumeration function was associated with a parameterized sufficiency condition for its use. For example, the function for nonlooping paths leaving a given state had a table entry equivalent to (PATHSEQ Y T) if (AND (NOLOOP X) (START X Y ) ) where X refers to the variable of the class being quantified over, Y is a parameter to be instantiated, and (PATHSEQ Y T) is the enumeration function to be used if the sufficiency condition is satisfied. Thus, if a quantification over paths had a restriction such as (AND (CONNECT-PATH X S/ S/VP) (NOLOOP X)) and the theorem prover had axioms such as (CONNECT-PATH X Y Z)+(START X Y), then
NATURAL LANGUAGE QUESTION ANSWERING
57
the theorem prover would infer that the sufficiency condition (AND (NOLOOP X) (START X Y)) is satisfied with Y equal to S/ and therefore the specialized enumeration function (PATHSEQ S/ T) can be used. Notice that the order of conjuncts in the restriction is irrelevant, and the restriction need only imply the sufficiency condition not match it exactly. In the above, there are still conditions in the restriction that will have to be checked as a filter on the output of the specialized enumeration function to make sure that the end of the path is at state SNP. In general, it would be nice to remove from the restriction that portion that is already guaranteed to be satisfied by the new enumeration function, but that is easier said than done. In the grammar information system the original restriction was kept and used unchanged.
8.1.2 Document Retrieval in LUNAR
In the LUNAR system, a special case of smart quantifiers, without a general theorem prover, is used to handle enumeration of documents about a topic. When the FOR function determines that the class of objects being enumerated is DOCUMENT, it looks for a predicate (ABOUT X TOPIC) in the restriction (possibly in the scope of a conjunction but not under a negative). It then uses this topic as a parameter to an inverted file accessing routine which retrieves documents about a given topic.
8.2 Printing Quantifier Dependencies
The LUNAR MRL permits the natural expression of fairly complex requests such as “What is the average aluminum concentration in each of the type c rocks?” The interpretation of this request would be (FOR EVERY X / (SEQ TYPECS) : T ; (FOR THE Y I (AVERAGE Z / (DATALINE X OVERALL AL203)) : T ;(PRINTOUT Y) )). If the PRINTOUT command does nothing more than print out a representation for the value of its argument, the result of this command will be nothing more than a list of numbers, with no indication of which number goes with which of the rocks. Needless to say, this is usually not what the user expected.
58
W. A. WOODS
For special classes of objects, say concentrations, a pseudo-solution to this problem would be to adopt a strategy of always printing out all conceivable dependencies for that object (e.g., the sample, phase, and element associated with that concentration). This would be sufficient to indicate what dependencies each answer had on values of arguments, but would take no account of which of those dependencies was currently varying and which were fixed by the request. Moreover, this approach would not work in the above case, since the objects being printed are the results of a general purpose numerical averaging function, which does not necessarily have any dependencies, depending on what is being averaged and what classes are being averaged over. LUNAR contains a general solution to this quantifier dependency problem that is achieved by making the PRINTOUT command an opaque operator that processes its argument in a semi-intelligent way as an intensional object. PRINTOUT examines its argument for the occurrence of free variables. If the argument is itself a variable, it looks up the corresponding governing quantifier in the discourse directory (the same directory used for antecedents of anaphoric expressions) and checks that quantifier for occurrences of free variables. If it finds free variables in either place, it means that the object it is about to print has a dependency on those variables. In that case it prints out the current values of those variables along with the value that it is about to print out. In the case of the example above, the variable Y has the corresponding class specification (DATALINE X OVERALL SI02) with restriction T, and is thus dependent on the variable X, which is ranging over the rocks. As a result, the printout from this request would look like S10018 12.48 PCT 510019 12.80 PCT
S10021 12.82 PCT This mechanism works for arbitrary nesting of any number of quantifiers.
9. An Example
As an example of the overall operation of the semantic interpreter to review and illustrate the preceding discussions, consider the sentence “What is the average modal plagioclase concentration for lunar samples that contain rubidium?”
NATURAL LANGUAGE QUESTION ANSWERING
59
This sentence has the following syntactic structure assigned to it by the gram mar: S Q NP
DETTHE N AVERAGE NU SG PP P R E P O F N P D E T NIL ADJ MODAL ADJ N PLAGIOC.LASE N CONCENTRATION NU PL PP PREPFOR NP DET NIL ADJ LUNAR N SAMPLE NU PL S REL NP DETWHR N SAMPLE NU PL AUX TNS PRESENT VP VCONTAIN NP DET NIL N RUBIDIUM AUX TNS PRESENT NU SG VP V B E NP DETWHQ N THING NU SG/PL.
Semantic interpretation begins with a call to INTERP looking at the topmost S node with TYPEFLAG NIL. The function RULES looking at an S node with TYPEFLAG NIL returns the global rule tree PRERULES. These rules look for such things as y e s h o question markers, sentential negations, etc. In this case, a rule PR6 matches and right-hand side, (PRED (# 0 SRULES)), specifies a call to INTERP for the same node with TYPEFLAG SRULES. The function RULES looking at the S node with TYPEFLAG SRULES
60
W. A. WOODS
returns a rule tree which it gets from the dictionary entry for the head of the sentence (the verb BE), and in this case a rule S :BE-WHAT matches. Its right-hand side is
(PRED (PRINTOUT (# 1 1))) specifying a schema into which the interpretation of the subject noun phrase is to be inserted. The semantic interpreter now begins to look at the subject noun phrase with TYPEFLAG NIL. In this case, RULES is smart enough to check the determiner THE and return the rule tree:
(D:THE-SG2 NIL D:THE-SG NIL D:THE-PL) of which, the rule D: THE-SG matches successfully. The right-hand side of this rule is
(QUANT (FOR THE X / (# 0 NRULES) : (# 0 RRULES) ; DLT)) which specifies that a quantifier is to be constructed by substituting in the indicated places the interpretations of this same node with TYPEFLAGS NRULES and RRULES. The call to interpret the subject noun phrase with TYPEFLAG NRULES finds a list of NRULES in the dictionary entry for the word “average,” consisting of the single rule N :AVERAGE. This rule, which we presented previously in Section 7.6, has a right-hand side
(QUOTE (SEQL (AVERAGE X / (# 1 1 SET) ))) which calls for the interpretation of the “concentration” noun phrase with TYPEFLAG SET. The call to interpret the “average” node with TYPEFLAG RRULES, which will be done later, will result in the empty restriction T. The call to interpret the “concentration” noun phrase with TYPEFLAG SET uses a list of rules (D:SETOF NIL D:NOT-SET) where D : SETOF, which has been discussed previously in Section 7.7, checks for a determiner and number consistent with a set interpretation (i.e., determiner THE or NIL and number PL) and D:NOT-SET will match anything else. In this case, D : SETOF matches, with right-hand side
(SETGEN (SETOF X / (# 0 NRULES) : (# 0 RRULES) )) and calls for the interpretation of the same node with TYPEFLAGs NRULES and RRULES. The call with NRULES finds a matching rule N :MODAL-CONC after failing to match N :CONCENTRATION because of the presence of the adjective MODAL, which is rejected by a negated template. N : MODAL-CONC is used to interpret modal concen-
NATURAL LANGUAGE QUESTION ANSWERING
61
trations of minerals in samples as a whole, and has the form [N: MODAL-CONC (NP.N (MEM 1 (CONCENTRATION))) (OR (NP.PP (MEM 2 (SAMPLE))) (NP.PP.PP (MEM 2 (SAMPLE))) (DEFAULT (2 NP (DET EVERY) ( N SAMPLE) ( N U SG)))) (OR (NP.PP (MEM 2 (PHASE MINERAL ELEMENT OXIDE ISOTOPE))) (NP.ADJ#2 (MEM 2 (PHASE MINERAL ELEMENT OXIDE ISOTOPE)))) + (QUOTE (DATALINE ( # 2 2) OVERALL ( # 3 2))) 1. (DEFAULT is a special kind of template that always succeeds and that makes explicit bindings for use in the right-hand side. In the above case, if the “concentration” noun phrase had not mentioned a sample, then the default “every sample” would be assumed.) N : MODAL-CONC in turn calls for the interpretations of the “sample” noun phrase and the constituent “rubidium.” In interpreting the “sample” noun phrase, it again goes through the initial cycle of DRULES selected by TYPEFLAG NIL looking at a noun phrase, in this case finding a matching rule D : NIL whose right-hand side is (QUANT (FOR GEN X / (# 0 NRULES) : (# 0 RRULES) ; DLT )) This in turn invokes an NRULES interpretation of the same phrase which uses the rule tree (N:TYPEA N:TYPEB N:TYPEC N:TYPED NIL N : SAMPLE) that looks first for any of the specific kinds of samples that might be referred to, and failing any of these, tries the general rule N : SAMPLE. N : SAMPLE checks for the head “sample” with an optional adjective “lunar” or the complete phrase “lunar material” and has a right-hand side (QUOTE (SEQ SAMPLES)) where SEQ is the general enumeration function for known lists, and SAMPLES is a list of all the samples in the data base. The RRULES interpretation uses the rule tree ((AND R: SAMPLEWITH R :SAMPLE-WITH-COMP R :QREL R :REL R :PP R :ADJ)), which contains a single AND group of rules, all of which are to be tried and the results of any successful matches conjoined. The rule R:REL matches the relative clause, tagging the relative pronoun with the variable of interpretation X13 and then calling for the interpretation of the relative
62
W. A. WOODS
clause via the right-hand side (PRED (# 1 1)). The interpretation of the relative clause, like that of the main clause begins with a set of PRERULES, of which a rule PR6 matches with righthand side (PRED (# 0 SRULES)). This again calls for the interpretation of the same node with TYPEFLAG SRULES. This interpretation finds the rule S :CONTAIN (presented earlier in Section 6), whose right-hand side calls for the interpretation of its subject noun phrase (which it finds already interpreted with the vanable of quantification from above) and its object noun phrase “rubidium.” The latter is interpreted by a rule D: MASS, whose right-hand side looks up the word “rubidium” in the dictionary to get its standard data base representation RB (from a property name TABFORM) and produces the interpretation (QUOTE RB). As a SEM-QUANT pair, this is ((QUOTE RB) DLT). This interpretation, together with that of the relative pronoun is returned to the process interpreting the “contain” clause, where they produce (after substitution and right-hand side evaluation) the SEMQUANT pair ((CONTAIN X13 (QUOTE RB)) DLT). This same SEM-QUANT pair is return unchanged by the R:REL rule and since that is the only matching RRULE, no conjoining needs to be done to obtain the result of the RRULES interpretation of the “sample” noun phrase. Inserting this and the NRULES interpretation into the righthand side of D: NIL, and executing, produces the SEM-QUANT pair
(X13(FOR GEN XI3 / (SEQ SAMPLES) : (CONTAIN X13 (QUOTE RB)) ; DLT )) where the right-hand side evaluation of the QUANT operator has embedded the quantifier in the QUANT accumulator and returned the SEM X13. We now return to the NRULES interpretation of the “concentration” noun phrase, whose right-hand side called for the above interpretation and now calls for the interpretation of “plagioclase.” Again, the D: MASS rule applies, looking up the TABFORM of the word in the dictionary and resulting in the SEM-QUANT pair ((QUOTE PLAG) DLT).
.
NATURAL LANGUAGE QUESTION ANSWERING
63
The substitution of these two into the right-hand side of the rule N :MODAL-CONC (and evaluating) produces the SEM-QUANT pair: ((DATALINE X 13 OVERALL (QUOTE PLAG)) (FOR GEN X13 / (SEQ SAMPLES) : (CONTAIN X13 (QUOTE RB)) ; DLT ))
where the quantifier from below is still being passed up. The RRULES interpretation of the “concentration” noun phrase produces T, since there are no predicating modifiers, and the insertion of these two into the right-hand side of the rule D: SETOF produces (SETGEN (SETOF X12 / (DATALINE X13 OVERALL (QUOTE PLAG)) : T ))
while the quantifier accumulator QUANT contains the collar (FOR GEN X13 / (SEQ SAMPLES) : (CONTAIN XI3 (QUOTE RB)) ; DLT). The execution of the function SETGEN grabs the generic quantifier from the register QUANT, leaving QUANT set to DLT, and produces the SEM (UNION X13 / (SEQ SAMPLES) : (CONTAIN X13 (QUOTE RB)) ; (SETOF X12 / (DATALINE X13 OVERALL (QUOTE PLAG)) : T )).
The quantification over samples has now been turned into a union of sets of data lines over a set of samples. The resulting SEM and QUANT are returned to the process that is interpreting the ”average” phrase, where the insertion into the righthand side of the rule N : AVERAGE and subsequent evaluation yields the SEM-QUANT pair ((SEQL (AVERAGE XI I / (UNION X13 / (SEQ SAMPLES) : (CONTAIN X13 (QUOTE RB)) ; (SETOF XI2 / (DATALINE XI3 OVERALL (QUOTE PLAG)) : T )))) DLT).
Interpretation of the “average” phrase with TYPEFLAG RRULES produces the SEM-QUANT pair (T DLT), and the insertion of this and the above into the right-hand side of the DRULE D:THE-SG and evaluating yields the SEM-QUANT pair (XI I (FOR THE XI I / (SEQL (AVERAGE XI I / (UNION X13 / (SEQ SAMPLES) : (CONTAIN X13 (QUOTE RBI) ; (SETOF X12 / (DATALINE X13 OVERALL (QUOTE PLAG)) : T)))) : T ; DLT)).
64
W. A. WOODS
This is returned to the SRULE S : BE-WHAT where the SEM XI 1 is embedded in the right-hand side to produce: (PRED (PKINTOUT XI I ) ) . Evaluating this expression grabs the quantifier to produce the new SEM, which the next higher rule, PR6, passes on unchanged as the final interpretation: (FOR THE XI I / (SEQL (AVERAGE XI 1 / (UNION X I 3 / (SEQ SAMPLES) : (CONTAIN X I 3 (QUOTE RB)) ; (SETOF XI2 / (DATALINE XI3 OVERALL (QUOTE PLAG)) : T)))) : T : (PRINTOUT XI 1)).
10. Loose Ends, Problems, and Future Directions
The techniques that I have described make a good start in handling the semantic interpretation of quantification in natural English-especially in the interaction of quantifiers with each other, with negatives, and with operators like “average.” However, problems remain. Some reflect LUNAR’s status as an intermediate benchmark in an intended ongoing project. Others reflect the presence of some difficult problems that LUNAR would eventually have had to come up against. In the remaining sections, I will discuss some of the limitations of LUNAR’s techniques, problems left unfaced, and trends and directions for future work in this area. 10.1 Approximate Solutions
One characteristic of some of the techniques used in LUNAR and many other systems is that they are only approximate solutions, A good example of an approximate solution to a problem is illustrated by LUNAR’s use of the head word of a constituent as the sole source of features for the testing of semantic conditions in the left-hand sides of rules. To be generally adequate, it seems that semantic tests should be applied to the interpretation of a phrase, not just its syntactic structure (and especially not just its head). Some of the problems with the approximate approach became apparent when LUNAR first began to handle conjoined phrases. For example, it’s simple semantic tests were no longer adequate when, instead of a single noun phrase of type X, a conjunction was encountered. This was due to a prior decision that the head of a conjoined phrase should be the conjunction operator (e.g., AND), since a constituent should have a unique head and there is no other unique
NATURAL LANGUAGE QUESTION ANSWERING
65
candidate in a coordinate conjunction. However, since a conjunction operator would never have the semantic features expected by a rule, selectional restrictions applied to the head would not work. A possible solution to this problem is to define the semantic features of a conjoined phrase to be the intersection of the features of its individual conjuncts. This has the attractive feature of enforcing some of the wellknown parallelism constraints on conjunctions in English (i.e., conjoined constituents should be of like kind or similar in some respect). However, this solution is again only an approximation of what is required to fully model parallelism constraints. For example, it does not consider factors of size or complexity of the conjuncts. Further experience with such a model will almost certainly uncover still more problems. Another example where obtaining the features from the head alone is inadequate involves noun phrases in which an adjective modifying the head contributes essential information (e.g., obtaining a feature +TOY from the phrase “toy gun”). In general, semantic selectional restrictions seem to require intensional models of potential referents rather than just syntactic structures. (In fact, their applying to such models is really the only justification for calling such constraints “semantic.”) In my paper “Meaning and Machines” (Woods, 1973c), I discuss more fully the necessity for invoking models of semantic reference for correctly dealing with such restrictions. More seriously, the whole treatment of selectional restrictions as prerequisites for meaningfulness is not quite correct, and the details of making selectional restrictions work correctly in various contexts such as modal sentences (especially assertions of impossibility) are far from worked out. For example, there is nothing wrong with the assertion “Rocks cannot love people” even if there seems to be something odd about “the rock loved John.” Again, Woods (1973~)discusses such problems more fully. 10.2 Modifier Placement
Another area in which LUNAR’S solution to a problem was less than general is in the interpretation of modifiers that are syntactically ambiguous as to what they modify. For example, in the sentence “Give me the average analysis of breccias for all major elements,” there are at least three syntactic possibilities for the modifier “for all major elements” (it can modify the phrases headed by “breccias,” “analysis,” o r “give”). In this case, our understanding of the semantics of the situation tells us that it modifies “analysis,” since one can analyze a sample for an element, while “breccias for all major elements” does not make sense.
W. A. WOODS Without a semantic understanding of the situation, the computer has no criteria to select which of these three cases to use. One of the roles that one might like the syntactic component to play in a language understanding system would be to make the appropriate grouping of a movable modifier with the phrase it modifies, so that the subsequent semantic interpretation rules will find the constituent where they would like it to be. However, since there is not always enough information available to the parser to make this decision on the basis of syntactic information alone, this would mean requiring the parser to generate all of the alternatives, from which the semantic interpreter would then make the choice. This in turn would mean that the interpreter would have to spend effort typing to interpret a wrong parsing, only to have to throw it away and start over again on a new one. It would be better for the parser to call upon semantic knowledge earlier in the process, while it is still trying to enumerate the alternative possible locations for the movable modifier. The question it would ask at this point would simply be whether a given phrase can take the kind of modifier in question, rather than a complete attempt to interpret each possibility. 10.2.1 Selective Modifier Placement
In general, the ATN grammars used in LUNAR tend to minimize the amount of unnecessary case analysis of alternative possible parsings by keeping common parts of different alternatives merged until the point in the sentence is reached where they make different predictions. At such a point, the choice between alternatives is frequently determined by having only one of their predictions satisfied. However, one place where this kind of factoring does not significantly constrain the branching of possibilities is at the end of a constituent where the grammar permits optional additional modifiers (e.g., prepositional phrase modifiers at the end of a noun phrase, as in the above example). Here. the alternatives of continuing to pick up modifiers at the same level and popping to a higher level have to be considered separately. If when the alternative of popping a constituent is chosen and the construction at the higher level can also take the same kind of modifier as the lower constituent, then a real ambiguity will result unless some restriction makes the modifier compatible with only one of the alternatives. The LUNAR parser contains a facility called “selective modifier placement” for dealing with such “movable modifiers.” When this facility is enabled, each time a movable modifier is constructed, the parser returns to the level that pushed for it to see if the configuration that caused the
NATURAL LANGUAGE QUESTION ANSWERING
67
push could also have popped to a higher level and, if so, whether that higher level could also have pushed for the same thing. It repeats this process until it has gathered up all of the levels that could possibly (syntactically) use the modifier. It then asks semantic questions to rank order the possibilities, choosing the most likely one, and generating alternatives for the others. In a classic example, ‘‘.I saw the man in the park with a telescope,” the phrase “in the park” could modify either “man” or “see,” and “with a telescope” could modify either “park,” “ man,” or “see” (with the possible exception, depending on your dialect, of forbidding “with a telescope” from modifying ‘‘man” if “in the park” is interpreted as modifying “see”). The selective modifier placement facility chooses the interpretation “see with a telescope” and “man in the park” when given information that one can see with an optical instrument. Woods (1973a) describes this facility for selective modifier placement more fully. 10.2.2 Using Misplaced Modifiers
Although the selective modifier placement facility in LUNAR’S parser is probably very close to the right solution to this problem of movable modifiers, the mechanism as implemented requires the semantic information that it uses to be organized in a slightly different form from that used in the semantic interpretation rules. Rather than duplicate the information, LUNAR’S demonstration prototype used a different approach. In this sytem, the grammar determined an initial placement of such modifiers based solely on what prepositions a given head noun could take as modifiers. Subject to this constraint, the movable modifier was parsed as modifying the nearest preceding constituent (i.e., as deep in the parse tree as premitted by the constraint). Subsequently during interpretation, if the semantic interpreter failed to find a needed constituent at the level it wanted it, it would look for it attached to more deeply embedded levels in the tree. If this procedure for looking for misplaced modifiers had been handled by a general mechanism for looking for misplaced constituents subject to appropriate syntactic and semantic guidance, it would provide an alternative approach of comparable generality to selective modifier placement, raising an interesting set of questions as to the relative advantages of the two approaches. In the demonstration prototype, however, it was handled by the simple expedient of using disjunctive templates in the rules to look for a constituent in each of the places where it might occur. Each rule thus had to be individually tailored to look for its needed constituents wherever they might occur. Problems were also present in
W. A. WOODS making sure that all modifiers were used by some rule and avoiding duplicate use of the same modifier more than once. A number of such decisions were made in LUNAR for the expedient of getting it working, and are not necessarily of theoretical interest. This particular one is mentioned here because of its suggestion of a possible way to handle a problem, and also to illustrate the difference between solving a problem in general and patching a system up to handle a few cases. 10.3 Multiple Uses of Constituents
Alluded to above in the discussion of LUNAR’S method of looking for misplaced modifiers was the potential for several different rules to use the same constituent for different purposes. In general, one expects a given modifier to have only one function in a sentence. However, this is not always the case. For example, an interesting characteristic of the “average” operator is the special use of a prepositional phrase with the preposition “over,” which usurps one of the arguments of the function being averaged. Specifically, in “the average concentration of silicon over the breccias,” the prepositional phrase “over the breccias” is clearly an argument to the average function, specifying the class of objects over which the average is to be computed. However, it is also redundantly specifying the variable that will fill the constituent slot of the concentration schema, even though it does not have any of the prepositions that would normally specify this slot. The semantic interpretation framework that the LUNAR system embodies does not anticipate the simultaneous use of a constituent as a part of two different operators in this fashion (although the implemented mechanism does not forbid it). The rules in the implemented LUNAR system deal with this problem (as opposed to solving it) by permitting the prepositional phrase with “over” to modify concentration rather than average. This choice was made because the average operator is interpretable without a specific “over” modifier, whereas the concentration is not interpretable without a constituent whose concentration is being measured. However, this “solution” leaves us without any constraint that “over” can only occur with averages. Consequently, phrases such as “the concentration of silicon over S10046” would be acceptable. Such lack of constraint is generally not a serious problem in very restricted topic domains and with relatively simple sentences, because users are unlikely to use one of the unacceptable constructions. However, as the complexity of the language increases, especially with the introduction of constructions such
NATURAL LANGUAGE QUESTION ANSWERING
69
as reduced relative clauses and conjunction reduction, the possibility increases that some of these unacceptable sequences may be posed as partial parsings of an otherwise acceptable sentence, and can either result in unintended parsings or long excursions into spurious garden path interpretations. This kind of ad hoc “solution” to the “average ...over ...” problem is typical of the compromises made in many natural language systems, and is brought up here to illustrate the wrong way to attack a problem. It contrasts strongly with the kinds of general techniques that typify LUNAR’S solutions to other problems. 10.4 Ellipsis
Possibly the correct solution to the problem of “average ...over ...” is one that handles a general class of ellipsis-those cases where an argument is omitted because it can be inferred from information available elsewhere in a sentence. In this account, the “over” phrase would be an argument to “average” and the subordinate “concentration” phrase would have an ellipsed specification of the constituent being measured. A similar problem with ellipsis occurs in the flight schedules context, where sentences such as List the departure time from Boston of every TWA flight to Chicago. would be interpreted literally as asking for the Boston departure times of all TWA flights that go to Chicago, regardless of whether they even go through Boston. To express the intended request without ellipsis, the user would have to say List the departure time from Boston of every TWA flightfrom Boston to Chicago. As I pointed out in my thesis (Woods, 1967), the information in the semantic rules provides the necessary information for the first step in treating such ellipsis-the recognition that something is missing. Capitalizing on this, however, requires a rule-matching component that is able to find and remember the closest matching rule when no rule matches fully, and to provide specifications of the missing pieces to be used by some search routine that tries to recover the ellipsis. This latter routine would have to examine the rest of the structure of the sentence, and perhaps some of the discourse history, to determine if there are appropriate contextually specified fillers to use. Research problems associated with such ellipsis have to do with the resolution of alternative possible fillers that meet the description, finding potential fillers that are not
70
W. A. WOODS
explicitly mentioned elsewhere but must be inferred, and characterizing the regions of the surrounding context that can legitimately provide antecedents for ellipsis (e.g., can they be extracted out of subordinate relative clauses that do not dominate the occurrence of the ellipsis?).
10.5 Plausibility of Alternative interpretations In general, the correct way to handle many of the potential ambiguities that arise in English seems to be to construct representations of alternative interpretations, or alternative parts of interpretations, and evaluate the alternatives for their relative plausibility. LUNAR does not contain such a facility. Instead, it makes the best effort it can to resolve ambiguities, given what it knows about general rules for preferred parsings, criteria for preferred interpretations, and specific semantic selectional restrictions for nouns and verbs. LUNAR does quite well within these constraints in handling a wide variety of constructions. This is successful largely because of the limited nature of the subject matter and consequent implicit constraints on the kinds of questions and statements that are sensible. However, a variety of phenomena seem to require a more general plausibility evaluator to choose between alternatives. If one had such an evaluator of relative plausibility, the mechanisms used in LUNAR would be adequate to generate the necessary alternatives.
10.6 Anaphoric Reference Anaphoric reference is another problem area in which LUNAR’s treatment does not embody a sufficiently general solution. Every time an interpretation is constructed, LUNAR makes entries in a discourse directory for each constituent that may be subsequently referred to anaphorically. Each entry consists of the original syntactic structure of a phrase, plus a slightly modified form of its semantic interpretation. In response to an anaphoric expression such as “it” and “that sample,” LUNAR searches this directory for the most recent possible antecedent and reuses its previous interpretation. LUNAR’s anaphoric reference facility is fairly sophisticated, including the possibility to refer to an object that is dependent on another quantified object, in which case it will bring forward both quantifiers into the interpretation of the new sentence (e.g., “What is the silicon content of each volcanic sample?” “What is its magnesium concentration?”). It also handles certain cases of anaphora where only part of the intensional description of a previous phrase is reused (e.g., “What is the concentration of silicon in breccias?” “What is it in volcanics?”). However, this facility contains a number of loose ends. One of the most serious is that
NATURAL LANGUAGE QUESTION ANSWERING
71
only the phrases typed in by the user are available for anaphoric reference, while the potential antecedents implied by the responses of the system are not (responses were usually not expressed in English, and in any case were not entered into the discourse directory). Anaphoric reference in general contains some very deep problems, some of which are revealed in LUNAR. Nash-Webber (1976, 1977), Nash-Webber and Reiter (1977), and Webber (1978) discuss these problems in detail. 10.7 Ill-Formed Input and Partial Interpretation
One of the problems that face a real user of a natural language understanding system is that not everything that he tries to say to the system is understandable to it. LUNAR tried to cope with this problem by having a grammar sufficiently comprehensive that it would understand everything a lunar geologist might ask about its data base. The system actually came fairly close to doing that. In other systems, such as the SOPHIE system of Brown and Burton (1975), this has been achieved even more completely. In a limited topic domain, this can be done by systematically extending the range of the system’s understanding every time a sentence is encountered that is not understood, until eventually a virtual closure is obtained. Unfortunately, in less topic-specific systems, it is more difficult to reach this kind of closure, and in such cases it would be desirable for the system to provide a user with some partial analysis of his request to at least help him develop a model of what the machine does and does not understand. LUNAR contains no facility for such partial understanding, although it does have a rudimentary facility to comment about modifiers that it does not understand in an otherwise understandable sentence and to notify the user of a phrase that it does not understand in a sentence that it has managed to parse but cannot interpret. Given the size of its vocabulary and the extensiveness of its grammar, there are large classes of sentences that LUNAR can parse but not understand. For these, LUNAR will at least inform the user of the first phrase that it encounters that it cannot understand. However, it cannot respond to questions about its range of understanding or be of much help to the user in finding out whether (and, if so, how) one can rephrase a request to make it understandable. More seriously, if a sentence fails to parse (a less common occurrence, but not unusual), LUNAR provides only the cryptic information that it could not parse the input. The reason for this is as follows. If the user has used words that are not in its dictionary, LUNAR of course informs him of this fact and the problem is clear. If, however, the user has used known words in a way that does not parse, all LUNAR
72
W. A. WOODS
knows is that it has tried all of its possible ways to parse the input and none of them succeeded. In general, the parser has followed a large number of alternative parsing paths, each of which has gotten some distance through the input sentence before reaching an inconsistency. LUNAR in fact keeps track of each blocked path, and even knows which one of them has gotten the farthest through the sentence. However, experience has shown that there is no reason to expect this longest partial parse path to be correct. In general, the mistake has occurred at some earlier point, after which the grammar has continued to fit words into its false hypothesis for some unknown distance before a n inconsistency arises. Beyond simply printing out the words used in this longest path (letting the user guess what grammatical characteristic of his sentence was unknown to the computer) there is no obvious solution to this problem. In this respect, a language with a deterministic grammar has an advantage over natural English, since there will only be one such parse path. In that case, when the parser blocks, there is no question about which path was best. Note that there is no problem here in handling any particular case or anticipated situation. Arbitrary classes of grammatical violations can be anticipated and entered into the grammar (usually with an associated penalty to keep them from interfering with completely grammatical interpretations). Such sentences will no longer be a problem. What we are concerned with here requires a system with an understanding of its own understanding, and an ability to converse with a user about the meaning and use of words and constructions. Such a system would be highly desirable, but is far from realization at present. The grammar information system discussed above, which knows about its own grammar and can talk about states and transitions in the grammar, is a long way from being able to help a user in this situation. One technique from the HWIM speech understanding system (Woods rt d.,1976) that could help in such a situation is to find maximally consistent islands in the word string using a bidirectional ATN parser that can parse any fragment of a correct sentence from the middle out. One could then search in the regions where such islands abut or overlap for possible transitions that could connect the two. A special case of the ungrammatical sentence problem is the case of a mistyped word. If the misspelling results in an unknown word, then the problem is simple; when LUNAR informs the user of an unknown word, it also gives him the opportunity to change it and continue. However, if the misspelling results in another legal word, then the system is likely to go into the state discussed above, where all parsing paths fail and there is little the system can say about what went wrong. In this
NATURAL LANGUAGE QUESTION ANSWERING
73
case, the user can probably find his mistake by checking the sentence he has typed, but sometimes a mistake will be subtle and overlooked. Again, some of the techniques from the HWIM system could be used here. Specifically, HWIM’s dictionary look-up is such that it finds all words that are sufficiently similar to the input acoustics and provides multiple alternatives with differing scores, depending on how well they agree with the input. An identical technique can enumerate possible known words that could have misspellings corresponding to the typed input, with scores depending on the likelihoods of those misspellings. These alternatives would then sit on a shelf to be tried if no parsing using the words as typed were found. 10.8 Intensional Inference
As discussed previously, the LUNAR prototype deals only with extensional inferences, answering questions with quantifiers by explicitly enumerating the members of the range and testing propositions for individual members. LUNAR contains a good set of techniques for such inference, such as the use of general enumeration functions and smart quantifiers. However, although this is a very efficient mode of inference, it is not appropriate for many types of questions. The ability to deal with more complex types of data entities, even such specialized things as descriptions of shape and textural features of the lunar samples, will require the use of intensional inference procedures. For this reason, LUNAR’s MRL was designed to be compatible with both intensional and extensional inference. Intensional inference is necessary for any type of question whose answer requires inference from general facts, rather than mere retrieval or aggregation of low-level observations. In particular, it is necessary in any system that is to accept input of new information in anything other than a rigid stylized format. Although LUNAR contained some rudimentary facilities for adding new lines to its chemical analysis data base and for editing such entries, it contained no facility for understanding, storing, or subsequently using general facts and information. For example, a sentence such as “All samples contain silicon” is interpreted by LUNAR as an assertion to be tested and either affirmed or denied. It is not stored as a fact to be used subsequently. However, there is nothing in LUNAR’s design that prohibits such storage of facts. In particular, a simple PRERULE for declarative sentences with a right-hand side (PRED (STORE (# 0 SRULES))) could generate interpretations that would store facts in an intensional data base (where STORE is assumed to be a function that stores facts in an intensional data base).
74
W. A. WOODS
The function STORE could interface to any mechanical inference system to store its argument as an axiom or rule. For example, with a resolution theorem proving system such as Green’s QA3 (Green, 1969), STORE could transform its argument from its given (extended) predicate calculus form into clause form and enter the resulting clauses into an indexed data base of axioms. TEST could then be extended to try inferring the truth of its argument proposition from such axioms either prior to, or after, attempting to answer the question extensionally. TEST could in fact be made smart enough to decide which mode of inference to try first on the basis of characteristics of the proposition being tested. Moreover, procedures defining individual predicates and functions could also call the inference component directly. For example, the predicate ABOUT that relates documents to topics could call the inference facility to determine whether a document is about a given topic due to one of its stored topics subsuming or being subsumed by the one in question. The incorporation of intensional inference into the LUNAR framework is thus a simple matter of writing a few interfacing functions to add axioms to, and call for inferences from, some mechanical inference facility (assuming one has the necessary inference system). The problems of constructing such an inference facility to efficiently handle the kinds of inferences that would generally be required is not trivial, but that is another problem beyond the scope of this paper. A number of other natural language systems have capabilities for natural language input of facts (e.g., Winograd, 1972), but few have very powerful inference facilities for their subsequent use. Among the shifts in emphasis that would probably be made in a semantic interpretation system to permit extensive intensional inference would be increasing attention to the notational structure of intensional entities to make them more amenable to inspection by various computer programs (as opposed to being perspicuous to a human). The effectiveness of the MRL used in LUNAR derives from its overall way of decomposing meanings into constituent parts, but is not particularly sensitive to notational variations that preserve this decomposition. When such MRL expressions are used as data objects by intensional processors, internal notational changes may be desired to facilitate such things as indexing facts and rules, relating more general facts to more specific ones, and making the inspection of MRL expressions as data objects more efficient for the processes that operate on them. In particular, one might want to represent the MRL expressions in some network form such as that described in Woods (1975b) to make them accessible by associative retrieval. However, whatever notational variations one might want to adopt for
NATURAL LANGUAGE QUESTION ANSWERING
75
increasing the efficiency of intensional processing, it should not be necessary, and is certaintly not desirable, to sacrifice the fundamental understanding of the semantics of the notation and the kinds of structural decompositions of meanings that have been evolved in LUNAR and her sister systems. 11. SyntacticlSemantic Interactions
A very important question, for which LUNAR’S techniques are clearly not the general answer, has to do with the relative roles of syntactic and semantic information in sentence understanding. Since this is an issue of considerable complexity and confusion, I will devote the remainder of this paper to discussing the issues as I currently understand them. The question of how syntax and semantics should interact is one that has been approached in a variety of ways. Even the systems discussed above contain representatives of two extreme approaches. LUNAR exemplifies one extreme: it produces a complete syntactic representation which is only then given to a semantic interpretation component for interpretation. TRIPSYS, on the other hand, combines the entire process of parsing and semantic interpretation in a grammar that produces semantic interpretations directly without any intermediate syntactic representation. Before proceeding further in this discussion, let me first review the role of syntactic information in the process of interpretation: 11.I The Role of Syntactic Structure
The role of a syntactic parsing in the overall process of interpreting the meaning of sentences includes answering such questions as “What is the subject noun phrase?”, “What is the main verb of the clause?”, “What determiner is used in this noun phrase?”, etc.-all of this is necessary input information for the semantic interpretation decisions. Parsing is necessary to answer these questions because, in general, the answers cannot be determined by mere local tests in the input string (such as looking at the following or preceding word). Instead, such answers must be tentatively hypothesized and then checked out by discovering whether the given hypothesis is consistent with some complete analysis of the sentence. (The existence of “garden path” sentences whose initial portion temporarily misleads a reader into a false expectation about the meaning are convincing evidence that such decisions cannot be made locally.) Occasionally, the interpretation of a sentence depends on which of
76
W. A. WOODS
several alternative possible parsings of the sentence the user intends (i.e., the sentence is ambiguous). In this case the parser must perform the case analysis required to separate the alternative possibilities so they can be considered individually. A syntactic parse tree, as used in LUNAR and similar systems, represents a concise total description that answers all questions about the grouping and interrelationships among words for a particular hypothesized parsing of a sentence. As such, it represents an example of what R. Bobrow (Bobrow and Brown, 1975) calls a “contingent knowledge structure,” an intermediate knowledge structure that is synthesized from an input to summarize fundamental information from which a large class of related questions can then be efficiently inferred. In general, there is an advantage to using a separate parsing phase to discover and concisely represent these syntactic relationships, since many different semantic rules may ask essentially the same questions. One would not want to duplicate the processing necessary to answer them repeatedly from scratch. In addition to providing a concise description of the interrelationships among words, the parse trees can serve an additional role by providing levels of grouping that will control the semantic interpretation process, assigning nodes to each of the phrases that behave as modular constituents of the overall semantic interpretation. The semantic interpreter then walks this tree structure, assigning interpretations to the nodes corresponding to phrases that the parser has grouped together. The syntax trees assigned by the grammar thus serve as a control structure for the semantic interpretation. For historical reasons, LUNAR’S grammar constructed syntactic representations as close as possible to those that were advocated at the time by transformational linguists as deep structures for English sentences (Stockwell et a l . , 1968). The complex patterns of semantic rules in LUNAR and the multiple-phase interpretation are partly mechanisms that were designed to provide additional control information that was not present in those tree structures. An alternative approach could have been to modify the syntactic structures to gain the same effect (see below). The approach that was taken provides maximum flexibility for applying a set of semantic interpretation rules to an existing grammar. It also provides a good pedagogical device for describing interpretation rules and strategies, independent of the various syntactic details that stand between the actual surface word strings and the parse structures assigned by the grammar. However, the use of such powerful rules introduces a cost in execution time that would not be required by a system that adapted the grammar more to the requirements of semantic interpretation.
NATURAL LANGUAGE QUESTION ANSWERING
77
11.2 Grammar Induced Phasing of Interpretation
As mentioned above, most of the control of multiple phase interpretation that is done in LUNAR by means of successive calls to the interpreter with different TYPEFLAGS could be handled by having the parser assign a separate node for each of the phases of interpretation. If this were done, the phasing of interpretation would be governed entirely by the structure of the tree. For example, one could have designed a grammar to assign a structure to negated sentences that looks something like S DCL NEG S NPNPRS10046 VP V CONTAIN NP DET NIL N SILICON NU SG instead of S DCL NEG N P NPR S10046 VP VCONTAIN N P DET NIL N SILICON NU SG.
In such a structure, there is a node in the tree structure to receive the interpretation of the constituent unnegated sentence, and thus the separate phasing of the PRERULES and the SRULES used in LUNAR would be determined by the structure of the tree. Similarly, noun phrases could be structured something like
NP DET THE NU SG NOM NOM ADJ N SILICON NOM N CONCENTRATION PP PREP IN NP NPR S10046
W. A. WOODS
78 instead of the structure
NP DET THE ADJ N SILICON N CONCENTRATION NU SG PP PREPIN NP NPR 510046 which is used in the LUNAR grammar. In such a structure, the nested NOM phrases would receive the interpretation of the head noun plus modifiers by picking up modifiers one at a time. It is not immediately obvious, given LUNAR’S separation of syntactic and semantic operations, which of the two ways of introducing the phasing is most efficient. Introducing phasing via syntax requires it to be done without the benefit of some of the information that is available at interpretation time, so that there is the potential of having to generate alternative syntactic representations for the interpreter to later choose between. On the other hand, doing it with the semantic interpretation rules requires extra machinery in the interpreter (but does not seem to introduce much extra run-time computation). One might argue for the first kind of structure in the above examples on syntactic grounds alone. If this is done, then the efficiency issue just discussed is simply one more argument. If it turns out that the preferred structure for linguistic reasons is also the most efficient for interpretation, that would be a nice result. Whether this is true or not, however, is not clear to me at present. 11.3 Semantic Interpretation while Parsing
The previous discussion illustrates some of the disadvantages of the separation of parsing and semantic interpretation phases in the LUNAR system. The discussion of placement of movable modifiers illustrates another. In general, there are a variety of places during parsing where the use of semantic information can provide guidance that is otherwise not available, thus limiting the number of alternative hypothetical parse paths considered by the parser. It has frequently been argued that performing semantic interpretation during parsing is more efficient than performing it later by virtue of this pruning of parse paths. However, the issue is not quite as simple as this argument makes it appear. Against this savings, one must weigh the cost of doing semantic interpretation on partial parse paths that will eventually fail for syntactic reasons. Which of the two approaches is superior in this respect depends on (1) the
NATURAL LANGUAGE QUESTION ANSWERING
79
relative costs of doing semantic versus syntactic tests and (2) which of these two sources of knowledge provides the most constraint. Both of these factors will vary from one system to another, depending on the fluency of their grammars and the scope of their semantics. At one point, a switch was inserted in the UNAR grammar that would call for the immediate interpretation of an newly formed constituent rather than wait for a complete parse tree to be formed. This turned out not to have an efficiency advantage. In fact, sentences took longer to process (i.e., parse and interpret). This was due in part to the fact that LUNAR’S grammar did a good job of selecting the right parse without semantic guidance. In such circumstances, semantic interpretations d o not help to reject incorrect paths. Instead, they merely introduce an extra cost due to interpretations performed on partial parse paths that later fail. Moreover, given LUNAR’S rules, there are constituents for which special interpretations are required by higher constructions (e.g., with TYPEFLAG SET o r TOPIC). Since bottom-up interpretation may not know how a higher construction will want to interpret a given constituent, it must either make as assumption (which may usually be right, but occasionally will have to be changed), or else make all possible interpretations. Either case will require more interpretation than waiting for a complete tree to be formed and then doing only the interpretation required. All of these considerations make semantic interpretation during parsing less desirable unless some positive benefit of early semantic guidance outweighs these costs.
3
11.4 Top-Down versus Bottom-Up Interpretation
In the experiment described above, in which LUNAR was modified to perform bottom-up interpretation during parsing, the dilemma of handling context-dependent interpretations was raised. In those experiments, the default assumption was made to interpret every noun phrase with TYPEFLAG NIL during the bottom-up phase. In cases where a higher construction required some other interpretation, reinterpretation was called for at that point in the usual top-down mode. Since LUNAR maintains a record of previous interpretations that have been done on a node to avoid repeating an interpretation, it was possible to efficiently use interpretations that were made bottom-up when they happened to be the kind required, while performaing new ones if needed. An alternative approach to this problem of bottom-up interpretation in context is to make a default interpretation that preserves enough information so that it can be modified to fit unexpected contexts without actually having to redo the interpretation. This would be similar to the
80
W. A. WOODS
kind of thing that SETGEN (in the right-hand side of the D:SET rule) does to the quantifiers it picks up to turn them into UNIONS. In the HERMES grammar (Ash et af., 1977), R. Bobrow uses this approach, which he calls “coercion” (intuitively, forcing the interpretation of a constituent to be the kind that is expected). In this case, when the higher construction wants the interpretation of a constituent in some mode other than the one that has been already done, it asks whether the existing one can be coerced into the kind that it wants rather than trying to reinterpret the original phrase. Many of these questions of top-down versus bottom-up interpretation, syntax-only parsing before semantic interpretation or vice versa (or both together), do not have clear cut answers. In general, there is a tension between doing work on a given portion of a sentence in a way that is context free (so that the work can be shared by different alternative hypotheses at a higher level) and doing it in the context of a specific hypothesis (so that the most leverage can be gained from that hypothesis to prune the alternatives at the lower level). It is not yet clear whether one of the extremes or some intermediate position is optimal.
11.5 Pragmatic Grammars One thing that should be borne in mind when discussing the role of grammars is that it is not necessary that the grammar characterize exactly those sentences that a grammarian would consider correct. The formal grammar used by a system can characterize sentences as the user would be likely to say them, including sentences that a grammarian might call ungrammatical. For example, LUNAR accepts isolated noun phrases as acceptable utterances, implicitly governed by an operator “give me.” In the classical division of problems of meaning into the areas of syntax, semantics, and pragmatics, the latter term is used to denote those aspects of meaning determined not by general semantic rules, but as aspects of the current situation, one’s knowledge of the speaker, etc. For example, in situations of irony, a speaker says exactly the opposite of what he means. Likewise, certain apparent questions should in fact be interpreted as commands or as other requests (e.g., “Do you have the time?” is usually a “polite” way of asking “What time is it?”). Moreover, certain ungrammatical utterances nevertheless have a meaning that can be inferred from context. In general, the ultimate product of language understanding is the pragmatic interpretation of the utterance in context. This interpretation, while not necessarily requiring a syntactically and semantically correct input sentence, nevertheless depends on an understanding of normal syntax and semantics.
NATURAL LANGUAGE QUESTION ANSWERING
81
In LUNAR, there is no systematic treatment of pragmantic issues, although in some cases, pragmatic considerations as well as semantic ones were used in formulating its interpretation rules. For example, the rule that interprets the head “analysis,” when it finds no specification of the elements to be measured, makes a default assumption that the major elements are intended. This is due to the pragmatic fact that (according to our geologist informant) this is what a geologist would want to see if he made such a request, not because that is what the request actually means. In this way, LUNAR can handle a small number of anticipated pragmantic situations directly in its rules. In TRIPSYS, a small step toward including pragmatics in the grammar was taken. The TRIPSYS grammar takes into account not only semantic information such as class membership and selectional restrictions of words, but also pragmatic information. This includes factual world knowledge such as what cities are in which states, actual first and last names of people, and discourse history information, such as whether appropriate referents exist for anaphoric expressions. The TRIPSYS system is only beginning to explore these issues, and has not begun to develop a general system for pragmatic interpretation. Much more work remains to be done in this area, and interest in it seems to be building as our mastery of the more basic syntactic and semantic issues matures. The “pragmatic” grammar of TRIPSYS is only one exploration of a philosophy of combined syntactic and semantic grammars that has arisen independently in several places. Other similar uses of ATN or ATN-like grammars combining syntactic and semantic (and possibly pragmatic) information are the “Semantic Grammars” of Burton (1976), the “Performance Grammars” of Robinson (1975), the SHRDLU system of Winograd (1972), and the HERMES grammar of R. Bobrow (Ash et d., 1977). 11.6 Semantic Interpretation in the Grammar
In separating parsing and semantic interpretation into two separate processes (whether performed concurrently or in separate phases), LUNAR gains several advantages and also several disadvantages. On the positive side, one obtains a syntactic characterization of a sizable subset of English that is independent of a specific topic domain and hence transferable to other applications. All of the domain-specific information is contained in the dictionaries and the semantic interpretation rules. On the other hand, there is a conceptual expense in determining what syntactic structure to use for many of the less standard constructions. One would like such structures to be somehow motivated by linguistic prin-
W. A. WOODS
ciples and yet, at the same time, have them facilitate subsequent interpretation. In many cases, the desired interpretation is more clear to the grammar designer than is a suitable syntactic representation. In a number of situations, such as those discussed previously for handling wh-questions with conjunction reduction and for handling averages, I have found it desirable to change what had initially seemed a suitable syntactic representation in order to facilitate subsequent semantic interpretation. If semantic interpretations were to be produced directly by the grammar instead of using an intermediate syntactic representation, then such problems would be avoided. The integration of semantic interpretation rules into the grammar could be done in a number of ways, one of which would be to develop a rule compiler that would use the templates of rules such as LUNAR’S to determine where in the grammar to insert the rule. Another would be to write the interpretation rules into the grammar in the first place. This latter is the approach that is taken in the TRIPSYS system. It seems clearly an appropriate thing to do for such rules as the PRERULES for sentences and the DRULES for noun phrases, where the principal information used is largely syntactic. For the equivalent of SRULES, NRULES, and RRULES, writing specific rules into the grammar would make the grammar itself more topic-specific than one might like. However, writing generalized rules that apply to large classes of words, using information from their dictionary entries for word-specific information such as case frames, selectional restrictions, permitted prepositions, and corresponding MRL translations, should produce a grammar that is relatively topic-independent. This is the approach taken by Robinson (1975) and by R . Bobrow (Ash et ul., 1977). Integrating semantic interpretation with a grammar is not an obvious overall improvement, since by doing so one gives up features as well as gaining them. For example, as discussed earlier the “advantage” of using semantic interpretation to prune parse paths is not always realized. However, there are some other efficiencies of the combined syntactichemantic grammars that have nothing to do with pruning. One of these is the avoidance of pattern-matching. One of the costs of the separate semantic interpretation phase used in LUNAR is the cost of pattern-matching the rules. Much of this effort is redundant since the various pieces of information that are accessed by the rules were mostly available in registers during the parsing process. From here they were packaged up by actions in the grammar into the parse tree structures that are passed on to the interpreter. The patternmatching in the interpreter recovers these bindings so that the right-hand side of the rule can use them. If the right-hand side schema of the rule
NATURAL LANGUAGE QUESTION ANSWERING
83
could be executed while these bindings were still available during the parsing process, considerable computation could be avoided. Moreover, much of the syntactic information that is checked in the rules is implicitly available in the states of the grammar by virtue of the fact that the parser has reached that state (and more of that information could be put into the states if desired). Thus, in many cases, much of the testing that goes on in the pattern-matching of rules would be avoided if the right-hand side of the rule, paired with whatever semantic tests are required, were inserted as an action at the appropriate points in the grammar. For example, at certain points in the parsing, the grammar would know that it had enough information to construct the basic quantifier implied by the determiner and number of a noun phrase. At a later point, it would know all of the various modifiers that are being applied to the head noun. As the necessary pieces arrive, the interpretation can be constructed incrementally. The effectiveness of this kind of combined parserlinterpreter depends partly on the discovery that the kinds of associations of REFS to constituent nodes that are made by LUNAR’S rules are usually references to direct constituents of the node being interpreted. Thus, they correspond closely to the constituents that are being held in the registers by the ATN grammar during its parsing. The original semantic rule format was designed to compensate for rather large potential mismatches between the structure that a grammar assigns and the structure that the interpreter would like to have (since it was intended to be a general facility applicable to any reasonable grammar). When a grammar is specifically designed to support the kinds of structures required by the interpreter, this very general “impedance matching” capability of the rules is not required. Thus, when fully integrated with the parsing process i n an ATN grammar, the process of semantic interpretation requires fewer computation steps than when it is done later in a separate phase. This clearly has a bearing on the previous discussion of the relative costs of syntactic and semantic processing. Other advantages of this kind of integrated parsing and interpretation process is that the single nondeterminism mechanism already present in the parser can be used to handle alternative interpretations of a given syntactic structure, without requiring a separate facility for finding and handling multiple rule matches. This not only eliminates extra machinery from the system, but appears to be more efficient. It also permits a more flexible interaction between the ranking of alternative syntactic choices and the ranking of alternative choices in semantic interpretation. A disadvantage of this integrated approach is that the combined syn-
84
W. A. WOODS
tactichemantic grammar is much more domain-specific and less transportable unless clear principles for separating domain-specific from general knowledge are followed. Moreover, the fact that a given semantic constituent can be found in different places by different arcs in the grammar seems to require separate consideration of the same semantic operations at different places in the grammar. 11.7 Generating Quantifiers while Parsing
The generation of separate SEM's and QUANT's when performing interpretation while parsing appears to complicate the integration of the semantic interpretation into the grammar, but in fact is not difficult. One can stipulate that any constituent parsed will return a structure that contains both a SEM and a QUANT as currently assigned by the INTERP function in LUNAR. The parsing at the next higher level in the grammar will then accumulate the separate QUANTs from each of the constituents that it consumes, give them to a SORTQUANT function to determine the order of nesting, and construct the interpretation of the phrase being parsed out of the SEM's of the constituent phrases. All of the quantifier passing operations described previously can be carried out during the parsing with little difficulty. One advantage of this procedure is that the job of SORTQUANT is simplified by the fact that the quantifiers will be given to it in surface structure order rather than in some order determined by the deep structure assigned by the grammar. LUNAR'S SORTQUANT function has to essentially reconstruct surface word order.
12. Conclusions
The LUNAR prototype marks a significant step in the direction of fluent natural language understanding. Within the range of its data base, the system permits a scientist to ask questions and request computations in his own natural English in much the same form as they arise to him (or at least in much the same form that he would use to communicate them to another human being). However, although the LUNAR prototype exhibits many desired qualities, it is still far from fully achieving its goal. The knowledge that the current system contains about the use of English and the corresponding meanings of words and phrases is very limited outside the range of those English constructions that pertain to the system's data base of chemical analysis data. This data base has a very simple structure; indeed it was chosen as an initial data base because
NATURAL LANGUAGE QUESTION ANSWERING
85
its structure was simple and straightforward. For less restricted applications, such systems will require much greater sophistication in both the linguistic processing and the underlying semantic representations and inference mechanisms. In this paper, I have presented some of the solutions that were developed in LUNAR (and several related systems) for handling a variety of problems in semantic interpretation, especially in the interpretation of quantifiers. These include a meaning representation language (MRL) that facilitates the uniform interpretation of a wide variety of linguistic constructions, the formalization of meanings in terms of procedures that define truth conditions and carry out actions, efficient techniques for performing extensional inference, techniques for organizing and applying semantic rules to construct meaning representations, and techniques for generating higher quantifiers during interpretation. These latter include methods for determining the appropriate relative scopes of quantifiers and their interactions with negation, and for handling their interactions with operators such as “average.” Other techniques are described for post-interpretive query optimization and for displaying quantifier dependencies in output. I have also discussed a number of future directions for research in natural language understanding, including some questions of the proper relationship between syntax and semantics, the partial understanding of “ungrammatical” sentences, and the role of pragmatics. In the first area especially, I have discussed a number of advantages and disadvantages of performing semantic interpretation during the parsing process, and some aspects of the problem of separating domain specific from general knowledge. As discussed in several places in the paper, there are a variety of loose ends and open problems still to be solved in the areas of parsing and semantic interpretation. However, even in the four systems discussed here, it is apparent that as the system becomes more ambitious and extensive in its scope of knowledge, the need for pragmatic considerations in selecting interpretations becomes increasingly important. I believe that, as a result of increasing understanding of the syntactic and semantic issues derived from explorations such as the LUNAR system, the field of computational linguistics is now reaching a sufficient degree of sophistication to make progress in a more general treatment of pragmatic issues. In doing so, it will become much more concerned with general issues of plausible inference and natural deduction, moving the field of language understanding in the direction of some of the other traditional areas of artificial intelligence research, such as mechanical inference and problem solving.
86
W. A. WOODS ACKNOWLEDGMENTS
Work described in this paper has been supported in part by the following contracts and grants: National Science Foundation Grant GS-2301: NASA Contract No. NAS9-I I IS; AKPA Contracts N00014-75-C-0533, N00014-77-C-0378; and ONR Contract N00014-77-C037 I . REFERENCES Ash, W., Bobrow, R., Grignetti, M., and Hartley, A. (1977). “Intelligent On-Line Assistant and l u t o r System,” Final Tech. Rep.. Rep. No. 3607. Bolt Beranek and Newman, Cambridge, Massachusetts. Bobrow, D. G.,Murphy. D. P., and Teitelman, W. (1968). “The BBN-LISP System,” BBN Rep. No. 1677. Bolt Beranek and Newman. Cambridge. Massachusetts. Bobrow, R. J.. and Brown, J . S. (1975). Systematic understanding: Synthesis, analysis, and contingent knowledge in specialized understanding systems. In “Representation and Understanding: Studies in Cognitive Science” (D. Bobrow and A . Collins, eds.), pp. 103-129. Academic Press, New York. Bohnert, H. G., and Backer, P. 0. (1967). “Automatic English-to-Logic ’Translation in a Simplified Model. A Study in the Logic of Grammar,” IBM Res. pilp. RC-1744. 1BM Research, Yorktown Heights, New York. Brown, J. S.. and Burton, R. K. (1975). Multiple representations of knowledge of tutorial reasoning. In “Representation and Understanding: Studies in Cognitive Science” (D. I3ohrow and A. Collins, eds.), pp. 31 1-34’), Academic Press, New York. Burton, R . (1976). “Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems.’’ Rep. No. 3453. Bolt Beranek and Newman. Cambridge. Massachusetts. Carnap, K. (1964a). Foundations of logic and mathematics. In “The Structure of Language: Readings in the Philosophy of Language” (J. Fodor and J. Katz, eds.), pp. 419-436. Prentice-Hall, Englewood Cliffs, New Jersey. Carnap, R. (1964b). “Meaning and Necessity.” Univ. of Chicago Press, Chicago, Illinois. Chomsky, N. (1965). “Aspects of the Theory of Syntax.” MIT Press, Cambridge, Massachuset t s. Green, S. (1969). “The Application of Theorem Proving to Question-Answering Systems,” Tech. Rep. CS 138. Stanford University Artificial Intelligence Project, Stanford, California. Nash-Wehher, B. 1.. (1976). Semantic interpretation revisited. Presented at 1976 I n t . Cotif. C’omput. Lingrtist. (COLING-76),Oftawn. (Available as BBN Rep. No. 3335. Bolt Beranek and Newman, Cambridge, Massachusetts.) Nash-Webber, B. L. (1977). Inference in an approach to discourse anaphora. Proc8th Atin. Meet. North Eastern Linguist. Sor. (NELS-8)( K . Ross, ed.), pp. 123-140. University of Massachusetts, Amherst. (also as Tech. Rep. No. 77. Center for the Study of Reading, University of Illinois, Urbana.) Nash-Webber, B. L., and Reiter, R. (1977). Anaphora and logical form: On formal meaning representations for English. Pruc. I n / . J. Conf. A d f . Intcll., 5th. MIT, Cambridge, Mass. pp. 121-131. (Also as Tech. Rep. No. 36. Center for the Study of Reading, University of Illinois, Urbana and Bolt Beranek and Newman, Cambridge, Massachusetts.) OAG (1966). “Official Airline Guide,” Quick Reference North American Edition. Standard reference of the Air Traffic Conference of America. Reiter, R. (1977). “An Approach to Deductive Question-Answering,” Rep. No. 3649. Bolt Beranek and Newman, Cambridge, Massachusetts.
NATURAL LANGUAGE QUESTION ANSWERING
87
Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. J . ACM 12, 23-41. Robinson, J. J. (1975). Performance grammars. In “Speech Recognition: Invited Papers Presented at the 1974 IEEE Symposium” (D. R. Reddy, ed.), pp. 401-427. Academic Press, New York. Simmons, R. F. (1965). Answering English questions by computer: A survey. Commun. ACM 8(1), 53-70. Stockwell, R. P., Schacter, P., and Partee, B. H. (1968). “Integration of Transformational Theories on English Syntax,” Rep. ESD-TR-68-419. Electronic Systems Division, L. G. Hanscom Field, Bedford, Massachusetts. Webber, B. L. (1978). A formal approach to discourse anaphora. Ph.D. Thesis, Harvard University, Cambridge, Massachusetts. Winograd, T. (1972). “Understanding Natural Language.” Academic Press, New York. Woods, W. A. (1967). “Semantics for a Question-Answering System,’’ Rep. NSF-19. Harvard University Computation Laboratory, Cambridge, Massachusetts. (Available from NTIS as PB-176-548.) Woods, W. A. (1968). Procedural semantics for a question-answering machine. AFIPS Natl. Compui. Conf. E x p o . . Conf. Proc. 33, 457-471. Woods, W. A. (1969). “Augmented Transition Networks for Natural Language Analysis,” Rep. No. CS- 1. Aiken Computation Laboratory, Harvard University, Cambridge, Massachusetts. (Available from NTIS as Microfiche PB-203-527.) Woods, W. A. (1970). Transition network grammars for natural language analysis. Commun. ACM 13, 591-602. Woods, W. A. (1973a). An experimental parsing system for transition network grammars. In “Natural Language Processing” (R. Rustin, ed.), pp. 111- 154. Algorithmics Press, New York. Woods, W. A. (1973b). Progress in natural language understanding: An application to LUNAR geology. AFIPS Natl. Comput. Conj: E x p o . . Con/.. Proc. 42, 441-450. Woods, W. A. (1973~).Meaning and machines. In “Computational and Mathematical Linguistics” (A. Zampolli, ed.), pp. 769-792. Leo S. Olschki, Florence. Woods, W. A. (1975a). Syntax, semantics, and speech. In “Speech Recognition: Invited Papers Presented at the 1974 JEEE Symposium” (D. R. Reddy, ed.), pp. 345-400. Academic Press, New York. Woods, W. A. (1975b). What’s in a link: Foundations for semantic networks. In “Representation and Understanding: Studies in Cognitive Science” (D. Bobrow and A . Collins, eds.), pp. 35-82. Academic Press, New York. Woods, W. A., Kaplan, R. M.,and Nash-Webber, B. (1972). “The Lunar Sciences Natural Language Information System: Final Report,” BBN Rep. No. 2378. Bolt Beranek and Newman, Cambridge, Massachusetts. Woods, W. A. , Bates, M., Brown, G., Bruce, B., Cook, C., Klovstad, J., Makhoul, J., Nash-Webber, B., Schwartz, R., Wolf, J., and Zue, V. (1976). “Speech Understanding Systems-Final Report, 30 October 1974 to 29 October 1976,” BBN Rep. No. 3438, Vols. I-V. Bolt Beranek and Newman, Cambridge, Massachusetts.
This Page Intentionally Left Blank
ADVANCES IN COMPUTERS. VOL . 17
NaturaI Language Inf o rmation Formatting: The Automatic Conversion of Texts to a Structured Data Base NAOMI SAGER Linguistic String Project New York University New York. New York I.
2.
3.
4.
Introduction . . . . . . . . . . . . . . . . . . . . 1 . 1 Language Processing . . . . . . . . . . . . . . . . 1.2 Text Analysis . . . . . . . . . . . . . . . . . . 1.3 Natural Language Data Bases . . . . . . . . . . . . . I .4 Automatic Information Formatting . . . . . . . . . . . Principles and Methods of Analysis . . . . . . . . . . . . . 2.1 The Form-Content Relation in Language . . . . . . . . . 2.2 Sublanguage Grammars . . . . . . . . . . . . . . . 2.3 Information Structures in Science Subfields . . . . . . . . . 2.4 Automatic Generation of Subfield Word Classes . . . . . . . 2.5 The Sublanguage Method Summarized . . . . . . . . . . Computer Programs for Information Formatting . . . . . . . . . 3 . I Linguistic Framework . . . . . . . . . . . . . . . 3.2 Representation of the Grammar . . . . . . . . . . . . 3.3 Parsing Program . . . . . . . . . . . . . . . . . 3.4 Restrictions and Routines . . . . . . . . . . . . . . 3.5 English Transformations . . . . . . . . . . . . . . 3.6 Formatting Transformations . . . . . . . . . . . . . 3.7 Format Normalization . . . . . . . . . . . . . . . 3.8 Performance . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . 4.1 Fact Retrieval . . . . . . . . . . . . . . . . . 4.2 Quality Assessment . . . . . . . . . . . . . . . . 4.3 Data Summarization . . . . . . . . . . . . . . . . 4.4 DataAlert . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
.
89 89 91 91 92 96 96 97 99 106 112 115
116 119 120 123 131
134 145 149 151 152 154 157 158 159
.
1 Introduction
1.1 Language Processing The field of computerized language processing encompasses a wide range of goals and methodologies. ranging from such theoretical objec89 Copyright" 1978 by Academic Press. Inc . All rights of reproduction in any form reserved .
ISBN 0-12-012 117-4
NAOMI SAGER tives as the modeling of human linguistic behavior (e.g., Schank, 1975; Lindsay and Norman, 1977) and human language acquisition (Reeker, 1976), to such applicational goals as machine translation (Josselson, 1971 ; Lehmann, 1978), natural language systems for man-machine communication (Waltz, 1977; Winograd, 1972; Thompson and Thompson, 1975) and speech recognition (Hill, 1971; Otten, 1971; Walker, 1975). Recent reviews of the entire field are to be found in Walker (1973) and Damerau (1976). What unites these varied endeavors is the need to come to grips with the special features of natural language as a communication system. Language is the major medium of both storing and transmitting information. Thus, whether the goal is to transfer that information from one language to another, or to access the stored information in response to user queries, or to model linguistic processes for their own sake, some theory as to how information is carried by language and how common meanings are extracted from different linguistic forms is required. In addition, it is by now clear that the procedural formulation of linguistic processes, as opposed to the purely descriptive, sets up its own requirements. “Decoding” the linguistic communication (recognition rather than generation) involves questions of strategy: How shall the requisite facts about language be organized for computation? How much of this information should be stored as properties of individual words and how much relegated to grammatical and semantic procedures? How should the burden of processing be divided between syntactic and semantic components? What should the processing algorithm be? And assuming that grammatical facts are well established (but computational linguists have had to reformulate them for use in procedures), the remaining question, how shall the relevant semantic categories be established’! These questions and others have helped to define the specialized field of natural language processing and have also led to divergent methods within the field. The present article approaches these problems from the viewpoint of information science, with applications envisioned primarily in science information retrieval and data base management. It attempts to provide a general solution to the following problem: Given a collection of documents on a particular subject written in English or another natural language, how can computer programs arrange the information contained in the documents so that it can be accessed from different points of view for a variety of informational tasks. This assumes that there are underlying common features in the texts that can be made explicit by formal procedures. For the field of information management the solution to this problem would extend the data bases on which data processing functions
NATURAL LANGUAGE INFORMATION FORMATTING
91
which have already been automated can operate. Presently, these operations require that the data be supplied in structured form, that is, in the form of a table or the equivalent. While natural language “front ends” for such systems have been developed, mainly in the form of questionanswering programs (Simmons, 1970; Woods et a l . , 1972; Petrick, 1975; Plath, 1975; Waltz, 1977), this has left open the question as to whether the data base itself, if recorded in natural language, can also be processed by computer programs. 1.2 Text Analysis
The problem, how to structure free running text so that its content is made accessible for processing, has been under study since the early days of computer science. Some of the first large undertakings in automatic language processing (Harris, 1959; Kuno and Oettinger, 1963; Zwicky e l al., 1965; Keyser and Petrick, 1967) had immediate or future goals of this type. However, numerous obstacles prevented the early achievement of such programs. Among these were the lack of an appropriate formalism for natural language computations, the lack of suitable grammars and dictionaries, the lack of mechanisms to treat syntactic ambiguity and to recover implicit elements from the discourse, and the unresolved problem of semantic representation. Fortunately, the decade or more of research on these problems has yielded solutions which, if not complete, nevertheless demonstrate that a text-structuring capability is being created and has already reached the stage of application to certain advanced information processing tasks. 1.3 Natural Language Data Bases
A stimulus to research on text-structuring programs has been the advent of machine readable natural language data bases. With changes in the technology of publishing, large quantities of machine readable text are being created daily. In current practice the machine readable form of the texts is often also destroyed. However, it is not difficult to imagine that agreements will be formulated whereby some portions of this resource will become available in the future as full-text data bases (Lancaster, 1977). More immediately, the wide use of computers in file management and the development of magnetic and optical devices for capturing written material in machine readable form has raised the question as to whether computer techniques can be developed for accessing and processing the information in large natural language files, such as are now found in quantity in medicine, government, business, and scientific research. If
NAOMI SAGER
routine clerical operations could be performed automatically on the original natural language form of the information, it could become worthwhile for institutions to have their document files stored in machine readable form. In some cases, natural language files have already been put into machine readable form, not for purposes of processing the contents by computer, but for convenience of access and storage. The existence of the files in computerized form then raises the question as to whether further data processing operations can be carried out on them without having to transfer the information manually into preset formats. An example of this process is the case of a hospital which computerizes its patient files for quick back-up to the written charts and for transactional purposes; then, finding itself with this large natural language data base, seeks computer techniques for processing the contents of the documents to obtain the summaries and other information required for health care evaluation and for clinical research. We are bound to see a pressure of this sort arising wherever natural language files are computerized, since once information is available in computer readable form, users inevitably want the computer to process it. 1.4 Automatic Information Formatting
This article will survey the methods and computer programs by means of which the information in natural language documents in a given subject area can be converted into a structured data base so that subsequent computer programs can retrieve specific facts and summarize the different types of information present in the original documents. The technique is based upon the prior development of a natural language parsing system equipped with a comprehensive grammar of English. To this have been added procedures which reduce paraphrastically the variety of syntactic forms found in sentences and align parts of different sentences which have similar grammatical and informational standing in their respective sentences. When these procedures are applied to documents in the same subject area, the result is a table-like structure, called an information format, which contains the same information as the original documents, but arranged in a structured, rather than narrative, form. To begin with an example, suppose the given document collection is a set of medical records of the type shown in Fig. 1, which is a typical hospital discharge summary. We would like the computer program to arrange the information in this document, and many others like it, so that specific information can be automatically retrieved. A system for formatting textual information has been developed by the New York Uni-
FuU ,* MU,IIH rrql RtLLhSEU. ME WAS A b Y I N ASTHPTOHATIb U N r I L 00100165 WlE I r n l l L N l SWFLRE; FNuW PRIAPISM uH1i.H R E W I A E D A SAPHENOUS LAVEHNOGUS dlPU6S. T H l r wxb PrRFOHMEn 17 u F I L t V U F H O W I T I L . CdsP,ne
Iil41 L t t 4 I*
LEC
FALL1
I
L.,IEI~ LOVES OF TtiL LUNU. CHLbT I-RAY SHUNFD LOBUR I N F I L T K A T E S LU.,R L W L S AN RIGHT LDU,R LOME. RECAUSt OF THE PATIENT'S 1) AN0 Hlen R t T I C o L O c r T F COUNT. n....TOcnlI. E n f N T u * L L r HE~CUII.C
6z G)
c D
D rn
+**+
FIG. 1.
A sample text.
NAOMI SAGER
94
PATIENT STATF
SIGNI SYMPT
__ ~
PAIN
-
PAINS WAS HOS-
RELEASED THE WORD “WITH” I S TRANSFORMATIONALLY RELATED TO THE V t R B “HAVE”: W I T H SYMPTOM.
P A T I E N T HAS SYMPTOM
+
PATIENT
FIG.2. An information format.
versity Linguistic String Project. A simplified version of the information format obtained for hospital discharge summaries is shown in Fig. 2, with several sentences from the HISTORY paragraph of the document in Fig. I shown as they are mapped by the programs into the medical information format. As can be seen in Fig. 2 , the column headings of an information format are specific to the subject matter. But it should be noted from the outset that these categories are not obtained by an a priori semantic analysis of the documents or from knowledge of the subject matter. Rather, they are determined by an analysis of word distribution patterns in sample texts from the field, as described in Section 2 of this article. Likewise, the computer programs for transferring the words of text sentences into the information formats are based on grammatical and word-usage regularities characteristic of the textual material. If it were not for the utilization of the inherent regularities in the language material, it would be impossible for a computer program to recognize and arrange the information contained in freely written English sentences. Some general properties of information formats can be seen in Fig. 2 . (Details of the particular formats obtained for medical records are given in Sections 3.6 and 3.7.)
NATURAL LANGUAGE INFORMATION FORMATTING
95
( I ) All of the words in the original sentences are placed somewhere in the format. While content words play a crucial role, function words like prepositions and conjunctions also carry information and are formatted. Some verbs, like be, are simply carriers of the subject-predicate relation and could have been dropped as far as the content is concerned, but the placing of all words in the format serves as a check that no information has been lost in the formatting process. ( 2 ) Some linguistic transformations that do not change the information are performed during the processing so as to regularize the English syntactic structures. An example is the expansion of the conjunctional construction in lines 7 and 8: He was hospitalized f o r a month and released becomes He was hospitalized.for a month and (he was) released, where the words in parenthesis are supplied by an EXPAND CONJUNCTION transformation that operates on the output of a syntactic analysis of the sentence. (The stages of processing are described in Section 3.) As a result of features ( I ) and (2), the original sentences or pharaphrases of them can be reconstructed from the format entries. Thus, while the form of the sentence is changed, no information is lost, and because only English transformations that are meaning-preserving and valid for the whole language are applied, no information that was not present in the original sentence is added. (3) Words carrying the same kind of information within the given type of documents are aligned; they are placed in the same column and an appropriate column heading is supplied. (The procedures for determining which words carry the same type of information in the given sublanguage are described in Section 2.4.) This feature is central to the use of information formats in practical text-processing applications. It means that when particular kinds of information are being asked for, e.g., in the discharge summaries the presence of particular symptoms, the information can be found by scanning a particular column of the format for these terms instead of scanning the entire body of texts. Thus, pain would be located by scanning the SIGN/SYMPTOM column. In addition, because the context of each occurrence of the terms has also been analyzed, a precise answer can be returned. For example, if evidence of pain was being queried (present in lines 2 and 6 of Fig. 2), the location of the pain and the time at which the pain occurred could also be retrieved by reference to the columns BODY PART and TIME in the same format line as the SIGN/SYMPTOM entry. The ability to draw upon the syntactic context is especially important when the finding is qualified or
NAOMI SAGER
negated (mild p a i n , puin was not present). Such information is difficult to supply reliably without an analysis of the sentence. 2. Principles and Methods of Analysis
2.1 The Form-Content Relation in Language
The form-content relation in language-how much there is of it and how it can be established-has occupied the attention of linguists since the beginning of modern liguistics. Language description made a great step forward when language forms were considered to be objects of study in their own right and not only as the derivative of meanings, assumed to be known. [See, e.g., Sapir (1925) and Bloomfield (1926), both reprinted and discussed in Joos (1957).] Pioneers in this development were often accused of ignoring the importance of meaning, though their purpose was rather to arrive at an explanation of how language, viewed as a system, carries out its function of conveying meaning. In any case, the success of this approach in describing the phonemic, morphologic, and syntactic structures of many languages, including those for which Latinbased grammars were inapplicable, showed that this methodology was fruitful, even though many questions concerning meaning remained unanswered. Within this approach, the method of immediate constituent analysis, first introduced under that name by Leonard Bloomfield (Bloomfield, 1933) and systematized by Harris (Harris, 1951), provided the basis for a formalization of the generative description of sentence structure (Chomsky, 1957) and for a procedural approach to sentence analysis. In immediate constituent analysis, the structure of a sentence is described as a sequence of certain kinds of segments (e.g., noun phrase plus verb phrase) each of which is characterized as being composed in turn of certain segment sequences, down to the words or morphemes of the language. While it was seen that this type of grammar is readily expressed as a set of context free productions, not all grammatical constraints that contribute to sentence well-formedness fit easily into the system. Nor does the analysis carry one very far toward a structure that correlates with meaning, even though a certain amount of information important to the meaning of the sentence is conveyed by its constituent structure. But the importance of immediate constituent analysis was only partly what it showed about sentence structure. Mainly, it demonstrated that grammar could be formulated as a system for analyzing (work of Bloomfield, Harris) or generating (Chomsky’s work) sentences, in contrast to previous grammars. which were at best detailed descriptions of grammatical
NATURAL LANGUAGE INFORMATION FORMATTING
97
phenomena (e.g., as in Jespersen, 1914-1929), or more often simply episodic collections of facts and rules. The discovery of linguistic transformations in the early 1950’s was a major advance toward correlating language structure with meaning. Sentences of different forms which were known to contain the same information (e.g., active and passive sentences) were now described as stemming grammatically from the same source sentence. The process of sentence construction could now be viewed as the successive transformation of underlying “kernel” sentences, the elementary assertions contained in the final sentence. Each transformation either combined components, rearranged the sentence paraphrastically, or added a fixed increment of meaning, the same increment for all sentences on which the transformation operated (e.g., seem in They seem to like it, They seem dissatisfied, etc.) Not only was it now possible by transformational analysis to show a common source element in every sentence which contained the effect of that source element in its meaning, but transformational analysis clarified many linguistic phenomena. For example, it was shown that many ambiguities come about because of degeneracies in the transformational history of sentences, i.e., two different transformational histories resulting in the same final form. It was immediately clear that transformational analysis had a great potential for language information processing. If sentences could be reduced to a canonical form of primitive assertions plus a certain few operators, where each operator could be associated with an increment of meaning o r a paraphrastic change in form, then clearly such operations as recognizing the presence of a particular informational component in a sentence or comparing sentences for overlap in content could be carried out in a systematic, perhaps even computable, way. It also appeared that a formal theory of language structure, showing how language carries its semantic burden, was in the offing.2 2.2 Sublanguage Grammars
Regularities in language exist on two levels, one common to the language as a whole and one specific to the subject matter. The regularities The term “kernel sentences” arose from Harris’ initial algebraic formulation of the transformational relation: they were the sentences in the kernel of the defined natural mapping [cf. Harris (1957, Sect. 5.4) and Harris (1968, Sect. 4.3.2)]. With regard to developments in linguistic theory since the discovery of transformations, major works in Chomsky’s theory, which combines some of Harris’ transformations with other features in a formal generative system, are Chomsky (1957, 1964, 1972, 1975). For the extension of Harris’ transformational analysis toward a theory of information in language, see Harris (1976) and papers in Harris (1970).
NAOMI SAGER
holding for the whole language are summarized in its grammar (phonology, morphology, and syntax) while the features pertaining to the specific content of discourses are often referred to as the semantic component of the description. Linguists for the most part have been concerned only with the former, and as was sketched above, progress relevant to computational goals has been made on this level. Computer scientists dealing with language, on the other hand, have for the most part dealt with the language material around specific structured data bases, or with instructions or conversations of a restricted type. Here it is sometimes possible to come to grips with the semantic categories and relations of the data as they appear in the language material without the use of exhaustive syntactic analysis (Charniak and Wilks, 1976) and without a general method for arriving at the relevant semantic categories. However, when the subject matter and the language material is more complex, as in the case of science writing, it is difficult, and would be costly, to proceed without a more general methodology suited to the purpose. Within both linguistics and computer science some attempts are currently being made to formulate semantic theories and systems valid for the whole language, for example, to state a set of semantic primitives and rules of combination from which the meanings of sentences would be generated. For science information, however, it is hardly possible to conceive of a system of this type that would generate the meanings of science sentences. A practical consideration is that specialists on the level needed for such specification (assuming the task doable) are not available for this work. Where the goal is to process the information in text sentences there is the additional constraint that a computer program must be able to map text sentences into the defined structures. And in view of the effort required to program a natural language processor, it is also important that there be methods to adapt the system for use on different subject matters. 'These considerations led to the search for general methods of determining the appropriate semantic structures for text processing in science subfields. It was clear that communications within one universe of discourse differed linguistically from those in another. This is especially marked in scientific fields, where the vocabulary of content-words differs sharply from field to field and where whole sentences which are mainly composed of common English words are intelligible only to persons engaged in that particular discipline. This specialized use of language in science subfields in discussed by Bross (Bross et al., 1972) using the notion of a science sublanguage, first introduced in conjunction with the definition of sublanguage grammars (Harris, 1968, Sect. 5.9). In the case of a whole language we know that there are grammatical
NATURAL LANGUAGE INFORMATION FORMATTING
99
rules because some word sequences are accepted by native speakers of the language as grammatical while other sequences are rejected as ungrammatical. A similar situation exists in a community of individuals engaged in a specialized field of science. Certain statements will be accepted as possible within the discipline while others will be rejected as impossible or outlandish. What is involved is not truth versus falsity or even accepted versus unconventional formulations, but rather whether the statements would be nonsensical or run counter to fundamental knowledge in the discipline. Thus, for example, the statement the ion crosses the membrane would be acceptable to a cell biologist-it may or may not be true in a given case-whereas the membrane crosses the ion would be rejected as unsayable in the science. This linguistic behavior on the part of the scientist indicates that rules analogous to the rules of grammar for the whole language are operating in the language of a particular scientific discipline. These rules are called a sublanguage grammar. The idea of a sublanguage grammar suggests a possible method for determining semantic structures in a science subfield. If a grammar summarizes the restrictions on occurrence that characterize a language, then a sublanguage grammar should capture the restrictions on occurrence that distinguish one area of scientific discourse from another. One would expect sublanguage grammatical categories and rules to have semantic standing, since what is captured in the sublanguage grammar over and above ordinary grammatical regularities is precisely what is special to the subject matter. This has proved to be the case. If one applies descriptive linguistic methods similar to those used in developing a grammar for a whole language to a corpus of texts in a science subfield, one obtains detailed patterns of word co-occurrence from which characteristic word subclasses and word-subclass sequences can be stated (i.e., a grammar). These word categories and syntactic formulas of the sublanguage grammar correlate closely with the classes of real-world objects and relations that are of special interest in the subfield. They thus provide a set of semantic structures for representing subfield information. And because the structures are based on syntactic regularities in the textual material (of both the general and sublanguage types) it is then possible to define procedures that locate occurrences of the structures in the sentences of subfield texts. 2.3 Information Structures in Science Subfields
The sublanguage-grammar hypothesis was first tested by applying the methods of distributional linguistics to a corpus of texts in a subfield of
100
NAOMI SAGER
pharmacology, the mechanisms of action of digitalis. In this study, nineteen texts were used, about 200 journal pages, divided into two sets, one to establish the grammar and the other to test it. Syntactic analysis was applied (manually) to the sentences of the first set, including the application of a small number of well-established paraphrastic transformations to regularize the sentence representation. Word classes were established on the basis of similarity of co-occurrence, for example, grouping together nouns which occurred as the subject of the same verb, and similarly verbs which occurred with a particular selection of nouns as their subject. It was found that there were distinct classes of this type in the texts, and it was possible, as conjectured, to write a specialized grammar summarizing the co-occurrence patterns of the classes (Sager, 1972a, 1975). The word classes in the digitalis sublanguage grammar correlated with recognizable semantic classes in the textual material, as verified by a consulting cardiologist. The analysis produced noun classes corresponding to: cardiac glycosides, cations, contractile proteins, enzymes (ATPase), heart muscle, cell, cell substructure, etc., and verb classes corresponding to specific relations among the noun classes (e.g., move verbs connecting cations to cell), quantitative relations (e.g., increase, decreme). causal relations (e.g., affr.ct,produce), as well as relations of the human investigator to all of the above (e.g., ohserve, report, sludy). The grammatical structure of the sublanguage was found to consist oP: (1) a set of elementary sentence types composed of the subfieldspecific word classes (e.g., NIoNVMoVENcELL, corresponding to such occurrences as Sodium flows o u t of the cell, Potussium moves into the cell. (2) aspectual operations on verbs (e.g., not, fail to, appear to, tend to, persist, continue, commence. etc.); ( 3 ) quantifiers Q (e.g., amount of, rute of, on certain verbs or nouns, and quantifying verbs V, (e.g., increase, decrease) operating on Q or V,; (4) the wh connective (relative clause), as in all of English; ( 5 ) conjunctions of English, and conjunctional verbs that connect nominalized sentences (e.g., aSfect, be cuncerned in, cause, produce, accompany, interfere with, involve, as in calcium exchange accompanies the excita tion ) ; This brief summary is elaborated in Sager (1972a).
* There were about 20 of the elementary sentences types with some elementary sentences in the texts not repeated with sufficient frequency to constitute a type. The unique elementary sentences were often drawn from neighboring fields, referring to phenomena without discussing them.
NATURAL LANGUAGE INFORMATION FORMATTING
101
(6) verbs and predicate sequences that operate on sentences (e.g., I t is not certain whether, We observe that, etc.).
The pharmacology sublanguage study gave rise to the notion of an information format, since it was found that the grammatical structure of the sublanguage could also be presented as a prototype sentence form in terms of which each successive text sentence could be seen as the realization of certain particular options within the overall hierarchical organization of the sublanguage grammar. The choice of options in each case showed precisely which types of information were present in the sentence. The overall structure of the format is given by English grammar. That is, the underlying grammatical relations of English determine the major columns in the format. These major relations are the subject-verb-object structure of elementary sentences; the fact that for certain verbs the subjects and objects consist of whole sentences (in nominalized form); and the fact that certain distinguished adjuncts, e.g., quantifiers and time expressions, have special syntactic status in sentences. The specialization of language usage in the given subfield, i.e., the sublanguage grammatical relations, shows up in the following: Particular sublanguage word-classes in the subject-verb-object positions constitute particular subtypes of sentences that occur frequently in the material. These subtypes have a specific semantic character, enabling us to attach sublanguage-specific labels to the major columns and subcolumns of the format. The information carried by sentences of these types is thereby classified and can be mapped into the appropriate columns of the format. This can best be illustrated by examples of formatted sentences. Elementary Fact Units
The formats obtained in the pharmacology sublanguage study contained one case or another of the basic unit illustrated in the format in Fig. 3 for the sentence Calcium uptake into liver mitochondria appears not to be affected by cardiac gfycosides. In this sentence, as in most others in this material, there is a “bottom level,” elementary assertion (shown in the format between double bars) consisting of a verb with its subject and object, which are concrete nouns.6 In the pharmacology texts, this assertion described an elementary physiological or biochemical event, which in the case of Fig. 3 is the uptake of calcium into the The remainder of this section and Figs. 5-7 are from Sager (1977). “Bottom level” refer to its position in the operator-operand hierarchy obtained by syntactic decomposition of the sentence (see Fig. 7).
102
NAOMI SAGER GL641 13,6,11 CALCIUM UPTAKE INTO L I V E R MITOCHONDRIA APPEARS NOT TO B E AFFECTED BY CARDIAC GLYCOSIDES. DRUG CARDIAC GLYCOSIDES
I
V-CAUSE AFFECT (APPEARS NOT TO)
I/ A R G l CALCIUM
I V-PHYS
YLF’KE
ARGZ
CONJ
1 II::IE!p”RIA 1 I -
mitochondria of the liver. The elementary assertion here is an instance of a subclass sequence that recurred over and over in these texts, NIoN VMOVE N C E i . L , in which a noun in the ion class is connected to a noun in the cell or cell substructure class by a verb in a class whose common semantic feature is movement. Despite the clear meaning features of these subclasses, it should be kept in mind that they are defined syntactically, by their position of occurrence vis a vis other classes. Examples of other elementary assertions encountered in this literature were those covering ion interactions, enzyme activity, tissue contraction or contractility, protein behavior, and ions binding to molecules. Operating on the elementary assertion, very often, was a noun-verb pair, shown in Fig. 3 to the left of the double bars, consisting of a drug wyrd and a verb of roughly causal character (qfffLv.t,irzflrrrJnc.r>,etc.) possibly negated or quantified, as in this sentence. The causal pair is said to operate on the elementary assertion because the latter appears as the object of the causal verb. A paraphrastic transformation (passive + active) was required in order to reveal that upfakc is the object of q f f i c t , since idptuke appears in the sentence as the subject of the passive construction rippears riot to be ajfictrd. Also, while uptukc appears in the sentence as a noun, it is in fact a nominal form of the verb take u p , so it is put in the verb column. The subject and object of this verb occur in the sentence as adjuncts of the nominal form (uptake) but in the format they are restored to verb-argument status. As illustrated in the format of this simple sentence, the major fact type in this pharmacology material was composed of an elementary assertion drawn from a prior science (cell physiology, biochemistry), with the pharmacological agent entering only on a higher grammatical level, as an operator on the elementary assertion. Fact versus “Meta-Fact”
The somewhat longer sentence formatted in Fig. 4 utilizes format columns that were not shown i n Fig. 3 because they were empty there. Two new columns appear on the left, labelled H U M A N and V-STUDY.
NATURAL LANGUAGE INFORMATION FORMATTING
103
GL t4l 2.2.1 MORE DETAILED STUDIES OF THE EFFECTS OF CARDIAC GLYCOSIDES ON SODIUM AND POTASSIUM MOVEMENTS I N RED CELLS HAVE BEEN MADE BY KAHN AND ACHESON (99), SOLOMON ET A L (168) AND GLYNN (67),
K AivD A (99) tlAVE MADE S ET AL (168) AiiZ C (67) DETA I LtP
EFFECT
SODllrli
I'lOVt I N
R€L CELLS
POTASSIUP
k O V E INJ
BED CELLq
STliDltS OF
FIG.4.
Formatted pharmacology sentence (2).
Factual assertions involving only the concrete objects of investigation in the science and their interrelations (the two inner sections of the format) are syntactically separable from the words describing the scientist's relation to the fact. Verbs like study, present, discuss, assume, report, which have exclusively human subject nouns and carry the connotation of the scientists' intellectual activity, appear as higher level operators in the operator-structure already built up from the words in the "object language" of the science. Another new feature in Fig. 4 is the conjunction column CONJ on the right which contains words that connect one line (or several grouped lines) of the format to another line or lines. This is a major departure from tables for quantitative data. In Fig. 4 the conjunction is and, but in other cases the conjunction may have the form of a verb or a phrase (e.g., is associated with, is the basisfor). Apart from grammatical conjunctions, only words which have the grammatical property of operating on a pair of nominalized sentences are accepted in the CONJ column. The words in the CONJ column are much the same in different subfields; whereas the words in the innermost columns are highly specific to the field. A last point to notice in Fig. 4 is the presence of reconstructed word occurrences, shown in square brackets. Sodium and potassium movements in red cells is expanded by paraphrastic transformation to sodium movements in red cells and potassium movements in red cells. The expansion of the phrase into two assertions does not imply that the events are independent of each other; only that the connection between them is not more explicit here than their conjoining by and. Data Structures versus Argument A third formatted sentence, shown in Fig. 5, is sufficiently complev so that it illustrates some of the regularizing effect that formatting achieves
104
NAOMI SAGER LA 721 1,1.5 THE POSSIBILITY THAT ADMIIIISTRATIOII OF DIGITALIS, THROUGH ITS It#lldlTlOit OF THE iJA+ - K+ COCPLED SYSTER, PROGUCES AH INCREASE I N NA+ - LA++ COUPLED TRANSPORT A I D THEREBY AN IhCREASE OF INFLUX OF CA++ TO THE FlYOFIWlttlTS IS DISCUSSED A I D IS PkESEliTED AS A POSSIBLE BASIS FOR THE WECHAliISM OF IJIGITALIS ACTION,
FIG. 5 .
Formatted pharmacology sentence (3).
for a whole text. When the sentence is read without reference to the format, it is not apparent that it is composed of repeating sequences of similar elements. As the format shows, the sentence consists of four interconnected factual units of the same basic type. The texture, and the intellectual content, comes from the interrelations among these subtypes, and from several other linguistic features: the use of conjunctions at different levels of grouping, the introduction of qualifying modifiers and higher level operators, and the use of reference, either explicitly via pronouns or implicitly via ellipsis. These features belong to the argument or reasoning in the text, which can be separated from the factual units shown in the inner portions of the format lines. Turning first to the individual fact units in Fig. 5 , the inner portion of the first line, stripped of its qualifiers, says that digitalis produces an increase in Na+-Ca++ coupled transport. In this unit, Na+-Cu++ coupled transport is an instance of the formula NloNVwovEN C E L L seen previously, even though the cell word is not present here. In often-repeated material, the subject or object of the verb is frequently dropped; sometimes the verb is dropped if it is unique to the stated subject or object. This is the case in the third line, where transport is suppressed but easily reconstructed because of the subject, Na+-K+ coupled system. Figure 5 introduces a new column V-QUANT between the innermost assertion and the columns DRUG, V-CAUSE. The V-QUANT column was not shown in the preceding formats because no words like increasp or decreuse were present in the sentences. In line 3, the V-QUANT
NATURAL LANGUAGE INFORMATION FORMATTING
105
column is empty. Another possibility is to split the word inhibition into two column entries, V-CAUSE and V-QUANT, since we find in the texts that inhibit, produce a decrease, and cause a decrease occur in similar environments. The format in Fig. 5 also illustrates the use of pronouns and other devices of reference. In the third line, the antecedent of the pronoun its, namely digitalis, has been reconstructed as the subject of inhibit. This follows the pattern throughout that the class of pharmacological agents occurs as the subject of verbs in the V-CAUSE class. The fourth format line has the interesting property that the whole object language, or factual, portion of the format is empty of physically occurring words. The three preceding format lines, seen as a unit, are repeated implicitly as the first operand (subject) of the binary relation is a busis for, where the second operand (object) is mechanism of digitalis action. Reasoning in science writing is characterized by devices of this sort. A single assertion becomes a nominalized sentence within another sentence; a sequence of interconnected sentences becomes an element of a later sentence by implicit repetition or by pronominal reference (this, this process, etc.) In this way it becomes possible for complicated interrelations to be expressed in the physically linear medium of language. Properties of Science Information
From a study of information formats in different subfields, one gets a picture of how scientific information is carried by language both with respect to the unique informational characteristics of each science and with respect to the general properties of information viewed over science as a whole. First, the information formats of a particular science reflect the properties of information in that particular science in contrast with other sciences. The pharmacology formats, for example, displayed a characteristic predicational hierarchy in which the word for the pharmacological agent occurred in the predicate on an embedded sentence of the type found in cell physiology or biochemistry, thus reflecting the role of the drug as an outside element that affects ongoing processes. Quantity and quantity-change were important in the pharmacology formats (not all quantity columns are shown in the example formats, e.g., dosage) reflecting the importance of quantity relations in this science. This type of format contrasts with one that was obtained for medical records, illustrated in Fig. 2. In the medical format, columns for time words are essential to the information, whereas they were almost entirely absent in the pharmacology formats. In the medical formats, there is very little
NAOMI SAGER
predicational hierarchy and virtually no argument, both of which were present in the pharmacology formats. The structure of the clinical information, displayed in the formats, is an interplay between columns containing treatment words and columns containing words that describe the patient’s state: successive rows are linked primarily through time sequence, with the conjunction columns playing a secondary role. While the formats for different subfields differ, as they should, to capture the specific character of information in each field, they have certain properties in common that appear to hold for all of science writing. To mention just a few: ( 1 ) Statements about science facts are separated by the grammar from the science facts proper; the role of the human investigator is carried by a grammatically distinct class of verbs and is syntactically separable from the report of factual events. (2) The report of a complex event has a structure composed of a hierarchy of different types of operators, the “bottom level” operand being the carrier of the most elementary objects and events. When a given science draws upon a prior science, the material from the prior science appears as the operand of material from the given science. (3) Argument is carried by connectives between the data structures built up in this hierarchial fashion. Whole units are carried forward by the telescoping of an operator-hierarchy into a single noun phrase, pronoun or “pro-sentence,” or by the controlled dropping of words permitted by the grammar. (4) A surprising amount of repetition and regularity is found in all science writing, once stylistic variations and equivalent grammatical forms are eliminated. Every individual piece of writing contains some repetition (or it would not be connected discourse). Across a single specialized discipline, the same items repeat in different combinations and with variations, as though all the texts were part of a single extended discourse. Although the texts each bring in some new feature they are sufficiently similar as to fit into an overall structural characterization. These structures, or information formats, are then a powerful tool for organizing the information in subfield texts.
2.4 Automatic Generation of Subfield Word Classes
The manual study of a pharmacology sublanguage, described above, showed that sublanguage word classes can be established on the basis of co-occurrence similarities of words in grammatically analyzed subfield sentences, and that these word classes constitute a basis for defining information formats for the textual material. However, the work of man-
NATURAL LANGUAGE INFORMATION FORMATTING
107
ually grouping together words on the basis of co-occurrence similarities proved a tedious task. And although only words which occur frequently in the same environments were put in one class, the results could not be as precise as if numerical similarity coefficients were calculated. It was therefore felt that a clustering program should be written to automate this tedious step and to provide a more rigorous demonstration of the relevance of distributional analysis. The steps in the procedure are summarized schematically in Fig. 6. Steps 2-5 constitute the clustering program proper, which was written especially for linguistic data. It is intended that the program operate from the output of the LSP sentence analyzer (parser + transformational component). However, at the time the experiment described here was performed, the transformational component was in process of implementation, and the input was obtained manually by applying standard English transformations to text sentences. The procedure can be described informally with reference to Fig. 7 and Tables I and 11, which carry through the analysis for several sample AUTOMATIC GENIRATIGtl OF SUBFIELD WORD CLASSES SAMPLE OF S U B F I E L D T E X T S
'TRANSFOHtllAT I ORAL DCCOf~PDSITIGN OF
OPERATOR-ARGUMENT PA1 KS;
I
FREQUENCY OF OCCURRENCE O F OPERATOR-ARGUMENT P A I R S
S I M I LARl TY
CLUSTERING
PROGRAM
S I N I L A R ITV C O E F F l Cl ENTS FOR A L L WORD P A I R S
(SC)
, \I, CLUSTERS OF WORDS WHOSE SC > THRESHOLD
KVtRAGE
SMERGINGPROCEDURE S U B F I E L D WORD CLASSES
FIG.6 . Automatic generation of subfield word classes.
108
NAOMI SAGER S1, THIS RESULTS FROM THE SLOWING OF THE INFLUX
OF
POTASSIUM INTO THE CELL. RESULT
/\
THIS
SLOW
/\ ( )
FLOW
/ \
POTASSIUM
CELL
S2, THE INFLUX OF: SODIUM IS FOLLOWED BY AN EFFLUX PO1ASS I UM
.
OF
FOLLOW
/ \
FLOW
/\
POTASSIUfl ( 1
FLOW
/ \
SODIUM
( )
S3. HAVING ESTABLISHED IN BROAD OUTLINE THE MOVEMENTS OF SODIUM AND CALCIUM, , ,, ESTABLISH-
/ \
I N BROAD OUTLINE
FIG.7. Transformational trees.
ntences from the corpus of pharmacology sent nces which was used in the experiment. Fig. 7 shows the transformational trees for the sample sentences. Each tree is made up exclusively of terminal nodes labelled with the base forms of the lexical items arranged in an operator-argument hierarchy; the verb dominates its subject and object(s), conjunctions dominate the verbs of the conjoined sentences. etc. Adjuncts (i.e., modifiers, as in S3) are shown connected to the element they modify by a double line. Transformations are used to denominalize a verb in nominalized sentences, e.g., slow in the slowing of the efflux of potassium (S2), move in movements of sodium and calcium (S3), to undo passives (e.g., in S2 is followed by -+follow with reversal of subject and object), and to expand conjunctions, as in S3 where the movements of sodium and calcium -+ the movement of sodium and the movement of calcium and thence by denominalization to sodium moves and calcium moves, and finally to sodium move and calcium move, since tenses are removed. The purpose is to bring all occurrences of the same base morphemes into a single form so that they can be counted, and t o d o a similar job for all
S
NATURAL LANGUAGE INFORMATION FORMATTING
109
TABLEI OPERATOR-ARGUMENT OCCURRENCES Operator
First argument
Second argument
THIS
SLOW FLOW (INTO) CELL FLOW (IN)
Third argument
~~
RESULT (FROM) SLOW FLOW (INTO) S2. FOLLOW FLOW (OUT) FLOW ( I N ) S3. ESTABLISH MOVE MOVE S1.
POTASSIUM FLOW (OUT) POTASS I U M SODIUM
MOVE SODIUM CALCIUM
occurrences of a particular operator-argument relation (verb-subject, verb-object). The input to the culstering program consists of a Iinearized form of the transformational trees obtained for the sentences. From these, all operator-argument pairs can be listed and their frequency of occurrence tabulated, as illustrated in Tables I and I1 for the data in senteyes S I S3. In reading off operator-argument pairs from the transformational trees, the conjunction nodes (e.g., and in S3) are “transparent.” Thus, in S3, move is obtained as argument of establish. To compute similarity coefficients, each word W iis assigned a characteristic vector VI which has 6n components if there are n distinct words TABLEI1
FREQUENCY OF OPERATOR-ARGUMENT PAIRS (Noun Arguments Only) 1st argument POTASSIUM 1st argument SODIUM 1st argument CALCIUM 1st argument CELL 2nd argument POTASSIUM 2nd argument SODIUM 2nd argument CALCIUM 2nd argument CELL 3rd argument POTASSIUM 3rd argument SODIUM 3rd argument CALCIUM 3rd argument CELL
Flow 2 1 0 0 0 0 0
Move 0 I I 0 0 0
1
0 0 0 0 0
0 0 0 0
0
110
NAOMI SAGER
in the corpus. The number 6ri arises because there are 6 possible relations which each W icould have to another word Wj: (1) (2) (3) (4) (5) (6)
W icould W1 could W icould W , could Wj could W , could
be be be be be be
an an an an
operator operator operator operator an operator an operator
with with with with with with
W, as its first argument Wj as its second argument W , as its third argument W ias its first argument W f as its second argument Wj as its third argument.
The value of the component is the number of occurrences in which W i and W , have the stated relation. Thus, in Table 11, which lists the components of S I ~ W and muve (only as operators) with respect to the four nouns in Sl-S3 (as arguments). the number 2 under FLOW in the row labelled 1st ARGUMENT POTASSIUM is the number of occurrences in S 1-S3 offlow, as operator (= verb) with potcissium as its first argument (= subject). As Table I1 illustrates, the characteristic vectors are sparse; only a few of the components are nonzero. In the clustering procedure, each characteristic vector is normalized to unit length, and multiplied by a weighting factor to reduce the effect of infrequently occurring words. The similarity coefficient between two words is then calculated by taking the inner product of the normalized, weighted characteristic vectors of the two words. Clusters are built up one word at a time. Two words form a cluster if their similarity coefficient exceeds a threshold value, which is a parameter of the program. A word W may be added to a cluster if and only if the average of the similarity coefficient of W with each word in the initial cluster exceeds the threshold. The reason that clusters are built up one word at a time is to avoid the situation of grouping together unrelated small clusters due to high similarity coefficients within the small clusters, which might be high enough to compensate for the lower similarity coefficients between words from the different component clusters. The output of the clustering program generally contains a number of clusters with overlapping membership. Clusters with 213 or more members in common are merged to form the final output.
Experimental Results The clustering program was run on a set of 400 sentences taken from 6 articles on the mechanism of action of digitalis. Sentences were not specially selected, except that the Methods sections of the articles were excluded. The sentences yielded approximately 4000 operator-argument
’ Summarized from Hirschman el ~ r l (197.5). .
NATURAL LANGUAGE INFORMATION FORMATTING
111
pairs and a vocabulary of some 750 words. The similarity coefficients between all pairs of words were computed and the words were grouped into clusters by the algorithm described above. While 400 sentences is a small corpus, it turned out, rather surprisingly, that the main subfield word classes and the main members in each class were obtained by the computer program. Table I11 shows one part of the clustering output, the noun classes. It can be seen that the classes are coherent and very few words are in the wrong class. However, to evaluate the results further, the same 400 TABLE I11 CLUSTERING PROGRAM OUTPUT (Noun Classes Only) Merged classes. Run of 11.13.74. t
=
0.250
Noun classes:
CG class
Cation class
agent cardiotonic glycoside CG compound digitalis drug erythrophleum alkaloid inhibitor ouabain st rophanthidin strophanthidin 3 bromoacetate strophanthin
Ca Ca++ calcium electrolyte glucose ion
Muscle class
actom yosin cardiac fiber protein
atrium heart muscle muscle ventricle
+ ATPase
False clusters Myocardium cell
Protein class
SR class
Enzyme class Na + K ATPase enzyme
K Na+ potassium sodium
ADP El
sarcoplasmic reticulum SR
ion
K+
ion substance
112
NAOMI SAGER
sentence corpus was analyzed manually and the computer-generated classes were compared with those obtained manually to determine two points: (1) How many of the classes recognized as significant by the human analyzer were represented in the computer output'? (2) What proportion of the words in each manually prepared class (which can be assumed to be relatively complete) were present in the corresponding computer-generated class? The results of the comparison of the noun classes in answer to Question 1 are shown in Table IV. Of the 1 I major noun classes found manually, 10 are accounted for by the computer: 6 by merged clusters and 4 by single member classes. One major class recognized manually (phosphorylated compounds) did not appear, due to a minor mistake in the program. On the average the computer classes accounted for 84% of the nouns in each manual class. Overall the computer classes + single member classes account for 1335 of 2016 occurrences = 66% of pair-occurrences of concrete nouns in the corpus. Table V shows the results of comparing the membership of the computer-generated class of cardiac glycosides ( C G ) with the manual obtained CG class, in answer to Question 2. In the case of the CG class, and others, the computer-generated class includes the major nouns of the class, and with minor exceptions does not include nouns from other classes. The computer CG class accounted for 89% of the pair occurrences of words in that class. In the case of the cation class the corresponding number was 96%. 2.5 The Sublanguage Method Summarized
Table VI summarizes the sublanguage method as it has evolved since the first study. In approaching a new subfield, first a linguistic analysis is performed on a sample of the subfield texts in order to determine the word classes and information structures in the material (Part A of Table VI). As we have seen, this analysis can be done on the basis of word distribution patterns in grammatically analyzed sentences. While some of the steps have been automated, and more may be in the future, this remains an essentially human task. Once the subfield information format (or formats) has been defined, subfield texts can be converted to a structured data base by the six-step procedure outlined in Part B of Table VI. Of these, the first two steps produce the computer lexicon to be used in the processing. and the remaining four carry out the processing of the text sentences. The text processing procedures have been fully automated and are described in detail in the next section.
NATURAL LANGUAGE INFORMATION FORMATTING
113
TABLE1V COMPARISON OF NOUNCLASSES OBTAINED MANUALLY AND Class major classese: CG Cation Enzyme Protein SR Cell Phosph. Cmpdsd Membrane Heart Heart parts Muscle minor classese: Human agent Drug, not incl. CG Ultrastructure, not incl. SR Native org. sub. Organism Tissue Organ not heart Inorg. molecule not incl. cation Expt. medium Physical forces Miscellaneous
Manual"
BY
COMPUTER %COMPlMAN
No. OCCINo. N
No. OCCINo. N
442122 412114 192113 136121 1011 5 82/ 6 66110 551 5 531 3 441 3 451 6
39511 1 3941 9 1571 3 631 3 971 2 771 1 xxxxxx 42/ 1 391 1 351 1 381 2
89
96 82 45 97
94 xx 76 74 80 84
95154 88/25 42115 33/12 231 9 201 3 171 8 151 6 131 3 I21 5 52/12
Entries are: total number of pair occurrences of nouns in classlnumber of nouns in class. * Single member classes are shown in correspondence to manual classes if the single word in question accounts for two-thirds or more of the pair occurrences of words in the manual class. In almost all cases, this word is identical to the name of the class. Major classes are classes which have 50 or more total occurrences, and at least one member with more than 8 occurrences. Minor classes have either less than 50 occurrences total, or no member with more than 8 occurrences, as in the human agent class. A phosphorylared compounds class was obtained on previous runs (five nouns, with 71% coverage of the manual class). Due to a small error, this class did not appear in this run. (I
With regard to the preparation of a suitable computer lexicon, this is an essential and nontrivial operation. To format the text sentences a correct parse must be delivered to the transformational component. This requires that the words (or most of the words-cf. below) have correct English syntactic classifications down to the subclass level. To proceed
114
NAOMI SAGER TABLEV MEMBERSIN COMPUTER (Cardiac Glycoside Class Only)
PROPORTION OF WORD CLASS
CG digitalis ouabain d w agent strophanthidi n strophanthidin 3 bromoacetate strophanthin cardiotonic glycoside
CG digitalis ouabain drug agent strophanthidin strophanthidin 3 bromoacetate strophanthin cardiotonic glycoside
compound inhibitor
cornpound inhibitor
OUTPUT
156'
118 70 15
8 5 395 - = 8Wo 4 '442 4
3
'1 5
erythrophleum alkaloid"
6 11 7 7 6
glycoside digoxin acetyl strophanthidin cardioactive glycoside digitalis glycoside digitoxigenin sprophanthoside cardiac glycoside digitoxin digitalis compound strophanthin K
6
3 2 2 1 I 1
442
~
~~
~
~~
~~
Agent, drug, and cornpoctnd are classifiers for words of the CG class, as well as of the more general DRUG class. Inhibitor is also a classifier, which classifies according to function. An occurrence of a word either as the operator or operand in a pair. Pair-occurrences are more numerous than text occurrences for several reasons. Recoverably zeroed material is reconstructed and contributes to pair formation. Also each operator can appear in a pair as the operand of its operator, as well as with each one of its arguments. (Thus a twoargument verb can appear in three pairs.) For concrete nouns however this does not occur, and the pair-occurrences correlate more closely with the number of actual occurrences in the text. LI Ervthrophlrum alkaloid does not belong in the CG class; it is a drug whose effect is compared to that of the cardiac glycosides.
NATURAL LANGUAGE INFORMATION FORMATTING
115
TABLEVI SUBLANGUAGE METHOD A.
Discovery of information structures 1. Select a representative sample of subfield texts. 2. Determine the sublanguage word classes based on similarity of word environments in text sentences. 3. Determine the sublanguage grammar based on co-occurrence patterns of sublanp a g e word classes. 4. Define the subfield information format based on the sublanguage grammar.
B.
Automatic conversion of subfield texts to a structured data base Preliminary 1. Look up text words in LSP lexicon; print list of new words. 2. Prepare lexical entries for new words", update lexicon. Text processing 3. Parse sentences using LSP parser, lexicon, and English grammar. 4. Regularize parse trees using LSP English transformations. 5 . Map regularized parse trees into information format using sublanguage formatting transformations. 6. Regularize formats using procedures that recover implicit material. Currently a manual step.
further, i.e., to map the words into their correct information-format slots, requires that the content words also have correct sublanguage classifications. Together, these requirements present a considerable burden of lexical classification. With training, coders can prepare a lexicon for each subfield of application, and once the job is done the lexicon can be used for many subfield texts, with only minor updating required. However, automation of this stage would clearly be desirable, and at this writing, research on automating the lexical coding is underway.8 3. Computer Programs for Information Formatting
Computer programs for processing English sentences have been under development for almost 20 years. Advances in the computer field, para The fact that 50-80% of the words in any new text are found in the LSP English lexicon (The percentage depends on how many previous texts in the subfield have been processed) suggests that a combination of morphological clues and parse tree context may make it possible to achieve correct parses of sentences containing new words, without hand coding all the new words. Further, the words in the environment of the new word which already have sublanguage classifications may in some cases determine the sublanguage class of the new word. These conjectures are being tested.
NAOMI SAGER
116
ticularly higher level languages and syntax driven compilers, along with bigger, faster machines, have aided in this development, but they have not in themselves solved the special problems of natural language. Among these problems are the inherent syntactic ambiguity of sentences, the existence of implicit elements (ellipsis), the rich network of conjunction and comparative constructions, the many detailed constraints that apply to word subclasses, the large amount of lexical variation (every word is different), the need for a semantic representation suited to the application, and the very size and complexity of the system needed to cope with all these features in algorithmic terms.
3.1 Linguistic Framework The choice of linguistic framework plays an important role in how these problems are solved. In the case of the LSP system, the use of linguistic string analysis as the grammatical framework for the programse has led to an economical organization of the grammar and a special method of dealing with linguistic relations. As will be seen below, the components of sentences under linguistic string analysis are such that all syntactic and semantic constraints on the words of a sentence operate locally, either within one component, or between contiguous components in the string analysis of the sentence. (A few constraints operate between elements separated by a chain of such units.) All linguistic operations can thus be carried out using a small number of basic routines corresponding to the simple grammatical relations which hold in the local case. This results in very great economy in specifying the grammar and appears to be the mechanism whereby language carries its enormous amount of detail without burdening the processor (human in this case) with thousands of complicated rules (Sager, 1967, 1972b). String Analysis
Briefly, under string analysis, a sentence consists of an elementary central sentence (the main clause, stripped of modifiers) called the center string, and an optional number of adjunct strings (modifiers) and conjunctional strings, each of which occurs in the sentence at a stated position in the string it adjoins, usually to the left or right of a particular element, such as a noun or verb. For example, in during the first admission, the adjective first is a single-element adjunct string adjoined to the left of the noun admission in the prepositional string (PN: preposition + noun) during admission. Also, a sentence may contain a sentence nom-
* A summary
of this theory is given in Harris (1968, Sects. 3.4 and 3.5).
NATURAL LANGUAGE INFORMATION FORMATTING
117
PATIENT WAS FOUND TO HAVE SICKLE CELL DISEASE DURING ADMISSION
FOR 1. INFLUENZAE MENINGITIS
FIG.8. Simple string diagram.
inalization string occurring in the position of a noun and its adjuncts, e.g. Your being present in Your being present would be helpful, parallel to Your presence would be helpful. Strings combine to form sentences in accord with their membership in sets of the above types; thus, for example in constructing a sentence, strings in the set left adjuncts of N (LN) are inserted to the left of the noun they are to adjoin, strings in the set right adjuncts of the verb (RV) are inserted to the right of the verb, and so forth. The string analysis of a sentence is very similar to what many of us were taught in grade school under the name of sentence diagramming. Fig. 8 shows a simple string diagram of a typical sentence from a hospital discharge summary, Patient was found to have sickle cell disease during first admission to Bellevue for H . influenzae meningitis. (The dropping of the definite article is a regular feature of the laconic style of notes and records.) In the diagram, each linguistic string is written on a separate line, and its point of adjunction in the host string is indicated by a vertical line drawn to that point in the host-string line. The compound nouns sickle cell disease and H . influenzae meningitis in this sentence are treated as single units, although linguistically they could be further analyzed into noun adjuncts on a host noun (e.g., disease of the sickle cell type). Relation to Information
The simplicity of string analysis and its regular rules of sentence composition are strong recommendations for its use in computerized sentence analysis. Another feature, which is important for information processing, is the fact that the linguistic strings are informational units as well as grammatical units of the sentence. Thus, in the example of Fig. 8, the center string patient was found to have sickle cell disease is one asserted fact in the sentence. The prepositional strings during admission, to Bellew e , and f o r H . influenzae meningitis each add a unit of information to the sentence. Transformational analysis shows further that these prepositional strings are part of a single larger informational unit: the connec-
118
NAOMl SAGER
tive during followed by a nominalized form of the assertion: patient u’as admitted t o Bellevue fur H . influenzae meningitis, another asserted fact in the sentence. The adjective string first adds the information that the mentioned admission was the first such event. Relation fo Transformations
The fact that linguistic strings are at the same time grammatical and informational units in the sentence is explained linguistically by the relation of linguistic strings to transformations (Section 2.1 above). The linguistic strings in a sentence are closely related to the elementary assertions and operators in the transformational decomposition of the sentence. Aside from the purely paraphrastic operators, these elementary assertions and operators are the individual informational components of the sentence. In the course of constructing a sentence out of such components. the elementary assertions and operators are transformed (into linguistic strings), and in this form they combine by simple rules of adjunction and substitutionlo to form the final sentence. When the relation of linguistic strings to transformations is understood, it is not hard to see also why grammatical and semantic constraints apply locally to the components of the sentence under string analysis. Most of these constraints, for example the fact that only certain nouns are appropriate subjects of particular verbs in a given sublanguage, are initially constraints within the elementary component assertions of the sentence. As an example, patient complained of pain is an acceptable sentence in the medical sublanguage, whereas pain complained of paticrzt is not. In a sentence, this assertion might occur in the form the complainf c?fprrin b-~ the patient, having been transformed into a noun with prepositional adjuncts so as to fit into a noun position. However, transformations do not move the parts of the original assertion into arbitrary or distant positions. In the case of a nominalized sentence, the transformed arguments of the verb (here, of pain, by the purient) are in adjunct positions near the noun-form of the verb (complaint). This makes it relatively straightforward to recover these underlying grammatical relations and apply the appropriate constraints (such as. for example, here, that pain is not an acceptable sublanguage subject for complain). ‘‘I A particularly simple statement of the rules of combination is obtained by considering the sentence nominalization strings to have entered the sentence by substitution for a subject or object noun. For example, I know thtrc he was hew would be formed from two source sentences N-TV-N (I hnoic, srmre/hiw,y) and N-TV-ADJ (Hc, M ’ L I S liprc), by nominalizing thc sccond sentence (He ivos hew -* thnr he IIYIS hzrr,) and substituting it for the ob,iect N (somc’thing) in the first sentence.
NATURAL LANGUAGE INFORMATION FORMATTING
119
3.2 Representation of the Grammar As seen above, string analysis is a grammatical theory which has the advantages of simplicity and of providing relevant units of description for information processing. Nevertheless, the number of grammatical facts needed for sentence analysis (in any framework) is such that a very real question is how to represent these facts in a form suited to computation. In the LSP case, the English grammar is divided into several components each of which is written in an appropriate formalism. The major syntactic constructions of the language, in our case the linguistic strings of the language and certain auxiliary constructs, are specified by contextfree productions. For the convenience of parsing, these are written as a set of Backus-Naur Form (BNF) definitions. The parser uses these definitions to construct a parse tree for the input sentence, drawing upon a lexicon which gives the parts of speech and subclass memberships of the input vocabulary. The lexical entries are also written in a B N F formalism. A second component of the grammar, the restrictions, states conditions on the parse tree which must be met in order for the parse tree to constitute a correct analysis of the input sentence. This component carries the many detailed grammatical constraints (e.g., number agreement and the like) that are not easily expressed in a context-free formalism. The restrictions are procedures which test parse subtrees and attributes of the sentence words as the parse tree is being constructed. If the test succeeds, parsing continues as though no interruption had occurred. If the test fails, the parser backs up and tries to rebuild the parse tree using other options of the grammar. The procedures are written in a special programming language, the Restriction Language (RL), which was developed by the LSP for the writing of computer grammars of natural languages and other language-like systems (Sager and Grishman, 1975). A third component of the grammar is a set of routines, described in Section 3.4, which are used by the restrictions and also by the transformational component of the grammar. These routines embody the basic linguistic relations of string and transformational grammar as they appear in the parse tree. Their use in restrictions and transformations and in the definitions of the routines themselves shortens the grammar by thousands of lines. The routines are also written in the Restriction Language. The last, and most recently implemented component of the grammar is a set of English transformations (Hobbs and Grishman, 1976; Raze, 1976b). These are procedures for restructuring the parse tree in accordance with stated conditions so as to eliminate alternative grammatical forms for the same information. The procedures are written in the Restriction Language, utilizing routines of the grammar and operations for
120
NAOMI SAGER
replacing and inserting nodes in the parse tree and for sequencing the transformations under user control. Lastly, a part of the LSP system that is used in information formatting is a set of sublanguage transformations. These are not part of the English grammar proper. However, like the parsing grammar and the English transformations, they are written in Restriction Language and are a necessary component of the information-formatting system in any application. 3.3 Parsing Program
When a sentence is read into the computer for processing, the program first looks up the sentence words in the computer lexicon and associates with each word of the input string its major classifications and subclassifications. For example, in the LSP lexicon, the major classes associated with each word of the following sentence are those shown beneath the word: Patient was admitted to hospital for meningitis.
P N. N/ADJTV T V N E N P N (Here, N stands for noun, ADJ for adjective, TV for a verb with tense suffix, VEN for a past particle, P for preposition.) Each major-class entry X of a word may have an attribute list associated with it giving the subclasses of X that the word belongs to; e.g., putient as N has the attribute SINGULAR. There are I I5 attributes defined for the LSP English grammar (Fitzpatrick and Sager, 1974). Parsing Algorithm
The syntactic structure is obtained by a parsing algorithm that draws upon the lexical classifications of the sentence words and a grammar which specifies the well-formed sentence structures of the language in terms of the lexical classifications and other defined grammatical constructs. A number of such algorithms have been developed for natural language parsing (Grishman, 1975). The LSP system uses as the core of the analysis algorithm a top-down serial parser with automatic back-up. It builds a parse tree of the input sentence and, if the sentence is ambiguous, generates the different parse trees sequentially. The current implementation of the LSP system in FORTRAN is described in Grishman (1973) and Grishman et a / . (1973). Briefly, the top-down parser generates a parse tree from the contextfree productions and attempts to match each successive terminal node of the tree with a word class assignment of the current sentence word.
NATURAL LANGUAGE INFORMATION FORMATTING
121
stepping from left-to-right through the sentence. If a terminal node X matches the X category of the current sentence word, a pointer is created from X in the parse tree to X in the lexical entry for that word. The subcategories of X in the lexicon thereby become attributes of the associated terminal node X in the parse tree. They can then be tested by procedures executed on the parse tree (restrictions). Components of the Program
In the LSP system the algorithm that produces the parse tree(s) for a sentence from the BNF definitions is actually a small part of the total program, as can be seen in Table VII. Out of a total of the 12,721 lines of the program, which does not include the grammar or lexicon, the parsing algorithm proper along with the trace mechanism accounts for 966 lines, or 7.5% of the program. The largest program component is the Restriction Interpreter (2257 lines). Next in size (1417-1001 lines) are the
TABLEVII LSP PROGRAM COMPONENTS LSP System (as of August 1977) Size of program Size of grammar Size of lexicon (ca. 5,000 English words) Program (including comments) Largest component Restriction interpreter Next largest components Lexical processors 423 -for compiler -for English parser 994 Saving mechanism Word dictionary update program Loading grammar (housing, directories, . . .) Smaller components Parser, including trace Main program Code generating routines of RL compiler Transformational mechanism Left recursion Print trees Trace (restrictions, BNF)
12,721 lines 3,400 lines 8.OOO lines
Number of Lines 2257 1417
1290 1148 1001 966 902 887 475 164 128
included above
122
NAOMI SAGER
lexical processors, the subprogram for saving reusable parse-subtrees, the dictionary update program, and the loading program. The main program and the parsing algorithm proper are thus among the smaller components of the system. Among the more interesting devices which have been developed as part of the system are the following: a system of interrupts and dynamically generated definitions for conjunction strings; nondeterministic procedures for executing restrictions on conjunctional strings with implicit elements; node attributes with automatic erasure for cross-referencing linguistically related nodes and updating the linkage if the parse tree is changed; switches to control which portions of the grammar are to be used under different circumstances; a mechanism for saving subtrees that keeps track of portions of restrictions which must be re-executed when a saved subtree is inserted into a new parse tree context; a method of distinguishing different types of ambiguity and printing out only alternative analyses of a specified type. These devices are described in a number of LSP papers and reports, chiefly (Sager, 1967, 1973; Raze, 1967, 1976a; Grishman et ul., 1973; Sager and Grishman, 1975). f o r m of the Output Figure 9 shows the output parse obtained for the same sentence that was analyzed informally in Fig. 8, namely, Putient was found to huve sichie cell disrusc. during first admission to Bellevur for H . influenzae meningitis. In this type of output, sibling nodes are connected by a horizontal line and the parent node is attached to the left-most daughter node only; branches end in terminal nodes (e.g., N , TV) o r literals (e.g., “to”) associated with sentence words or in NULL (not shown). This way of drawing the tree is appropriate for displaying a string analysis of the sentence. It will be seen in Fig. 9 that certain nodes in the parse tree immediately dominate a sequence of nodes rather than a single element, for example, ASSERTION, PN. These nodes are in the class LINGUISTIC STRING. The sentence words subsumed by a LINGUISTIC STRING type node are those which would be recognized in string analysis as constituting a linguistic string in the sentence. Thus, associated with the terminal symbols under ASSERTION in Fig. 9, one reads the same sequence of words (patient wus found to huve sickle cell diseuse) as appear on the center string line of the string diagram of Fig. 8. If one reads the words associated with terminal symbols under each PN node (up to but not including the words subsumed by the next PN node) it is seen that each of these word sequences also corresponds to a component of the string diagram of Fig. 8.
NATURAL LANGUAGE INFORMATION FORMATTING
123
Thus the string character of the analysis, which we saw was important for further information processing, is preserved in the parse tree by the fact that the structures generated by a distinguished subset of the B N F definitions are in direct correspondence with the word sequences constituting the linguistic strings in the sentence. The reason, of course, why the computer-generated parse tree is so much more complicated than the string diagram is that it records every choice made by the parser from among the grammatical alternatives specified by the B N F portion of the grammar. In the case of the LSP grammar the number of alternatives is large since the grammar covers virtually all the sentence forms one is likely to encounter in scientific writing. 3.4 Restrictions and Routines
Just as the context-free parsing algorithm is a small part of the total parsing program, so the context-free portion of the grammar is a small part of the total grammar. Some 200 BNF definitions (about 250 lines) suffice for specifying the syntactic structures of English, exclusive of conjunction strings, which are dynamically generated. This constitutes about 7% of the parsing grammar, and if the transformational component is included in the tally, then the percentage is much smaller. The remainder of the grammar (about 3000 lines, written in RL) consists mainly of routines and restrictions. Restrictions
As noted earlier, restrictions are procedures that test the parse tree and attributes of the sentence words that have been associated with terminal nodes of the parse tree. In the LSP grammar, for convenience of reference, the restrictions have been divided into functional groups, as shown in Table VIII. Referring to the numbering in Table VIII, Sections 2, 3 , 4, 6, 9, 11, 12, 13 are of the type that concern a particular linguistic element or construction. Thus are covered the comma as punctuation, comparatives, coordinate conjunctions, the noun phrase, quantifiers, sentence nominalizations (exclusive of wh-complements), tenseverb constraints, and wh-strings of all kinds. Sections I , 8, 10, 14 each concern a particular type of linguistic constraint. Thus, all agreement restrictions are in one section, including those that concern the noun phrase or other constructions named by special sections. The so-called position restrictions check for particular subclasses in particular syntactic positions (e.g., a particular preposition depending on the governing verb). Selection restrictions concern the appropriateness of the combinations of word choices in given syntactic relations. Sublanguage constraints are of
H I P D S 2.1.7 C E L L DISEASE
SIGNIFICPLT P h S I HISTUMl
D U R I N G F I R S T ADMISSIOh T O
-
P L T l E h l 4 A S FLW.IO TO H A Y E S I i l ( L E B E L L E v U t & O i l H. I h f L L E h Z A t n t N I ’ i G I r 1 5
.
SELTEhCE b TEXTLET
E S EN1
fAO-7ENTENC E---*OR
+
INTPDOUCER
---CEYTEK------EhCMLRK 4
Z
4
ASSlRTIDh
*.*
&
b
D 0
5
b b
PUEJtCT---TENSE---VEI8--------------LdJtCT---RV b
b
YSlG
LV---VYI~---P~
b
b
LNR
Tb
d Ai?---RN
LN---N
.
bk
D
litiJtciBt
0
m
VthPASS
&
b
cn
&
b
5
LdSA---LkEkR-----------?ASSOBJ---RV
b
b
b
N
Lu---vEY---RV
IO4Cm
b
PLTI€*I
n
* l * 4
---------NSTG"
Lp---p b
4
DURING
hSrG 4
LNR 4
LY----------------------------------
hvAK---*h b
4
TPOS---OPCS---APUS---NSPO>---hPO>
PI
I(ha---NLILL
b
4
4
b
4
+
b
b
b
b LP----O-----h5TGO 4 4 + IJRISSION 143 CSlG
b 4DJ40.1
Ph b
+ LNR *
b
LARl b
lb---AYAR---RAI
LN---NUAl---ah
+
LCDA---ADj b b
b
b
N
RNP---NULL
4
*
b
PN b
4
L p ---p
4
4 4
4
FIRST
0tLLEYLc
----_- >TG 4
q &
kSTG b
L E,R 4
LI---NVAR---RV b N
+
H.
INFLbEYZAE ? E N I Y G
FIG.9. Parse Tree Output. Node Names: Terminal S.vmbols: N noun, P preposition, ADJ adjective, TV tensed verb, VE participle. Types of Non-rerminal Symbols: For X = a terminal symbol: LX left adjuncts of X ;RX right adjuncts of X ; sequence of LX + X + RX or of LX + XVAR + RX; XVAR local variants of X;XPOS position of X-occurrence among adjuncts. Other Node Names: SA sentence adjunct; NSTG noun string; NSTGT noun string of time; NSTGO noun string in PN prepositional phrase; ADJADJ repeating adjectives; LCDA left adjuncts of compound adjective; OBJECTBE ubjec VENPASS passive string; LVSA left adjuncts of verb in partkipla1 SA string; PASSOBJ object in passive string. Output Conv A prepositional phrase PN which has several possible positions of adjunction is assigned in the parse tree to the nearest (here BELLEVUE FOR MENINGITIS rather than ADMISSION FOR MENINGITIS). The later stages of processing cor assignment on the basis of word co-occurrence classes.
126
NAOMI SAGER TABLEVlll
LSP ENGLISH GRAMMAR COMPONENTS Component of grammar BNF definitions Type-lists Routines Restrictions 1. Agreement restrictions 2. Comma restrictions 3. Comparative restrictions 4. Conjunction restrictions 5 . Min-word restrictions 6. Noun phrase restrictions 7. Optirniration restrictions 8. Position restrictions Y. Quantifier restrictions 10. Selection restrictions 11. Sentence norninalization restrictions 12. Verb and center string restrictions 13. W/z-string restrictions 14. Zeroing restrictions
Number of lines 250 150 500 2500 250 100 150 300 50 250 150 400 200 50 250 100 150 100
this type. Zeroing restrictions control the acceptance of NULL elements in conjunctional strings with ellipsis and lay the groundwork for recovering the implicit word-occurrences in these strings. Minword and optimization restrictions (Sections 5 and 7) increase the efficiency of parsing and limit the number of alternative analyses that are printed. A description of many of the restrictions in the form in which they appeared in earlier implementations of the grammar is given in Sager (1968) and Salkoff and Sager (1969). A fuller and more up-to-date documentation of the grammar is in preparation. From the very start of efforts in computerized language processing it was clear that restrictions, or their equivalent in other terms, were an essential ingredient of the grammar. Parses that are syntactically correct according to the context-free component of the grammar but are clearly not correct syntactic analyses of the given sentence must be eliminated. For a grammar of any respectable size, there may be hundreds of such false parses for sentences even of modest length. The main constraints on the syntactic level that are needed to eliminate false parses are the well-known grammatical rules of agreement in number, case, etc. In addition, for applications, one needs sublanguage constraints to distin-
NATURAL LANGUAGE INFORMATION FORMATTING
127
guish the intended reading from among alternative correct syntactic parses. The Need for Global Routines
In applying constraints to putative parse trees the basic operation is to move a pointer from node to node in order to locate the nodes to which the constraint applies. These nodes are then tested to see if they have the desired attributes. By far the largest part of these procedures is devoted to locating the argument nodes of the test to be performed. Even the judicious use of registers does not eliminate the large number of treeclimbing operations that are required in order to cover all the different cases in all the restrictions. When these parse tree paths are specified in detail, the bulk and opacity of the grammar increases until it becomes clear that some more general solution is desirable. Our solution has been to define a set of global tree-climbing routines based on the linguistic relations of string analysis. The BNF definitions of the grammar are divided into types based on their role in the string grammar, and each type is written in a standard form. This results in a parse tree which has a modular structure. The global tree-climbing routines are then defined in terms of these node types and operate in a standard way in all modules of the parse tree. This eliminates the need to specify the parse tree paths in detail, since all restrictions can be expressed in terms of the relations defined for a standard module. Modular Structure of Parse Tree
The main types of nodes are LINGUISTIC STRING, ADJUNCT SET, and TERMINAL SYMBOL. A generalized parse tree module illustrating the relations of these types in the parse tree is shown in Fig. 10. A module has as its root a node of type LINGUISTIC STRING (Si). The B N F definition of a LINGUISTIC STRING consists of a sequence of required (such as the SUBJECT, TENSE, VERB, and OBJECT elements Eil.--Ei,, nodes of an ASSERTION), interspersed with optional elements of the type ADJUNCT SET (a in Fig. 10). ADJUNCT SET nodes may be empty (because adjunct string occurrence is optional) o r else they terminate in a terminal symbol X o r a literal W of the grammar, or, as shown in Fig. 10 under the first a , in a node of the LINGUISTIC STRING type. If the latter, then this node is the root of another module. Required elements Eij of a string S ioriginate branches which either terminate in a LINGUISTIC STRING type node (again, the root of another module) or, as is more frequently the case, a three-element sequence consisting
128
NAOMI SAGER
Node Types (word)
S linguistic string
E required element of linguistic string a adjunct set X terminal symbol
LINGUISTIC RELATIONS IMMEDIATE STHING
Si i s the immediate string of every node dominated by Si in the module Si has required elements Ei1,Eip,Ei3
COELEMWT
and adjunct elemente Eil has coelements Ei2 and Ei3
CORE
Eil has core Xil
LEFT ADJLTCT
Xil has left adjunct Si+2
RIGHT ADJUNCT
xil
Ei3 has core S1+4
has right adjunct Si+3
HOST STRING
S.
is the host string of Si+l
HOST
Xil
i s the host of Si+2,Si+3
FIG. 10. Parse tree module and linguistic relations.
of a TERMlNAL SYMBOL X, flanked on either side by ADJUNCT SET nodes, corresponding to the left and right adjunct sets of X. To illustrate how parse trees for text sentences are composed of modules, the ASSERTION subtree of Fig. 9 (up to the branch labeled * I *) is redrawn node-for-node in Fig. I I , deleting the node names and labelling those nodes which are of the types noted above. In practice, it has been convenient in some cases to define local variants of a terminal symbol (e.g., pronouns as variants of nouns), giving rise to a type of definition XVAR that is seen in the module in the position usually occupied by X. X or one of its local variants is then the value (i.e., immediate descendent) of the XVAR node. In the sentence of Fig. 9 there are no adjunct occurrences of the type that occur between string elements, and the nodes corresponding to these adjunct set positions are suppressed in the output. Thus no interelement a's are shown in the node-for-node module repre-
NATURAL LANGUAGE INFORMATION FORMATTING
129
I E12-
Ell-
I
E13
E14-a
s-XvAR13-a
'
I
I
I I I
(vord)
a-WAR-a
I
x11
I
I
i'
E21-
222
I
-E23I
a
(vord)
x33 (word)
&-xvAR34-*
I
x34 (word)
FIG. 1 I .
String modules in the parse tree (ASSERTION subtree of Fig. 9, relabelled).
sentation of the ASSERTION parse tree. Apart from this convention it will be seen that the actual parse tree is faithful to the modular representation. The node types in a module have a certain few relations in terms of which all linguistic constraints and conditions on transformations can be stated very succinctly. These relations are listed below the parse tree module in Fig. 10 and are illustrated by statements applying to the module. The relations are, in summary, that a linguistic string has ELEMENTSs, each of which is a COELEMENT to the others. Every string element has a CORE value which corresponds to the word or word-string (exclusive of modifiers) which satisfies that element in the sentence. In Fig. 10, the CORE values of E i l , Eiz, and Ei, are, respectively, Xtl, X i 2 , and Si+4(the numbering of S-nodes is arbitrary). In Fig. 9 the CORE of SUBJECT in the ASSERTION subtree is N @ d e n t ) . A CORE which is a terminal symbol can have LEFT-ADJUNCTS or RIGHT-ADJUNCTS or both. Usually the adjuncts are themselves linguistic strings, (e.g., Si+,, right adjunct of X i l in Fig. 10) and the CORE they adjoin is their HOST. Likewise, a whole linguistic string can have adjuncts (e.g., usually in Usually he goes alone) in which case the string it adjoins is its HOST STRING. All of the nodes in a module have the same IMMEDIATE STRING, the root node of the module.
-II all
130
NAOMI SAGER
Linguistic Routines
Corresponding to each of the relations described above is a routine of the grammar. When executed at a node N of the module the routine locates the node having the stated relation to N . Since the routines are written in terms of the node-types, they apply to any module. For example to find the CORE of any node N (except nodes of types LINGUISTIC STRING and TERMINAL SYMBOL for which the relation is not defined), the routine descends to the first node of type LINGUISTIC STRING or TERMINAL SYMBOL, not entering nodes of type ADJUNCT SET. By invoking the CORE routine at any element position one reaches the word (or word string) which centrally satisfies the element in the given sentence, regardless of how many intervening nodes may have been constructed in the parse tree or whether there are modifiers of that word in the sentence. Thus, for example, to check agreement between the subject and verb of a declarative sentence, the restriction, executed at, say, the VERB element of the ASSERTION string would invoke the CORE routine to reach the verb that carries the number attribute, the COELEMENT routine (with argument SUBJECT) to reach the SUBJECT node, and again the CORE routine at SUBJECT to reach the noun (or other core element) carrying the subject number attribute. It is then a simple matter to check that the attributes of the core words are compatible. Different restrictions test modules corresponding to different definitions of the grammar, and even a single restriction like subject-verb agreement applies to a variety of parse tree structures (e.g., different values of SUBJECT, the QUESTION versus ASSERTION string). Yet the same small set of routines suffices to locate the argument nodes for all cases of linguistic constraints and saves the user from having to specify the parse tree paths in detail. The definition of parse tree modules and the use of higher level parse tree routines based on the relations of node-types in the module greatly simplifies and shortens the parsing grammar. Its most important effect, however, is that it provides regular structures and global operators for the definition of yet more complex linguistic operations. For example, the expansion of conjunctional strings to full assertions (filling in implicit elements) is carried out in a general way based on the modular structure of the parse tree (Raze, 1976b). The expansion procedure locates an element in the host string which is to become the “filled-in” value of the corresponding element in the conjunction string. The ability to accomplish this task in all linguistic cases by means of a single procedure rests on the regularity of the representation of sentence structure. For another example, the denominalization transformation is a sizeable
NATURAL LANGUAGE INFORMATION FORMATTING
131
body of R L code with numerous calls on routines of the grammar. Without the aid of the generalized linguistic routines the code would be many times longer and much more difficult to comprehend. Under such circumstances it would hardly be possible to write a full transformational component of the grammar, nor to write sets of information formatting transformations for different sublanguages.
3.5 English Transformations The need for a further stage of grammatical processing after a parse is obtained arises because language provides alternative grammatical forms for the same substantive information. In order to obtain a uniform representation of the information, it is desirable to reduce the number of different forms which have to be dealt with and to maximize the repetition of standard types. The most common among equivalent forms, therefore, is chosen as the base form and the program is equipped with procedures which transform occurrences of the equivalent forms into the base forms. Transformational Decomposition
When the purpose of the sentence analysis is to obtain a decomposition from which tokens of a root word-form can be counted, then a relatively complete transformational analysis is required. The denominalization transformation, as well as many others, must be carried out in full. This is the case in preparing textual input for word cluster analysis as was described in Section 2.4. There, it will be recalled, the transformational decomposition was represented as a hierarchy of operators (mostly verbs and conjunctions) each operating on an n-tuple of arguments which were themselves either operators o r concrete nouns (Fig. 7). In this form the number of occurrences of each operator-argument pair can be counted, providing the basis for a calculation of similarity coefficients. An example of a computer output for this type of sentence decomposition is shown in Fig. 12. The sentence in Fig. 12, This results from the slowing of the influx of potassium into the cell is the same one (Sl) for which a tree was drawn in Fig. 7. In the computer output every operator with its arguments appears under a node ASSERTION; the operator appears first, as in Polish notation, under the node VERB, with the ordered arguments following, under the nodes SUBJECT, OBJECT, OBJECT... as needed. The indication that a transformation was performed is given by the presence of a T-node (“T” followed by the name of the transformation) dominating the structure which results from the execution of the transformation. In Fig. 12, the first such node is T-THIS, standing for a transformation which recognized the referential this (here in the subject position of results) and places a node N U L L N under
NAOMI SAGER l i ~ A i J S F O H M 4 T l O i c 4 L IlECOMP0SIlI0N TREE 1 ~ 4 1 5RESIILIS FROM THF SLOwl‘lG DF THE l N F L i J X OF POTASSIUM I N T O TWE C E L L .
SEYTEhCk b
CENTEP------E~DRbRti b
b
ASSERTICN
*.I
b
4
4 b
VERB---SUBJECl---OfiJ,.CT b
b
b
TV
T-THIS
PN
b b
b
b
bSTG
P-------”STGO
4
b
b
b
hlULLN
fROt;
T-UEflhilTt
&
b
HESLLTS
ASSERTIUII 4
VEU~---~UbJtCT*--OBJECT b
V
T-IlEF I f 1 1 TE
b
&
sLon
I-VV-ACT b
ASSERllON 1
VEqH---SUHJECT---ORJECT---nBJECT b
b
b
b
V
NSTT.
PY
PN
4
b
4
FLon
N
P-------NSTGO
b
POTASSIUM
INTO
T-OEFINITL 1
NSTG N
CELL FIG.12. Transformational decomposition output.
NSTG under SUBJECT to stand for the referred-to noun or nominalized sentence. Further down in the tree, there are two instances of a TDEFINITE transformation, which records the presence of a definite article, and one instance of a denominalizing transformation T-VN-ACT, which operates on nominal forms of verbs (VN) of the type which undergo “action nominalization.” ’I Aside from the form in which the operator-argument relations are presented in the output, the same operator-operand hierarchy which was pictured in the hand-drawn tree of Fig. 7 is seen in the computer output in Fig. 12 for the same sentence. In addition, in the computer decomposition a record is kept of the transformations which operated, since in some cases the transformation carries a certain increment of information.12 I I See Hobbs and Grishman (1976) for a description of “action” and “argument” nominalization types, and for references to the linguistic literature on nominalization. l2 One transformation which operated on the sentence of Fig. 12 is not represented by a T-node. That is the transformation which converted the slowing of the influx of porussium into /he cell into an ASSERTION with the verb sfow and the two arguments: (1) an unstated subject and (2) an object /he influx ofpotassium into /he cell. The latter is also transformed into an ASSERTION by the denominalizing transformation T-VN-ACT.
NATURAL LANGUAGE INFORMATION FORMATTING
133
Transformational Regularization
The transformational decomposition of sentences is in itself an informational representation of a very general kind. In science sentences it separates the description of the scientists’ activities from those of the objects of study and displays the predicational hierarchy. It does this for all sentences, without relying on prior knowledge of sublanguage word classes or sublanguage information structures. l 3 However, in information formatting, the transformational decomposition is not the final form of the analysis. Rather, we use transformations to regularize parse tree structures, without changing the information, so that the mapping of the text sentences into a more specific information structure can be done with uniform procedures. The main English transformations which have been needed in formatting applications to date are the conjunction expansion, relative clause expansion, passive + active transformation, and denominalization. The conjunction and relative clause transformations are important because they make explicit certain implicit word occurrences so as t o provide two full assertions in place of one assertion with adjoined material of a different form. Thus, the relative clause transformation expands a sentence like Was treated with erythromycin which she took f o r f i v e days to (in effect) Was treated with erythromycin such that she took erythromycin for$ve days. The second occurrence of erythromycin appeared in the original sentence in pronoun form in the word which. In cases where the relative pronoun (e.g., which) is not present, the relative clause expansion is triggered by the presence of adjuncts on the noun. Thus, in the sentence seen earlier, Patient was found to have sickle cell disease during first admission t o Bellevue f o r H . influenzae meningitis, the presence of adjuncts on the noun admission triggers the relative clause expansion, as though the sentence read Patient was found to have sickle cell disease during an admission which was the first (admission) to Bellevue for H. influenzae meningitis. Since the noun in this case (admission)is a nominalized verb, the final form of the expanded relative clause (as in the format of Fig. 17, below) will show admission in the verb position of the expanded relative clause due to the effect of denominalization. l4 In the medical information format of Fig. 2, the effect of the execution of the conjunction%xpansion transformation was seen in lines 7 and 8, But disambiguation may require the feedback of sublanguage selectional constraints. Currently, only those portions of the denominalization transformation that identify the arguments of the nominalized verb are executed; the nominal form of the verb is not changed into a verb. Thus, for example, admission remains admission in the format while its subject and object (i.e., of admit) are mapped into the correct argument slots. l3
NAOMl SAGER
where a single assertion containing a conjoined verb phrase (He was hospitalized and released) was expanded to two full assertions (He W Q S hospitalized and he was released) as a step towards obtaining two complete format lines. An example of the results of executing the passive + active transformation is seen in Fig. 17 below find in place of wusfound, with reversal of subject and object). The effect of the passive + active transformation is also seen in Fig. 2 , though there the original passive forms of the verbs are retained in the table for readability. 3.6 Formatting Transformations As previously outlined in Table VI, text sentences which are being information-formatted are first parsed using the LSP parsing program and English grammar (Sections 3.2-3.4). Then the parse tree structures are regularized by executing paraphrastic English transformations (Section 3.5). The regularized parse trees are then mapped into the information format for the given kind of textual material by means of a set of tree transformations that make use of the sublanguage word classes. These transformations will be described in this section. The final step (Section 3.7) is to normalize the formats by filling out format lines with word occurrences which can be reconstructed by reference to preceding formatted sentences (e.g., by resolving pronoun references). The data base is then ready for use (Section 4). Information-formatting techniques have been under development by the LSP for several years. Formatting transformations were first written for a corpus of natural language radiology reports (Hirschman Pt al.. 1976). The corpus consisted of 159 consecutive (i.e., not specially selected) follow-up X-ray reports on 13 patients who had had surgery for breast cancer. The reports contained a total of 188 sentential units ranging in complexity from short partial sentences (e.g., X-ruys n e g a t i w ) , to full sentences of moderate length (e.g., chest ,film 6-5 sltows enlargernc~t~l of lesion on right hilum since Jiltn of 4-17), with occasional very long runon sentences that could not always be handled by the parser. To parse the partial sentences, a FRAGMENT definition was added to the BNF grammar. The options of FRAGMENT cover the deletion forms found in notes and records, which, it turns out, are only a few types, mainly due to the dropping of the verb be or show or particular noun subjects (Anderson et al., 1975). With the addition of the FRAGMENT portion of the grammar and the formatting transformations, the program (parser + English transformations + formatting transformations) successfully formatted 176 of the original 188 sentences (94%). From the radiology data base, we moved on to consider the more
NATURAL LANGUAGE INFORMATION FORMATTING
135
complex material represented by hospital discharge summaries, the type of document shown in Fig. 1 . While this material contains a much greater variety of information and is more varied in style and content than the X-ray reports, it still has the restricted and repetitive features of a sublanguage. This has enabled us to formulate the word subclasses and syntactic regularities that are the basis for the information format for this type of document. An information format in its entirety is a schematic representation of all the possible grammatical sequences of sublanguage word classes which might be encountered in sentences of the subfield texts. As a rule, we reserve a separate column of the format for each major sublanguage word class, since each such word class has been found to correlate with a particular kind of subfield information. For example, we define a column DIAG to house diagnosis words (meuslrs, sicklp cell diseuse, etc.), another column S-S for sign and symptom words, and so forth. This elaboration of format columns facilitates later retrieval. In any one instance of mapping a sentence into the format, only a few of the defined columns will be filled. The formatting transformations use the same transformational mechanisms that were implemented for the English transformations in the LSP system. Because these mechanisms are designed to map trees into trees, the format is first created as a tree. After the sentences have been mapped into the format trees, the trees can be compressed and written out in the tabular form shown in Fig. 2 . FORMAT Tree for Discharge Summaries
An overview of the FORMAT tree for sentences of hospital discharge summaries (as of Fall 1977)is shown in Fig. 13. l5 An initial transformation sets up this structure and assigns the value N U L L to certain nodes, as shown. Later, a number of the NULL nodes are replaced by parse tree substructures that are transferred into the format by the formatting transformations. Only subtrees terminating in non-NULL nodes are printed in the formatting output. To explain the FORMAT tree a few preliminary remarks on the source of data are in order. We received our corpus of medical documents in machine readable form from the Pediatrics Service of Bellevue Hospital. The information system in use there at the time provided that each type of document in the system (about 70 types in all) would have preset l 5 Figure 13 was provided by Lynette Hirschman, who wrote and tested the formatting transformations for the hospital discharge summaries.
136
NAOMI SAGER
.the node D U W is replaced by one of i t 6 optlane, unless i t 1s empty. + nodes With + c4n have sister ncdes MODS and TIME
FIG. 13. Format tree for hospital discharge summary.
paragraph and subparagraph headings, within which any amount of free narrative was accepted (Korein, 1970). Correspondingly, the FORMAT tree developed for these documents has at the highest level a node for the paragraph heading (PARAGR) and a sibling node for all the data (DATA). Under DATA, there are four nodes: PATIENT, TREATMT, TR-STCONN and PATIENT STATUS. The material under these nodes identifies the patient (PATIENT), states medical actions taken in treating the patient (TREATMT), states a connection between the treatment given and the patient’s status (TR-ST-CONN), and describes the patient’s status (PT-STATUS). I n some sentences, all four nodes are represented, e.g., Patient nws hospitalized for meningitis; in others only PATIENT and TREATMENT (Patient has been followed in Hematology Clinic every three nzonths); in others only PT-STATUS (There is tenderness to pressure at the left knee laterally), etc. The substructure of each of the main nodes provides slots for all the words in occurrences of the given type, following the syntactic order of
NATURAL LANGUAGE INFORMATION FORMATTING
137
transformationally regularized sentences. Thus, under TREATMT, the node sequence is INST (medical institution personnel), VERB-MD (medical action verb), MEDIC-TR (medication or treatment). These nodes, together with the PATIENT node, which has been moved to the front, correspond to a SUBJECT + VERB + OBJECT + MEANS sequence of a prototype sequence “Physician treats patient with X.” Variants of this basic form occur frequently in the material. Examples of the substructure of the TREATMENT portion of the FORMAT tree are seen in the formatting outputs in Fig. 14. The format for the upper sentence, Treatment of ampicillin 200 mgikg I . V . was given shows the VERB-MD slot filled by give (with PAST tense information given under TIME) and the MEDIC-TR slot filled by both a treatment verb (in this case the nominalized verb treatment) and a medication noun (ampicillin, under MED), associated with a. dose (under RXDOSE). The format for the second sentence in Fig. 14, A transfusion of 100 cc packed red blood cells was given for extreme anemia shows MEDIC-TR under TREATMT filled by transfusion with its associated information. It also illustrates the use of TR-ST-CONN (for)and PATIENT STATUS. Here the nodes under PATIENT STATUS indicate that a FINDING of a QUALitative type, namely a Sign or Symptom (S-S: anemia) was recorded, associated with QUANTitative information of a NON-NUMerical type (extreme). Further options of PATIENT STATUS are illustrated in Fig. 15, by the formatting output for the sentence There was mild mucous discharge from the nose, lungs were clear, grade 2 over 6 systolic murmur along the lefr sternal border. This sentence was parsed into two ASSERTION’S and a FRAGMENT, and each part was mapped into a FORMAT tree. All three FORMAT trees contain entries for BODY-PART (nose, border, lungs).l6 The first FORMAT shows there be under the EVIDential node of MODS, associated with the FINDING, whereas in the third FORMAT the verb be appears in the VERB-BE-SHOW slot connecting the BODYPART entry (lungs) to the FINDING entry (clear). The way be is formatted in the two cases accords with the two syntactic roles of be in the l6 A more detailed treatment of along the left sternal border would take sternal (+ sternum) as the entry under BODY-PART, associated with locative information along the lej’i border. Words which are syntactically connected to the central word under a format node are placed in associated LEFT-ADJUNCT or RIGHT-ADJUNCT slots according to their position vis a vis the central word, (e.g., in the first FORMAT, mild mucous as LEFTADJUNCT of discharge). But it should be noted that LEFT-ADJUNCT and RIGHTADJUNCT in the format have been defined in a manner slightly different from their use in the string grammar, e . g . , in the same FORMAT, the LEFT-ADJUNCTfrom the on discharge.
NAOMI SAGER
138
b
b
b
b
PAST
2 n ii
MG / K G
1 . v ~
sentence. The FINDING node in the first and third FORMAT trees is, as in Fig. 14, of the QUALitative type; in the first format it contains a Sign-Symptom word (discharge) and in the third a more neutral DESCRiptive term (clear). The FINDING node of the second FORMAT tree has a combined QUANT QUAL value, with systolic murmur under QUAL and grade 2 over 6 under QUANT. FORMAT trees can be connected to each other via a CONNECTIVE node, which was not shown in Fig. 13. The connective may be one of several types. One possibility is a conjunction, such as when and and as shown in Fig. 2 . Another possibility is a relational verb, such as confirm, as shown in the formatting output in Fig. 16, for the sentence Impression
NATURAL LANGUAGE INFORMATION FORMATTING
*
EXDSP 1. 1. 5 LUhGS CERE CLEAR
139
T H E R E b A S P I L D M U C O b S D I S C H A R G E F R O M THE N O S E # 2 DVER 6 S Y S T O L I C MURMUR ALONG THE L E F T S T E R N A L BORDER.
, GRADE
FIRMAT 4
DATA
c PT-STATUS 6
FIlrDIkG
B9DY-PART---VEPB-BE-SHOY--------
+
c
LUNGS
V-BE-SHGW---TIflE
c
z c
6
c
c
c
4
c
c
c
BE
PUAL
c
DESCR c CLEAR
V-TENSE
c PAST
FIG. 15. Formatting output showing IT-status subtree of format.
of meningitis was confirmed by finding cloudy cerebrospinal fluid. Here the CONN node (confirm) joins the FORMAT tree forfinding cloudy cerebrospinal fluid to the FORMAT tree for impression of meningitis. The arguments of CONN are written in Polish notation. Also in Fig. 16, the formatting of cloudy cerebrospinalfluid illustrates the LAB option of
140
NAOMI SAGER
PT-STATUS (containing the laboratory material cerebrospinal fluid”’) associated with the QUALitative LAB-RESult (cloudy) under FINDING. Other possible values for the CONNECTIVE node that links FORMAT trees include the marker of an embedded construction (EMBEDDED) and the marker of a relative clause (REL-CLAUSE). Both of these connectives are seen in Fig. 17, the formatting output for the sentence whose parse tree was exhibited in Fig. 9. The first CONN node in Fig. 17 (value EMBEDDED with value in turn NTOVO) joins the FORMAT that contains the medical action verbfind (V-MD) to the conjunection of two FORMAT trees that contain the rest of the information in the sentence. The CONN node in this latter case has the value REL-CLAUSE with value EXPAND-REFPT which indicates that a word which is a time reference point in the first FORMAT under CONN (here, the word admission under REF-F’T) is expanded in the manner of a full relative clause in the second FORMAT under CONN. Thus, in the case of Fig. Presently, as a short cut in retrieval for the frequent case where culture is omitted from cerebrospinal fluid (or C S F ) culture positive (or negative, etc.), these BODY-PART words are also classed as LAB words, and are put in the LAB column if no other test-word is given.
HlPDS 2 . 1 . 7 CELL
SIONIFICANT
-
PAST tiISTIIRY P A T I E N T b A S FOUNil Tn H 4 V F S I C K L E TI1 B k L L E V U E F D H H . I v F L L I F I ~ Z A ?~ t ' l l " l I T l S
DISE4St U U U I N G F I R S T A D U l S S l O N
i
RFL-CLAUSF b b b b
PATIFNT
EXPAYO-REFPT b
HS Tb
- -- N
4 TE CDYT
.
L
OVAL
D
V-qD
b
AD*ISSIO
b
1
L
b
b
EMWDDED
b l E L I F I i 4 N T P A S T ++ISTUMV
b
I4TOVO
-
A nr I
s s 1ON
b
TREATUT
.
V-rU---TI?F F 1 ~ 3
b
V-TEYSE b
PAST
FIG. 17. Formatting output for sentence of Fig. 9.
A
0 Z
NAOMI SAGER
142
17, admission appears a second time (in the last FORMAT tree), this time not as a time reference word but as a medical action verb, under VMD, indicating that a medical event (the first admitting of the patient to Bellevue) is also noted in the text. Further details about both CONN and time reference words arise in connection with format normalization, described in Section 3.7 below. A Formatting Transformation
Both English transformations and information formatting transformations are written in the Restriction Language. RL statements may be in declarative or command syntax, or a mixture of both, as the following fairly typical medical formatting transformation illustrates. T-DIAG = I N LNR, N N N , LAR, LARI: IF BOTH CORE IS H-DIAG A N D $CHECK-COOC THEN ALL OF $PUT-IN-DIAG, STFORM-LADJ-RADJ, $ERASE, 0 EMPTY $PUT-IN-DIAG = BOTH FIND QUAL ~ 3 IS AND BOTH REPLACE VALUE OF x30 BY (DIAG) xSO (PRESENT-ELEMENT-) AND AT VALUE OF x50 STORE IN x51. $CHECK-COOC = IF ASCEND TO LN + THEN HOST-IS NOT H-TEST OR H-RX. The T-DIAG transformation transfers words that are in the hospitalrecords class for diagnosis words (H-DIAG) from stated positions in the parse tree to the DTAG column of the medical information format, provided certain conditions are met. The transformation is executed at the nodes in the parse trees which are named following the word IN in the first line of the transformation: T-DIAG
=
IN LNR, N N N , L.AR, 1-ARI:
LNR and NNN are noun-centered structures, LAR and LARl are adjective-centered structures. The transformation states that if the core of the wbtree dominated by the node in question is in the class H-DIAG and if the check on co-occurrence constraints ($CHECK-COOC) for words in this class succeeds, then three steps are carried out: (1) the subtree whose cure is H-DIAG is cupied into the appropriate slut in the format ($PUT-IN-DIAG); (2) the left and right adjunct subtrees of the core are transformed (STFORM-LADJ-RADJ); and (3) the parse tree structures of which copies have been made are erased from the parse tree ($ERASE). The latter two procedures (not defined here) are global substatements used in many transformations whereas the $CHECK-COOC
NATURAL LANGUAGE INFORMATION FORMATTING
143
procedure has local scope and is specific to the particular transformation in which it occurs. In this case the intent of the co-occurrence check is to prevent the placing of a diagnosis-type word in the DIAG column if that word is functioning as a modifier of a test (H-TEST) or treatment (H-RX) word, rather than as a stated diagnosis, as in, e.g., sickle cell preparation, ruberculosis immunization. The test is simply stated in RL: I F ASCEND TO LN + THEN HOST- IS NOT H-TEST OR H-RX.
Translating, this states that if an ascent to L N can be made, which indicates that the H-DIAG noun or adjective is occurring as a modifier of another noun, then the noun which is modified should not be an HTEST or H-RX word, as in the above examples. The path from the node L N to the noun which is to be tested is given by the routine HOST-.lX The steps in transferring words from the parse tree to the information format, in this case via the procedure stated in $PUT-IN-DIAG, are illustrated in Fig. 18, which shows the successive transformations of the relevant portion of the FORMAT tree for the sentence in Fig. 17. The QUAL node under FINDING in the FORMAT tree initially has a N U L L value (Fig. 17a). The parse tree for this sentence, shown previously in Fig. 9 has a subtree (Fig. 17b) to which the transformation applies. The first part of $PUT-IN-DIAG FIND QUAL x30 IS EMPTY calls on a FIND routine with argument QUAL which searches the FORMAT tree for QUAL. The remainder of the statement causes QUAL to be stored in register X30 and tests that the node is empty. The statement REPLACE VALUE OF x30 BY (DIAG) x50 (PRESENT-ELEMENT-) I * The HOST- routine is distinguished from the HOST routine by the fact that it does not invoke an automatic re-execution of the restriction or transformation on conjoined structures as would otherwise be the case due to a device called STACKING (Raze, 1976a). Every routine R in the grammar which calls on the STACK operator has a nonstacking counterpart routine R-. The STACKING device is what makes it possible to obtain correct parses for sentences containing truncated conjunctional strings due to ellipsis. In order to obtain a correct parse the truncated string should be expanded so that restrictions can be executed on it, but it is inefficient, if not impossible, to expand all putative parse trees containing conjunctional strings before a correct parse of the sentence is obtained. The STACKING device is an answer to this dilemma. It causes restrictions to be executed as though the conjunctional string were expanded, without restructuring the parse tree, and leaves pointers which make a later expansion (of the correct parse) a relatively straightforward matter.
NAOMI SAGER
144 (A)
FIkDING t Q;U
NULL
t 4
t 4 f 4
+ PN t
*l*
SICKLE CELL DISEASE
FIG. 18. Operation of a formatting transformation.
causes the FINDING subtree of FORMAT to appear as in Fig. 17c. The node DIAG has been created and appears in place of N U L L under QUAL; its value is a copy of the node located by the routine PRESENTELEMENT- in the parse tree, which in this case is the LNR node shown in Fig. 17b. Register assignments are also made in this and the subsequent statement for use in the two procedures which conclude the transformation. The procedure $TFORM-LADJ-RADJ transforms the L N and RN nodes of the LNR structure (the RN prepositional phrase in *1* becomes a TIME entry), and $ERASE erases the LNR structure from the parse tree since all its parts have been transferred to the FORMAT tree. The final form of the FINDING subtree is shown in Fig. 17d, and
NATURAL LANGUAGE INFORMATION FORMATTING
145
as it appears in the output (suppressing certain nonessential nodes) in Fig. 17e. 3.7 Format Normalization
The last step in preparing a structured data base from input texts is to complete each line of the format table (each FORMAT tree) with information which can be supplied by reference to other format lines of the data base. For example, in the data base of radiology reports referred to earlier, each format line had slots for the type of test, the location of the test, the date of the test, the medical finding, etc. In formatted sentences where only the finding slot was filled, but the preceding sentence of the report named both the test and a finding (e.g., Chest films 10-15 R L L infiltrate clear. LUL scarring still present.), it could be inferred that the format line for the second sentence should contain the same test information as its predecessor (e.g., in this case, chest films 10-15 to be associated with LUL scarring still present as well as RLL infiltrate clear). Also, in the radiology data base, missing test locations and test dates could sometimes be supplied from the context of formatted sentences. The procedure for filling out the format in this way was called format normalization and is described in application to the radiology data base in Hirschman and Grishman (1977). In the data base of hospital discharge summaries, the major format normalization problem is to assign to each format line a unique event time. This is important not only because the time of occurrence of an event is essential medical information but because in searching the data base we must be sure not to consider a reference to a previously noted event to be the record of a second event of the same kind. For example, if the report notes that a transfusion was given and later describes events that were “prior to the transfusion,” we must not mistakenly take the two mentions of transfusion to indicate that there were two transfusion events. In this example, the referential status of transfusion in prior to the transfusion is clearly indicated by the presence of the definite article (the = the previously mentioned), and the occurrence of the word in a time prepositional phrase tells us explicitly that it is being used as the time reference point for another event. In other cases, the analysis is more complicated. Representation of Time in the FORMAT
Time expressions can be divided linguistically (with some overlap) into those that specify or refer to the location in time of an event (e.g., yesferduy, 10-3-66, on admission) and those that describe the time aspect
146
NAOMI SAGER TABLEIX
STRUCTUREOF EVENT-TIME TPREPl NUM
TIME-UNIT TPREPZ REF-PT
I
i on
i one
I I
I I
I
I I
I
I
1
since
for
I several I
ladmission prior to [admission I I I 10-24-72 1 at [present before fvisit (the visit to the Emergency Room) I at :age 2 years until :age 5 I
1 days
la
1 month
I
I
I i I
I I during
fladmission (the first admission)
of the event, i.e., its duration, frequency, way of beginning or ending, etc. Accordingly the TIME portion of the medical information format (bottom section, Fig. 13) consists of an EVENT-TIME node and an ASPectual node, with several sibling nodes that contribute to either or both types of information (V-TENSE, CHANGE, REPetition). TIME subtrees can be attached to any node which carries a verb (V-MD, VPT, V-BE-SHOW, V-TEST, TR-ST-V, and CONN in certain cases) and independently to FINDING nodes (LAB-RES, S-S, LAB-RES, DIAG). The TIME subtree is only created when a time expression is present. Within EVENT-TIME the ordered slots correspond to the word order in the most common form of event-time expression, the prepositional phrase, e.g., on admission, one day prior to admission, etc. As illustrated in Table IX, the noun under REF-PT provides the reference point; the immediately preceding preposition (TPREP2) tells whether the event time is before, at, or after the REF-PT time; the sequence preceding this preposition (if present) provides data for a quantitative adjustment of the REF-PT time in the direction given by TPREP2. Time Normalization Problem
The time normalization problem is twofold. Given a complete statement of the event time in terms of a given nonreferential REF-PT the event time which is expressed in words should be translated into an absolute time. This is not difficult when the REF-PT is the current admission, discharge, or hospitalization, since these dates are supplied in the document header. Nevertheless, the adjustment of the REF-PT in accord with an analysis of the prepositional phrase(s) in EVENT-TIME is not trivial.
NATURAL LANGUAGE INFORMATION FORMATTING
147
By far the harder problem is to assure that every format line has a REF-PT and to find the antecedent REF-PT when the given REF-PT is referential. The algorithm which is being implemented19 has the following general features. It proceeds in a forward direction from format line to format line to determine the correct event-time for each format line. It first examines the EVENT-TIME entries and if sufficient information is present performs the necessary calculations. If no REF-PT in EVENTTIME is given, the AGE column in PT-DATA is examined; e.g., in the occurrence a well developed, well nourished one-year old female, the time associated with the information well developed, well nourished is given indirectly by one yeur old. If no AGE entry is present, the algorithm looks at TENSE, since some tenses (e.g., perfect, future, progressive) provide time information of a general kind. Then PARAGR is examined, since some paragraph headings give time information (e.g., EXAMINATION ON ADMISSION, STATUS AT DISCHARGE).20 If the event-time cannot be established by the above steps, then preceding format lines must be consulted. If the REF-PT entry of a format line is empty then, again, the paragraph that the format line occurs in is important. In narrative paragraphs, such as HISTORY or COURSE IN HOSPITAL, it can be assumed that unless specific time information is given, the time of each successive statement is the same or later than that of its predecessor (ignoring intervening subordinated format lines). In the case of subordinated format lines, e.g., those which are conjoined to a previous format line by a relative clause connective, it may be assumed that the event time of the subordinated line is the same as that of its host line unless contrary information is given. Where the time is given relative to a particular event (e.g., at admission, two weeks following the last transfusion) we try to establish a (more) exact time by locating in one of the preceding format lines the original mention of the event referred to. Since time normalization proceeds on the individual format lines in a forward direction, this original mention will have had its time normalized. There will therefore be a time associated with this event that can be used to establish the time of the subsequent format line in which the reference occurs.21 I8 The design of the time normalization algorithm described here and its implementation are the work of Lynette Hirschman. *O In EXAMINATION and LAB paragraphs, the tense employed does not give time information. 2L Although one would expect in normal discourse not to find a reference to an event without there having been a prior mention of the event in the discourse, this sometimes occurs in this material; e.g., posffrunsfusion hemufocrif, with no earlier mention of trunsfusion. Fortunately, such occurrences are rare.
148
NAOMI SAGER
When the reference is to the current admission, hospitalization or discharge, there is no need to search for the original mention since the exact dates for these events are given in the header information. A search for an antecedent is required in the case where the REF-FT word is not one mentioned in the header information, or in the case where such a word is mentioned but the adjuncts of the word indicate a hospital stay other than the present one (e.g., the previous hospitulizution, the udmission ,for septicemia). The adjuncts of the event in the time expression determine the search strategy. There are four principal cases: ( I ) nth: The adjunct or the associated REPT (repetition) column is an ordinal:first, second, 3rd, etc. The search for the nth event begins at the start of the subsection o r paragraph where the reference occurs. It counts mentions of events of the appropriate type until it gets to n. It makes use of the adjuncts and the normalized time on each mention to distinguish mentions of the same event from all others, and to aid in counting. (2) Same: The adjunct or associated TPREP2 of EVENT-TIME has a word indicating “sameness” of reference, e.g., this trunsfusion, the saww visit. We search backward to the most recent mention of this event (where the time has already been normalized). (3) Previous: TPREP2 of the associated EVENT-TIME has a word of type “previous” in it (previous iidmissiun, the earlier transfusion 1. We search backwards for the most recent mention of the appropriate event that is not a mention of the same event: *‘same” events will have the same normalized time, whereas distinct events will show a difference in time (at least relative to other events). (4) Subsequent: TPKEP2 of the associated EVENT-TIME has a word of type “subsequent” in it (the next admission, the repeat determinution). The time information that these words provide is that the event occurred uf1c.r a previous occurrence. Therefore we search (as in (3)) for the previous event and note the time as subsequent to this eventunless we have a future tense or a section heading indicating a future event (PLAN O N DISCHARGE, RETURN APPOINTMENT, etc.)-in which case we mark the time as post dischcirgc.
Once the type of search is determined (nth, Same, Previous, Subsequent) and the word to be searched for is identified,“ the search for an antecedent begins. When a candidate word is found it must be established
** Details
of the identification procedure are omitted here
NATURAL LANGUAGE INFORMATION FORMATTING
149
that the word is not negated or otherwise modified in a way that disqualifies it as an antecedent. Therefore each time the procedure finds a format node with the word (or synonym of the word) that it is searching for, it checks that the word is not in the scope of negation (NEG in MODS on the noun or on an associated verb) and that it is not in the scope of a "nonreal" marker (e.g.,future, possible in INDEF in MODS). Finally the procedure checks that the time of the proposed antecedent is consistent with the time information established so far: e.g., Format F , has TIME 1 , associated with it; format F,,, has a REF-PT (e.g., previous transfusion) in its TIME slot; the antecedent is in F,-,, with Time t o . To be consistent with the information so far, t o It , (otherwise F,-, has a time later than F,, a very unlikely occurrence in a linearly progressing narrative). These checks ensure that the event chosen as an antecedent will have the appropriate properties: it refers to the desired type of event; it will be unnegated, real, and have a time associated with it that is consistent with the accumulated time information so far.
3.8 Performance In this section some preliminary results of the automatic formatting of discharge summaries will be presented. Table X summarizes the performTABLEX SUMMARY OF PRELIMINARY RESULTS Parsing results Number of sentences OK for formatting Wrong parse No parse Average parsing time Formatting results Correct format Wrong format No format
1st 5 documents 261 88% 4% 8% 3.5 sec 1st
2nd 4 documents
I98 9wo 5%
5% less than 1.5 sec
S documents 85% 0% 159%
ance of the parsing and information-formatting components of the program on the first small set of documents received from Bellevue Hospital.23The nine documents were divided into two sets. The first five were 23
The parsing results in Table X are from Insolio and Sager (1977).
150
NAOMI SAGER
used mainly as a source of sentences to test and debug the additions to the system for medical records. The second set was run in a more routine manner, in batches, as a test of the system, though still on a very preliminary basis. The top half of Table X gives parsing results. The category “OK for formatting” arises because the test of success in parsing in this application is precisely that the parsing output should be such that when the English and formatting transformations operate on it they produce correctly formatted sentences. Thus, some parses which are syntactically correct but do not display the intended reading of the sentence (e.g., a prepositional adjunct is shown as modifier of a near noun when in the intended reading it should be a modifier of a more distant noun or of the verb or of the whole sentence) are nevertheless adequate input to the transformations, since the transformations compensate for a certain number of such syntactic variations. The denominalization transformation, for example, hunts for its arguments in all possible neighboring adjunct slots of the norninalized verb and thus compensates for the ambiguity of adjunct PN’s. Also, the formatting transformations require that words which are to be moved from the parse tree to particular format slots have particular sublanguage attributes, thereby supplying in effect a selectional ( i . s , semantic) constraint on the parse tree output. As a result of this interdependence of components, demands on the parser are to some extent lessened. However, because the parsing is only part of a larger process, there is a contrary pressure to produce a useable output as a first parse. (The figures in Table X are for the first parse only.) Formerly we were satisfied if the intended analysis was among the first three parses obtained. But in information-formatting, we want to avoid having to return to the parsing stage to obtain a second or third parse once the formatting has begun. We therefore attempt to order the options of the grammar and add selectional constraints so that a useable analysis is obtained as the first parse. In unedited note-based material there are further difficulties in obtaining a correct parse due to the variety of incomplete sentence constructions and the many functions of the comma: punctuation, conjunction, end-of-sentence marker, deletion-marker, and sometimes no function at all, leading to misinterpretation. Special constraints of various kinds are needed, and a certain failure rate is to be expected and should be used to signal the need for either editing or manual analysis. In a routine application of the program to documents, one would expect to coordinate document analysis with data capture so as to avoid spending program time on problems that are not germane to the narrative analysis. In
NATURAL LANGUAGE INFORMATION FORMATTING
151
medical records this means mainly establishing conventions for punctuation and for reporting laboratory data and medications. With regard to the formatting results, a relevant question is whether failure signifies inability to format a sentence or the obtaining of a wrong result. At a later time a reliability index, such as the ratio of incorrect to correct analyses, can be measured. At this stage, since we are aiming at a zero or near-zero value of such an index, every wrong result is used as a signal to tighten the formatting constraints. In fact, there have been few such cases, partly because the parser screens out certain kinds of deviant sentences, and partly because the formatting transformations have been purposely written with tight constraints from the start. With regard to processing times, figures are presented in Table X as a part of the report on processing results, but it should be borne in mind that the programs exist in a research environment and have not been designed for efficient routine operation. The figures on processing times thus give little indication as to what could be achieved with an optimized, application-tailored system, and, like the other numerical data in Table X , are offered in the spirit of informing the reader of the current state of an experiment rather than as a measure of performance of a finished system, it being too early in the development of the system to make such a test. With all the limitations on the data of Table X noted above, one conclusion stands out: Automatic information-formatting can be done. A crucial fact is that the program and grammar were not written for these particular documents, nor even, except for a relatively small addition to the system, for this subject matter or type of document. The English parsing grammar, for example, which is the crucial element in the successful processing, was implemented (in three successive versions) and used to parse texts well before its application in information-formatting was envisioned. Of course, many problems remain to be solved, and the application to a new subject area would require a considerable toolingup effort. Nevertheless, the novel notion that freely written text in delineated subject areas can be automatically converted into a structured data base is being demonstrated by this experiment.
4. Applications
If indeed it is possible to obtain structured data bases from textual material then the potential for new computer applications is large. In this section a few applications are sketched with reference to how they would
152
NAOMI SAGER
be realized on a data base of computer-formatted medical records. Analogous functions for other types of documents can be inferred. Thus, quality assessment by computer is illustrated below by the application of criteria for the evaluation of medical care to information-formatted hospital documents, but it is equally possible to imagine quality assessment being carried out on formatted documents in a different subject area by the application of criteria that are appropriate to that data base. In all the applications, the essential point is that the information in sentences of the documents is arranged in labelled columns of a table in such a way that assertions of a factual kind can be recognized (rather than the occurrence of a term apart from its context) and that the cooccurrence of particular features can also be tested for, permitting complex informational queries to be answered. A retrieval program operating on formatted discharge summaries, for example, can establish not only that a given diagnosis word was mentioned, but that a diagnosis was asserted. And it can, if requested, also supply the age at which the diagnosis was made, the method by which the diagnosis was confirmed, the symptoms that were manifested, etc., all by reference to the entries in the relevant columns of the format.
4.1 Fact Retrieval In the information field, the ability of a system to supply a user with specific information in response to an information request, as opposed to a complete document or citation that may be relevant to the request, is sometimes referred to as “fact retrieval” in contradistinction to “document retrieval” or “bibliographic retrieval.“ By linking the natural language information-formatting system described above to a retrieval program that can search the format columns of the data base for particular combinations of entries, we achieve a fact retrieval program that operates on natural language information. Such a program might be used interactively to obtain quickly some particular item of information from a document somewhere in the data base. Or it might be used retrospectively along with a counting and summarization procedure to obtain a profile of the contents of the documents in the data base with respect to particular variables. Examples of applications of this type have been demonstrated using the data base of automatically formatted radiology reports referred to earlier (Hirschman and Grishman, 1977; Sager e f M I . , 1977). A natural language question-answering program that operates on this type of data base is also being developed (Grishman and Hirschman, 1978).
NATURAL LANGUAGE INFORMATION FORMATTING
153
Reuse of Information
A hypothetical situation illustrates an important feature of the information-formatted data base. Suppose, as has happened a number of times in recent medical history, some element in the past environment, treatment or habits of patients later becomes suspect. It could be desirable to search retrospectively in a given population for those whose medical history might have mentioned such facts, even though the facts may not have been considered particularly important at the time they were recorded. Such items might be the taking of a particular drug (e.g., a fertility drug considered harmless at the time), employment in a particular industry, amount of smoking, or the like. The purpose of the search might be to identify the high risk population so that an attempt could be made to contact the individuals, or it might be to establish a correlation of the implicated factor with later symptoms, or to provide background for a policy decision such as whether a mass screening of a segment of the population would be worthwhile. What is significant here is that the perspective with which the recorded facts are viewed has changed. Most systems for storing textual information are based on a selection of content items which serve to identify the source documents and provide the points of access to the documents in the future. In a data base built up for research purposes, these selected content items may actually constitute the coded data base, i.e., they may be all that was asked for on a check list, or all that was kept when information was extracted from a set of source documents. In either case, there is very little room for reinterpreting the data, for accessing it from a different point of view, or, in economic terms, for amortizing the data base by making the information available for reuse in different contexts. In contrast, the data base built up by information-formatting has the feature that it is neutral with regard to the purpose for which the information is to be used; and it is relatively complete with regard to the information originally contained in the source documents. Though it would undoubtedly be expensive to create such a data base on a large scale, that expense would have to be weighed in any real situation against the expense of acting without the information that such a data base could supply. Case Matching
Given a large enough corpus of information-formatted medical documents, another application which would make use of the refinement in
154
NAOMI SAGER
search criteria that is possible with such a data base is to locate cases which can serve as a control population in clinical studies where ethical considerations preclude setting up a controlled experiment. It might be possible to match cases in the data base with regard to a sufficiently refined set of features so that for each patient who is being treated in a given manner in the existing study a “similar” patient who was not treated in the given manner can be located to serve as a control. Again, here, as in the former sketched application a large data base is assumed. While this may be far in the future, the example illustrates how a data base of information-formatted documents can be used retrospectively in different ways for different purposes. 4.2 Quality Assessment
As an example of fact retrieval from formatted discharge summaries, the LSP is engaged in showing that some types of questions that arise in assessing whether proper medical care was administered in a hospital can be answered by computer programs operating on information-formatted discharge summaries (Sager and Lyman, 1978). While there is discussion in the medical community as to what criteria should be used to evaluate care (see, e.g., McDermott, 1975; Brook, 1977), certain elementary questions regarding the procedures that were carried out and the reasons for doing them are likely to be asked in many evaluation frameworks. To the extent that these questions could be answered automatically by reference to the formatted documents, the human investigator would be freed from routine screening of documents to turn attention to cases that are not of a routine nature. The automation of the steps in a health care audit of hospital documents could also contribute to the study of different sets of criteria for evaluating the process of health care in relation to the outcome of the process in terms of changes in the patients’ health. The number of cases that could be examined would be increased, and once the data base was established, the different sets of criteria could be applied, at least in part, simply by changing the retrieval queries applied to the data base. Applying Audit Criteria to the Data Base
Table XI shows the first 3 of 14 audit criteria for meningitis and/or septicemia in sickle cell disease, prepared for use in the Performance Evaluation Procedure {PEP) forms of the Joint Committee on Accreditation of Hospitals (“PEP Primer,” 1974). A LISP program is being written by the LSP to apply the full set of criteria to information-formatted discharge summaries. It is not difficult to see how the retrieval
NATURAL LANGUAGE INFORMATION FORMATTING
155
TABLEXI EXCERPTED AUDITCRITERIA Diagnosis of meningitis I.
Positive CSF culture (or A + B) A. Admission history contains all of the following: ( I ) Fever (2) Stiff neck (3) Vomiting or headache B. First CSF shows 2 of the following: (1) Positive smear (2) WBC greater than IO/cmm (3) Glucose less than 30 mg% (4) Protein greater than 40 mg% Diagnosis of septicemia
2.
Positive blood culture Diagnosis of sickle cell disease
3. One of the following: (1) Positive sickle cell preparation (2) Hemoglobin electrophoresis = HgS (3) Statement in history “known sickler” or equivalent
procedure works if one keeps in mind the format structure for discharge summaries, described in Section 3.6 above. To illustrate, the logic for applying the criteria for a diagnosis of meningitis (item 1 of Table XI) will be described. According to the criteria in Table XI, a diagnosis of meningitis is considered to be correctly established if there was a positive CSF culture or if the criteria stated in A and B under 1 are met as indicated. To establish that a positive CSF culture was reported, the retrieval procedure searches the LAB column (under FINDING under PATIENT STATUS) for CSF or cerebrospinal fluid, and if it finds such an entry, looks under the QUAL node of the same FINDING for LAB-RES, which should contain positive or a synonym of positive in this context. (The synonyms to look for are supplied by medical personnel in the form of directives to the clerks who presently carry out the manual procedure.) The list of synonyms of positive in this case includes the word cloudy. Thus, for example, the formatted sentence in Fig. 16, which shows LAB = cereb r u s p i ~ f f ~ f l uand i d LAB-RES = cloudy, is an example of a formatted sentence that meets the test for a diagnosis of meningitis, up to this point. If the test of the LAB and LAB-RES columns in FINDING is suc-
156
NAOMI SAGER
cessful, two more tests must be made. It must be established that the finding is not negated, and that the time of the finding is within the present hospitalization. The information as to whether the finding is negated or not is found in the columns NEG and MODAL under MODS. The entries in NEG indicate negation (no, not, etc.) and those in MODAL some indefiniteness (e.g., impvcssiori). Hence to establish a definite finding both these nodes should be empty. Syntactically, negation and modal information may attach to the finding itself or to a verb governing the finding (e.g., It was n o t esrahlished that . . . , The culture did n o t grow out anything). Therefore the MODS subtrees in PATIENT-STATUS or associated with V-MD are all examined for the presence of NEG or MODAL entries. To establish that the time of the finding is within the hospitalization covered by the discharge summary the EVENT-TIME entry in the TIME subtree is examined. As in the case of the MODS, above, the TIME subtree may be associated directly with the finding (in PATIENT-STATUS) or may be associated with a verb (V-MD). The time normalization procedure will have supplied an EVENT-TIME entry to every format line during the last step in preparation of the data base (Section 3.7). For example in the formatted sentence in Fig. 16, where no event-time is stated, the fact that the sentence occurs in the paragraph COURSE IN HOSPITAL (indicated by the first two letters CO of the serialization) provides the normalization procedure with a default event-time entry “during this hospitalization.” Thus, in regard to criterion I of Table XI for the diagnosis of meningitis, the retrieval procedure finds that the sentence whose formatting output is shown in Fig. 16, gives a positive answer: a positive CSF culture was reported, with no negation or doubt expressed, having been obtained at a time within the hospitalization described in the document. An alternative basis for the diagnosis of meningitis is given in 1A and 1B of Table XI, since the lack of a positive CSF culture does not rule out the possibility of meningitis. Criterion IA states characteristic symptoms and 1B states significant laboratory findings. These are treated by the retrieval program in a similar manner to the CSF-culture criterion, above. In the case of lA, S-S (Sign-Symptom) and BODY-PART columns are examined for the appropriate entries; then MODS and TIME are examined, the MODS to check that there is no negation or doubt (as above) and the TIME, in this case, to test for a time “at admission” (or within 48 hours of admission). The criteria in I B are applied by first testing the columns LAB and QUAL: LAB-RES (in a manner similar to the case above) for the qualitative finding I B( I ) “CSF shows positive smear,” or LAB and QUANT
NATURAL LANGUAGE INFORMATION FORMATTING
157
for the quantitative findings IB(2)-(4) regarding the white blood cell, glucose, and protein determinations. Note that the criterion specifies “first CSF.” To establish that a given CSF finding is “first” the search strategy for the nth event, described in Section 3.7 above, is applied to the document. 4.3 Data Summarization
Since, after information-formatting, the information in a document collection is arranged in columns that are labelled as to the type of information they contain, it is possible to count the number of occurrences of the different types of information and generate statistical summaries covering the contents of the data base. In situations where each document corresponds to a case treated by the institution (as in medical records) such summaries could provide those who are responsible for policy decisions with a current and cummulative profile of the types of cases that are entering the institution and the outcome of actions taken in each type of case. In the case of a hospital record system such data might help to identify patient management problems and could be used to provide a summary of the hospital’s experience as background for staff conferences and other activities in the area of continuing medical education. Continuing medical education, i.e., formal, systematic and on-going review of diagnosis and treatment and preventive health care-has become an important element in all efforts to improve the quality of medical care. Government legislation and regulations require that hospitals receiving Federal reimbursement for patient care monitor the quality of care rendered and take appropriate action when deficits are identified. One such action is to hold a teaching session o r Hospital Staff Conference. Teaching conferences commonly have a standard format: one or more case presentations, a review of the hospital’s own experience in the subject of the conference, and a discussion which draws heavily on the experience of others as reflected in the medical literature. The use of a case presentation as a focal point for review seems well established in medical training programs. There is a heavy burden of work on the persons conducting the conference, even for the specialist who is concentrating his reading in relatively small areas and keeping records of cases which have come to his attention. Most often it is a brief, intensive effort and rarely includes a complete review of the hospital’s own cases. The number of such formal conferences that can be given appears to be as much limited by the work involved as by the time available to the staff for attending conferences.
158
NAOMI SAGER
In this context, the availability of an automated method for analyzing patient case reports could make a significant difference both in the number of staff conferences that would be possible and in the relevance of the contents of the conference to ongoing patient care in the hospital. Suppose, for example, that a conference is to be devoted to the treatment of bacterial meningitis. Discharge summaries for cases of bacterial meningitis treated in the hospital over a certain period could be informationformatted and a summary of features of these cases could be generated in response to questions posed by the person preparing the conference. Such questions might include: How many patients with this diagnosis stayed longer than two weeks? In this set, what were the signs or symptoms or additional diagnoses reported? In how many cases did the CSF culture show no growth? Of those patients who did not have positive CSF cultures, what were the symptoms on admission? How many patients were on antibiotics before diagnosis was established? For how many days? How many patients received antibiotics intravenously (for how many days?) and how many orally? How many patients were released with no residual sequelae? These questions, and others, can be answered from the formatted discharge summaries, by a retrieval program similar to the one developed for quality assessment (Section 4.2). The answers to the questions can then be tabulated to provide a summary of the hospital’s own experience in the form of a table as a basis for discussion at a staff conference or other forum. The hospital’s performance can then be compared with that of other institutions as reported in such sources as “Medical Clinics of North America,” and with up to date clinical review articles in the area. It is even conceivable that questions that arise during such a conference could be answered if an on-line query capability were added to the fact retrieval and data summarization system. 4.4 Data Alert
If the information-formatting process is further developed so that it can be performed routinely within 48 hours of the receipt of the source document a valuable service would be to alert the data collector of deficiencies in the source document. In many research situations much good data must be discounted at the end of the study because certain items are found to be missing which invalidate the inclusion of the case in the final summary. For example, in a long-term drug study if the patient was to have responded at periodic intervals to a particular question, but at various times in the study no response to the question was recorded, this may invalidate the patient’s inclusion in the final tally of
NATURAL LANGUAGE INFORMATION FORMATTING
159
responses. In a large expensive long-term study, it might be worthwhile to monitor the incoming data soon after it is recorded so that there is still time to contact the subject (or in cases not involving persons, to perform the test or whatever was omitted) so as to remedy the deficiency. The economic saving in terms of research time and effort that would otherwise be wasted might be far greater than the cost of automatically converting the documents into a structured data base by the formatting program. Once the information is formatted, it would be straightforward to detect missing format entries and issue an alert. It has also been pointed out that a periodic summary of results to date over the whole study population, which could also be generated from the formatted documents, might indicate desirable changes in research strategy that otherwise might not emerge until the end of the study. ACKNOWLEDGMENTS
I wish to acknowledge the contribution of Dr. Margaret Lyman to the research on processing hospital records. In particular, the medical applications sketched in Section 4 are based on suggestions of Dr. Lyman, who also answered many questions throughout the study. This research was supported in part by NIH Grant LM02616 from the National Library of Medicine, and in part by Research Grant SIS-75-22945from the National Science Foundation, Division of Science Information. I am grateful to Edmond Schonberg for helpful comments on the manuscript. REFERENCES Anderson, B., Bross, I. D. J . , and Sager, N. (1975). Grammatical compression in notes and records: Analysis and computation. A m . J . Comput. Linguist 2, No. 4. Bloomfield, L. (1926). A set of postulates for the science of language. Languagc 2, 153164.
Bloomfield. L . (1933). “Language.” Holt, New York. Brook, R. H. (1977). Quality-Can we measure it. N. EngI. J . Mrd. 296, 170-172. Bross, I. D. I . , Shapiro, A., and Anderson, B. 8. (1972). How information is carried in scientific sublanguages. Scic,nc,e 176, 1303- 1307. Charniak, E., and Wilks, Y. (1976). “Computational Semantics.” North-Holland F’ubl., Amsterdam. Chomsky, N. (1957). “Syntactic Structures.” Mouton, The Hague. Chomsky, N. (1964). “Aspects of the Theory of Syntax.’’ MIT Press, Cambridge, Massachusetts. Chomsky, N. (1972). “Studies on Semantics in Generative Grammar.” Mouton, The Hague. Chomsky, N. (1975). “Reflections on Language.” Pantheon Books. New York. Damerau, F. J . (1976). Automated language processing. Annrr. Rev. /nf. Sc,i. 7 c ~ h n o I .11, 107- 161. Fitzpatrick, E . , and Sager, N. (1974). The lexical subclasses of the linguistic string parser. A m . J . Comput. Linguist. 2, (microfiche). Grishman, R. (1973). The implementation of the string parser of English. I n “Natural Language Processing“ (R. Rustin, ed.), pp. 90- 109. Algorithmics Press, New York.
160
NAOMI SAGER
Grishman, R. (1975). "A Survey of Syntactic Analysis Procedures," Courant Computer Science Rep. No. 8, New York University, New York. Grishman, R., and Hirschman, L. (1978). Question answering from natural language medical data bases. Artif: Intell. 7 (in press). Grishman, R.. Sager, N., Raze, C., and Bookchin. B . (1973). The linguistic string parser. Pros(wli~igsI973 Nut. C'omput. Conf. pp. 427-434. Harris, Z. S. (195 I ) . "Methods in Structural Linguistics." Univ. of Chicago Press, Chicago, Illinois. Harris, Z. S . (19S7). Cooccurrence and transformation in linguistic structure. Lunguugc 33, 283-340. Harris, Z. S . (1968). "Mathematical Structures in Language." Wiley (Interscience), New York. Harris, Z. S. (1970). "Papers in Structural and Transformational Linguistics." Reidel Publ., Dordrecht. Netherlands. Harris, Z. S. (1976). A theory of language structure. Am. Philos. Q. 13, 237-2.5.5. Harris. Z.S. (1959). "The 1959 Computer Sentence Analyzer," Transf. Discourse Anal. Pap. IS- 19. Department of Linguistics, University of Pennsylvania, Philadelphia. (Reprinted in part in Harris. 1970.) Hill. D. R. (1971). Man-machine interaction using speech. A h . Cornput. 11, 166-230. Hirschman, L., and Grishman, R. (1977). Fact retrieval from natural language medical records. f n "Medinfo 77" (D. B. Shires and H. Wolf, eds.), pp. 247-251, North-Holland Publ., Amsterdam. Hirschman, L . , Grishman, R., and Sager. N. (1975). Grammatically-based automatic word class formation. inJ Procc,ss. Meinuge. 11, 39-57. Hirschman, L . , Grishman, R., and Sager, N. (1976). From text to structured information: Automatic processing of medical reports. AFIPS N o t . Cornput. Conf: Pro(,.45, 267-275. Iiobbs, J . , and Grishman, R. (1976). The automatic transformational analysis of English sentences: A n implementation. I r i t . ./. Compur. Muth. 5, 267-283. Insolio, C . , and Sager, N. (1977). Parsing free narrative. Anriu. Mee.1. A.YS(JC. Cornput. Lirrgitist. Grorgc,!o\c~riUr7iv.. Washirigron, D.C. Jespersen, 0. (1914- 1929). "A Modern English Grammar on Historical Principles." (Reprinted, Allen Unwin, London. 1961.) Joos, M., ed. (1957). "Readings in Linguistics." American Council of Learned Societies. New York. Josselson. H. H . (1971). Automatic translation of language since 1960: A linguist's view. At/\'. C ' O I I ~ / I U / 11, . 2-58, Keyser, S. J., and Petrick. S. R . (1967). "Syntactic Analysis," Rep. AFCPL-67-0305. Air Force Cambridge Research Laboratory, Bedford. Massachusetts. Korein, J. (1970). The computerized medical record: The variable field length format system and its applications. I r i "Information Processing of Medical Records" (J. Anderson and J . M. Forsythe. cds.), pp. 259-291. North-Holland Publ., Amsterdam. Kuno. S . . and Oettinger. A. C ; . (1963). Multiple-path syntactic analyzer. I f ! "Information Processing 1962" (C. M. Popplewell. ed.), pp. 306-312. North-Holland Publ., Amsterdam. Lancaster. F. W. (1977). The relevance of linguistics to information science. Pro(,. IY76 FIDII.L) Wrtrkshoip Lirtp,isr. InJ Sci. In!. F d . Doc. Fid 51, 19-43. 1.ehmann. W. (1978). Machinc translation. Iri "Encyclopedia of Computer Science and Technology" ( J . Belzer. A. G . Holzman, and A . Kent, eds.). Vol. X . pp. 1.51-164 Dekker. New York.
NATURAL LANGUAGE INFORMATION FORMATTING
161
Lindsay, P. H., and Norman, D. A. (1977). “Human Information Processing.” Academic Press, New York. McDermott, W. (1975). Evaluating the physician and his technology. Daedalus Winter, pp. 135- 157. Otten, K . W. (1971). Approaches to the machine recognition of conversational speech. A d v Comput 11, 127-163. “PEP Primer” (1974). 4th Ed. Joint Commission on Accreditation of Hospitals, Chicago, Illinois. Petrick, S. R. (1975). On natural language based computer systems. IBM J . Res. D e v . 20, 314-325. Plath, W. J. (1975). “The REQUEST System” IBM RC 5604. IBM T. J. Watson Research Center, Yorktown Heights, New York. Raze, C. (1967). “The FAP Program for String Decomposition of Scientific Texts,” String Program grep. S.P.R. No. 2. Linguistic String Project, New York University, New York. Raze, C. (1976a). A compoutational treatment of coordinate conjunctions. A m . J . Comput. Litzguisf. 52, (microfiche). Raze, C. (1976b). “The Parsing and Transformational Expansion of Coordinate Conjunction Strings,” String Program Rep., S.P.R. No. 11. Linguistic String Project, New York University, New York. Reeker, L. H. (1976). The computational study of language acquisition. A d v . Comput. 15, 181-239. Sager, N. (1967). Syntactic analysis of natural language. Adv. Compuf. 8, 153- 188. Sager, N. (1968). “A Computer String Grammar of English,” String Program Rep., S.P.R. No. 4. Linguistic String Project, New York University, New York. Sager, N. (l972a). Syntactic formatting of scientific information. Proc. Fall J t . C o m p u f . Con,f.. 1972. AFIPS Cimf. Pror.. 41, 791-800. Sager, N . (1972b). A Two-stage BNF specification of natural language. J Cvbfm. 2.39-50. Sager, N . (1973). The string parser for scientific literature. I n “Natural Language Processing” (R. Rustin, ed.), pp. 61-85. Algorithmics Press, New York. Sager, N. (1975). Sublanguage grammars in science information processing. J . A m . Soc. Inf: Sc.i. 26, 10-16. Sager, N. (1977). Information structures in the language of science. In “The Many Faces of Information Science,” AAAS Selected Symposium 3 (E. C. Weiss. ed.). pp. 53-73. Westview Press, Boulder. Sager, N . , and Grishman. R. (1975). The restriction language for computer grammars of natural language. Commun. ACM 18, 390-400. Sager, N., and Lyman, M. (1978). Computerized language processing: Implications for health care evaluation. M r d . R r c . Ncit3.r (JAMRA), Vol. 49, No. 3 , pp. 20-30. Sager, N . . Hirschman. L . . Grishman. R., and Insolio, C. (1977). Computer programs for natural language files. Proc. 40th Atiiiu. M r c j ~ A . m . Sot,. Itif. Sc,i,, ASIS. Wushitigtori. 0 . C . (in press). Salkoff, M., and Sager, N . (1969). “Grammatical Restrictions o n the IPLV and FAP String Programs,” String Program Rep., S.P.R. No. 5. Linguistic String Project, New York University. New York. Sapir, E. (1925). Sound patterns in language. Language 1, 37-51. Schank. R. (1975). “Conceptual Information Processing.” North-Holland Publ., Amsterdam and Elsevier, New York. Simmons. R. F. (1970). Natural language question answering systems: 1969. Commun. AC’M 13, 15-30.
NAOMI SAGER Thompson, F. B., and Thompson, B. H. (1975). Practical natural language processing: The REL system as prototype. A d v . Cornput. 13, 110-170. Walker, D. E. (1973). Automated language processing. Annu. Rev. Inf. Sci. Techno/. 8, 69-119.
Walker, D. E. (1975). “Speech Understanding Research,” SRI Annu. Tech. Rep. Waltz, D. J . (1977). Natural language interfaces. ACM SIGART N e w s / . 61, 16-65. Winograd, T. (1972). “Understanding Natural Language.” Academic Press, New York. Woods, W. A., Kaplan, R. M., and Nash-Webber, B. (1972). “The Lunar Sciences Natural Language Information System; Final Report,” BBN Rep. 2378. Botl Beranek and Newman, Cambridge, Massachusetts. Zwicky, A., Friedman, J., Hall, B., and Walker, D. (1965). The MlTRE syntactic analysis procedure for transformational grammars. Proc. AFIPS Fa// J t . Cornput. Con$. 1, 3 17.
ADVANCES IN COMPUTERS. VOL . 17
Distributed Loop Computer Networks * MlNG T . LIU Department of Computer and Information Science The Ohio State University Columbus. Ohio
1.
Introduction . . . . . . . . . . . . . . 1.1 Distributed Processing Systems . . . . . . 1.2 Local Computer Networks . . . . . . . . 1.3 Loop Computer Networks . . . . . . . . Message Transmission Protocols a n d Formats . . . 2.1 Centralized-Control Loop Networks . . . . . 2.2 Newhall-Type Loop Networks . . . . . . 2.3 Pierce-Type Loop Networks . . . . . . . 2.4 T h e Distributed Loop Computer Network (DLCN) Loop Interface Design . . . . . . . . . . 3.1 Specifications of' the Loop Interface . . . . . 3.2 Architecture of the Loop Interface . . . . . 3.3 Extensions t o the Loop Interface . . . . . . Network Operating System Design . . . . . . . 4.1 Requirements of the Network Operating System . 4.2 Interprocess (hmmunication . . . . . . . 4.3 Remote Program Calling . . . . . . . . 4.4 Generalized Data Transfer . . . . . . . . 4.5 Distributed Resource Management . . . . . User Access and Network Services . . . . . . . 5.1 A Network ( h m m a n d Language . . . . . . 5.2 A Distributed Programming System . . . . . 5.3 A Distributed Data Base System . . . . . . Perforniance Studies . . . . . . . . . . . 6.1 Analytical Comparison of Three Loop Subnetworks 6.2 Simulation Results o f Three Loop Subnetworks . 6.3 Perforniance Study o f the Whole DLCN System . Conclusion . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
2.
3.
4.
5.
ti .
7.
*
. . . . . . .
.
164 164
.
.
.
.
. . .
. . .
. . .
. . . . . . . . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
165 166 169 170 171 173 173
.
.
.
.
.
.
178
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. 179 . 179 . 182 . 183 . 185 . 187
. . . . . . 190 . . . . . . 192 . . . . . . 193 . . . . . . 195 . . . . . . 196 . . . . . . 199
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
. 202 . 206 . . 207 . . 210 . . 211 . 215 .
. . . . .
216
This work was supported in part by the National Science Foundation under Grant No .
US NSF MCS.7723496 .
163 CopyrightQ 1978 by Academic Press. Inc All rights of reproductmn in any form reserved
ISBN 0-12-012117-4
164
MlNG T. LIU 1. Introduction
The field of distributed processing and computer networking is growing at a very rapid rate within the industry, government, and university communities. While distributed processing systems may differ from computer networks both in perspective and in environment, they do have some common characteristics. There are several motivations for the development of distributed processing systems and computer networks and notable among them are as follows: ( 1)
(2) (3) (4) (5)
modularity and structural implementation, high availability and greater reliability, improved throughput and response time, load leveling and resource sharing, and distributed data processing and computing.
This article presents both an expository survey and research results on local computer networks in general and loop computer networks in particular. Different types of local loop computer networks are surveyed along with typical design problem areas, such as the loop interface design, message transmission protocols, the network operating system, the network command language, the distributed programming system, the distributed data base system, and performance studies. Though attention is focused mainly on the design and operation of the network itself, the formation of a unified distributed processing system from a network of separate computer systems always remains the ultimate objective. Since the loop topology is adopted for the communication subnetwork and control is fully distributed among the nodes of the network, the resulting system is called a distributed loop computer network. 1.1 Distributed Processing Systems
Distributed processing systems are still evolving, so some confusion may exist as to what they are, what they do and how best to design them (Jensen, 1975; Liebowitz and Carson, 1977; Stankovic and van Dam, 1977). The confusion stems partly from the fact that both physical and functional separations satisfy the adjective “distributed.” Moreover, to some, the term “distributed processing” has become synonymous with terms like “parallel processing” or “concurrent processing.” Basically, a distributed processing system may be considered as an interconnection of digital systems, each having certain processing capabilities and communicating with others through exchange of messages. In general, what people call “distributed systems’’ may vary in character
DISTRIBUTED LOOP COMPUTER NETWORKS
165
and scope from each other. So far there is no accepted definition and basis for classifying these systems. To distinguish this class from conventional architectures, Enslow (1978) has used the term “distributed data processing system” and defines its members as having the following five essential characteristics:
( I ) A multiplicity of general-purpose resource components, including both physical and logical resources, that can be assigned to specific tasks on a dynamic basis. Homogeneity of physical resources is not essential. ( 2 ) A physical distribution of these physical and logical components of the system interacting through a communication network. (A network uses a two-party cooperative protocol to control the transfers of information.) (3) A high-level operating system that unifies and integrates the control of the distributed components. Individual processors each have their own local operating system, and these may be unique. (4)S y s f e m transparency, permitting services t o be requested by name only. The server does not have to be identified. (5) Cooperative autonomy, characterizing the operation and interaction of both physical and logical resources. As also pointed out by Enslow, there are very few systems that currently meet all of the criteria for a distributed data processing system. Systems that are excluded are multiple processor systems with shared memory, intelligent terminal systems connected to single host CPUs, single CPU systems with independent I/O processors, etc. 1.2 Local Computer Networks
The increase use of data communications over the past several years has spawned a vareity of computer networks to support requirements for distributed processing (Abramson and Kuo, 1973; Davies and Barber, 1973; Greenberger et al., 1974; Schwartz, 1977). The success of national and international networks of large-scale computers in promoting the benefits of program sharing, data sharing, resource sharing and load sharing is well known (Combs, 1973; Kimbleton and Schneider, 1975; Pouzin, 1973; Roberts, 1977; Wood, 1975). In particular, the pioneering efforts of the ARPANET in providing the cost and performance advantages of computer-communication networks are widely acclaimed in the literature (Kahn, 1972; McQuillan and Walden, 1977; Roberts, 1973; Roberts and Wessler, 1970). Just as computer networks have grown across continents and oceans to interconnect major computing facilities around the world, they are
166
MlNG T. LIU
now growing down corridors and between buildings to interconnect miniand micro-computers in offices and laboratories. Such local networking environments are frequently found in many industrial, commercial, and university areas, with applications ranging from simple time-sharing services to complex data base and management information systems, transaction processing, process control, and distributed processing. Thus, some important technology trends are of interest in the application of computer networks serving these localized communities. However, it is doubtful that the techniques learned from implementing national and international large-scale networks like the ARPANET will be fully applicable to local distributed minicomputer networks, considering for example that a single IMP (interface message processor) in the ARPANET is more expensive than many complete minicomputer systems (Heart et af., 1970). It is thus obvious that new approaches must be developed for local computer networking. The network structures used for local networking must obviously be simple and very economical, yet need to be fairly powerful and flexible. In recent years many local computer networks have been developed, using different topologies (star, hierarchical, or loop), control disciplines (centralized or distributed), and communication media (cable, twisted wire pair, or radio). Notable among these local networks are the ALOHANET (a packet radio network) (Abramson, 1970), the ETHERNET (a branching broadcast network) (Metcalfe and Boggs, 1976), SPIDER (a centralized-control loop network) (Fraser, 1975), DLCN (a distributedcontrol loop network) (Liu and Reames, 1975), MISS (a three-level hierarchical network) (Ashenhurst and Vonderrohe, 19751, the NWU minicomputer network (a star network) (Lennon, 1974), etc. As pointed out by Metcalfe and Boggs (IY76), differences among these systems are due to differences among their intended applications, differences among the cost constraints under which trade-offs were made, and differences of opinion among researchers. It is strongly felt that the distributed-control loop network is particularly well suited for meeting local networking requirements (simple, economical. powerful, and flexible) and for forming a distributed processing system, as explained below (Reames, 1976). 1.3 Loop Computer Networks A loop network consists of a very high-speed (1 to 10 Mbps), digital communication channel (a twisted-wire pair or TI -based carrier system) that is arranged as a closed loop. Computers, terminals and other peripheral devices (called components) are attached to the loop channel by
DISTRIBUTED LOOP COMPUTER NETWORKS
.
167
7
Computer
High-speed D i g i t a l Communication Channel
1
I: I n t e r f a c e
Computer
FIG. 1 . A loop computer network.
a standard hardware device called a loop interface (see Fig. 1). Messages (programs, data, commands) are multiplexed onto the loop in the form of addressed blocks of data called frames. If a message is to be sent from a local to a remote component, the local interface forms the frame, giving the address of the destination interface, and multiplexes the frame onto the loop. Each interface downstream receives this frame, checks its destination address, and immediately relays it back onto the loop if the proper destination has not been reached. When a receiving interface does recognize its o w n address as the destination of a n incoming frame, it receives the frame from the loop and delivers the message to its local attached component. Notice that the frame can be removed from the loop either by the receiving interface upon reception or by the sending interface upon return, depending on the design. Thus, for use in a distributed processing system of the kind envisioned in Section 1 . 1 , a loop computer network is very attractive for a number of reasons as explained below:
( I ) The problem of message routing in the communication subnetwork completely disappears, as there is only one path for a messsage to follow in reaching its destination (Heart et a / . , 1970; Davies and Barber, 1973). Thus, a transmitter need not know the location of its receiver. Also, broadcast message transmission is very easy to achieve with this kind of
MlNG T. LIU
network, since a message can be copied by every interface as it passes through and can then be removed when it returns to its transmitter. This will facilitate interprocess communication by name, as shown in Section 4.2. ( 2 ) The problem of congestion control in the communication subnetwork also disappears, since a message frame waiting in a buffer of a loop interface can only be multiplexed onto the loop when the portion of the loop channel it needs is free and unused. Thus, various techniques of congestion control for other types of networks such as ARPANET and TYMNET are not needed at all for loop networks and deadlocks resulting therefrom are also avoided (McQuillan and Walden, 1977; Schwartz, 1977). (3) Data transmission on a loop network is usually digital and at A verb high rate. I his is true because the rclatively short distances involved involved allow digital transmission to be used over cables or T1-type telephone lines. Not only are expensive modems (data sets) not needed for every component attached to the loop, but digital transmission eliminates unnecessary data conversions and means that the typical rate is 1-10 Mbps. Thus transmission times on a loop network are very short, even when large amounts of data are transmitted. (4) Mainly because of items (1) through (3) above, the functions required of the loop interface are very simple, thus making construction of it very inexpensive (typically under $500). This being the case, it is therefore economically and technically feasible to attach all sorts of components directly to the network through individual loop interfaces and controllers. Thus, not only can computers be attached to the network, but also terminals and special I/O devices may be directly connected. For the same reasons, additional loop control functions may be incorporated in the interface hardware that must be otherwise performed less efficiently elsewhere, as shown in Section 3 . (5) Finally, the net result of items (3) and (4) above is that the total initial cost of a loop network is quite small and is directly proportional to the number of loop interfaces that are needed. This is t h e case because control of the network is completely distributed, and there is no large centralized computer that niuct be purchased. Similarly, later expansion 01 the netuor h h adtliiig iiioir interfaces is also very simple arid inexperisive. In summary, a loop network offers easy message routing, simple node interfacing, high data rate, and low construction and expansion costs. The major advantage of the loop network, however, is the ease with which it lends itself to distributed control. With a distributed-control loop
DISTRIBUTED LOOP COMPUTER NETWORKS
169
communication subnetwork as its backbone, design and implementation of a loop computer network as a distributed processing system is considerably easier.
2. Message Transmission Protocols and Formats
The loop is a random access channel shared by many nodes. There is a high probability that several different nodes will simultaneously have messages to transmit. Thus, some means must be found to resolve the resulting contention for services. As mentioned in Section 1.3, each node is connected to the loop via a loop interface that manipulates the transmission of outgoing messages and the extraction of incoming messages, according to some given message transmission protocol. In recent years a variety of message transmission protocols for use with loop networks has been studied (Fraser, 1974; Hafner, 1974; West, 1972). Some involve the use of time-division multiplexing with time-slots made available for transmitting messages. Time-slots may be fixed or variable length, and may be permanently assigned (static assignment) o r temporarily assigned upon request (demand assignment) to the nodes. Control of time-slot assignment and message traffic on the loop may be performed entirely by one central control node (centralized control) o r distributed among all nodes on the loop (distributed control). In a centralized-control loop network, there is a single primary control node communicating with the remaining secondary nodes on the loop. Messages are transmitted over the loop directly between the control node and secondary nodes, but not directly between secondary nodes. If a secondary node has a message to sent to another secondary node, the message must first be sent to the primary node, and then relayed or switched by the primary node to the destined secondary node. On the other hand, in a distributed-control loop network, all nodes on the loop are of equal status and any node can communicate directly with any other node over the loop. This difference in control disciplines manifests itself in the layout of the message formats. In centralized-control loop networks, only one address field per message is needed to identify the secondary node (which may be the message transmitter or receiver, depending on the direction of data flow), since the single primary node is implicitly known to be the other address (see Fig. 2a). In distributed-control loop networks, however, two address fields are required with every message, one to identify the transmitter of the message and the other the receiver (see Fig. 2b).
170
MING T. LIU
2.1 Centralized-Control Loop Networks
A very simple way of implementing a centralized-control loop network is to use synchronous time-division multiplexing on the loop that connects several secondary nodes (terminals) to a primary node (central computer). Each secondary node is permanently assigned a time-slot and must hold its message until the assigned time-slot arrives. If the secondary node has no message to send when the assigned time-slot arrives, the time-slot is unused and thus wasted. In general, this static assignment scheme requires less complex control circuitry in the loop interface. However, it uses the transmission loop less efficiently, thereby resulting in very low channel utilization. Notice that in this scheme, the single address field in Fig. 2a may be dropped, since the time-slot is permanently assigned by the control node to a secondary node. For example, the IBM 2790 data communication system, designed for linking many computer terminals to a system controller, uses a transmission loop that is shared by static assignment of time-slots (Steward, 1970). This loop operates at 514.67 kbps on a twisted-wire pair with repeaters every 1000 ft. This is essentially a data collection system wherein all data go to or come from the central controller. Another example is the Collins C system that operates at a bit rate of 32 Mbps (Newhall and Venetsanopoulos, 1971). This system is intended primarily for short range communications between processors and shared peripherals. Another way of implementing a centralized-control loop network is to use demand assignment of time-slots, which are assigned by the control node to secondary nodes upon demand or request. Since the flow of data into and out of a computer is “bursty” as observed by Fuchs and Jackson (1970) and by Jackson and Stubbs (1969). dynamic assignment of timeslots on a demand basis increases the loop channel utilization drastically.
DISTRIBUTED LOOP COMPUTER NETWORKS
171
However, the frequent need to switch bandwidth between the nodes in a dynamic scheme can complicate the control circuitry in the loop interface. An example of demand assignment sharing is found in the Weller loop that operates at 3.3 Mbps with fixed packet size of 35 bits (Weller, 1971). In this loop the control node polls the secondary nodes to find which one is requiring service and assigns time-slots accordingly. The Weller loop is in effect an extended circular I/O bus for a time-sharing computer. It provides a convenient means of providing computing support to laboratory equipment in varigus parts of one building. The most sophisticated centralized-control loop network that has been implemented thus far is probably SPIDER, operating since 1972 at Bell Laboratories’ Murray Hill, New Jersey, location (Fraser, 1975). SPIDER is a small switched data communications system that links 11 computers (four PDP-l1/45’s, two DDP-516’s, two DDP-224’s, one H-6070, one PDP-8, and one PDP-I 1/20). Each computer (called a terminal) connects to SPIDER through a terminal interface unit (TIU) which itself contains a microcomputer. Terminals are connected to a central switching computer (TEMPO I) by a transmission loop operating at 1.544 Mbps with fixed-length messages of two sizes: 32 bits and 304 bits. All traffic passes through the switching center even when messages are being transmitted between two terminals on the loop. Every message carries a 7-bit terminal address and 1 bit to indicate direction of transmission (to or from the switching computer). Packets are routed by the switching computer according to information provided in a separate call establishment procedure. Cooperation between the microcomputers in the TIUs and the central switching computer effects traffic control, error control, and various other functions related to network operation and maintenance. 2.2 Newhall-Type Loop Networks
Pioneering work in loop computer networks was carried out at Bell Laboratories by Farmer and Newhall (1969) who proposed and constructed an experimental distributed loop switching system. Designed to handle “bursty” computer traffic, their loop connected two DDP-516 minicomputers, a PDP-9 minicomputer, a disk, a card reader, a plotter, and a TTY by a 3.156 Mbps communiccation channel that was formed from a shielded twisted-wire pair. To resolve contention for services, a control token is passed around the loop in a round-robin fashion from interface to interface. Only the interface currently in possession of the control token is allowed to transmit messages of arbitrary length onto the loop; the other interfaces are allowed only to receive during this time.
172
MlNG T. LIU 6 SOH
4
VARIABLE LENGTH
OP CODE
DATA
6 FROM
TO
NODE ADLRESSES 1) TFIM 2) TSR
3)
EMFt
4) BDR 5 ) RYE 6) BCF 7) RTR 8 ) NFJY
I
LOOP ACCESS CONTROL
- Transfer t o Business Nachine - TBM and 5end iceady t o Keceive
-- Enter "Where-To" Holding Register Enter "Op-Code" Holdine Register - "Are You Busy?"
- Binary Control Function -- Ready t o Receive Not BUSY 9 ) END - End Transmission 10) SET - S i x t e e n B i t Transfer 11) HCF - Mnemonic Control Function FIG.3. Newhall message format.
When transmission is finished, the control token is passed to the next interface downstream, thus giving it the chance to transmit. This is an example of distributed-control loop networks, since there is no single central node that controls the access to the loop channel. In the Newhall loop, the message frame starts with SOM (start of message), followed by the header and data of variable length, and ends with EOM (end of message) (see Fig. 3). The single bit that follows the EOM and precedes the SOM of the next message is the control token that controls the access to the loop. When a loop interface finishes sending a message, it affixes the control token at the end of the message to denote that it has completed transmission and passes the control token on to the next interface downstream. If this node has a message to transmit, it removes the control token and proceeds immediately to send its message; otherwise, it simply passes the control token to the next interface downstream. In order to provide a reliable system, a supervisor node (not used for loop access control) is included in the loop to perform such functions as loop synchronization, traffic overflow monitoring, detection of lost messages, and reinitiation of the loop operations in the event of errors. The supervisor node can also serve as a stroage and gateway to other loops. There are two other loop networks that also use the control-token passing scheme for loop access control. One is the distributed computing system (DCS) developed at the University of California, Irvine (Farber et al., 1973). In this system five minicomputers (three SUE and two
DISTRIBUTED LOOP COMPUTER NETWORKS
173
VARIAN 620/i) are connected to a loop operating at 2.3 Mbps. The system is used primarily for software development and for document production. The other is the MININET developed at the University of Waterloo (Manning and Peebles, 1977). A two-host implementation based on PDP-11 minicomputers is now operational. The network is designed as a turnkey system for transaction processing. 2.3 Pierce-Type Loop Networks
Also at Bell Laboratories, Pierce (1972a,b) extended the loop concept to include a hierarchy of interconnected loops for serving the entire country. Only one local experimental loop of the Pierce scheme was implemented by Kropfl (1972) and Coker (1972). Two laboratory minicomputers (DDP-516) have been connected to the local loop for a variety of on-line applications in speech analysis, synthesis, and perception reseach. In the Pierce loop, communication space on the loop channel is divided into an integral number of fixed-size slots (522 bits per slot). Messages are also divided into fixed-size packets (522 bits per packet), so that a packet can fit into a slot on the loop channel. Each slot contains one bit to indicate whether the slot is empty or filled with a packet. The loop interface simply waits for the beginning of any empty slot and fills it with a packet. The single control bit is set to “filled” by the interface as it multiplexes a packet onto the loop, and is set to “empty” by the interface when it removes a packet destined for it from the loop. This is another example of distributed-control loop networks, since there is no single control node that controls the access to the loop channel. In addition to the local source and destination addresses, regional and national addresses are provided in the message format (see Fig. 4) to take care of a hierarchy of networks of many loops. As in the Newhall loop, a supervisor node (not used for loop access control) is required to perform such functions as loop synchronization, detection and removal of lost messages, and other maintenance procedures. 2.4 The Distributed Loop Computer Network (DLCN)
Conceived as a means of investigating fundamental problems in distributed processing and local networking, the distributed loop computer network (DLCN) is envisioned as a powerful distributed processing system that interconnects midi-, mini-, and microcomputers, terminals, and other peripheral devices through careful integration of hardware, software, and a loop communication channel. The network is designed in such a manner that its users will see only a single, integrated computing
MlNG T. LIU
174
SYNC
CNTRL
To
FROM
M
FROM
DATA
ux)P HOG PREVENTION
LOOP SLOT VACANT OR NLG 00 Block Vacant 10 Block Full 01 block F u l l and Passed Once
11 Block Full and Passed Twice
facility with great power and many available resources without being aware of the system's actual organization and method of operation. Implementation of an experimental prototype is currently underway at The Ohio State University. The initial network consists of six nodes, interconnecting an IBM 370/168, DECsystem-10, PDP-8/S, PDP-9, PDP-11/ 45, MICRO-1600/21, and some special I/O devices. Design and implementation of DLCN is an ongoing research project at The Ohio State University (Liu and Reames, 1975, 1977; Reames and Liu, 1975, 1976; Liu and Ma, 1977; Liu et al., 1977a,b; Babic et al., 1977; Oh and Liu, 1977; Pardo et al., 1977, 1978). As mentioned in Sections 2.3 and 2.3 above, two message transmission protocols are in common use today for distributed-control loop networks. In Newhall-type loops, a round-robin control passing token circulates around the loop and allows only one interface at a time the opportunity of transmitting (see Fig. 5b). T h e selected interface may place one message of variable length o n t o rhc loop, or may simply pass thc control t o k e n on to the next interface duwiistreaiii. In Yicrce-type loops, c o m munication space o n the loop channel is divided into fixed-si7e timc-slots (see Fig. 5a). Messages are also divided into fixed-size packets, so that each packet will occupy exactly one time-slot on the loop channel. The transmission protocol is as simple as waiting for the beginning of an empty time-slot and filling it with a packet.
DISTRIBUTED LOOP COMPUTER NETWORKS
175
S: Start of Message E: End of Message
F: Full E: Empty (a)
(b)
S: Start of Message E: End of Message (C)
FIG. 5 . Message traristnissioii protocols for loop networks. (a) Pierce loop, (b) Newhall loop, ( c ) DLCN loop.
Both of the aforementioned transmission protocols are simple to implement, but suffer from certain inherent shortcomings. The controltoken passing protocol limits message transmission to just one interface at a time and thus results in very inefficient loop channel utilization and long message delay. Dividing computer generated messages (which are usually variable-length) into fixed-size packets can introduce other problems. Not only is there delay in waiting for an empty slot to arrive, but considerable communication space is wasted when dividing variablelength messages into fixed-size packets. In addition, all the facilities required for converting messages into packets and back a g a i n 4 i s a s s e m bly, sequencing, buffering, and reassembly-must be provided by the loop interface or the attached component. Thus, neither protocol makes very efficient usage of the loop channel.
176
MlNG T. LIU
With the elimination of these disadvantages in mind, a third transmission protocol has recently been developed for use in DLCN (Reams and Liu, 1975). Called the shift-register insertion scheme for the transmission of multiple variable-length messages, it combines the best features of the aforementioned transmission protocols. This protocol makes possible the simultaneous and direct transmission of multiple variable-length messages onto the loop without the use of any centralized control (see Fig. 5c). By buffering incoming messages when necessary, it guarantees nearly immediate access to the loop for infrequent users, regardless of the level of message traffic on the loop. This protocol also provides automatic regulation of the rate of message transmission in accordance with observed system load, favoring light, infrequent users at the possible expense of heavy, frequent users. Perhaps most importantly, it yields shorter message delay and makes better utilization of the loop channel than any existing protocol. The detailed operation of this transmission protocol was reported elsewhere (Liu and Reames, 1975; Reames and Liu, 1975). The superior performance of DLCN’s transmission protocol has been verified both by queueing analysis (Liu et a / . , 1977a,b) and by GPSS simulation (Reames, 1976; Reames and Liu, 1976), which will be discussed in Sections 6.1 and 6.2, respectively. This fact is not surprising, since DLCN’s protocol (multiple messages of variable length) is more general than the other two, and when constrained properly, it can be reduced to operate like either the Newhall protocol (single message of variable length) or the Pierce protocol (multiple messages of fixed length) (see Fig. 5). DLCN’s overall message format is illustrated in Fig. 6. The flag is a special bit-sequence that denotes the start or end of a message frame. The two 16-bit address fields are each decomposed into %bit loop interface addresses and 8-bit process numbers for identifying source and destination processes. This facilitates interprocess communication by name without using an associative memory in the loop interface, as explained in Section 4.2. The first two bits of the message control field are used to encode the message type-information (00). acknowledgment ( O l ) , control (10) or diagnostic ( I I). The next bit allows for broadcast message transmission, whereby a message is copied by every interface on the loop, which has considerable application in inquiry situations when the address of the desired receiver is not known in advance. The following 3-bit subfield is interpreted differently for each message type (i.e., function code, response code, etc.). The next 2 bits are used for lost message detection and removal and the following bit is used for lockout prevention. The final 7-bit subfield of the message control field
DISTRIBUTED LOOP COMPUTER NETWORKS
DEST. ADDR.
ORIOIN ADDR.
CONTROL
DATA
CRC
177
FLAG
also has different uses for different types of messages (i.e., sequence number, function modifiers, etc.). The last field of the message format is the cyclic redundancy check field in which a 16-bit checksum is stored by each message transmitter. Note that not only the information field but also the address and control fields are checked by this technique, thereby ensuring their integrity as well. The detailed description of each message type and function was also reported elsewhere (Liu and Reames, 1977; Reames, 1976). There are several important design objectives for DLCN (viz., simple, economical, powerful, flexible, etc.), but perhaps the most significant one is integration-integration of hardware, software, and communication technologies to form a unified distributed processing system. Evidence of such integration is apparent throughout the system design of DLCN, as in the loop interface design that incorporates the novel transmission protocol mentioned above, o r in the control message format that facilitates low-level distributed control by the network operating system (to be discussed in Section 4), which in turn handles distributed process control and distributed resource management. Many distributed functions in DLCN are made possible because facilities like these are integrated into hardware, software, and communication systems of the network (to be discussed in Section 5).
178
MlNG T. LIU
3. Loop Interface Design
The loop interface consists logically of five parts as shown in Fig. 7: (1 )
(2) (3) (4) (5)
interface receiver, interface transmitter, loop control, buffer space, and attached component interface.
The interface receiver accepts incoming message frames from the loop, checks their destination addresses, and either accepts them for the attached local component if they are addressed to it or passes them over to the transmitter for relaying to the next interface downstream. The function of the interface transmitter is to place messages onto the loop, both incoming messages relayed from the receiver and locally generated messages from the attached component. The transmitter should be capable of merging these two message streams into one without interference and without the use of centralized control. The loop control regulates the traffic on the loop and performs control functions such as detection and removal of lost messages, prevention of transmission lockout, and others. Buffer spece is provided for the input, output, and delay buffers to input, output, and relay message blocks, respectively. Direct memory access (DMA) transfers of data will also be provided if appropriate. The attached component interface will not be considered here, since its design depends on the type of components attached (processors, terminals, or I/O devices). L*ne
Loop output ~
~i LOOP
,OD
t e r € i c e Line Receiver
Driver
Transmitter
Receiver
,I Node Output
f
Output
I
Attached
Node Input
Input
Component
I
FIG. 7. Functional organization of loop interface.
Input
DISTRIBUTED LOOP COMPUTER NETWORKS
179
3.1 Specifications of the Loop Interface
The major goal of the loop interface design is to provide a loop interface controller that uses standard state of the art off-the-shelf LSI packages (chips). The loop interface is designed primarily for the distributed loop computer network (DLCN) discussed in Section 2.4; however, it should be general enough so that it can be used for other types of loop networks with little modification. Flexibility and expandability should also be taken into consideration for later implementation of some of the network operating system functions in the loop interface. Specifications of the loop interface are listed as follows: (1) The loop interface should have the capability of interfacing to dual synchronous communication lines operating above 1 Mbps, and to two parallel channels directly connected to the attached component. (2) The loop interface should provide basic arithmetic and logic functions, data movement functions, and other functions necessary for minimizing interrupt service overhead. (3) The communication protocol algorithms should be built into the microcode, including lost message detection and removal, lockout prevention, and other measures necessary for regulating loop traffic. (4) The loop interface should provide a mechanism for handling priority interrupts, so that interrupts can be acknowledged and processed promptly. (5) The loop interface should provide enough buffer space for the input, output, and delay buffers, and should intelligently coordinate all message routings and control functions through these buffers. (6) Bit-sliced LSI microprocessors should be used because of their speed, flexibility, and microprogrammability .
3.2 Architecture of the Loop Interface
The basic functional components necessary for the loop interface are hardware and firmware. The interface hardware consists of the following components as shown in Fig. 8: (1) (2) (3) (4) (5)
loop channel adapter and control logic, cyclic redundancy check (CRC) logic, microprocessor and control logic, read/write random access memory (RAM), and attached component interface logic and control.
Figure 9 shows a block diagram of the loop interface using chips from the Am2900 series (“Am2900 Bipolar Microprocessor,” 1976). The Am2900 series was chosen because it is more general and considerably
180
MlNG T. LIU !Attached
I LOOP
In Microprocessor and Control Logic
Attached Component Interface Logic I
I I
Channel Adopter Control Logic
LOOP out
I t t
ReodIWrite R A M Logic
FIG.8. Hardware components of loop interface
simpler than other bit-slices like the Intel 3000 series in overall design (Oh and Liu, 1977). The key building blocks of the loop interface, as shown in Fig. 9, are as follows: three AM2901 microprocessor slices, one Am2910 microprogram controller, one microprogram control memory of 512 words by 64 bits, one Am29 14 priority interrupt encoder, ( 5 ) one RAM buffer space of 4K words by 8 bits, (6) one direct memory access (DMA) controller, and (7) one USRT with a line receiver and a line driver.
(I) (2) (3) (4)
With the microprogrammed design, there are control and data manipulation circuits in Fig. 9. The main part of the data manipulation is performed by the Am2901 microprocessor slices, which contain arithmetic logic units (ALUs), register files, and associated control logic. Since the unit of datum to be manipulated is 8-bits wide, at least two 4bit slices are needed for the loop interface. In order to utilize the general register file of the ALU as memory pointers to directly address up to 4K words of the RAM buffer, however, three microprocessor slices are employed in the design. The most important support circuit is the control unit of the microprocessor slices, since the former determines the latter's power and flexibility. Using the Am2910 microprogram controller, the loop interface is
DISTRIBUTED LOOP COMPUTER NETWORKS
181
3 FIG.9. LSI implementation of loop interface.
fully controlled by a microprogram stored in a bipolar PROM (programmable read only memory). The Am2910 is also an address sequencer and can supply a 12-bit address to the microprogram memory. It has a 5-word by 12-bit stack, allowing up to five levels of microsubroutine nesting. The number of bits in the microinstruction depends on the number of functions to be performed simultaneously. A @-bit microinstruction is used in the design (Oh, 1977). The microprocessor slices perform all housekeeping tasks such as monitoring and updating loop conditions, updating the status of and adjusting the pointers t o the buffer space, manipulating the message transfer between the loop transmission facility and the attached component, etc. By executing the microprogram, the microprogram controller performs all control functions, including detection and removal of lost messages, prevention of lockout, and automatic generation of acknowledgment for the received message, among other things. The read/write RAM is used to provide buffer areas for inputting, outputting, and delaying of message blocks, and its size is one of the design parameters to be discussed in Section 6.3. In the initial design of the loop interface, access time of less than 200 nsec is considered to be satisfactory. However, faster access time would be desirable if much control information was to be stored in the RAM (to be discussed in Section 3.3). The transmission loop is connected to the programmable USRT (universal synchronous receiverhransmitter) via an RS-422 line receiver from
182
MlNG T. LIU
the loop input and via an RS-422 line driver to the loop output. The basic communication protocol and CRC logic are also provided by the USRT. The DMA controller provides direct memory transfer of data between the USRT and the buffer memory. The A L U of the microprocessor slices is thus free from performing any data transfer between the transmission loop and the RAM buffers, except for decoding the address and control functions. Input and output events, either from the loop through the USRT or from the attached component, are attended to by interrupts, utilizing the Am2914 priority interrupt decoder. Up to eight levels of interrupt are provided by the Am2914, and the highest priority is assigned to the power-on event, which puts the loop interface into a working state. 3.3 Extensions to the Loop Interface
The microprocessor-based implementation of the loop interface described above will allow flexibility, power, and high-speed transmission by the loop at low cost, and perform basic functions needed for proper operation of the distributed loop computer network (DLCN). As mentioned previously, however, some additional functions such as network monitoring and operating system primitives can be easily added to the loop interface. In the following, some extensions to the loop interface design are examined. First, the loop interface designed above can be used for both Newhalltype and Pierce-type loop networks with little modification. By assigning one or more bits next to the EOM as a loop access control field in the message format and by providing firmware to detect the control token in the loop interface, a Newhall-type loop can be realized. This is possible because the USRT in the loop interface can receive and transmit shortlength data ( I to 8 bits) between the flags. Similarly, by dividing the communication space of the loop channel into fixed-size time-slots and by providing firmware to detect empty time-slots, a Pierce-type loop can also be realized. Even though separate firmware can be provided for implementing Newhall-type and Pierce-type loops, a better approach would be to develop generalized firmware, so that all three types of distributed-control loop networks (Newhall, Pierce, and DLCN) could be supported by the loop interface. The choice of which type to support could be determined during the initialization of the loop interface. As pointed out by many people, the major disadvantage of loop networks is their low reliability. Loop reliability is a problem, but it should riot be overrated. Marly techniques are available for improving the reliability and availability of loop networks (Hafner and Nenadal, 1976; Za-
DISTRIBUTED LOOP COMPUTER NETWORKS
183
FIG. 10. A double-loop network
firopulo, 1974). The reliability and availability of loop networks can be greatly improved by the use of an interface bypassing mechanism and adoption of the double-ring structure (see Fig. 10). The loop bypassing mechanism will prevent the entire loop breakage due to malfunction of any loop interface. A redundant loop may be built with another loop communication channel adapter (a line receiver, line driver, and USRT) added to each loop interface. Each receive and transmit request would be honored by different levels of interrupt. Microprocessor-based implementation would be very attractive since automatic reconfiguration of the loop network due to failure of a node or section of the loop could then be performed by firmware. Finally, more intelligence could be put into the loop interface, so that it could act as a network front-end processor and perform some of the network operating system functions, such as network communications, process scheduling and control, and peripheral management (Pierce and Moore, 1977).Through firmware, these front-ends could be easily tailored to optimize the relationship between their attached components and the remainder of the loop network.
4. Network Operating System Design
The efficient operation of a resource sharing computer network requires careful design of the interface between the user program and the network resources. Establishment of a network operating system is required to afford effective access and use of network resources by application programmers. The network operating system may be regarded as
184
MlNG T. LIU
an agent interposed between the user and the network resources so as to provide ease and uniformity of access to the computing resources, and ( 2 ) system-wide control of the allocation of resources among multiple competing requests. (1)
The design of such a network operating system becomes very complicated if the computers in the network have different architectural characteristics. Some network operating systems are currently being investigated, but operational experience is very limited (Cosell et ul., 1975; Forsdick et u l , , 1978; Retz and Schafer, 1976; Robinson, 1977; Thomas, 1973). I n fact, many problems of network operating systems have not even been identified, let alone been solved (Kimbleton and Mandell, 1976). The distributed loop computer network (DLCN) described in Section 2.4 is a loosely-coupled heterogeneous network that consists of many different computer systems (small and large), each operating for the most part independently and under local control, yet occasionally sharing data and programs with or borrowing resources from other computer systems in the network. In order to make such cooperative resource and data sharing possible in a uniform manner, it is necessary that all processors in the network execute components of a commonly structured network operating system, called the distributed loop operating system (DLOS). Each local component of DLOS is responsible for converting between local and network-wide representations for all forms of information (both user and system) that must be exchanged through the network. Thus, the primary job of DLOS is to provide a unified interface to the network, so that the user is under the illusion that he or she is working with a single powerful system with many available resources (see Fig. 11). The user does not have to know the exact location of the resources he or she is requesting, or the architecture of the network, or its method of operation. Control of the network as a unified distributed processing system is made easier to achieve by the close cooperation of the message communication protocol and the design of DLOS, since all the required lowlevel operating system functions are implemented, using the interchange of control messages through the loop communication subnetwork. Having a loop for the communication subnetwork, which makes message routing, broadcast transmission, and distributed control so easy (see Section 1.3), has had a profound effect on the structure and design of the network operating system. The development of DLOS is also intended as a research vehicle to study various basic requirements and alternatives as applied to a distributed environment, to explore the unique nature of the loop architecture,
DISTRIBUTED LOOP COMPUTER NETWORKS
185
FIG.I 1 . Internal structure of DLCN.
and to investigate problems intrinsic to a network environment (Liu and Reames, 1977; Reames, 1976). It is felt that many concepts developed for DLOS can be applied and extended to general network topologies with little modification. 4.1 Requirements of the Network Operating System
The distributed loop operating system (DLOS) is distributed among the host computers attached to the network; each local version can be implemented differently according to local requirements, but must coordinate and communicate with other components in a uniform fashion in order to perform the necessary functions required in the network environment. The basic requirements and design philosophy of DLOS can be summarized as follows (Metcalfe, 1972; Malhotra, 1975; Retz, 1975; Wecker, 1973): (1) DLOS should support a variety of operating systems such that the addition of DLOS would not require extensive modifications to the ex-
186
MlNG T. LIU
isting operating systems. In order to do this, DLOS can be implemented as a separate component. It can be viewed as an interface between processes of the various operating systems that are communicating and cooperating to perform necessary functions requested b y the user. (2) DLOS should be designed to handle requests locally if possible; otherwise, requests will be converted to a standard message format and broadcast through the loop to a remote component that services the requests and returns the results. (3) There should be no centralized host in control of the network. Each system is running independently under its operating system, but serving or using other systems occasionally, so that the failure of a single host need not cause too serious a consequence to the rest of the network. Each system should be able to accept or reject requests coming from remote users, depending on its own situation and policy. Furthermore, if multiple acceptors are found, the requestor's own system should be able to choose which one to accept according to its own evaluation policies. (4) DLOS should support automatic resource sharing, whereby programs, data and other resources can be accessed by a process without having to know the location of the resources. This concept is referred to as location transparency. ( 5 ) DLOS should support complete interconnectability, that is, to supply an interprocess communication (IPC) mechanism whereby a n y pair of processes and/or devices can communicate. Moreover, DLOS should support IPC by process name, that is, it supplies a mechanism whereby any pair of processes can communicate regardless of their location in a particular host computer. Since this IPC path can be dynamically created upon request, a process can communicate with several processes simultaneously and may also migrate from processor to processor: (6) DLOS should provide a generalized process control structure for process management. This would be a facility whereby a process could request clif'ferent control privileges from or grant different control privileges to other processes. Since a process is allowed to communicate simultaneously with several processes through the dynamic IPC mechanism, a generalized process control structure should be allowed, thereby permitting more than one process to exercise control over another process (with its consent) or allowing two processes to exercise control over each other (coroutines). (7) DLOS should provide a standard data representation, since internal data representations are different in each different host computer. A component, called the data format and transfer processor (DFTP), must
DISTRIBUTED LOOP COMPUTER NETWORKS
187
be developed to perform functions like data preparation, selection, and translation. The DFTP at one end must translate the data into a standard form for transmission, while the corresponding remote counterpart converts it back into its own internal representation (Levine, 1977). (8) DLOS should be well structured to allow for future technological advances to be incorporated in the design or in a particular implementation and for debugging purposes. The basic functions to be performed by DLOS should be divided into modules and structured in a layered manner (Brinch Hansen, 1970). This would allow higher level network facilities to be built in an orderly fashion on those functions provided by lower level modules. 4.2 lnterprocess Communication
Since DLCN operates in the context of processes, the discussion of DLOS will first begin with the topic of interprocess communication (IPC), which is one of the most basic facilities provided by the network operating system (Akkoyunlo et al., 1974; Walden, 1972). Moreover, the data structures and operations developed for implementing process name-toaddress translation will be used for many other modules of DLOS as well. As mentioned in Section 2. I , the message communication protocol used in a distributed-control loop network requires that physical addresses be given in every message to identify both the receiver and sender (see Fig. 2b), since messages (unless broadcast) are directed to a particular user process executing on a particular host computer in the network. In order to accomplish this objective for DLCN, every process address is broken down into two components: an 8-bit loop interface address (LIA) for locating the proper machine in the network and an-8bit process number (PN) for identifying the process itself (see Fig. 6). While this method of physical addressing was chosen for transmission efficiency, it is desirable that user processes be able to communicate with each other by logical process name, without having to know the actual physical locations of other processes. If IPC by name is to be supported by DLOS, while physical addressing of message frames is to be required for message transmission on the loop communication subnetwork, then DLOS must translate from process name to process address (and vice versa) for every message transmitted or received. The translation must be both dynamic (since the location of a process may change via migration) and efficient. To see how this is done by DLOS, consider the following problem and its step-by-step explanation. Suppose that process A in machine 1 wants to send a message to
188
MlNG T. LIU
process B in machine 2 (see Fig. 11). Process A does not know the physical location of process B (nor does it wish to know) and thus it will send its message to process B by name. The action of transmitting this message from process A to process B can be explained in the following four steps (see Figs. 12-14): (1) First, DLOS must set up a logical connection between the two named processes. A LOCATE control message is put on the loop in broadcast mode from machine I , giving the address of process A and the names of processes A and B. Every machine on the network receives the LOCATE message and checks its local list of process names to see if process B is located there. If found, that machine (2 in this case) copies the physical address and logical name of process A into its PNATT (process name-to-address translation table), and then sends a LOCATE control message back to machine 1 to give it the physical address and logical name of process B. Machine 1 also copies that information into its own PNATT (see Fig. 12). Should the location of either process ever change, a single LOCATE control message can cause the appropriate translation table to be updated. (2) Next, a communication (talk) path must be established between processes A and B. Process A must request permission to talk to process B by issuing a request talk privilege command to process B (either explicitly, or implicitly by executing a send message command with process B as the destination). Moreover, process B must also issue a grant talk privilege command to process A (again explicitly, or implicitly by executing a receive message command with process A as the origin) in order to complete the talk connection. Once this communication path is established (and until it is broken by either party), process A is free to send messages to process B. (3) When process A later prepares a message it wishes to send to process B, it can transmit the message by executing a send message command. DLOS uses the parameters (see Fig. 13) of the send message command to construct an MQE (message queue element) for the message, which is then linked to the send queue chain pointed to by PCB (process control block) of process A (see Fig. 14). If process B were in machine 1, then the message communication handler of DLOS would simply transfer the MQE just formed from the send queue of process A to the receive queue of process B. However, since process B is assumed to be in machine 2, the MQE is instead removed and added to the send queue of the loop communication handler for machine 1 . The loop communication handler forms an information message frame from the MQE (using the PNATT addresses for processes A and B) and transmits the frame out onto the loop. The MQE is then removed from
DISTRIBUTED LOOP COMPUTER NETWORKS
189
PNATTl
r---- - - I
I I
,---+ LIAB I
ii
I
PNB
1
I
Index
P
- -- -- - --
I
I
I I
I
Ejext PQE
-C
iI -
1
L
1
I
I
PNATT Index T a l k Request Q
1
+----
B--
PCBA
I-YPNATT
P r o c e s s Name
Lengthg
I
I
(a)
r - -- I I I
I
---
LIAB
PNg
Lengthg
P r o c e s s Name
F---
LIAA
PNA
LengthA
P r o c e s s Name
BT-1I I I
A
+-
A-
I
-- -
1
190
MlNG T. LIU Send to B
User Buffer Area Act i o n
Status
Text of User Message
Destination Process Name
Send/Receive Message Parameter Block
FIG. 13. Format of sendkeceive message parameter block.
onto the loop, If, however, the message was received correctly, the MQE is removed and destroyed, its buffer space is released, and the appropriate status is reported to process A (see Fig. 13). (4) At machine 2, where the received message frame is removed from the loop by its own loop communication handler, the message frame is dismantled, the message text is placed into a system buffer, and an MQE is formed for the message that is added to the receive queue chain of process B’s PCB. ‘I’he message sender (process A) has n o w been located in the PNATT, its talk privilege to process B verified, and the MQE associated with process B. The message will simply wait on the receive queue until process B executes a receive message command specifying process A by name as the desired sender. The first (oldest) MQE associated with process A will be removed from the chain, the message text will be copied into the user’s buffer area, proper status will be reported, and the MQE will be destroyed. The data structures and operations upon them that have been presented above should explain h o w IPC by name is to be achieved under DLOS. Notice that this novel IPC mechanism can be extended to other types of networks, provided that appropriate broadcast procedures are employed. The explanation of this feature was rather detailed, since the same design philosophy and kinds of operations are found throughout DLOS. Thus, the features to be presented hereafter will not be described in quite so much depth.
4.3 Remote Program Calling As its next feature, DLOS introduces the concept of global process control in the sense that a process can be considered as an entity whose existence and scope of influence is global to the entire distributed computing network, just as the latter is also regarded as a single, unified computing system. The idea is that a single process should be able to control the sequential execotion of program code that is physically scattered among several heterogeneous machines in the network.
DISTRIBUTED LOOP COMPUTER NETWORKS
191
PCB,
I I
I
L,
LIAA
PNA
LengthA
Process Name
--
-- 1
A
I
I
I
I
LIAB PNB
LengthB
Process Name
4
B
-1
I
1
FIG. 14. Message queue element format and linkage.
What will be explained here is essentially a generalized method of remote subroutine calling, whereby a piece of code (program X) being executed by a process A in machine 1 can call (transfer control to) another piece of code (program Y) located in machine 2 , but still under the supervision of process A. The actual transfer of control (the call) from program X to program Y is effected by a call command in program X. If programs X and Y are both in the same machine, then the call is handled in the normal manner. Otherwise, a CALL control message is sent to machine 2 (containing program Y), specifying the name of program Y and the address of process A that is (logically) to control it. What actually happens is that a new process A ' is created in machine 2; and it is this new process, acting as the remote agent of process A, that really controls the execution of program Y in machine 2 . Thus, process A that controlled the execution of program X in machine I is deactivated by the remote call, for machine I no longer has any work to perform for process A until the called remote program finishes and returns control to program X. When that event occurs (signalled in DLOS by the execution of the return command in program Y and the transmission of a corresponding RETURN control message to machine I ) , several actions will be performed: process A will be reactivated, the specially
192
MlNG T. LIU
created process A' will be destroyed, and execution of program X will be resumed. Thus, as explained above, the remote program calling mechanism is fairly easy to implement and is totally transparent to its users. A user process need never know if a called program is local or remote, for DLOS handles all the location and linkage details. The transfer of parameters to such a remote program is considered next. 4.4 Generalized Data Transfer
Data transfer between a local calling program and a remote called program (or between two processes) is in general extremely difficult, since the two programs may reside in different machines that have completely different internal data representations (Levine, 1977). However, all machines are capable of outputting inforination in human-readable form as character strings in some standard code (EBCDIC, ASCII, etc.) and can accept input expressed in the same general format. Thus, one solution to the data transfer problem is to establish a network-wide standard character string representation for all data and to map output into this standard form on transmission and input from it on reception. Naturally, the mappings will be different for each machine. Actual data transfer might well use two separate pieces of information: ( I ) the data itself, expressed as an ASCII (or EBCDIC) character string, and (2) the format description of that data, also represented as a character string (similar to a FORTRAN or PL/I format statement), extended to include all necessary device control information.
By having such a generalized format capability to describe the data and its treatment, it is possible to transmit any kind of data between two machines, without having any prior agreements as to its exact format and interpretation. By having the data itself expressed as a character string, it is possible to ignore internal data representation differences and to use a standard external representation that most systems already use for communicating with the outside world. With such a powerful and generalized data transfer mechanism at its disposal, it should be possible for DLOS to support parameter transfers between remote programs (or even file transfer between file systems). Unquestionably, this method is inelegant, inefficient, and somewhat restrictive. Yet it is entirely workable, can be implemented fairly easily, and is quite general in nature. Every computer system that uses FORTRAN or similar languages already has elaborate data conversion and
DISTRIBUTED LOOP COMPUTER NETWORKS
193
formatting routines for the input and output of data. In these cases, it should be as easy for programs to exchange data as it is to read or punch a card. The data transfer component of DLOS in each machine could even be written to perform all needed conversion automatically, so that individual programs need never even know that data conversion was necessary. 4.5 Distributed
Resource Management
In order to accomplish distributed resource management under DLOS, it is envisioned that all resource requests in each machine will be made through the local resource manager (see Fig. 15). If the resource request can be satisfied locally, the resource manager will simply pass control over to the appropriate local resource allocator; otherwise, the resource manager will issue a RESOURCE control message in broadcast mode to determine if some other machine in the network has the requested resource and is willing to allocate it. No response may be obtained, in which case the resource is simply not available now, or perhaps several replies will be obtained from different machines. In the latter case, the resource manager can choose the one it feels is “best” in some appropriate sense and will pass the request on to the machine’s resource manager, so that it can control the allocation of its own resources. All usage of the allocated resource would also have to pass through the local resource manager. For local resources, the appropriate local routines would directly service all requests as usual. Otherwise, a control message describing the type of resource access requested is passed through the network to the remote resource manager and thence to the remote access routine. The resulting reply, if any, is passed back by the same method to the original requestor. This method of resource management is workable and attractive for a distributed processing system, since it still allows local management to work without change. As far as the local operating system is concerned, the resource manager is just another program to which it assigns resources. If the resource manager wishes to reallocate them to remote programs, that is entirely its business. Moreover, it should also be possible to suballocate and exchange resources among porcesses, while DLOS still maintains ultimate control over them. Obviously, there are many difficult problems to solve in an actual implementation of this approach, such as resource accounting, deadlock avoidance, or just ordinary resource access and usage. However, since resources are allocated and ultimately controlled by local resource managers, each under control of a local operating system, it seems likely that these problems can be
194
MlNG T. LIU Local User Request
f o r System Service
r P 4 Local
Request
LOOP
Network
5.
Conversion
...
Remote
Local Form
Request
Conversion
Handler
Handler
r
Distributed
Remo t e
Local
Operating
Request
Request
Servicer
System
Servicer
(DmS)
Local
Response
Local Form
Handler
Conversion
...
Network
Conversion
Remote
Response Handler
r
ove come and that a powerful, flexible, and easy to use resourc management system can be obtained. To summarize, the design philosophy of DLOS is that DLCN, even though actually a collection of separate computer systems connected by a loop subnetwork, should be viewed by its users as a single, unified distributed processing system. Thus, all functions in DLOS have been
DISTRIBUTED LOOP COMPUTER NETWORKS
195
implemented with this idea in mind. Interprocess communication is by process name, not by address, remote programs (procedures located in other machines) are callable by a global network process also by name, using generalized data transfer mechanism. Similar capabilities are also provided for distributed resource management. Consequently, by using the facilities specifically designed for it in the hardware and communication subnetwork, DLOS can give its users the illusion of executing on a single, powerful distributed processing system with many available resources. 5. User Access and Network Services
The basic characteristic of a computer network is that the user views the network as a collection of several computing systems with many varied resources available to him, such as software subroutines and packages, specialized hardware, and data bases. In a heterogeneous distributed network like the ARPANET, the user frequently needs to learn to use and remember how to use a variety of such service resources. Experience with ARPANET has shown that its use by nonexpert users is fraught with frustration. In the early development of the ARPANET, the user had to explicitly manage the resources by (1) locating a desired resource, which is done generally through personal contact and is time consuming, and (2) learning the host-dependent commands necessary to invoke it. In an effort to improve the ARPANET’s user environment, several systems, such as RSEXEC (Cosell et al., 1975; Thomas, 1973), NAM (Rosenthal and Watkins, 1974), and REX (Benoit and Graf-Webster, 1974), have been developed for providing on-line information concerning network resources and for providing automatic acquisition of selected resources. However, these sysems are all “added-on” features, require special software (TENEX) or hardware (minicomputers) to implement, and are thus not suitable for local networking. The distributed loop computer network (DLCN) intends to provide a common user interface to all the components attached to the loop communication subnetwork through the process structure and novel interprocess communication (IPC) mechanism of the distributed loop operating system (DLOS), as described in the previous section. DLOS will determine where a needed resource is located, whether it is a processor, a file, or a software routine, for the user, and will also provide him with the needed communication path to the appropriate managing process. It is through this uniform IPC mechanism and process structuring that DLCN’s operating transparency is maintained, thereby making DLCN into a distributed processing system.
196
MING T. LIU
Network services provided by DLCN are as follows: (1) processor-independent execution of jobs, (2) uniform and flexible process-to-process communication, (3) network-wide resource allocation, including I/O devices and data bases, (4) remote program calling and file access, and ( 5 ) generalized process control structures.
These combined facilities provide the user with a common network interface through a common network command language to be presented below. 5.1 A Network Command Language A command language suitable for a distributed network environment must be very flexible and powerful, and must somehow accommodate a wide variety of computer system architectures and user interfaces found in a heterogeneous network (Enslow, 1975; Gray, 1976). Since each computer system in a network may already have its own command language, it is very important that a common network command language be developed to hide the host-dependent specifics from the user, thereby allowing him to access the network in a standard way. Moreover, the format and sequence of commands that the user must employ should be natural, easy to learn, and easy to remember. In the following, the manmachine interface provided by the network command language for use in DLCN will be described. The aim is to provide the user with uniform access to network resources. The user is given a single tool that allows him to describe the environment and to coordinate the execution of his application programs (Schicker and Duenki, 1976; Timlinson, 1976). When a user logs on a local system, he o r she is connected to a system process called the logger and after proper identification, a user process containing his o r her profile will be created to act as the user’s agent in this system. This is required since DLCN operates in the context of processes, so that the user is treated very much like any other process in the network. Once logged onto a local system, the user can issue a sequence of commands to
(1) link to another user process (remote or local), (2) return to the old user process, (3) log on to a remote system by creating a remote agent, (4) send data and control messages to other processes, (5) rcceive data arid control mcssages from other processes,
DISTRIBUTED LOOP COMPUTER NETWORKS
197
(6) locate and select an application program (possibly from a library), and (7) create and initiate processes at other sites for distributed processing.
A set of such network commands that are deemed essential for DLCN users are listed in Table 1. It will be most illuminating to consider the appearance of DLCN from the user's point of view and then to describe how his requests are processed by DLOS through an example. This example also illustrates some of the capabilities that the network commands can provide. Consider that an interactive user in site A wants to run a FORTRAN program. Since site A has no FORTRAN compiler, the user wishes to run it on a remote machine and to have the results printed at his or her own site. A sequence of commands that the user must employ at site A and a corresponding sequence of commands generated at a remote site on the user's behalf are shown in Fig. 16, where the underlined items are produced by the system. After logging onto site A, the user types a LOCATE command to find a FORTRAN compiler available in the network. In this example, both TABLE1
LISTOF NETWORKCOMMANDS Command WHERE CONNECT DISCONNECT SEND RECEIVE LOCATE INVOKE CALL ATTACH START CLOSE SUSPEND RESUME CREAT DELETE SFILE RFILE HELP
Function Find a user process by name Establish communication path Destroy communication path Send a message Receive a message Locate a program or resource Invoke a procedure Locate and invoke a remote procedure Attach a procedure for concurrent processing Initiate a remote process for the user Deactivate the user's remote process Suspend temporarily communication with a remote process Restore communication with the suspended remote process Create a file Delete a file Send a file Receive a file Request a list of commands and their functions
MlNG T. LIU
198 LOGON LOCATE
FORTRAN
HOST
HOST C -START B SFILE
LOGON
Job
EXECUTE
Job
RFILE EXECUTE
RECEIVE
SEND
CLOSE
LOGOFF
Job
LOGOFF
Site A Site B FIG. 16. Sequence of commands at sites A and B.
sites B and C have the requested resource and are willing to provide service. The user chooses site B by issuing a START B command, which will create and initiate a remote process at site B on the user’s behalf, and will establish a communication path between the user’s local and remote processes through the IPC mechanism provided by DLOS. Once the communication path is established, the user is automatically logged on to site B. Suppose at this point that the user has a source program called “job” already stored in site A. In response to the user’s SFILE job command at site A, which causes the user’s source program to be sent to site B, the user’s remote agent at site B will issue RFILE to receive his source program at site B. Upon correct reception the user types EXECUTE job, which in turn will prompt the user’s remote agent to do the same at site B. The user at site A then issues a RECEIVE command, which will block the user’s local process temporarily to wait for receiving the results from site B. Upon receiving the EXECUTE job command from site A, the system at site B complies and executes the FORTRAN program, then sends the results back to site A by issuing a SEND command. The user process at site A is now awakened to receive the results and to print them out at the user’s own site. Now that the job is done, the user deactivates the remote process at site B by issuing a CLOSE command and logs off from site A by typing a LOGOFF command. If the user had some local processing to do after issuing the EXECUTE Job command, the user could type a SUSPEND command (instead of a RECEIVE command as in the example above). This would temporarily suspend the user’s communication with his remote process at site B, but the user is then free to interact with the local system for some local
DISTRIBUTED LOOP COMPUTER NETWORKS
199
processing of the user’s job. When the remote process at site B returns the results of the user’s program, the message cannot be received and printed out immediately, but must be queued onto the user process’s input list and wait there until the user issues RESUME and RECEIVE commands. The communication path between the user’s local and remote processes will be restored, the message received, and the results printed. This shows that a user can perform concurrent and distributed processing of his application by proper use of the SUSPEND and RESUME commands. 5.2 A Distributed Programming System
A distributed programming system is defined here as a programming system that supports sharing and distributing program and data modules of a single application program among processors on a work unit at the task o r subroutine level (Thomas and Henderson, 1972; White, 1976). Different modules can run at different sites within the sytem in order to take advantage of any specialized resources that may be available there. This definition does not necessarily mean that these modules must be run in parallel with each other. T h e nature of some jobs may require that thesejobs be run on different machines in a sequential manner. Doubtless to say, whenever the situation allows, parallelism should be exploited. The motivations for distributed processing of an application program include the following: ( 1 ) To access special resources (hardware, software, data bases) that are available to the system. A program (job) may have to access several special resources that are residing a t different sites. Such a case may provide a good reason to partition a job into several modules or procedures, so that each of the partitions can be distributed to access the special resources in the constituent computers. (2) To satisfy real-time urgency. Partitioning can be done with the aim of increasing the speed of computation by allowing the partition to be run in parallel. Moreover, partitioning creates smaller units of work that would receive service more rapidly, since most scheduling policies tend to favor short jobs. (3) To reduce turnaround time and to improve performance of large jobs by allowing division of labor among the machines (load leveling). (4) To extend the memory space by dividing a large program into portions and by distributing them for execution on multiple processors.
Depending on the reason for distributed processing, remote programs and subroutines may have to be called, remote files and data bases may have to be accessed, procedures may have to be sent to remote sites for
200
MlNG T. LIU
processing, arid additional facilities may have to be provided by the rirtwork operaling system to l’acilitate distrdxitetl processing of an application prograin. The idea of’ allowing several autoriornous cornputer systems to solve complex problems by the division of labor and cxploring thc parallclistn iiiliererit i n this eriviroriirient is not n e w . In fact, several models of parallel and distributed proccssing have been described i n the literature (While, 1977). These models, however, all follow the hierarchical control structurc thar enforces a parent-son relationship and that is a very limited niodel for distributed processing. Such a static, regulated erivironrnent makes life easy l’or the opetaiing system, hiit i t can unduly haniper and restrict user processes. Why cannot a process have control over it.s crcaior, a s Ioiig ;IS lioth ;ire willing to swap roles? Why cannot two or more separately created processes agree t.o cooperate and t o help each other? Why cannot a process permit a group ol’other processes (instead o f j u s t one) t o have joint control over it? It is felt that all these forms of control SI r11ct11re are reasoriable a r i d slioultl be possible, especially i r i a distributed processing environment. Thus, a more general model for distributed processing of an application program has been proposed (Liu and Ma, 1977). The general model is built on independent processes that coexist within the system on an equal status and that cooperate and interact through exchange of messages to provide service to the users. A job is defined to consist of a set of processes that may be residing and executed on different hosts. These processes are connected via communication paths as illustrated in Fig. 17. Facilities are also provided for synchronization and communication among the processes. Moreover, the traditional hierarchical control structure can be easily simulated as a special case of the general model (to be discussed below). Processes are the primitive objects of the model and have the following properties: ( 1 ) A process can create another process in the local machine or a remote machine by the CREATE or SEND commands. The new process thus created is independent of its creator and enjoys its own existence and privileges as an independent entity in the system. It competes for resources with other processes in the system and is scheduled for execution according to the scheduling policies of the system. (2) Once created, a process is allowed to run in parallel with its creator and there is no transfer of control from the creating process to the created process when the latter is created. (This is contrary to the hierarchical control structure that does have such transfer of control.)
DISTRIBUTED LOOP COMPUTER NETWORKS Proccsa
I
201
Procesa
I I
Commtznicadion Path
A W
I
I Machine Boundary
FIG. 17. A model of distributed processing.
(3) A process, after creation, will enjoy its own existence until its work is done, whereupon the process destroys itself by reaching the end of the procedure or by executing an EXIT statement. Any values that need to be returned to the creator will be sent via the communication path by the operating system. (4) Each process is modeled as comprising of a set of communication channels, a local data space, and an input data space. The local data space contains a set of local variables and files that are private to this process. The input data space contains a set of variables whose values are determined (or updated) by other processes. It represents a set of data values that are inaccessible at that point in time to the process. The local and input data spaces are dynamic in nature and can be changed throughout the life span of the process. ( 5 ) Although residing in different autonomous sytems, processes can communicate with each other by explicit exchange of messages through established communication paths that are set up dynamically by the IPC mechanism of DLOS. A communication path is automatically established between the creator and the created process at the creation time. A process, however, can communicate with other processes in the system than its creator or created process by issuing a SEND command. The IPC mechanism will be invoked to set up a communication path, as explained in Section 4.2.
Modeling a job in such a manner as a collection of independent processes has the following advantages: ( I ) There is no central locus of control within a job. As processes are created dynamically during the life span of the job, they can be executed independnetly in a parallel fashion. ( 2 ) Each process can correspond in size and in function to a procedure or a subroutine and each process has its own private virtual memory space and protection offered by its own operating system.
MlNG T. LIU PROG :
CALL
PROG:
SEND
FIG. 18a. Simulation of CALL/RETUKN construct. X:
Y:
X:
Y: RECEIVE X
REsuMEP-----.
SEND Y RECEIVE Y SEN;) X RECEIVE X
SEN; Y RECEIVE Y
.
FIG.18b. Simulation of coroutine X and Y .
(3) The passing of control from one procedure to another in conventional systems has been discarded. The passing of data by procedure parameters and shared variables has been replaced by exchange of messages between processes. This eliminates the one-to-one call and return restriction imposed by the traditional hierarchical control structure. Multiple passages of data arguments and results between processes as directed by the application program are allowed. An example of applying the general model to a simplified data processing system is given in Liu and Ma (1977). The application program is divided into a batch and a real-time section, running in different machines. Each section calls other procedures to perform various data processing routines necessary in each procedure. The example illustrate several features of the proposed model for distributed processing of an application program. Application programmers, who are unfamiliar with writing application programs that can span multiple hosts, do not have to leave their familiar working environment. Moreover, the remote procedure call and the coroutine can be handled in a uniform manner (see Fig. 18). Thus, the proposed model provides a more flexible control structure among interacting processes and eliminates the need of control primitives for different forms of control constructs. 5.3 A Distributed Data Base System
The design of network services that manipulate data has to take into consideration several properties of data. For example, data can be refrr-enrcd by corilent (queries), data tend t o be dynamic (updates), data
DISTRIBUTED LOOP COMPUTER NETWORKS
203
can be semantically damaged (consistency, integrity, and security), data may be accessed in parallel (concurrency), data can be differently viewed by many users (data bases), and data usage tends to follow special patterns (geographical, organizational, etc.). It is advantageous to distribute the data in a data base according to the goals described below: (1) Increasing reliability. A data base contains such a valuable resource that users may not be able to afford a system failure. If a host crashes, users should be able to find the data needed in some other host. ( 2 ) Zmproving responsiveness and throughput. Transactions processed entirely on local nodes do not incur delays introduced by the communication subnetwork and interhost message exchanges. On the other hand, the inherent parallelism in the network can increase transaction throughput. ( 3 ) Satishing user needs. Many companies (or branches within a company) like to share data resources while they still want to exercise control over their portion of data. (4) Reducing communication costs. If the system is such that most of the processing is local, then communication costs are minimized. Moreover, communication costs due to nonlocal processing can be shared among the user community.
Distributed data base systems are relatively new and still evolving. Here an approach to designing a distributed data base system, called the distributed loop data base system (DLDBS) is presented for the distributed loop computer network (DLCN). The system architecture of DLDBS is intended to be very general, so that new concepts can be applied to many system environments (Pardo at al., 1977). Design goals of DLDBS are to avoid major surgeries to existing data base systems and to explore only those problems dealing with distribution of data. Thus, the model dissociates as much as possible the problems of distribution of data from conventional data base problems. A convenient way of viewing distributed software is to consider two levels of abstraction: a powerful communication protocol and the distributed processing algorithms built on top of it. A communication protocol for distributed systems should not be restricted to handle exchange of messages between two parties (processes) only (Carret al., 1970; Crocker et a l . , 1972). In fact, it turns out that this class of protocols (two-process protocols) can support only some algorithms that are part of a powerful distributed system. Thus, a new type of communication protocols ( N process protocols) has been proposed to handle the exchange of messages
MlNG T. LIU
204
among N parties, where N is equal to or larger than two (Pardo rt al., 1978). A distributed processing algorithm (DPA) is an algorithm whose execution involves interaction between two or more remote processes in a distributed processing system or computer network. It turns out that network services are a very important class of DPAs, and that the problem of handling interdatabase interactions is reduced to the problem of designing appropriate DPAs (Pardo ef al., 1977). DLDBS is used by means of a query language. In general, a query language manipulates data as seen by the user. A major difference between a data base and a file system is that a query in the former acts upon views of data that are different from the actual view of the data stored, whereas a query in the latter acts upon the only view that is the view of the data stored. It turns out that the major problems of distributed data are those of a data base system and a file system as well. Thus, the system architecture of DLDBS and several of its design principles also apply to a distributed file system for DLCN. DLDBS is intended to support data bases distributed across DLCN in many sites (host). In the model, these sites are called loop data nodes (LDNs). LDNs have processes that are active agents through which data are demanded, transmitted, and delivered. Figure 19 shows a functional division of different types of processes supported by LDNs. User processes can be either processes representing some remote user who is interacting through on-line queries or the actual process controlling the application program. These queries made by user processes are called transactions in DLDBS. Local data base processes are the “experts” on the local data base. They interpret queries, implement data control fea-
Loop Data Node (LDNf
I
I
Special DLDBS
Processes
Processes
I
Local Data Base Processes
FIG. 19. Processes in LDNs.
DISTRIBUTED LOOP COMPUTER NETWORKS
205
Distributed Loop Data Base System (DLDBS)
Driver 1
-
Enforcer
. )
DPA
-C
DPA
-+
DPA
L
--c
Multi-Host Transaction Processor
- - - _ - -
-
- - - __ -
-
-
Error Recovery
-
v Local Data Inquirer
Transaction from/ to User Processes
Local
Data
Base
FIG.20. DLDBS process interaction.
tures (local access control, integrity, etc.), make local mappings between views, and handle local access strategies, optimization, and related functions. DLDBS special processes for distributed services are the “experts” on foreign matters since they handle all interdatabase interactions. Several distributed service functions for DLDBS have been identified, as shown in Fig. 20. In a typical scenario, a user process requests service from DLDBS by means of a transaction. The driver accepts the user transaction, attaches information needed by other components (transaction identification, time, etc .), a n d invokes the consistency enforcer. This component implements a DPA to maintain consistency among the LDNs. Since a transaction may or may not be satisfied locally, the consistency enforcer calls for assistance from the local data inquirer. This component interacts with local data base processes to find out whether the transaction is referring to local or foreign parts (or both, since duplication is
206
MlNG T. LIU
allowed). Thus, the consistency enforcer at any LDN can know about the locality of a given transaction. Note that update transactions are the main concern of the consistency enforcer. Once the transaction is found to be safe for processing (no violation of consistency), the multihost transaction processor is invoked. The function performed at that point is to process efficiently a transaction whose answer set (data satisfying that transaction) spans more than one LDN. Note that this component also implements a DPA (see Fig. 20). When the transaction is completely processed, the user process is properly informed by the driver. Should crashes occur, the error recovery component at each LDN starts communicating and interacting to handle such events. This function is also an example of a DPA implemented in DLDBS. Finally, during the progress of the transaction activity, all the aformentioned components send performance information to the statistical collector for gathering transaction statistics. The design of data management functions in a network environment brings together several types of problems (Peebles and Manning, 1978). In a heterogeneous computer network, practical problems like data translation have to be solved. In general, there are no elegant solutions to syntactic translation problems (Levine, 1977) and semantic translation problems (Shu et al., 1975). Other problems of data management arising in the network environment are those of allocating data (Cheng, 1976; Morgan and Levin, 1977), accessing remote data (Passafiume and Wecker, 1977), storing catalog information (Stonebraker and Neuhold, 1977). locating nonlocal files (Thomas, 1973), recovering from network failures (Mills, 1975), etc. However, more fundamental problems that may in fact deal indirectly with some of the aforementioned problems, are those of consistency and multihost transaction processing. These problems have been very actively attacked by many researchers (Alsberg and Day, 1976: Ellis, 1977; Eswaran r t ul.. 1976; Mullery, 1975; Thomas, 1975). 6. Performance Studies
Meaningful analysis of computer networks requires the use of quantitative analytical tools. Queueing theoretic models of centralized computer systems consisting of one or more CPUs. several I/O procssors, terminals, and other peripherals have been studied intensively (Kleinrock, 1975, 1976). Baskett et al. (1975) gave a solution to a very general model that includes different classes of customers and four types of servers.
DISTRIBUTED LOOP COMPUTER NETWORKS
207
Also, queueing analysis of different types of loop communication subnetworks has been performed by many researchers, e.g., Konheim (1972, 1976), Konheim and Meister (1972a,b, 1973, 1974), and Spragins (1972a,b) on IBM loops (centralized control, fixed-length messages): Hayes (1974) on SPIDER (centralized control, fixed-length messages); Carsten et al. (1977), Kaye (1972), Robillard (1974), and Yuen ef al. (1972) on Newhall loops (distributed control, single message of variable length): Anderson et al. (1972) and Hayes and Sherman (1971) on Pierce loops (distributed control, multiple messages of fixed length); and Liu ef al. (1977a,b) on DLCN loops (distributed control, multiple messages of variable length). Modeling and analysis of general computer-communication networks has also been reported by Kobayashi and Konheim (1977). However, all of the aforementioned papers dealt with either centralized computer systems or loop communication subnetworks only: none of them considered a network of computers as a whole system. Two recent papers by Babic et a / . (1977) and Labetoulle et al. (1977) are the only work that has analyzed the whole loop computer network. 6.1 Analytical Comparison
of Three Loop Subnetworks
In Section 2.4, it was claimed that the novel message transmission protocol developed for use in the distributed loop computer network (DLCN) is faster and more efficient than that for use in Newhall-type loops or in Pierce-type loops. The purpose of this section is to demonstrate the superior performance (shorter message delay and higher channel utilization) claimed for this novel protocol by analytical comparison (Liu et al., 1977a). Due to lack of space, the treatment will be brief and to the point. Analytical formulas for average message delay and channel utilization have been derived by Kaye (1972) for the Newhall loop, by Hayes and Sherman (1971) for the Pierce loop, and by Liu et ul. (l977b) for the DLCN loop. For simplicity, only symmetric loops (symmetric traffic pattern and identical nodal characteristics) are considered here. A glossary of terms used in the formulas is given in Table 11. 6.1.1 Analytical Results of the Newhall Loop
Formulas given below for the Newhall loop were derived by Kaye (1972). Data sources have the following characteristics: After a data source generates one message it will be inactive until that message is multiplexed onto the loop. Afterward the data source behaves as a Poisson process with parameter A , but only until the next message is produced. Then the data source is again inactive. This process generates
208
MlNG T. LIU TABLE I1 GLOSSARYOF TERMS
c
Capacity of the communication channel (bps) Number of nodes in the loop Arrival rate of messages from data Sources (messagesisec) Average message length (hits) Second iiionient of message length (bits2) Average duration of the active period of the data source (scC) Average duration of the idle period of the data source (sec) Bit rate during the active period (bps) Utiliration of the data source Number of information bits per packet (bits) Averagc message delay (includingall waiting times and the time for multiplexing t h e message onto the loop) (sec) Channel utilization Number of bits in the address field (hits)
N
x
ri
LI
messages with effective average interarrival rate heR = A(1 - P,J (PI,is given below). ( 1 ) Channel utilization:
u = h(l
-
P,)/2pC
where PI, is the portion of lost messages given by P , = hTI(I
(2)
Average message delay:
wherc
and K is determined by
+ AT).
209
DISTRIBUTED LOOP COMPUTER NETWORKS 6.1.2 Analytical Results of the Pierce Loop
Formulas given below for the Pierce loop were derived by Hayes and Sherman (1971). In their model, data sources are characterized by having active and idle periods and both are assumed to be exponentially distributed. During an active period, a data source produces bits in constant rate and generates one message. (1) Channel utilization:
U
=
rNBp/2C
(6.3)
where r = p’Qu
Q = 1/[1
-ex~(-B~~.’/b)l
u = X’/(h’
+ p’).
(2) Average message delay:
T=,+ Y p
+ e*y + - ye(i + e*)) ~
y2e(i
pyi
U* * ( -i ye(i
YO*
+ e*)) + -
(6.4)
where Y = p’QBp/C
e= 8* = R*B,/(C - R*Bp) R* = r ( N / 2 - 1)
U* = R*B,/C
M* = p f / ( i + e * ) y . 6.1.3 Analytical Results of the DLCN
Loop
Formulas given below for the DLCN loop were derived by Liu et al. (1977b). Data sources are assumed to be characterized by instantaneous generation of messages according to the Poisson process, and distribution of message lengths is general. (1) Channel utilization:
U ( 2 ) Average message delay:
=
hN/2pC.
210
MlNG T. LIU
where
T, = N A z p / / 4 C ) C p - A ) T.. = N A Z p 2 / 2 ( C p- A)(2Cp - N A ) . 6.1.4 Performance Comparison of Three Loops
Because of different characteristics of data sources assumed in the analysis of each loop, the DLCN loop is compared separately with the Newhall loop and the Pierce loop. Figures 21 and 22 show the average message delay of the DLCN as compared with that of the Newhall loop and the Pierce loop, respectively, and clearly demonstrate that the DLCN loop has a shorter message delay. Better channel utilization can be easily verified by comparing Eqs. (6.1), (6.3), and (6.5). 6.2 Simulation Results of Three Loop Subnetworks
Simulation models were written in GPSS and run on an I S M 370/168 for all three types of loop subnetworks. The primary quantity of interest in this study was again the average message delay, although many other quantities, such as buffer size, average message length, channel utilization, etc., were measured during the simulation (Reames and Liu, 1976; Reames, 1976). The general characteristics of all three loop subnetworks modeled were the same. Each consists of six nodes, with each message source being an identical independent Poisson process. Messages produced at each node were uniformly addressed among the other five nodes, so that message traffic was entirely symmetric and random. Message data lengths were exponentially distributed with a mean of 50 characters; 9 additional characters of header information were added to each message or packet produced. All timing was in arbitrary character-time units, so that no particular line rate was assumed. Propagation delay on the communication channel itself was ignored, while each interface contributed two units of delay: one unit in the receiver for address checking and one unit in the transmitter. Figur, 23 shows a graph of average message delay (both including and excluding acknowledgment time for DLCN) versus average source arrival rate for all three loop subnetworks. Again, DLCN is better than either of the other two. It is interesting to note that even if the time required for an acknowledgment message to return to the transmitter is included in DLCN’s total time, it still performs better than the Pierce loop at all traffic loads and better than the Newhall loop except at very low utili-
21 1
DISTRIBUTED LOOP COMPUTER NETWORKS
?.Era !
1
1 .a
,
2.50
I
3.75
,
5.00
EFF MESSAGE ARRIVAL RATE [MSG/SEC]
I
6.25
*lB3
7 50
FIG.2 I . Average message delay (DLCN vs. Newhall).
zation levels. Thus, automatic message acknowledgment is essentially being performed for free in DLCN. 6.3 Performance Study of the Whole DLCN System
In order to implement a prototype of the distributed loop computer network (DLCN), design parameters such as buffer size, channel utili-
21 2
MlNG T. LIU
a
1
3.00
I
6.00
I
9.00
1
12.00
MESSAGE A R R I V A L RATE [ MSG/SEC]
1k.m
a 10’
B.08
FIG. 22. Average message delay (DLCN vs. Pierce).
zation, message delay, user response time, system throughput, etc., must be obtained through analysis and simulation. The performance study of the whole DLCN system (consisting of the DLCN loop and attached host computers) has assumed a very important role in prototype design. The ultimate objective is to develop closed-form formulas for all system design parameters of interest. The study has followed the standard steps: ( I ) development and analysis of queueing-theoretic models, (2) develop-
213
DISTRIBUTED LOOP COMPUTER NETWORKS Q
3
0
DLCN ( W 0 ACK TIME?
X
DLCN ( L I T H ACK T I E )
#
P I E K E LOOP
+
W:WCiALL LOGF
1
1 .m
9
1 .m
1
2. h@
I
3.40
MEAN MESSAGE A R R I V A L R A T E
I
1jzQ
I
Be
~10‘
FIG. 23. Mean total transmission time for all three networks.
ment and use of simulation models, and (3) comparison of the results obtained from (1) and (2). A simple and general approach to analyzing a loop computer network using arbitrary transmission protocols has recently been developed by Babic et al. (1977). In this approach the entire loop communication subnetwork is considered as one single server, the service rate of which is obtained from analysis of communication subnetworks, as discussed
214
MlNG T. LIU
-
Stream ith
C(i)
Host
Stream A ( i )
FIG.
24. A simplified model of loop network
in Section 6.1. Another significant feature of this approach, as shown in Fig. 24, considers at each step (iteration) only one host at a time, which interacts with other hosts through message streams A i and B , , and with the loop communication subnetwork modeled as a single server (I,) through message stream C,. Stream Bi contains responses returned to the ith host from all other hosts on remote requests made by the ith host. Stream A , contains remote requests made by all other hosts to be satisfied at the ith host. Stream Ci contains responses to requests from stream A , and remote requests made from the ith host to all other hosts. Once the system in Fig. 24 has been solved for a particular host (the first iteration), another host can be taken into consideration (the second iteration), then the third host, etc., until all hosts in the system are exhausted. It is worth noting that using the approach above, the problem of modeling and solving a whole computer network is reduced to queueing analysis of a single centralized computer system with addition of one (or more) FCFS server(s) and two (or more) Poisson input streams. Closed-form formulas have been derived by Babic rt ul. (1977) for calculating several design parameters of interest, namely, average response time to users, mean queue lengths of the servers. and utilization of processors as a function of locality of references, system load, and service rates. Simulation was also carried out on an IBM 370/168, using GPSS to verify and evaluate the analytical results, and to determine their domain of validity. It turns out that analytical and simulation results agree over a wide range of interest. Due to lack of space, the reader is referred to the original paper covering this subject (Babic et a l . , 1977).
DISTRIBUTED LOOP COMPUTER NETWORKS
215
7. Conclusion
The preceding sections have considered various aspects of the system design of the distributed loop computer network (DLCN), a distributed processing system envisioned as a means of investigating fundamental problems in distributed processing and local networking. Research concerning DLCN is primarily directed toward geographically local communities of semi-autonomous midi-, mini-, and microcomputer users, who occasionally have need of computing services or resources that are present elsewhere in the network, yet that are not available locally. Such a computing environment is typical of that frequently found today in many industrial, commercial, and university settings. Bringing the cost advantages and performance improvements of distributed processing to such computing groups would be a significant achievement and is, therefore, one of the major design goals for DLCN. The ARPANET has demonstrated that a national network connecting over 80 large-scale computer systems around the country is economically practical; the research on DLCN hopes to prove that networking can also be beneficial to midi-, mini-, and microcomputer users in a localized community. A university campus, which already contains dozens of small-scale computers scattered among many academic departments, seems a logical place to test the feasibility of this goal. Thus, implementation of an experimental prototype of DLCN at The Ohio State University is currently underway. The initial network consists of six nodes, interconnecting an IBM 370/168, DECsystem-10, PDP-8/S, PDP-9, PDPI 1/45, MICRO-1600/21, and some special I/O devices. In support of this effort, additional research on distributed processing and computer networking is now being conducted in the areas of N process communication protocols, distributed processing algorithms, distributed programming systems, distributed file and data base systems, etc. As new results are obtained from these efforts, additional reports can be expected. It is hoped that with its careful integration of hardware, software, and communications technologies, DLCN can meet its expectations and become a forerunner of future distributed processing systems of this type. ACKNOWLEDGMENTS The author wishes to express his appreciation to many people who have been involved with the DLCN project since its inception in 1974. Special thanks are due to Dr. Cecil C. Reames, currently with the Burroughs Corporation, Mission Viejo, California, who contributed a great deal in the initial stage of the project. Thanks are also due to Agnes M a , Gojko Babic, Roberto Pardo. Young Oh. D. P. Tsay. and Jacob Wolf, who have contributed
21 6
MlNG T. LIU
directly or indirectly. Many parts of this paper are based on research results that were published in I I technical papers during the past three years. Research reported herein was supported in part by grants from N S F and AFOSR. Acknowledgments are also due to the Graduate School for providing a research grant to stimulate the research, and to the Instruction and Research Computer Center for providing computing funds to carry out the simulation. Finally, the author would like to thank Professor Marshall C. Yovits for his constant encouragement, Professor Daniel J. Moore for his helpful suggestions, and Professor Jerome Rothstein for both.
REFERENCES Abramson. N. (1970). The ALOHA system-another alternative for computer communications. Proc. AFIPS Full J t . Comput. Conf. pp. 281-285. Abramson, N.. and Kuo, F. F., eds. (1973). “Computer-Communication Networks.” Prentice-Hall, Englewood Cliffs. New Jersey. Akkoyunlo, E., Bernstein, A., and Schantz, R. (1974). Interprocess communication facilities for network operating system. Computer 6(6), 46-55. Alsberg, P. A., and Day, J . D. (1976). A principle for resilient sharing of distributed resources. Proc. 1111. Cot$ Software Eng., 2nd pp. 562-570. “Am2900 Bipolar Microprocessor Family Data Book” (1976). Advanced Micro Devices, Sunnyvale. California. Anderson, R. R., Hayes, J. F., and Sherman, D. N. (1972). Simulation performance of a ring-switched data network. IEEE Truns. Cummun. COM-20, 576-591, “ARPANET Protocol Handbook” (1976). NIC-7104. Network lnformation Center, Stanford Research Institute, Stanford, California. Ashenhurst, R. A., and Vonderrohe, R. H. (1975). A hierarchical network. Datamation 21(2), 40-44. Babic, G. A., Liu, M. T., and Pardo, R. (1977). A performance study of the distributed loop computer network (DLCN). Pro(,. Comput. N c t , i w k i n g Symp. pp. 66-75. Baskett, F., Chandy, K. M., Muntz, R. R., and Palacios, F. G . (1975). Open, closed, and mixed networks of queues with different classes of customers. J . Assoc. Comput. M a c h . 22, 248-260. Benoit, J . W., and Graf-Webster. E. (1974). Evolution ofnetwork user service-the network resource manager. Proc. Symp. Comput. Networks pp. 2 1-24. Brinch Hansen. P. (1970). The nucleus of a multiprogramming system. Commun. A C M 13, 238-241. Carr, C. S., Crocker, S. D., and Cerf, V. G. (1970). Host-host communication protocol in the ARPA network. Pro(,. AFIPS Spring J t . c‘omput. Conf. ‘pp. 589-597. Carsten, R. T.. Newhall, E. E . , and Posner. M. J. M. (1977). A simplified anatysis of scan time in an asymmetric Newhall loop with exhaustive service. lEEE Trans. Commun. COM-25, 95 1-957. Cheng, T . T. (1976). Design consideration for distributed data bases in computer networks. PhD. Thesis. Department of Computer and Information Science, Ohio State University, Columbus. Coker, C. H. (1972). An experimental interconnection of computers through a loop transmission system. BeN S y s t . Tech. J . 51, 1167-1 175. Combs, R. (1973). TYMNET: A distributed network. Datamarion 19(7), 4 0 4 3 . Cosell, B. P.. Johnson, P. R., Malman, J. H., Schantz, R. E., Sussman. J . , Thomas, R.
’ DISTRIBUTED LOOP COMPUTER NETWORKS
21 7
H., and Walden, D. C. (1975). An operating system for computer resource sharing. ACM Oper. Syst. Rev. 9(5), 75-81. Crocker, S. D., Heafner, J . F., Metcalfe, R. M., and Postel, J. B. (1972). Function-oriented protocols for the ARPA computer network. Proc. AFIPS Spring J t . Comput. Conf. pp. 271-280. Davies, D. W., and Barber, D. L. A. (1973). “Communication Networks for Computers.” Wiley, New York. Distributed processing workshop transcript, Part I. (1976). ACM Comput. Arch. News 5(5), 10-28. Distributed processing workshop transcript, Part 11. (1977). ACM Comput. Arch. News 5(6), 8-14. EIA Standard RS-422 (1975). “Electrical Characteristics of Balanced Voltage Digital Interface Circuits.” Engineering Department, Electronics Industries Association, Washington, D.C. Ellis, C. A. (1977). A robust algorithm for updating duplicated databases. Proc. Berkeley Workshop Distribufed Data Manag. Comput. Network, 2nd pp. 146-158. Elovitz, H. S., and Heitmeyer, C. L . (1974). What is a computer network? Proc. Natl. Telecomrnun. Conf. pp. 1007-1014. Enslow, P. H. (1975). Operating system command languages, a brief history of their study. in “Command Languages” (C. Unger, ed.), pp. 5-24. North-Holland Publ., Amsterdam. Enslow, P. H. (1978). What is a “distributed” data processing system? Cornpurer 11(1), 13-21. Eswaran. K . P., Gray, J. N . , Lorie, R. A., and Traiger, I. L. (1976). The notions of consistency and predicate locks in a database system. Commun. ACM 19, 624-633. Farber, D. J . , Feldman, J., Heinrich, F. R., Hopwood, M.D., Larson, K. C., Loomis, D. C., and Rowe, L. A. (1973). The distributed computing system. Proc. COMPCON 73 pp. 31-34. Farmer, W. D., and Newhall, E. E. (1969). An experimental distributed switching system to handle bursty computer traffic. Proc. ACM Symp. Probf. Opfim. Dafa Commun. Syst. pp. 1-33. Forsdick, H . C., Schantz, R. E., and Thomas, R. H. (1978). Operating systems for computer networks. Compufer l l ( l ) , 48-57. Fraser, A. G. (1974). “Loops for Data Communication,” Tech. Rep. No. 24. Bell Laboratories, Murray Hill, New Jersey. Fraser, A. G. (1975). A virtual channel network. Datamation 21(2), 51-56. Fuchs, E., and Jackson, P. E. (1970). Estimates of distributions of random variables for certain computer communications traffic models. Cornmun. ACM 13, 752-757. Gray, T. E. (1976). Job control in a network computing environment. Proc. COMPCON Spring 76 pp. 146-149. Greenberger, M., Aronofsky, J., McKenney, J. L., and Massy, W. F., eds. (1974). “Networks for Research and Education: Sharing of Computer and Information Resources Nationwide.” MIT Press, Cambridge, Massachusetts. Hafner, E. R. (1974). Digital communication loops-a survey. Proc. Int. Zurich Semin. pp. DI .I-D I .7. Hafner, E. R., and Nenadal, Z. (1976). Enhancing the availability of a loop system by meshing. Proc. Inr. Zurich Semin. pp. D4. I-D4.5. Hafner, E. R.. Nenadal, Z . , and Tschanz, M. (1974). A digital loop communication system. IEEE Truns. Commun. COM-22, 877-881. Hayes, J . F. (1974). Performance models of an experimental computer communication network. Bell S y s f . Tech. J . 53, 225-259.
21 8
MlNG T. LIU
Hayes, J. F.. and Sherman. D. N. (1971). Traffic analysis of a ring switched data transmission system. Bell Sysf. Tech. J . 50, 2947-2978. Heart, F. E . , Kahn, R . E., Omstein, S. M., Crowther, W. R.. and Walden, D. C. (1970). The interface message processor lor the AKPA computer network. Proc. AFIPS Spririg .It. L'ot~ipu/.C'orlJ pp. 551-567. Jackson, J . R . (1963). Jobshop-like queueing systems. M a n a g e . Sci. 10, 131-142. Jackson. P. E . , and Stubbs, C . D. (1969). A study of multi-access computer communications. Proc.. AFIPS Fall J t . Comput. Con.f:pp. 491-504. Jensen, E. D. (1975). The influence of microprocessors on computer architecture: distributed processing. Proc. A C M Annu. Conj: pp. 125-128. Kahn, R. E. (1972). Resource-sharing computer communications networks. Proc. IEEE 60, 1397-1 407. Kaye, A. R. (1972). Analysis of a distributed control loop for data transmission. Proc. SJrwp. ('onipur.-Cot,i/,irrri. N c l i i w X s 7'ulurrriffic pp. 47-58. Kimbleton. S. R . , and Mandell. R. L. (1976). A perspective on network operating systems. AFIPS N u t / . Comput. Conf: Expo., Conf. Proc. pp. 551-559. Kimbleton, S. R., and Schneider. G. M . (1975). Computer communications networks: Approaches, objectives. and performance considerations. A C M C'ompur. Surv. 7 , 129173. Kleinrock, L. (1974). "Queueing Systems," Vol. 1. Wiley, New York. Kleinrock, L. (1976). "Queueing Systems," Vol. 2. Wiley, New York. Kobayashi. H.. and Konheim, A . C.(1977). Queueing models for computer communications system analysis. IEEE l'rtrns. Cornmun. COM-25, 2,29. Konheim. A . G . (1972). Service epochs in a loop system. pro(.. Sytttp. Cor,ipirr.-C'o,nmIr;l. Nc1tworL.s Trrujfic. pp. 12.5- 143. Konheim, A. G . (1976). Chaining in a loop system. IEEE Trans. C'ornmun. COM-24, 203209. Konheim, A . G . , and Meister, B. (1972a). Service in a loop system. J . Assoc. ('omput. Mt1c.h. 19, 92-108. Konheim, A. G . , and Meister. B . (l972b). Two-way traffic in loop servicc systems. Nerworks 1, 291-301.
Konheim. A . G..and Meister. B. (1973). Distribution of queue lengths and waiting times in a loop with two-way traffic. J . L'omput. Sysr. sci. 7, 506-521. Konheim, A. G . , and Meister, R. (1974). Waiting lines and times in a system with polling. J . A s s o c . Comput. Much. 21, 470-490. Kropfl, W. J . (1972). An experimental data block switching system. Rrll Sysr. Tech. J . 51, 1147-1 165. Labetoulle. J . . Manning, E. G . , and Peebles, R. (1977). A homogeneous computer network: Analysis and simulation. Cnmpuf. Nztworks 1, 225-240. Lennon, W. J . (1974). A mini-computer network for support of real time research. Pro(,. AC'IZI A / I / I uC. ' o / I f : pp. 595-604. I.evine, P. H. ( 1977). "Facilitating Interprocess Communication in a Heterogeneous Nctwork Environment," TR- 184. MIT Laboratory li)r Computer Science. Cambridge, Massachusetts. Licbowitz. B. H . , and Carson. J . H . , eds. (1977). "Distributed Processing," IEEE Publ. No. EH-1027-1. IEEE Press, New York. Lipsky. L.. and Church, J . D. (1977). Applications of a queueing network model for a computer system. AC'M C'ompuf. Surl,. 9, 205-221. Liu. M. T . , and Ma, A. L. (1977). Control and communication in distributed processing. Proc.. Irrr. Coittpuf. Ssnip. Vol. I , pp. 123-138.
DISTRIBUTED LOOP COMPUTER NETWORKS
21 9
Liu, M. T., and Reames, C. C. (1975). The design of the distributed loop computer network. Pro(,. Int. Comput. Symp. Vol. I , 273-282. Liu, M. T., and Reames, C. C. (1977). Message communication protocol and operating system design for the distributed loop computer network (DLCN). Proc. Annu. Symp. Comput. Arc,h. 4th pp. 193-200. Liu, M. T., Pardo, R., and Babic, G . (1977a). A performance study of distributed control loop networks. Pro(.. Int. Conf. Porullol Process. pp. 137- 138. Liu, M. T., Babic, G., and Pardo. R. (1977b). Traffic analysis of the distributed loop computer network (DLCN). Proc. Nor/. Telecommun. Conf. pp. 31.5.1-31.5.7. McQuillan, J. M., and Walden. D. C. (1977). The ARPA network design decisions. Cornput. Networks, 1, 243-289. Malhotra, R. (1975). Interaction monitor in a distributed system. AFZPS Natl. Comput. Conj: E x p o . , Conf. Proc. pp. 705-714. Manning, E. G . , and Peebles, R. W. (1977). A homogeneous network for data sharing. Cornput. Networks, 1, 21 1-224. Metcalfe. R. M. (1972). Strategies for operating systems in computer networks. Proc. ACM Annu. Conf. pp. 278-281. Metcalfe, R. M., and Boggs, D. R. (1976). Ethernet: Distributed packet switching for local computer networks. Commun. A C M 19, 395-404. Mills, D. L. (1975). “Dynamic File Access in a Distributed Computer Newtork,” Comput. Sci. TR-415. University of Maryland, College Park. Mills, D. L . (1976). An overview of the distributed computer network. AFZPS Natl. Comput. Conf. E x p o . , Conf. Proc. pp. 523-531. Morgan, H. L., and Levin, K . D. (1977). Optimal program and data location in computer networks. Commun. ACM 20, 315-322. Mullery, A. P. (1975). “The Distributed Control of Multiple Copies of Data,” IBM Tech. Rep. RC-5782. Yorktown Heights, New York. Newhall, E. E., and Venetsanopoulos, A. N. (1971). Computer communications-representative systems. Proc. IFZP Congr. pp. 545-552. Oh, Y . (1977). Interface design for distributed control loop networks. M.S. Thesis, Department of Computer and Information Science, Ohio State University, Columbus. Oh, Y.. and Liu, M. T. (1977). Interface design for distributed control loop networks, Proc. Natl. Telecommun. Conf. pp. 34.4.1-34.4.6. Pardo, R., Liu, M. T., and Babic, G. A . (1977). Distributed services in computer networks: Designing the distributed loop data base system (DLDBS). Proc. Comput. Networking Symp. pp. 60-65. Pardo, R., Liu, M. T., and Babic. G . A. (1978). An N-process communication protocol for distributed processing. Pro(.. S.vmp. Cornput. N r r w r k Protocols. pp. D7.10. Passafiume, J . J., and Wecker, S. (1977). Distributed file access in DECNET. Proc. Berkeley Workshop Distributed Doto M a n a g e . Comput. Networks, 2nd pp. 114- 129. Peebles, R. W., and Manning, E. G . (1978). System architecture for distributed data management. Computer 11, ( I ) , 40-46. Pierce. J . R. (1972a). How far can data loops go? IEEE Trans. Commun. COM-20, 527530. Pierce, J . R. (1972b). Network for block switching of data. Bell S y s t . Tech. J . 51, 11331145. Pierce, R. A., and Moore, D. H. (1977). Network operating system functions and microprocessor front-ends. Proc. COMPCON Spring 77 pp. 325-328. Pouzin, L. (1973). Presentation and major design aspects of the CYCLADES computer network. Proc. DATACOM, 3rd pp. 80-87.
220
MlNG T. LIU
Reames, C. C. (1976). System design of the distributed loop computer network. Ph.D. Thesis, Department of Computer and Information Science, Ohio State University, Columbus. Reames, C. C. and Liu, M. T. (1975). A loop network for simultaneous transmission of variable-length messages. Proc. Annu. Symp. Comput. Arch., 2nd pp. 7-12. Reames, C. C. and Liu, M. T. (1976). Design and simulation of the distributed loop computer network (DLCN). Proc. Annu. Symp. Comput. Arch., 3rd pp. 124-129. Retz, D. L. (1975). Operating system design consideration for the packet-switching environment. AFIPS Natl. Comput. C o d . Expo., Conf. Proc. pp. 155-160. Retz, D. L. and Schafer, B. W. (1976). Structure of the ELF operating system. AFIPS Natl. Comput. Conf. Expo., C o d . Proc. pp. 1007- 1016. Roberts, L. G. (1973). Network rationale, a 5-year reevaluation. Proc. COMPCON 73 pp. 3-5. Roberts, L. G. (1977). Packet network design-the third generation. Proc. IFIP Congr. pp. 54 1-546. Roberts, L. G., and Wessler, B. D. (1970). Computer network development to achieve resource sharing. Proc. AFIPS Spring J t . Compur. Con$ pp. 543-549. Robillard, P. N. (1974). An analysis of a loop switching system with multirank buffers based on the Markov process. IEEE Trans. Commun. COM-22, 1772-1778. Robinson, R. A. (1977). National software works: Overview and status. Proc. COMPCOK Fall 77 pp. 210-273. Rosenthal, R., and Watkins, S. W. (1974). Automated access to network resources, a network access machine. Proc. Symp. Comput. Networks pp. 47-50. Rothnie, J. B., and Goodman, N. (1977). An overview of the preliminary design of SDDI : A system for distributed databases. Pruc. Berkeley Workshop Distributed Data Manage. Cornput. Networks, 2nd pp. 39-57. Schicker, P., and Duenki, A. (1976). Network job control and its supporting services. Proc. Int. Con$ Compur. Commun., 3rd pp. 303-307. Schwartz, M. (1977). “Computer Communication Network Design and Analysis.” PrenticeHall, Englewood Cliffs, New Jersey. Shu, N. C., Housel, B. C., and Lum, V. Y . (1975). CONVERT: A high level translation definition language for data conversion. Commun. ACM 18, 557-567. Spragins, J. D. (1972a). Loops used for data collection. Proc. Svmp. Cornput.-Commun. Networhs Teletraflc pp. 59-76. Spragins, .I.D. (1972b). Loop transmission systems-mean value analysis. IEEE Trans. Commun. COM-20, 592-602. Stankovic, J., and van Dam, A. (1977). “The Distributed Processing Workshop,” Tech. Rep. CS-32. Brown University, Providence, Rhode Island. Steward, E. H. (1970). A loop transmission system. Proc. Int. Commun. Conf. pp. 36.136.9. Stonebraker, M.,and Neuhold, E. (1977). A distributed data base version of INGRES. Proc. Berkeley Workshop Distributed Data Manage. Comput. Networks, 2nd pp. 19-36. Thomas, R. H. (1973). A resource sharing executive for the ARPANET. AFIPS Notl. Compur. Conf. Expo.. Conf. Proc.. pp. 155- 163. Thomas, R. H. (1975). “ A Solution to the Update Problem for Multiple Copy Databases which uses Distributed Control,” BBN Rep. No. 3340. Bolt Beranek and Newman, Cambridge, Massachusetts. Thomas, R. H., and Henderson, D. A. (1972). McRoss-a multi-computer programming system. Proc. AFIPS Spring J I . Comput. Conf. pp. 281-293.
DISTRIBUTED LOOP COMPUTER NETWORKS
221
Timlinson, R. T. (1976). A high level computer control language. Proc. A C M Annu. Conf. pp. 381-386. Walden, D. C . (1972). A system for interprocess communication in a resource sharing computer network. Commun. A C M 15, 221-230. Wecker, S. (1973). A design for a multiple processor operating environment. Proc. C O M P CON 73 pp. 143-146. Weller, D. R. (1971). A loop communication system for I/O to a small multiuser computer. Proc. I E E E Comput. So(..Conf., pp. 77-80. West, L. P. (1972). Loop transmission control structure. IEEE Trans. Con?mun. COM-20, 53 1-539. White, J. E . (1976). A high-level framework for network-based resource sharing. A F I P S N a t l . Comput. Conf. Expo.. Con$ Proc.. pp. 561-570. White, J. E. (1977). Element of a distributed programming system. Comput. Languages 2(4), 117-134. Wood, D. C . (1975). A survey of the capabilities of 8 packet switching networks. Proc. Svmp. Comput. Networhs pp. 1-7. Yuen, M. L . T., Black, B. A., Newhall, E. E., and Venetsanopoulos, A. N. (1972). Traffic flow in a distributed loop switching system. Proc. Symp. Compur.-Coinmun. Ne/tt*orAs Tdctruffic pp. 29-46. Zafiropulo, P. (1974). Performance evaluation of reliability improvement techniques for single loop communication systems. IEEE Trans. Commun. COM-22, 742-75 1.
This Page Intentionally Left Blank
ADVANCES IN COMPUTERS. VOL . 17
Magnetic Bubble Memory and Logic TIEN CHI CHEN
ISM San Jose Research Laboratory San Jose. California
AND HSU CHANG ISM Watson Research Center Yorktown Heights. New York
I.
2.
3.
4.
5.
Introduction . . . . . . . . . . . . The Magnetic Bubble Phenomenon . . . . 1.1 The Formation of Bubbles . . . . . 1.2 Field-Induced Propagation . . . . . 1.3 Detailed Control by Electrical Currents . 1.4 Detection and Replication . . . . . I .5 Bubble Generation and Annihilation . . Bubbles as Memory . . . . . . . . 2.1 The Major-Minor Loop Scheme . . . 2.2 Fabrication of Rubble Memory Chips . 2.3 Factors Affecting Density and Cost . . 2.4 Current Development . . . . . . 2.5 Unconventional Bubble Memory Devices Magnetic Bubble Logic . . . . . . . 3.1 Overcoming Physical Constraints . . . 3.2 AND-OR Logic . . . . . . . . 3.3 Multitrack Logic . . . . . . . . 3.4 Dynamic and Static Deflectors . . . 3.5 Pipelined Array Logic . . . . . . 3.6 Bubble Logic Systems . . . . . . 3.7 Boolean Logic and Bubbles . . . . Steering of Bubbles for Text Editing . . . 4 . I Character Manipulation . . . . . . 4.2 Line and Page Management . . . . Storage Management . . . . . . . . 5.1 Storage Hierarchies . . . . . . . 5.2 Dynamic Reordering . . . . . . 5.3 A Steering Switch in Two Forms . . . 5.4 Linked Loops and the Ladder . . . . 5.5 The Uniform Ladder . . . . . .
. . . .
. . . . .
. . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . .
224 225 225 226 228 231 232 232 233 235 236 237 240 243 243 247 248 249 251 251 252 252 253 256 257 258 259 261 262 266
. .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . .
.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
223
.
Copyright0 1978 by Academic Press Inc . Ail rights of reproduction in any form reserved . ISBN 0-12-012117-4
224 6.
7.
8.
TlEN CHI CHEN AND HSU CHANG Sorting . . . . . . . . . . . . . . 6.1 The Odd-Even Transposition Sort . . . 6.2 Folding the OETS Scheme . . . . . . 6.3 Sorting within InpuVOutput Time . . . . Information Selection and Retrieval . . . . . 7.1 Selection by Address . . . . . . . 7.2 Associative Search . . . . . . . . 7.3 Data Base Implications and Intelligent Storage Summary and Outlook: More than Memory . . References . . . . . . . . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . . . . . . . . .
269 270 271 272 274 274 276 277 278 279
lntroductlon
Bubbles are small (I- 10 pm in diameter) magnetic domains of uniform size, capable of carrying digital information. Once generated, they can be deformed, moved, or replicated easily. Bubbles move synchronously under a global drive field, but also yield to local control by electric currents. They are a major contender for a high density, nonvolatile, electronically accessed memory. Magnetic bubbles were first observed and explained by Kooy and Enz (1960) in magneto-plumbite platelets. Barely one decade after Bobeck (1967) suggested the use of bubbles for computer information storage, bubble memory chips have become available commercially (Lee, 1977; Bobeck, 1977; Juliussen et al., 1977; Ypma and Swanson, 1977). This rapid development has been all the more remarkable in view of the routine use of new magnetic crystalline materials unknown before 1970, the adoption of precision fabrication and packaging techniques never attempted on the industrial scale, and the solution of many unusual physical problems heretofore unsuspected by the investigators. A 92-thousand bit bubble memory chip is now available. A million-bit chip will probably be announced officially by 1978-1979 (Archer, 1977). The next area of investigation is naturally bubble logic, especially the kind which takes advantage of the synchronized movement of bubble memory information. This survey starts with a brief account of the physical basis of the bubble phenomenon, then its exploitation in the form of bubble memories. The bulk of the article, however, is a discussion of bubble logic, taken in the generalized context to include not only Boolean logic, but the manipulation of bubble information flow as well. Several books on magnetic bubble technology have become available. Examples are Bobeck and Della Torre (1975), O’Dell (1974). and cornpendia of important reprints edited by Smith (1974), and by Chang (1975), the latter including 200 pages of a broad survey, plus an extensive bibliography. There are also comprehensive reviews by Bobeck et al. (1975),
MAGNETIC BUBBLE MEMORY AND LOGIC
225
Cohen and Chang (1979, and Chang (1978a). Surveys of recent industrial development have also appeared in Computer (Myers, 1977) and in Digital Design (Anonymous, 1977). 1. The Magnetic Bubble Phenomenon
1.1 The Formation of Bubbles
Certain magnetic materials, notably orthoferrites and magnetic garnets, exhibit strong magnetic anisotropy to sustain an “easy” magnetization direction. Within these crystals there are two types of magnetic domains, with opposing magnetization vectors along the easy direction. It is possible to cut thin platelets or grow thin films of the crystal, with the easy direction perpendicular to the plane, as shown in Fig. la. The
la1
Ibl
FIG.I . The formation of bubbles. (a) Serpentine domains when vertical magnetic field is low. (b) Bubbles form when field exceeds criticality.
magnetization vector points up in the white domains; it points down in the shaded domains. By the Faraday magnetooptic effect, these two domain types rotate the plane of polarized light differently: this makes them visible when sandwiched between two polarizing filters (for a description, see Carey and Isaac, 1966). At room temperature and in the absence of stress or external magnetic field, the two types of domains are roughly equal in volume. They intertwine in a complicated, serpentine fashion, to minimize the total energy. The imposition of a vertical bias magnetic field encourages one kind of domain, say the white one, to grow at the expense of the other. The shaded domains first become narrower, then start to shorten. The effectiveness for a material to support bubbles is measured by a quality factor Q , which is the anisotropy energy divided by the magnetostatic energy. Q should be much larger than two for practical bubble applications.
TlEN CHI CHEN AND HSU CHANG
226
When the bias field exceeds a critical size (HJ, the shaded serpents finally shrink to appear as tiny discs on the surface, see Fig. lb. These discs are the tops of magnetic bubbles. The latter are actually circular cylinders of uniform cross section, extending the full thickness of the film. It is convenient to consider bubbles as cylindrical magnets, suspended vertically in an ocean of opposite magnetic polarity. In a thin film of uniform composition and thickness, the bubble diameter depends only on the bias field, which can be made uniform to within 15% over a large film area. When H , continues to grow, the bubbles shrink in diameter; when H , reaches H d , another critical limit, the bubbles vanish altogether. From the foregoing we see that bubbles can exist only in the right material, and under a vertical bias field of appropriate size. When these conditions are met, bubbles will form. Throughout this survey we shall consider the bias field H , to be pointing upward, and the magnetization vector of the bubbles then points down. The bubble tops are then magnetically negative. 1.2 Field-Induced Propagation
Bubbles, being vertically suspended magnets, can change shape and position in response to magnetic fields. Figure 2 shows the effect on a bubble due to a magnetic pole lying on the upper surface. The positive pole attracts the bubble top and repels the bubble bottom; the repulsion being weaker than the attraction because of distance. There is thus an attempt to topple the bubble, and a net attraction towards the pole. Prevented from toppling because of the anisotropy of the material, the bubble moves as a whole toward the positive pole, to rest directly under it eventually. This movement is a continuous displacement of the domain walls
1
i' FIG.2. Effect of magnetic pole on bubble.
MAGNETIC BUBBLE MEMORY AND LOGIC
227
bounding the bubble, involving no transfer of matter. It is a low-energy process, nearly frictionless, and consumes little power. A bar of a soft magnetic material (permalloy) becomes an induced magnet when exposed to a magnetic field along its long axis. When the external field is withdrawn, the soft magnetic material returns to the original unmagnetized state. We have therefore a means to provide temporary centers of attraction for bubbles. Using a time-varying field over a periodic permalloy pattern, one can create the effect of moving attractive poles to guide the bubbles. This is done by enclosing the bubble film with two orthogonally intersecting coils, in which are passed a pair of alternating currents with a phase difference of 90". These currents generate at the center of the coil a steadily rotating magnetic field of fixed magnitude, and the plane of rotation can coincide with the film plane. If the film plane is overlaid with a periodic permalloy pattern, then at any time, the tips of all permalloy bars pointing in the field direction are strongly magnetized. As the field rotates, the set of induced magnetic poles changes with time. Bubbles follow the positive poles, and avoid the negative ones, provided that the distances between successive poles are short enough, and the changes of poles are sufficiently infrequent, to suit the bubble mobility. Figure 3 shows the so-called T-bar permalloy patterns. All points labeled I , 2, 3, and 4 become attractive when the field angle is 0", 90", 180", and 270", respectively. The four types of poles are switched on in succession, and the sequence is repeated to define a loop. Bubbles along the path will traverse the loop over and over again, once every seven field-rotation cycles.
Y
H@
4
FIG. 3 . Field propagation in permalloy loop pattern. During subcycle k , bubbles are attracted to poles labeled k. This results in the circulation of binary information inside the loop.
228
TlEN CHI CHEN AND HSU CHANG
The sequence of poles between repeated labels, say 1, 2, 3, 4, can define a “memory cell.” A cell may contain a bubble, or a “void (no bubble),” representing the binary digits “ 1” and “0” ,respectively. Under normal circumstances, no cell can contain more than one bubble, and the exact location of a bubble in an occupied cell is determined by the field angle. A cell-to-cell distance of about four bubble diameters is needed to avoid mutual interference. The state of occupancy of the collection of memory cells defines a bit pattern, Though only the bubbles move, the uniform background of voids gives the appearance that the entire bit pattern within the loop moves at the rate of one cell distance per cycle. The loop thus retains a piece of shifting binary information, and is a basis for a shift-register memory. The time required to move a bit between cell boundaries is a bit cycle. It exactly equals a field rotation cycle, which in turn matches the cycle of the alternating currents used. The bubble shift register memory has the following characteristics:
a) All binary information under the same rotating field are synchronized in motion, unless overruled by local control. b) When the rotating field stops, all bubble motion is suspended, to be resumed with the restart of field rotation. Meanwhile, as long as the bias field persists, the information remains intact. This start-stop capability is a distinct asset for bubble memories. c) Finally, bubble tracks can be engineered to allow two-way data motion, with the bubble data moving backwards when the field reverses its rotation. The T-bar patterns are not the only ones used for bubble propagation. All the patterns in Fig. 4 move bits to the right under a counterclockwise rotating magnetic drive field. The Y-bars have roughly the same characteristics as the T-bars. The half-disks (Gergis et a l . , 1976; Bonyhard and Smith, 1976) and the asymmetric chevrons (Bobeck, 1977) allow wider gaps between the permalloy shapes, reducing the precision demands for high-density fabrication. The chevron stacks can stretch bubbles laterally, and are important in bubble detection and replication. 1.3 Detailed Control by Electric Currents
It is very important to be able to manipulate bubble information locally, down to a single bubble position, and within a fraction of a cycle. This resolution can only be achieved by electric currents flowing through specially constructed conductors. The switching of the current on and
MAGNETIC BUBBLE MEMORY AND LOGIC
229
T-Bar
Y-Bar
uuu Half
UUU Asymmetric
UGW wgg Asymmetric
chevron
Chevron stack FIG.4. Permalloy patterns for field propagation.
off can ake place well within a quarter of a cycle, and as curren s travel very much faster than bubbles, its effect on the moving bubble is instantaneous. When the current is off, the conductors have no influence on bubble behavior, unless they are themselves magnetic. When a current flows through a conductor of uniform thickness, the magnetic field it generates is strongest where the width is least. Thus by varying the
230
TlEN CHI CHEN AND HSU CHANG
P
FIG. 4.
The eRect of line current on a bubble.
widths of the conductor, one can focus magnetic effects into a small area, while creating negligible magnetic disturbance along the way. In Fig. 5 , a current flows above the bubble film, into the plane of the paper. It generates a circular magnetic field, which reinforces the bias fieM on the left, and opposes the bias field on the right. A bubble on the left will tend to cross under the conductor, to rest somewhere on the right side. When supported by special permalloy patterns this is a convenient way to move many bubbles in parallel across the conductor. A current loop above the film generates a strong field inside to attract or repel a bubble, depending on the current direction. This is often used to control single bubbles in real time. A strong loop current can create or destroy a bubble below (Section I .5). Very weak currents are used to detect bubbles (Section 1.4). Intermediate currents are used to replicate (Section 1.4). or to propagate bubbles; for an account of current-propagation2of bubbles see the monograph by Bobeck and Della Torre (1975, pp. 160- 166). Permalloy can serve a variety of functions in bubble systems. It is magnetic, hence useful in propagating elements; magnetoresistive, hence important in bubble detection; and finally it is conductive, hence useful as conductor. The last property allows the use of permalloy for all conductor requirements, resulting in a sirtgle-level masking overlay, which eliminates the mask registration problem during chip fabrication, at the price of increased geometric constraint and heating (Bobeck et a / . , 1975). It is customary to use the term “field access” to designate the use of a prevailing inplane field to drive bubbles, and “conductor access” to designate propagation by means of electric currents through conductors. The word “access” carries a strong memory connotation, reflecting the intended use by the investigators.
MAGNETIC BUBBLE MEMORY AND LOGIC
231
1.4 Detection and Replication
In the “void-bubble” representation, decoding of bubble data requires the detection of bubbles as distinct from voids. This is a difficult problem; the tiny bubble exerts too little influence on its surroundings to be easily discerned from the background. A magnetic conductor changes its electrical resistance by a few percent, when a bubble is in its vicinity; this magnetoresistance effect is most commonly used for detection. The net resistance change increases with elongated bubbles, and methods have been devised to stretch bubbles for stronger signals. The best technique uses lateral stretching (namely in a direction perpendicular to the direction of bubble travel), using a cascade of chevron stacks of increasing height. In the largest one the chevrons are linked together to form a permalloy conductor, the resistance of which is meausred by a dc Wheatstone bridge circuit. Nevertheless, bubble detection is still costly, using 10-20% of the entire chip space. The sensing chevron stacks often span the entire width of the bubble chip (see Fig. 6).
FIG.6. Magnetoresistive detection of bubbles using a cascade of chevron stacks. (From Bobeck and Della Torre. 1975, with permission.)
TIEN CHI CHEN AND HSU CHANG
A stretched bubble can also provide a convenient fan-out of two, by feeding two outlets each leading to a different track. A current pulse is often used to split the stretched bubble. A chevron stack also serves as a convenient wired-OR, to receive signals from two inlets, at most one of which carries a bubble. 1.5 Bubble Generation and Annihilation
The generation of bubbles by shrinking magnetic domains is an uncontrolled process at best. For digital data processing one needs to generate an arbitrary number of bubbles, in any distribution required. This can be done via a seed domain, or more colloquially, a “mother bubble.” A piece of permalloy about the size of a memory cell behaves like a deep energy well holding one bubble. With proper track design and often with current assist, the mother bubble can be stretched until it breaks. Each of the fragments then becomes a full-grown bubble; the daughter moves away along the bubble track, and the mother is ready to beget a new offspring within another bit cycle. A simple and popular method for bubble generation is by direct nucleation. Simply stated, a current loop can generate a magnetic field strong enough to counteract the bias field, creating a bubble directly from the ocean below. By the same token, a reversed current through the same loop can reinforce the bias field locally beyond the annihilation limit, to destroy a bubble directly below. A bubble can also be eliminated by merging with a mother bubble, or just by being thrown towards a guardrail consisting of chevron stacks, which then expels it beyond the useful bubble territory, For a discussion of generation and annihilation see Bobeck and Della Torre (1975, pp. 178-184). 2. Bubbles as Memory
Bubbles, at least in the “void-bubble’’ representation, can carry binary information. Bubble information can be generated, destroyed, decoded, moved, and stored: all requirements for a memory are satisfied. Efficient fabrication techniques have been perfected: and the race is on to produce bubble memory chips with the largest capacity at the lowest cost. While most of the techniques have been borrowed from the semiconductor industry, chip fabrication for bubbles is actually easier than that for semiconductors. Consequently, bubble memories use smaller bit areas and larger chips.
MAGNETIC BUBBLE MEMORY AND LOGIC
233
Together with charge-coupled devices, bubble memories are known as electronic disks due to their small physical size, freedom in loop design, electronic loop selection, and most important, short looping time, hence fast access. In sharp contrast, rotating storage devices, constrained by mechanical inertia, have access times more than ten thousand times those of computer main memories; this well-publicized storage gap can only be filled by the electronic disk devices (see Pugh, 1971). Pohm (1975) has shown that the electronic disks can have a very much higher price per bit and still be immensely worthwhile in computer systems. Bubble memories are no match for semiconductor memories in speed, however, many bubble tracks can be mobilized in parallel to reach high data rates, and new bubble materials hold the promise of a twentyfold speed increase (Breed et al., 1977). Nonvolatility is an advantage shared by bubbles and rotational devices, but is absent in electronic storage; stoppage, and reversal are additional advantages not available in rotational devices. Bubble memories have still another unique advantage: their information bits are naturally quantized by themselves. In the void-bubble representation, a void means a zero, a bubble means a one. The dichotomy is absolute, there is never any signal deterioration calling for deliberate decay-tolerant designs nor real-time reshaping. 2.1 The Major-Minor Loop Scheme
A single, long shift register loop could serve as a bubble memory. However, the average access time would be correspondingly long, and a single defect along the loop could destroy the bubble information completely. It is worthwhile to consider an assemblage of many smaller, identical loops. The major-minor loop scheme uses a number of “minor loops” to hold information, and a single “major loop” for reading and writing. The major loop need not be closed and thus may not be a loop at all. All the minor loops, and usually the major loop as well, are driven under the same in-plane rotating field. Figure 7 shows a typical major-minor loop design. The major loop is associated with a generator, a replicator which doubles as an annihilator, and a detector. It is the only interface between the minor loops and the outside world. At the major-minor loop boundary, special permalloy and conductor configurations permit three modes of behavior: a) No transfer current. Bubble information continues to traverse the minor loops.
234
TlEN CHI CHEN AND HSU CHANG
Minor Loops
Generator
I
I
Replicator/ Annihilator
Dectector
FIG.7. The major-minor loop memory organization. (After Bobeck et al., 1975.)
b) Current flows in a forward direction, for a fraction of a cycle. This causes one bit from each minor loop to cross under to the major loop, leaving a void behind, in a destructive readout. c ) Current flows in a reversed direction. Bits from the major loop cross under, to fill voids in the minor loops, leaving voids behind, clearing the major loop. In addition, the major loop is used to transmit data to the detector, completing the read out from the chip; to generate bubble data in response to off-chip stimuli, to be written
into the minor loops: and/or to destroy major loop data selectively: to return a copy of major loop data to the minor loops. The major-minor loop design is often likened to a magnetic drum, with the minor loops being the drum tracks, and the major loop serving as a collection of reading heads. This analogy is only partly valid. The bubble drum is very much smaller and lighter, and the minor loops can be designed to satisfy the access requirements, without regard to mechanical factors, unlike conventional drum tracks. Further, bubble motion can be stopped or reversed, providing a unique degree of flexibility. Moreover, bubble motion is strictly synchronized with the rotating drive field, while in mechanically rotating memories the synchronization across the available reading heads is a very difficult problem. Thus a bit slice of every minor loop naturally forms a record, and is read serially
MAGNETIC BUBBLE MEMORY AND LOGIC
235
from the major loop; but a bit-slice of drum tracks is hard to access with certainty. In magnetic drums, therefore, the reading tends to be one track at a time. 2.2 Fabrication of Bubble Memory Chips
A typical bubble chip today consists of a nonmagnetic garnet substrate, on which is deposited a magnetic garnet film several microns in thickness, then one or two layers of metal separated by insulation material. A representative sketch is shown in Fig. 8 (for discussions, see Chang, 1975, pp. 167-193; Bobeck et al., 1975). The bubble memory module contains not only the chip, but also magnets and coils, as well as standard electronic connections. The garnets used have the chemical formula A3B5.012
where A is yttrium, a rare-earth metal, or combinations. B is iron, aluminum, galium, or combinations. Substitutions by other metals, such as calcium and germanium, are commonly made to enhance anisotropy, temperature stability, bubble mobility, and other desirable properties. A typical bubble garnet has the formula Y,.,,~mo.,Cao.,~e,.,,Fe~.o2~~2
(For a comprehensive review of bubble materials, see Nielsen, 1976.) The substrate wafer is preselected and cut from a nonmagnetic garnet crystal, to approximate the crystal structure and orientation of the mag-
6WET EpRAXlllL LAYER
WON-MAGNBW: W E T SUBSTRATE
FIG.8.
Bubble chip fabrication. (From Bobeck et al., 1975, with permission.)
236
TIEN CHI CHEN AND HSU CHANG
netic bubble garnet material to be grown from it. The epitaxial growth of the bubble film can be conducted in the liquid phase by just immersing the substrate wafer in a supercooled melt of the correct chemical composition for a few minutes. The upper surface of the epitaxial layer is often treated by ion-implantation (Wolfe and North, 1972) to eliminate the possible creation of “hard bubbles,” these are bubbles with erratic propagation behavior due to their domain-wall complexity. Typically two metallic layers are deposited. The conductor patterns are laid down first, for precision bubble control, then followed by permalloy patterns. Metal-deposition is actually a complicated process (deposition of metal, deposition of photoresist, optical exposure of mask pattern, removal of exposed resist, removal of unwanted metal, then the removal of the remaining photoresist). The deposition of all but the lowest metal layer further requires a critical alignment step. The singlelevel masking technique, still in development, certainly represents a great simplification. Even with two levels of metal, bubble chip fabrication is much simpler than that for semiconductor chips, which call for several levels of precise masking. The garnet crystals, both for the substrate and for the epitaxial bubble material, are remarkably free from crystal defects; this, in combination with the simpler production steps, contribute to much higher yield. However, there must be a permanent magnetic field to ensure the existence and stability of bubbles, and provision must be made for an inplane rotation field, to propagate the bubbles. Both fields should be fairly uniform over the chip, or the multi-chip modules. These provisions represent an extra packaging cost not borne by semiconductor chips, and much ingenuity is needed to enable mass production (for an example, see Ympa and Swanson, 1977). 2.3 Factors Affecting Density and Cost
It is well known in the semiconductor industry that the ultimate manufacturing cost of an LSI chip is insensitive to the complexity of the internal structure, after a “learning period” to reach a satisfactory yield level. This is largely true for the magnetic bubble chips, which have already exceeded semiconductor chips in storage capacity. Bubble materials being relatively free from crystal defects, there is a definite trend towards chip size increase, beyond the semiconductor norm of a quarter inch square (6 mm by 6 mm). However, the chief means to increase chip capacity is still by decreasing cell size, and this,
MAGNETIC BUBBLE MEMORY AND LOGIC
237
in turn, means smaller bubbles, thinner lines, and narrower gaps between lines. For rectangular memory cells with an area of s square microns, the areal density is 10% bits per square centimeter, or 1.55 x 107/s bits per square inch. To avoid mutual interference, the centers of bubbles should be separated by about 4d (d = diameter). This means that a memory cell should be no smaller than 16 dZ. To minimize it one needs smaller bubbles, and this may mean new garnet crystals or even amorphous materials, but the more stringent line width and gap size problems must first be examined. For the T-bar patterns, the lines which form the T’s and 1’s must be no wider than a bubble radius. There is a stronger requirement: the gaps between lines must be no wider than a third of the bubble diameter. Let the line width and the gap size be designated, respectively, by w and g , then for T-bar designes the cell size is no smaller than either 16d2,64w2, or 144g2. Y-bars appear to have similar limitations to the T-bars. Chevron stacks are known to be more gap-tolerant; but their size precludes their general use. Reporting of new gap-tolerant permalloy patterns began in 1976. Examples are the half disks (Gergis et al., 1976; Bonyhard and Smith, 1976) and the asymmetric chevrons (Bobeck, 1977), see Fig. 4. These new designs allow gaps as wide as the line widths. The gap width is now no worse than the line width as a limiting factor to high density. But the line width remains a problem. The standard technique of optical masking is effective only if the line width is at least several times the wavelength used, lest diffraction affects image quality. Using conventional optical masking, the line width can be pushed down to about 1 pm. Beyond this, one may have to resort to electron beam and X-rays and a completely new masking technology. 2.4 Current Development
The Texas Instruments TBM0103 chip (Juliussen et af., 1977) announced in 1977, has 92,304 informational bits per chip. A memory module contains one bubble chip, two drive coils, two permanent magnets, and shielding to protect the chip from external fields up to 40 Oe. The module is a dual in-line 14-pin package which weighs 27 gm, and measures 1.O x 1.1 x 0.4 in. The chip uses a major-minor loop design, with 641 bits per minor loop, and 144 informational minor loops. There are actually 157 minor loops, 13 of which are selectively disabled to ensure that the rest are defectfree. The bubbles are 5.4 pm in diameter, and are driven by the double coils at 100 kHz.
TIEN CHI CHEN AND HSU CHANG
238
TABLE 1 TBMO 10 3 BUBBLEMEMORYC H A K A C T T . H I S I I ( S I. The memory module
5.4 p m
Bubble diameter Minor loop capacity Numher of minor loops Total number of bits Bias field Rotating drive field Rotating field l'requencq Minor loop bit rate lnputioutput rate Average access time Average cycle time Current for nucleation replication annihilation transfer out transfer in detection (dc) coil Power dissipation Module size
641 bits
144 (+ 13 splue5) 92.304 (+8333 spares) 60 Oe (permanent) SO O e 100 kHr 100 kH7 50 kH7 maximum
4.0 msec first bit 12 8 msec for 144(+ 13) bit block
250 mA, 2.5 n 0.3 p s e c 120 mA 5.s iI 1.56 pscc 60 mA. 2.5 n 4.7 psec 46 m A , 317 1.71 p s e c 317 Cl 25 p s e c 27 mA. 1WOQ 5.5mA. 800 rnA. 0.6 watts (60 microwatts per bit) 1 x..l.l x 0.4 i n (including t w o coils. permancnt magnets, shielding and packaging)
14 Pin count weight 20 gm 0" t o 70" c niax operable temp. - 4 W t o 85" C (any direction) nonvolatile storage temp. range: miix permissible external magnetic field: 40 Oe ____~ ~~
____~
2.
Multiple module mass storage 8 memory cells I controller function timing, data handlingckts. 2 12v. s 5 v Power supply Total weight 312 gm 38 in? 'Total volume Data rate with redundancy removed 44 kHz Mass storage
Memory cell
I I
bubble memory module diode array 1 set of coil drivers 4 of shared sense amplifier and termination network
MAGNETIC BUBBLE MEMORY AND LOGIC
239
TABLE1 4 ontinuid (Max I/O bandwidth for sense amplifier:
50 kHz)
Operation characteristics: Power dissipation with at most one module on: Module switched on only when addressed: Module on all the time:
standby 1.0 W active 5.4 W standby 9.9 W active 11.5 W
A bubble-based mass storage employs up to eight bubble memory modules under the same control. The delivered data bandwidth is 50 kHz, or one bit every 20 psec. This chip is now being used to replace floppy disks in small computer systems and in the Silent 700 portable terminals (Anonymous, 1977). For a summary of the specifications of the TBM0103 see Table I. The Bell Telephone System initially planned to use bubbles for a telephone repertory dialing system, to contain the frequently used telephone numbers for rapid recall. It is now using bubbles for voice generation. A voice announcing system has been installed in Detroit, Michigan, using eight bubble memory cards each of four chips; each chip contains a 68 kilobit single loop using 3-pm bubbles (Bobeck et ul., 1975). North American Rockwell also uses single loops in its bubble modules being developed for a 100 million-bit inflight data recorder and for its 0.8 million-bit POS-8 point-of-sale terminal. The unit package involves eight chips of 100 kilobits each, using 3.8 p m diameter bubbles (Ypma and Swanson, 1977). In addition, Rockwell has tested an experimental million-bit chip, using 1.8 p m bubbles (Archer, 1977; Myers, 1977). Plessey in England has also developed a 16 kilobit chip for use in terminals. In Japan, the most ambitious bubble application announced to date is for Nippon Telephone and Telegraphy's electronic switching systems, with an estimated annual requirement of 5 x 10" bits. Bubbles are used for this application because they are compact, easy to transport, mainper tenance-free, and highly reliable (read error rate less than 9 x read). All three major computer companies in Japan, namely Fujitsu, Hitachi, and Nippon Electric, are active in bubble development; the applications under consideration include intelligent terminals, data cassettes, and Chinese character generation.
240
TIEN CHI CHEN AND HSU CHANG
Fujitsu Laboratory has a 83354-bit chip with 3-pm bubbles driven at 300-500 kHz, using special Y-bar patterns to facilitate stoppage and reversal (Takasu et al., 1976). Advances towards 1-2 pm bubbles and larger chips are well under way (Tsubaya et al., 1977).
2.5 Unconventional Bubble Memory Devices As is the case with the semiconductor industry, improvement in chip density can be due to technology improvement, such as larger chips and thinner lines, or design cleverness. In the gap-tolerant designs, design cleverness has already doubled the bubble chip density. Memory designs along radically different lines are needed for even greater rewards. A thousandfold increase in a real density over the current bubble memory products is expected; the novel designs to be described here should play an important role in achieving this gain (Chang, 1978a). Figure 9 shows the contiguous disk design, which allows very small bubbles to move near the periphery of relatively large permalloy disks. Rather than using narrow T-bar patterns to drive relatively large bubbles, the contiguous disks have no linewidth limitation, and have been tested successfully using I-pm bubbles (3 x lo7 bits/in2) even with 4-pm minimum overlay features, easily achievable with photolithography (Lin et al., 1977). The unit cell size for contiguous disks is 16d2 or 4w2. Note that the gaps between cells are completely absent. To transcend the final cell size limitation of 1W2,the bubble lattice file concept was developed. The scheme is based on the observation that in the hexagonal closest packing of bubbles, the bubble-bubble distance
PROPAGATION
TO SENSOR a ANNIHILATOR WRITE CONTROL LINE
FIG.9. The contiguous disk bubble memory. Above: Sketch. Opposite page: Photomicrographs of actual operation.
MAGNETIC BUBBLE MEMORY AND LOGIC
24 1
242
TlEN CHI CHEN AND HSU CHANG
can be as small as 1.2 diameters, instead of the four diameter separation in the "void-bubble" representation. This closeset packing offers the promise of a tenfold areal density increase. For such a scheme to work, both 1 and 0 must be represented by bubbles. Fortunately, several differnt species of bubbles, with different domain-wall twist patterns, can be distinguished by their characteristic deflection angles while moving in an inhomogeneous magnetic field (Calhoun rt ul., 1975, 1976). Figure 10 shows an experimental 1024-bit bubble lattice file chip by Hu ot ul. (1978). The details are outside the scope of the present paper; for a brief discussion see the review by Cohen and Chang (1975). As first described, the bubble lattice file employs a current access scheme. A recent study by Ho and Chang (1977) shows that field access
Bubble Lattic
FIG. 10. A 1024-bit experimental bubble lattice storage chip.
MAGNETIC
BUBBLE
MEMORY AND LOGIC
243
can also be employed to move the entire lattice, using permalloy patterns and rotating plane fields. Each permalloy shape drives one bubble directly and several other bubbles indirectly via interbubble repulsion. Field access is a preferred approach since it simplifies fabrication (discrete patterns rather than long, unbroken lines; multifunction single overlay rather than multiple overlays of single functions), reduces on-chip heat dissipation, and above all it avails itself of the many techniques already developed for discrete bubble devices. 3. Magnetic Bubble Logic
Bubbles are vertically suspended magnets. When two bubbles approach each other (within four diameters) mutual repulsion becomes strong enough to deform and displace them, though not to annihilate each other, or to create new ones. This phenomenon is the basis for Boolean logic in the “void-bubble” representation. This is an active form of logic, in that the bits (voids and bubbles) furnish the functions automatically, on specially constructed tracks. Bubble logic requires meticulous implementation to overcome the strong geometric constrains of planar design and time synchronism. Fortunately, it has been found possible to allow bubble tracks to cross with no information loss, and requirements of fan-in and fan-out can be reinterpreted to mean binary interactions with well-spaced time lags. Though much slower than semiconductor logic, bubble logic has a natural pipefined character capable of producing one set of outcomes a t the end of every cycle. Moreover, bubble information is unambiguously binary, requiring no signal reshaping. Even the strict synchronism means there is no extra time clock pulses to content with. (For a critical survey of bubble logic up to 1974, see Lee and Chang, 1974b.) 3.1 Overcoming Physical Constraints
Semiconductor logic devices, reputed to obey a planar design discipline, can and do use subterranean passages to surmount the planar restrictions. The charge carriers move at high speGds, and can be arbitrarily delayed when needed. Fan-ins and fan-outs of four or more are commonplace. Individual circuits can be powered independently, without any synchronization requirements. Bubble motion is inherently two-dimensional and slow, varying over a very narrow speed range. In addition, under field access, all bubbles under the same drive field move in strict synchronism. These facts pose
244
TlEN CHI CHEN AND HSU CHANG
severe geometric constraints on bubble logic designs. For instance, there is no instantaneous fan-ins and fan-outs of more than two. These constraints, though serious, turned out to be much less forbidding than at first thought. High-degree fan-ins and fan-outs often can be replaced by repeated use of their binary counterparts and suitable delays. An example is outlined in Fig. 1I , where effective fan-in and fan-out multiprocessing is replaced by a staggered individual processing involving their binary counterparts. The most intriguing problem is the crossing of magnetic bubble information tracks. This is possible using an idler, which is a trap holding a bubble (Morrow and Perneski, 1970). Figure 12 shows a sketch of a trap, It consists of four permalloy bars radiating from the same position. When a bubble enters from any one of the bars, it will be trapped into a tight spin. When another bubble comes in, possibly via a different route, the trapped bubble will be freed, and
(b)
FIG.I I . Creating the effect of multiple fan-in and fan-out. (a) Fan-in and fan-out of 4. (b) Effective fan-in and fan-out using time-lagged design.
MAGNETIC BUBBLE MEMORY AND LOGIC
245
FIG. 12. The bubble trap. It becomes an idler after trapping a bubble. (After Morrow and Perneski, 1970.)
the new bubble now becomes trapped. In doing so, it appears as if the second bubble has moved two cell distances in one cycle. In the “void-bubble’’ representation of binary data, the voids are not really affected by the idler design. But the speed-up of bubbles over a background of voids is fully equivalent to the speed-up of the entire bit pattern. The apparent skipping occurs even when both input tracks carry bubbles; both bubbles will be relayed. Table I1 shows the detailed happenings for all cases. The trap is represented by the positions P, Q , R, and S. If the trap is empty, a southbound bubble will be trapped at the end of subcycle 3; but an eastbound bubble will be trapped at the end of subcycle 4. A trapped bubble, by itself, traverses the positions P, Q , R, and S at subcycles 1, 2, 3, and 4, respectively. When a bubble approaches an occupied trap, it becomes trapped as if the trap were empty; the trapped bubble is freed to lend the appearance of skipping. When both input paths are occupied by bubbles, again the incoming bubble falls into the trap in exactly the same fashion; the trapped bubble skips to continue the southbound journey, and the trapped southbound bubble, immediately freed by the eastbound bubble, skips
TABLE11 BEHAVIOR OF BUBBLESNEARA TRAP"
1
Memory subcycles 1.
Southbound bubble enters trap
2.
Eastbound bubble enters trap
Southbound. eastbound bubbles meet near trap
3.
Southbound bubble approaching idler
4.
5.
Eastbound bubble approaching idler Southbound. eastbound bubbles approaching idler
6.
~
~~~~~~~~
~~~
~
~~~
v
2
3
4
2
3 j -
-
-
S -
1
2
3
4
-
-
-
(Bubble is trapped; tra
P
Q
R
-
-
-
S -
-
P t -
Q
R
-
-
S -
(Bubble is trapped; tra
4 8
9
A
B
C
-
P
Q
R
S
-
-
-
(Southbound bubble sk eastbound bubble is tr
C S
T H
I -
v
-
T H
I
-
-
2
3
V T H
1 -
2
1
2
3 3
V T H
1 P -
2 Q
3 1 8 7 V S
9
A
B
P
Q
R
-
-
-
-
-
4 1 -
v
-
-
-
-
Q 1
8 H P 4 r 9
Q
H
P I
R 3
-
T
R B
S C
V T H
I P I
2 Q 2
3 1 8 9 7 V 8 H P 3 4 T 9
A Q
B R B
C S C
A
A
(Trapped bubble skips is trapped) (Trapped bubble skips is trapped)
Southbound bubble fre bubble. before being p east: the eastbound bu
~~~~~~~
" The possible happenings involving bubbles and a trap are delineated. V. T. and H refer to the vertical track, the trap, a track. respectively. Arrows indicate falling into the trap during the next subcycle. When the trapped bubble is pushed ou position is given in the T-rou. using the track position number followed by the track label. See Fig. 12.
MAGNETIC BUBBLE MEMORY AND LOGIC
247
east. The final bubble remaining in the trap is the previous eastbound bubble. 3.2 AND-OR Logic
In Figure 13, the tracks are configured to allow easy bubble movement downward, but not to the right. Let us imagine two bits ( a , b ) starting simultaneously at positions V1 and H1. After a lapse of six cycles, the bit positions V7 and H7 are examined. Because of the bias installed in the tracks, a 1 (represented by a bubble) appears at V7 unless both of the original inputs had been voids (representing zeros). Clearly, the downward track yields the logical OR of the original input. The corresponding position H7, along the track to the right, shows an entirely different behavior. It usually contains a 0 because of the energy barrier built into the intersection. When both inputs are bubbles (l’s), however, only one can enter the downward track. The bubble from the top is now eased into the rightward track by bubble-bubble repulsion. Therefore, H7 shows “ a AND b”, while V7 shows “ a OR b”, six cycles after the input time. It is interesting, and also important, that the total number of bubbles during output exactly equals the total number during input six cycles earlier. This is simply because interbubble repulsion is not energetic enough to destroy nor to create any bubbles, nor to change their synchronization. We thus have a time-delayed version of Kirchoff s current law: The total number of input bubbles at any instant equals the total number of output bubbles after a fixed time delay. Figure 14 shows a track design for AND-OR logic by Sandfort and Burke (1971). Bubble logic is naturally pipelined. At every bit-time a new set of inputs is accepted and a set of outputs emerges. The processing rate is one set of operands per cycle. a
1* V l I v2 I
4V6
I
iv’
avb
FIG. 13.
AND-OR logic.
248
TlEN CHI CHEN AND HSU CHANG
FIG 14. Implementation of AND-OR logic. (After Sandfort and Burke, 1971.)
3.3 Multitrack Logic Classically, AND, OR, and NOT define a complete set of Boolean logic connectives. Since interbubble repulsion alone is too weak to create or destroy bubbles, the implementation of the NOT function usually requires a bubble source or a bubble sink nearby. The NOT is then represented as a displacement of bubbles between two alternative positions. The systematic extension of this view leads to the “two normal cells per bit” scheme, in which a pair of normal cells contain exactly one bubble. ( I , 0) might represent a I , and (0, I ) would represent the binary 0 (see Bobeck and Scovil, 1971). Another generalization leads to a three-track logic, using three input tracks and three output tracks. (For a systematic account of three-track logic including the cases of bubble nonconservation, see Minnick et al., 1972). An example of three-track logic is seen in Fig. IS, where a bubble tends to gravitate towards the dense-chevron center track regardless of its input position. When there is a second bubble, the interbubble repulsion overrides the “gravitation,” and both bubbles take the outer track upon exit. When corresponding positions along all three input tracks contain bubbles, all three output tracks are populated. As a result, one obtains a parity function along the middle track, and two majority functions along the outer tracks. If one of the input tracks is empty, the output tracks will give the sum (a EXCLUSIVE-OR b ) and carry (a AND h ) bits, with the latter replicated. The device then behaves like a halfadder. With all three inputs active, it is a full adder; this is the basis of Williams’ (1977) serial arithmetic adder.
MAGNETIC BUBBLE MEMORY AND LOGIC
249
Tendency to drift to center
V
Tendency to drift to center
p = abc v Bbc v a& v abi5 = majority of (abc) q = abc v ab? v B E v a6c = parity o f (a, b, c ) r=p
FIG.15. Three-track full adder. (After Minnick er a / . , 1972a.)
3.4 Dynamic and Static Deflectors
When the need for fan-out is replaced by well-timed reappearance of the same operand at the diverse target sites, it is desirable to consider Boolean logic elements which can leave one of the inputs unchanged, to be reused elsewhere. Figure 16 shows one such device, which shall be called a dynamic deflector. The downward track carries a bubble if and only if an input a bit has not been deflected by a bubble from control stream 6. This track therefore represents the Boolean connective “a AND NOT 6.” The
i
aF; FIG.16. The dynamic deflector.
250
TlEN CHI CHEN AND HSU CHANG
rightward stream likewise shows a I (a bubble) if and only if a bubble has been deflected by a bubble from the control stream; it represents the connective “ a A N D h ” . The control stream is left intact, ready to be used again. The dynamic deflector design has often been mentioned in the literature, though seldom with emphasis. It is the Group 4 logic element of Sandfort and Burke (1971), the ZB element of Kinoshita et a l . (l976), the AND-INVERT gate of Williams (1977). It also can be obtained from many of the three-track devices of Minnick rt al. (1972) by setting appropriate inputs to 1 or 0. One use of the dynamic deflector immediately presents itself. When a is I , the downward stream gives “NOT h,” which together with the A N D , OR connectives form a system of complete Boolean logic connectives. Another use is for systeinatic Boolean logic with multiple inputs. An rn-level binary tree arrangement of deflectors can generate 2”’ output tracks representing all minterms of the input variables, exactly one of which will carry a bubble. A n example will be seen in the next section. Closely allied with the dynamic deflector is the static dejkctor, which uses a resident bubble as a one-bit control memory (see Fig. 17). This stationary bit can be held by a permalloy disk roughly twice the area of a bubble top. A bubble introduced under the disk will tend to rotate along the periphery of the disk, deflecting bubbles nearby without itself being destroyed or dislodged in the process. This resident bubble can be loaded o r unloaded at will, or can be generated and annihilated by a current loop. The static deflector logic i s thus “repersonalizable.”
a
I
b=Oor 1 symbolized b y )
i~
FIG. 17. The static deflector
MAGNETIC BUBBLE MEMORY AND LOGIC
251
3.5 Pipelined Array Logic
A combination of the two types of deflectors leads to a Boolean logic array. In the first stage, the input streams are resolved via dynamic deflectors into minterm tracks. For a given set of input, exactly one minterm track contains a bubble, all other tracks carry voids in corresponding positions. The next stage consists of a programmed set of static deflectors to deflect the unwanted minterm contribution so that only the desired subset of minterms contribute to the final result. This scheme, shown in Fig. 18, is a simplification of an earlier one reported by Chen et a l . (1975), which uses the EXCLUSIVE-OR to generate the minterms, and requires many more bubble logic elements. Both schemes, however share the same characteristics of a repersonalizable, pipelined, Boolean logic array. This array represents a general solution of the systematic bubble Boolean logic problem, at least in principle. 3.6 Bubble Logic Systems
Boolean logic devices can combine to form entire systems. Notable in the literature are the discussion of an all bubble computer system by Minnick et al. (1972b), and the serial arithmetic system examined by Williams (1977). We should also mention the proposed resident bubble logic system of Garey (1972), which performs logic without moving bubbles across cell boundaries. The idea is very close to cellular logic the bubble implementation of which has been described in a patent issued to
minterm se Iec t io n
FIG. 18. A repersonalizable, pipelined Boolean logic array.
252
TIEN CHI CHEN AND HSU CHANG
Bobeck ef al. (1970). Takahashi and Kohara (1975) have designed a bubble-based symbol string pattern recognition system exploiting the variability of the in-plane drive field.
3.7 Boolean Logic and Bubbles It is clear that any form of Boolean logic can be performed for bubble information in the "void-bubble'' representation. This logic is naturally pipelined, and can be completely systematic and freely repersonalizable. Further, bubble logic is nonvolatile, and can tolerate indefinite power stoppage with no ill effects after resumption. A rule of thumb for a simple bubble logic connective is four times the memory bit area; this is comparable to the space requirement for semiconductor Boolean logic. However, bubble logic is far slower than semiconductor logic, and the strict geometric constraints demand careful design and implementation. Only under special circumstances can bubbles replace semiconductors in logic. This happens within systems based on bubble memories, where the pipelined bubble logic can synchronize naturally with the data stream. The use of bubble logic here is usually more economical than the alternative of converting into semiconductor electronics, performing the logic, then converting back. Extensive conversion would mean extra I/O pins on the bubble chip. But the most important concern is the very large chip area needed for bubble detection (Section 1.4); this naturally precludes indiscriminate conversions. There is a different form of logic, namely the steering of information flow based on judicial switching. As most bubble devices involve constantly moving bit patterns, it is simple and effective to direct the traffic in real-time, to achieve the goal of information processing, without leaving the bubble medium. The control can still reside in fast semiconductor electronics, but the requirement for bubble detection can be held to a minimum. The sections that follow are devoted to different aspects of the logic of data-stream steering. 4. Steering of Bubbles for Text Editing
The handling of textual material has become an overwhelming necessity in our modern society. It is natural that the power of new digital logic be directed to speed up document generation and editing, to relieve the tedium of retyping. Present-day text-processing equipment usually includes a keyboard for
MAGNETIC 8UBBLE MEMORY AND LOGIC
253
text generation, some type of display (printed paper or optical screen) for user scrutiny, a magnetic storage medium to contain the text, and semiconductor electroics to change the text to suit the user. While comprehensive computer power can be tapped from a central computer system, considerations of convenience, privacy, and the new LSI economics favor local logic and storage, at least for simpler editing tasks. Magnetic bubbles can combine both storage and logic in the same bubble medium, indeed even on the same chip. The bubble text being in constant motion, its flow can easily conform to the sequencing of most alphabetic languages. Most editing requires simple information reshuffling, often trivially achieved by steering the text flow at strategic junctions. Backward flow is also possible, though probably at extra cost. The requirement for logic speed is modest, just enough to keep up with human reaction time during the editing phase. Nonvolatility of information at all levels, often considered a necessity, is natural with bubbles. The bubble memory-cum-logic chip, being nonmechanical, tends to cost less per bit than even the low-cost floppy disks, and is just as nonvolatile, ready to be detached for security or transportation. It should be said, however, for simple removable storage requirements, a bubble chip module is more expensive than a floppy disk surface. In text-processing, the unit of information is a character which occupies a “byte” of seven or eight information bits, plus perhaps a check bit. In the following, we shall take a byte to be eight information bits plus a parity bit. The bubble information streams being bit-oriented, it is convenient always to take a bundle of nine bit streams together to form a “byte stream.” The important text-editing operations fall into two distinct types, character manipulation for a given line of text, and line-and-page management over the whole “volume.” These will be discussed separately. 4.1 Character Manipulation
The use of bubbles for character manipulation has been studied by Lee et al. (1974). Figure 19 shows a sketch of their,design. It uses a system of nine identically constructed bubble l o o p working in unison, so that a bit slice represents a character. The loop usually contains at least as many bits as the number of characters allowed in a line, which is about 60 for typewritten text on a letter-sized page. At one end of the loop assembly is a specially constructed region containing a port position serving as a one-character input-output port for the entire loop. The normal mode of data movement involves the port
254
TlEN CHI CHEN AND HSU CHANG
4
Scheme
lrnplementoiion
FIG. 19. Character manipulation.
as part of the data path. Such a movement over one bit cycle shall be called a global shift, symbolized by G . With current flowing through the conductors, a different type of data motion occurs. The port bit is “frozen” and bypassed by the rest of the data. The application of this bypass mode over one bit time shall be called a bypass shift, symbolized by B . In addition, each of the port bits in the nine-loop assembly can be changed via external contact. Such a change, altering the port character, is designated by C; it can take place within one bit time, and can be concurrent with a bypass shift. The character manipulations can now be described in terms of the operators G , B, and C. oeprating on a file F which happens to be a character string closing upon itself. We have GN
=
BN-1
1
I , the identity operator.
The operators obey the associative law: P(QR) = (PQ)R = PQR We note also that C and B, being independent activities, commute:
CB = BC in the sense that the resultant file is tile same. ’The “logical line” meaningful to the user is a contiguous string of characters; it should be strictly shorter than the “physical line” of N characters involved in the global shift. A logical line is said to be “nor-
MAGNETIC BUBBLE MEMORY AND LOGIC
255
malized” if it is contained entirely in the bypass loop, with the leading character at the top left position, one unit distance from the port. The line then is ready to move either towards the port in a global shift, or to the right in a bypass shift. During manipulation, the logical line may move along the bypass path, with the port character always poised for insertion between the two top characters of the bypass loop. This insertion takes place automatically, whenever the mode changes from bypass to global. The resultant line now follows the global path. Conversely, while the logical line is moving along the global path, an act of deletion occurs automatically when the mode is altered from global to bypass. The port character is “frozen” while the rest of the line moves ahead. The transition between the two modes thus has the effect of altering the number of characters in the logical line, except when the end of the logical line has already passed either the top left or the port positions. This latter situation always occurs when the logical line is normalized, but it can happen sooner. The normalized state can always be reached by simple waiting. But normalization is often unnecessary if repeated operations are needed on the same line. In any case, even for the slow-moving bubbles, it takes less than a millisecond for a 60-character line to pass through any bit position: this is very much shorter than human reaction time, measured in hundreds of milliseconds. In the following the indices i j are taken to be nonnegative integers. The head character of a line is the zeroth character. Assuming that all unwanted characters are properly encoded as “garbage” and put at the end of the physical line, the following operations can be done easily on normalized lines: a) b) c) d)
Change the top character: CF. Move thejth character to port: Gj+’F. Move the j t h character to the top left: BJF, or G F . Insert a new character in front of thejth character, and normalize: G.V-j-lBjCF.
e) Delete thejth character, normalize, and insert the “garbage character” at line-end: CBSV-jGjF f,
Exchange the ith character with thejth: (i < j ) G”- IBiG Bj-1- 1GiF.
256
TlEN CHI CHEN AND HSU CHANG
These operations are simple and effective. While the mechanism handles only one character at a time, cascaded operations can often be combined for speed, though the latter is not a critical concern in useroriented text-processing. 4.2 Line and Page Management
Lee and Chang (1974a) have described an all-bubble text-editing system; a modification is shown in Fig. 20. The system holds a volume of up to N = 2“ pages in the passive storage, controlled at the interface by two single-throw, multiple pole switches Sl and S 2 . S1 is linked on the left by wired-ORs to a write decoder, and can have two states, “up” or “down.” S2 has two states, “up” or “right,” and is connected to a read decoder on the right, and the active storage beyond. (A decoder is a gating device with n processing stages, to make a selection based on an a-bit address. A write decoder directs a data source to a selected target track. A read decoder directs a data source out of many towards the output (see Section 7.1). When S1 is down and S2 is up, the pages in the volume are held in their respective loops. This is the idle state for the volume. With both switches up, the entire volume is linked into a giant, moving character string, with the first page leading. The entire volume can be read out, and meanwhile can be displaced by a new volume. With S I up and S2 pointing to the right, all pages are linked individually to the read decoder. By current selection, exactly one of the N pages is chosen to enter the active storage. The disqualified pages are channeled back into their own loops. While the one selected page moves to the active store, a new page can enter via the write decoder to take its place
r?I=
Papa 32
-~
-
FIG.20. A text-editing system.
MAGNETIC BUBBLE MEMORY AND LOGIC
257
through the wired-OR, with no fear of conflict. Thus the page selection can be used for selective replacement. The active storage is divided into upper and lower subpages, controlled independently by single-throw multipole switches S3 and S4, so that either subpage can be in the idle mode or the streaming mode. In the streaming mode, the upper subpage streams towards a readwrite station, and the lower subpage streams away from the same station. When both subpages are set to idle, the active storage is in a waiting state. When both are streaming, the subpage boundary is being shifted constantly. During this synchronous streaming, the information read by the read/write station can be written back immediately, or entirely new information can be read in, to produce synchronized replacement at the subpage boundary. When the upper page alone is streaming towards the readwrite station, deletion occurs automatically at the subpage boundary. When the lower page alone is streaming, away from the readwrite station, new information is inserted at the subpage boundary. Thus the all-bubble editing system performs most of the editing function needed for text-editing, except for data permutation, which can be provided by coupling the r e a d write station to a character management mechanism.
5. Storage Management
One of the most important features of a computer system is its systematic control over large amounts of coded information. A typical computer installation today contains billions of bytes of information distributed over a number of devices from the very fast registers, through a random access memory, to disks and tapes. Over the years, it came to be known that the pattern for computer access of memory information is not uniform. While all data accessible to the machine may have equal a priori probability of being needed, at any given time the computer CPU tends only to concentrate on a slowly varying subset. Once a memory item is needed, there is a high probability that it will be needed again very soon; in other words, this memory item is probably a member of the frequently needed subset. This phenomenon is known as the Locality ofreference (Belady, 1966); its exploitation leads to the eminently practical discipline of storage management, aiming at making the subset (the working set) more accessible (for analysis and details, see Mattson et a f . , 1970; Coffmann and Denning, 1973). The management of bubble storage will be discussed in the following.
258
TlEN CHI CHEN AND HSU CHANG
5.1 Storage Hierarchies The challenge in storage management is to move storage data without the benefit of detailed knowledge of the computation, yet succeeding in significant reduction of the average access time over the heterogeneous memory. One of the simplest assumptions to make is that the most recently used ( M R U ) memory item (which shall be called a record) has the highest probability of being needed again in the immediate future. Also, the least recently used (LRU) record has the lowest. This assumption has proven very effective for a two-level hierarchy consisting of a large main memory, and a cache memory an order of magnitude faster, but also an order of magnitude smaller. The two-level management scheme ensures that the most recently used record is in the cache. If it is already in the cache, no special action need to be taken; otherwise it should be promoted, i.e., brought in from the main memory. As the cache capacity if fixed, a promotion must be accompanied by a corresponding demotion of another record from the cache to the main memory. The candidate for demotion is the least recently used (LRU) record in the cache, which has probably outlived its usefulness there anyway (see Fig. 21a). This scheme is very effective and deservedly popular. Simulated results show that most of the accesses have a 98% probability of referring to the cache (Gibson, 1974). Thus the entire memory complex, with ten times the cache capacity, has an average access time only fractionally greater than the raw cache access time. Without storage management, the average access time would be close to the main memory access time, an order of magnitude greater. The actual algorithm and its implementation involves extensive table management and much educated guesswork in real time. The treatment of the store operation is also a major concern. At the present time, two hierarchies are being managed separately, each with two levels. These are the "cache-random access memory" hierarchy, mentioned earlier, and the "memory-disk" hierarchy. These two hierarchies employ different data block sizes and quite dissimilar algorithms. It is hoped that the electronic disks, typified by magnetic bubbles, will bridge the storage gap, and allow a smooth-running multilevel hierarchy. For serial memories, a multilevel hierarchy makes sense within the sumr storage technology, indeed, within the same storage device. Consider a storage mechanism holding a file of N equal-length records, one on top of the other, their sole link to the outside world being one input-output port at the top. A record is said to be at level d , if it is the
MAGNETIC BUBBLE MEMORY AND LOGIC
259
To CPU To CP,U
2
MRU
Main Memory
la)
FIG.21. Memory hierarchies. (a) Two-level. (b) Multilevel.
dth record from the top. The topmost record is at level zero (see Fig. 21b). For the storage structures of interest here, a record at level d has an access time proportional to d . When this record is needed, it becomes the most recently used (MRU) record, and is probably the record in greatest demand for the short term. It should move to the top to be accessed, and further should stay on top afterwards. The (d - 1) intervening records are assumed to have decreased their importance uniformly, and each is to be displaced downward by one record level. This data movement shall be called “topping from a depth of d , ” meaning the access of a record from level d to the top, with the concurrent demotion of each intervening record by one level. It is interesting that there is no need for extensive bookkeeping to identify the LRU record; it is naturally at the bottom, just as the MRU record is always at the top. In the following we shall call the time needed t o move a record past a given marker position a record period, or a period for short. 5.2 Dynamic Reordering
Beausoleil et al. (1972) have proposed a multilevel storage management scheme based on the stoppage and reversal of the magnetic driving field.
260
TIEN CHI CHEN AND HSU CHANG
FIG.22. Dynamic reordering operators G and R.
A similar technique is given by Bonyhard and Nelson (1973). We shall describe Beausoleil's scheme briefly. Consider a system of field-driven bubble memory loops each with one access port and a means to bypass the access port (see Fig. 22). Between accesses, the drive field is reset to zero, and all data movement is halted. This way all records maintain their respective distances from the top. Whenever there is a need to top the record from a depth of d, the drive field is set for forward data movement, through the access port. The desired record reaches the access port in exactly d periods. Immediately afterwards, the drive field is reversed; this causes reverse data movement, bypassing the access port. Meanwhile the data at the port is being held by a local current-generated magnetic field. Exactly d record periods later, the desired topping operation is complete. The record at the access port is copied into the CPU, and the drive field is reset to zero, until the next access. The operations involved can be symbolized by two data-rearrangement operators, G and R, operating on the file F. Their pictorial representation is given in Fig. 22. G (global shift) represents a shift for one record period in the forward direction along the long path which includes the access port. R (reverse shift) represents a shift in the reversed direction along the shorter path involving the bypass, also for one record period. These operators, dealing with permutation of objects, are known to be associative, but not necessarily commutative.
MAGNETIC BUBBLE MEMORY AND LOGIC
26 1
For an N-record file, it is clear that GJ"=
RJ"-1=
I , the identity operator.
The topping of a record from a depth of d in a file F is achieved by T(d)F = RdGdF as can be easily verified. Beausoleil's scheme actually uses an asymmetric bubble switch, with a one-bit memory along the forward path, which is bypassed on reversed data movement. As this is a one-bit bypass scheme, the bypass of an entire record of m bits is accomplished by using m loops each of N bits so that a bit slice of all loops represents one record. The detailed scheme also stores an address field within each record, to allow searches based on real-time matching of addresses. In this bit-slice format a record period becomes one bit-cycle. This high speed access is paid for by the need for m bubble detection mechanisms, very wide data channels, and very long data buffers. The field stoppage and reversal further calls for careful chip design, especially for the switches. Clearly, all chips under the same drive field are affected equally. There is no effective magnetic shielding technique to allow two different modes of field-driven data movement to coexist. This is believed to be a serious restriction. All parts of the chip must be designed to change drive modes at the same time. In the next section we shall describe a data-steering switch for storage management requiring neither field reversal nor bit slicing, thus minimizing design and fabrication cost.
5.3 A Steering Switch in Two Forms
Tung et al. (1975) have described a bubble switch with the following properties (see Fig. 23):
a) It has two input tracks ( A , B ) and two output tracks ( C , D ) ; b) It operates in two modes:
OffMode (0): A , B are linked to C , D , respectively, with no path crossing; OnMode (1): A , B are linked to D , C , respectively, resulting in crossing.
A bubble implementation is sketched in Fig. 24, where the Off Mode calls for local control by current.
262
TIEN CHI CHEN AND HSU CHANG
f.4
Id
Ibl
FIG.23. A steering switch. (a) Switch. (b) Avoidance (off). (c) Crossover (on).
There is an alternative form of the switch, controlling the choice between two orthogonal flow patterns, neither of which involves path crossing (see Fig. 2 5 ) . The implementation (Chang and Cohen, 1976) is sketched in Fig. 26. We note that Fig. 25 can be obtained conceptually from Fig. 23 by holding the top half of the picture while rotating the bottom half out of the plane of the paper by 180". The following discussions shall be in terms of the first type of switch, with the understanding that simple topological transformations can map one into the other. 5.4 Linked Loops and the Ladder
We shall now use a single steering switch for the purpose of controlling the bypass of entire records. Consider the two-loop arrangement in Fig. 27. The top loop has the capacity of exactly one record, while the bottom loop holds N - I records. When the switch S is set to I for one record period, a global shift is achieved, corresponding to the operator G in Section 5.2. With S set to 0 for one period, (but without any field reversal), the data moves along the bottom loop, while the top record circulates within the upper loop. The operator corresponding to this operation shall be designated by B (for bypass shift). A
2 I
C
FIG.24. Implementation of the steering switch.
MAGNETIC BUBBLE MEMORY AND LOGIC
Ftc. 25. flow (on).
263
A noncrossing sterring switch. (a) Switch. (b) Horizontal flow (om. (c) Vertical
We note that the scheme for character manipulation (Section 4.1) is a bit-slice analog of this double-loop arrangement here. The record there is a nine-bit byte, and the switch is different. The topping from a depth of d can be done by using T(d)F = BA'-d-lGdF and the total time required, by summing the exponents of the operators, is a constant, namely ( N - I ) , for all but the trivial case with d = 0. While this scheme enhances immediate reaccess, the sizable constant cost of topping fails to improve the access except that from the very top. The situation is aggravated by the fact that, even if no special attempt whatever were made for LRU demotion, the average access would only take N / 2 periods. By mentally twisting the bottom loop, and by installing steering switches at loop boundaries, we obtain a multiloop configuration called a ladder. The ladder in Fig. 28 has a small top rung holding exactly one record; each of the lower rungs has twice the capacity. The original switch has been renamed S o . The additional switches are collectively designated {S,} and are controlled uniformly to minimize control cost. This structure,
2
I T
+Merge
Mechanism*
FIG. 26. implementation of the noncrossing steering switch.
264
TlEN CHI CHEN AND HSU CHANG
n * V s=o c
Q
I
D
S=l
H
FIG.27. A two-loop storage manager.
with four distinct switch settings, shall be called afour-state ladder. We note incidentally that this ladder holds an odd number of records. The four possible switch settings lead to the four processing patterns shown in Fig. 29. Two of these, with {S,} set to 1, are essentially the G,B operators as before (with redefined time periods), but two extra operators are now available, corresponding to (S,} = 0, adding greatly to the topping efficiency. With So = 0 in addition, the left and right sides of the large loops are exchanged in one period; this is designated by the operator X (for exchange). The same situation with So = 1 also leads to a left-right exchange, except the three records in the top two loops now join in a three-way rotation by one record length. The corresponding operator is designated D (for delta exchange). We can derive:
GN
=
BN-1
e =
Xz = D6
= 1.
If the ladder is set in the X state, every two periods will see a repetition
6
c
H
G
D
E
F
FIG.28. A four-state ladder.
MAGNETIC BUBBLE MEMORY AND LOGIC
265
A
B
G
e-.
0-0
.-.
0-.
0-0
*-e
X
D
FIG.29. Operations involving the four-state ladder.
of the same data arrangement. This idle feature alone can hold information in place without invoking drive field stoppage. In addition we have Gj = XGN-JX Bj = XBN-j-IX D = BXG. The last expression means that the successful application of G, X, and B can be replaced simply by a single delta shift. A link with Beausoleil’s R operator can now be established: Rj = XBX. Then the topping from a depth of d can be done by T(d)F = RdGdF = XBdXGdF = XBd-lDGd-IF appropriate for d < N l 2 . For d > Nl2, ( N - d ) is smaller than d itself, and the simulation of reverse shifts via the exchange operations will be more economical. We have then T(d)F = RdGdF = (XBdX)GdF = (X(XBN-d-lX)X)(XGN-dX)F - BN-d-lGN-dXF - BN-d-ZDGN-d-lXF.
266
TIEN CHI CHEN AND HSU CHANG
The last expression is beneficial for N12 < d < N - 1. For d = N - 1, however, the formula leads to a negative exponent, and the expression just above it is more proper. The corresponding cost in time, measured by summing the exponents on the operators, gives
time delay =
I
2d PERIODS
for
d < N/2
2N - 2d - 1 periods
for
N
-
+ 2periods
for
d
=
2N
-
2d
1> d > N/2
N
-
1.
These operations have been shown to be optimal by Wong and Coppersmith (1976). Throughout the topping operation, the drive field can be set to rotate in the forward direction. Field stoppage and reversal are not required, The discussion above involves a single ladder. Actually a number, say k , of identical ladders can act in unison, the record period will then be reduced proportionately. 5.5 The Uniform Ladder
We now consider a ladder with all loops equal, with all switches capable of being set independently. This data structure is called a uniform ladder by Chen and Tung (1976). Each loop in the uniform ladder holds one record exactly. It is convenient to specialize to the case of an even number of bubble bits per loop. The uniform ladder in Fig. 30 can behave like the ladder described in the last section, if all odd-labeled switches are set to 1 (crossing mode), and if all even-but nonzero-labeled switches act in unison. The freedom available through independent switch settings shall now be exploited. Let us start with a record in each loop. By setting all switches to zero, all records circulate within their own loops and exactly the same configuration recurs at the beginning of each period; this is the idle state, achieved without any field reversal. This idle state recurs twice as frequently as that of the four-state ladder discussed earlier. Now let us set one switch, say S,, to 1 . Records will flow across the switch simultaneously, resulting in the complete exchange of data between the two loops linked by the switch. It so happens that topping from a depth of d can be accomplished using a succession of record exchanges, each exchange moving the desired record one rung closer to the top loop. By the time this record has
s
MAGNETIC BUBBLE MEMORY AND LOGIC
267
b
FIG.30. The uniform ladder.
reached, and coiled within, the top loop, all the intervening records will have been demoted properly, by one loop position. This way topping is done by d exchanges, each taking precisely one period. The total time of d periods is half the time required of the fourstate ladder in the last section, for the important case of d < N / 2 . Again one can employ k uniform ladders in parallel; then each ladder will feature much smaller loops and correspondingly shorter periods. It turns out that this topping time is still not optimal, and can almost be halved again. The best way to top is by setting switch S d P kto 1 at the kth half period (rather than at the kth period as called for in complete exchange). But a switch, once set to 1, should still be held at 1 for exactly one period. This way the desired record slithers to the top, then coil up in the top loop in exactly (d + 1)/2 periods. A snapshot sequence of the slithering is given in Fig. 3 1 . This speed-up has a simple geometric explanation. During slithering the traveling record touches only one of the two arms of every intervening loop, but complete exchange would require traversing through both arms. Hence slithering is twice as fast for all but the top loop. The final coiling
TABLE111 COMPARISON OF TOPPING SCHEMES
Topping schemes Available operators Topping delay (periods) Definition of period (cycles) Data width (Numberof loops) Figure of merit (width x delay) Field stoppage and reversal?
Dynamic ordering
Character manipulation
G,R 2d
W - I)d
1 m
1 m( = 9)
2md
Yes
G,B
(N -
1)md
no
Uniform ladder (exchange)
Uniform ladder (slither)
Twin loops
Nonuniform ladder
G,B
G,B,X,D
s, = 0,
2d+
d
sj = 0, 1 (d + lY2
m ik
mlk
m lk
m lk
k
k
k
k
( N - I)md
2md*
md
no
no
no
(N -
1)d
1
m(d
+ 1)/2 no
MAGNETIC BUBBLE MEMORY AND LOGIC
269
red
,‘fa
f =fs i ‘/2
FIG.3 I .
/=fail
f
=c + 3/2
Topping by slithering.
takes another half period. The two-fold speed-up via slithering turns out to be quite general; another example will be given in the next section. A comparison of the various topping schemes is given in Table 111. In general, the dynamic ordering scheme is fast but requires field stoppage and reversal. For m-bit records, the requisite bit-slice approach calls for rn detectors, also field reversal and stoppage. The ladder schemes allow a steadily rotating field with no reversal nor even stoppage. k ladders can operate in parallel if k is a factor of m . The number of loops times the topping delay gives a figure of merit which perhaps captures the essence of the schemes, and allow the trade-offs to be discussed separately. It is, however, important to note that the use of too many ladders in parallel will result in too few bits per loop, then the cost of the switches may become noticeable. 6. Sorting
One of the most important uses of computer equipment is sorting, namely rearranging records in a given file in linear order, by a prescribed
270
TlEN CHI CHEN AND HSU CHANG
criterion. The latter usually is based on the values of a particular key field in each record. It has been estimated that 25% of all large machine time is spent on some kind of sorting (Knuth, 1973, p. 3). It is therefore interesting to see how sorting can be accomplished using magnetic bubbles. For the sorting task, both the bubble and semiconductor technologies are used together. Bubble ladders are used to rearrange data in multiple loops simultaneously. Fast semiconductor circuits are used to perform the comparison function on the key fields of the records, also to determine the strategy of the complex data movement of the bubble records, at a speed too fast to be noted by the bubble devices. This way bubble sorting can exploit multiprocessing techniques while under unified semiconductor control. 6.1 The Odd-Even Transposition Sort
A technique for systematic sorting by multiprocessing was proposed many years ago, but has seen little actual practice. This is the odd-even transposition sort scheme (OETS) which guarantees sorting in linear time, rather than ( N log N ) time (Knuth, 1973, pp. 241, 640). We first note that, in a linear chain of records, there are exactly two ways to pair up the neighboring records, to cover all available records except possibly at the two extremes. We shall call these two pairing schemes even and odd. For example, if the records are ABCDEF, then the two pairing schemes are
(AB) (CD) (EF) and
NBC) (DEF, respectively. Note that one technique will pair up two records which in the other technique would belong to different pairs. OETS uses the two pairing techniques in alternation. Within each pairing scheme the two operands in every pair are compared and rearranged if need be, to achieve a prescribed local ordering. The theorem then states that:
N pairing-permutation stages suffice to guarantee the complete sorting of N records, whether one starts with an odd stage or an even stage. The sorting of N = 6 numbers starting with either stage is shown in Fig. 32. It is clear that here no more than six stages are needed, in contrast to the ten stages using a single comparison mechanism.
MAGNETIC BUBBLE MEMORY AND LOGIC
271
FIG.32. The odd-even transposition sort.
Using the OETS algorithm in the form as given, the time saving is gained at a tremendous cost of equipment. For N records to be sorted, the straightforward implementation of OETS calls for N ( N - 1)/2 comparators, storage for N 2 records, also N input and N output ports. This large investment actually brings pipelined processing power: upon repeated usage, the system can deliver a complete set of sorted outcome at the end of every comparison-exchange period. On the other hand, this pipelining capability is seldom needed. 6.2 Folding the OETS Scheme
It is possible to economize on hardware via their repetitive use (for details, see Chen et al., 1978a). Briefly, the N comparison stages can be folded into one physical storage, to be used N times. The N input ports and N output ports can be reduced to one each. The net result is a structure for which the data movement can be carried out in an N-loop uniform ladder, with the switch settings determined by a fast arithmeticlogic unit capable of performing up to Ni2 comparisons in one record period. After the folding, the resultant equipment still sorts N records in N stages, and each stage takes exactly the same time as before. The pipelined throughput has been traded for hardware economy. The folded sorting engine does inflate the loading and unloading time, because the operands now need to travel the length of the ladder before reaching the starting configuration for OETS sorting. An implementation is sketched in Fig. 33. The ladder is loaded externally, with the key fields either stripped or copied into a fast memory,
272
TIEN CHI CHEN AND HSU CHANG Record output
Data input
---
Key Input
---------1
I I
I I I
Electronic Arithmetic Logic Unit
I
I I I I I I
1
L -------_ _ ---Uniform Ladder
Electronic
FIG.33. A ladder-based sort system.
where they are permuted by the OETS technique using semiconductor electronics. At the end of each key permutation stage, control bits are issued to the steering switches, which in turn control the movement of the bubble records. A single ladder suffices to contain all records during the permutation, and all exchanges of neighboring records within a stage can be done concurrently, with no interference. OETS by explicit exchanges would require up to N periods to complete: but again slithering halves this time to (N + 1)/2 periods, as long as semiconductor electronics can decide on N / 2 comparison-exchanges in a half-period. An example of the slithering OETS sort is given in Fig. 34.
6.3 Sorting within Input/Output Time The only unsatisfactory aspect in our sorting scheme is that it is too fast compared to the load-unload overhead of N periods each. The sorting takes ( N + 1)/2 periods, only 20% of the total time of (SN I)/ 2 periods. One could argue that sorting in the future will be applied directly in memory and no load-unload time need to be expended. Before this time, however, one may consider doing the next best thing, namely, if one must be charged with load-unload time, then the sorting time can be hidden partially or completely.
+
MAGNETIC BUBBLE MEMORY AND LOGIC
273
FIG.34. Sorting by slithering.
This can be done using the arrangement of ladders in Fig. 35. Consider the problem of loading the ladders serially, starting from the leftmost ladder. As long as one ladder has been loaded, sorting begins within it, while the next ladder is also being loaded. This way during the loading of a single ladder, a number of ladders to the left of it is undergoing sorting internally. The length of each ladder is so designed as to finish their sorting tasks within the time needed to load all ladders on its right. When the last ladder has completed its loading, all sorting activity also ceases. Also, for certain ladders, there is no need to sort (e.g., when the ladder has only one loop), and for others, the sorting is so simple that concurrence with loading is trivial; it turns out that a ladder with three loops can satisfy this requirement.
r 72
24
8
3
records
records
records
records
!
0 FIG.
3
80
35. A multiladder sorter.
TlEN CHI CHEN AND HSU CHANG At the conclusion of all input, all ladders will have completed their sorting. The ladder contents can now be merged by simultaneous readout and electronic merging into one long file, which could also be in bubble form. Semiconductor electronics is fast enough to merge a hundred ladders concurrently, which means more than 3loo records (for details and related schemes, see Chen e t uf., 1978a). It turns out that the sorting within input/output time can be done even within a single ladder under a modified sorting scheme. A detailed discussion (Chen er uf., 1978b) is outside the scope of the present survey. 7. Information Selection and Retrieval
Magnetic bubbles are expected to play a significant role in storing information. The retrieval of information can be by external attributes (explicit addresses), internal attributes (content addressing), or can be based on complex queries, requiring further processing and remapping into human-understandable form. We shall consider some information selection and retrieval schemes for the magnetic bubble medium. For magnetic bubble shift registers, the accessing of information by address must exploit the fact that bubble data movement is essentially free of charge. For the major-minor loop memory, the access consists of an electric current applied at exactly the right time, to move the bit-slice nearest the conductor to the major loop. In the situations below, exact timing alone will not suffice. A good strategy is to create an opening for the qualified data to pour forth by themselves. 7.1 Selection by Address
Consider the problem of selecting the contents of exactly one shift register out of N , N = 2". The contents of all registers are under the same drive field, ready to move the the right, say. The selection can be made by a gating device called the decoder, (see Fig. 36) which puts n processing stages in the path of each register, each stage eliminating half of the remaining contenders, until the desired lone survivor emerges. The stages are energized by electrical current; the collection of n such currents suffice to define an n-bit address exactly. The figure contains N parallel double tracks. Through the decoding control all unqualified data drop into the secondary tracks. This general decoder can serve the purpose of either read selection or write selection. If all N tracks carry thc same information, the scheme allows only the intended primary track to carry the data through, achieving the location selection purpose of a write decoder. On the other hand, if the data in all
MAGNETIC BUBBLE MEMORY AND LOGIC A
t+
-
A
t+
B
t+
275
B
t+ ( reject
+ ( if AE ) ( reject )
-+ ( if AB ) ( reject )
+ ( if AE ( reject )
-+ ( if AB
FIG.36. A general decoder.
N tracks are potentially different, the scheme will allow only one intended piece of data to cross over to the right, on a primary track, thus achieving the data selection purpose of a reud decoder. Actually one can remove the redundant tracks and define specific write decoders and read decoders, to do the job more economically. A write decoder would gate a single piece of source information toward a specific intended track, and a read decoder would gate data from a specific track through a common output port. The decoder principle can be generalized as follows: The selection process directs the data stream into either an active track or an inactive track. Along the active tracks read, write, or other logic functions can take place; these activities can be arranged in sequence, each enabled by a dedicated control line affecting all active tracks. The activation of the kth control line causes the kth type of function to be performed on data being carried on all active tracks. Numerous device designs have been evolved to enhance the versatility of the decoder (see Chang 1975, pp. 89-93 for details). The decoder is efficient, requiring a delay proportional to log N . Its usefulness has already been shown in text-processing (Section 4.2). (For a discussion of the combined use of decoders with major-minor loop memory devices. see Chen et ul., 1976). While Fig. 36 describes a current-controlled decoder, there could also be a bubble controlled version, with initiating currents depositing deflect-
276
TIEN CHI CHEN AND HSU CHANG
ing control bubbles, in the same spirit of the static deflector of Section 3.4, memorizing, so to speak, the decoding pattern in bubble form. This would eliminate the large power dissipation needed for the current control, and is certainly a preferred approach in address-based steering of information. 7.2 Associative Search
The most common form of content-addressing consists of examining a given field of every candidate record for matching with a known key. The record with a matching field is selected; in the case of multiple qualified records, a tie-breaking scheme is enforced so that only one is selected. Lee and Chang (1975) described a scheme to encode the key in (reversible) current form, to interact with bubble streams, so that an unqualified record disqualified itself by dropping a bubble to deflect the rest of the record. An improved version of their scheme is shown in Fig. 37. Here the record bit stream controls a dynamic deflector to deflect a stream of l’s, producing as output both the original and the 1’s complement of the record. We have now a copy of the record in two-cell per bit encoding. Then the key-encoded current is made to cause a bubble (from either the true or the complement stream) to enter a static deflector control position. There are two deflectors, one from the true stream, the other from the false stream. These deflectors then automatically deflect the remaining parts of the unqualified record. Using this technique, the selection is made in the real time of one single record streaming pass. All qualified records have the right to cross the associative search threshhold, and also all qualified portions of key fields of the unqualified records. Further steps can be taken to remove
I
partially
-
screened
R-
.R
‘, h‘,‘
R_
1
,/
fl.‘x _I
fully
screened .R
FIG. 37. An associative search scheme
MAGNETIC BUBBLE MEMORY AND LOGIC
277
all traces of unqualified records, and to select no more than one qualified record; these will not be elaborated here. The search operation uses one pair of “key lines,” independent of the length of the key in question. The same pair of conductor lines can transmit two or more sets of key signals, in well-timed sequence, to achieve the effect of successive screening based on several fields. Searches based on inequalities or other complex criteria can also be honored by sending different conductor signals, and/or by using more conductor lines (Lee and Chang, 1978). The bubble associative search is enhanced by a number of intrinsic advantages. All data contained in a long shift register flows through all points in the register. Therefore logic to be applied to all data need not be replicated and distributed throughout the register, it needs only to be installed at one point along the data path. The cost is a tiny increment to the register cost, and is not proportional to the number of bits in the memory. The considerable saving in cost allows the installation of very sophisticated search techniques. The associative search operation is thoroughly pipelined, and can be performed in flight repeatedly, interspersed with Boolean logic and steering control, as the candidate records are converging towards the memory output port. The steady state data flow rate of the qualified records is not affected by the act of screening. 7.3 Data Base Implications and Intelligent Storage
The terms data base connote huge amounts of data serviced by a complex assortment of software to facilitate user retrieval, system selfpreservation, and orderly growth. Magnetic bubbles should have a definite place in the future data bases not only as passive memory, but also as part of the retrieval logic. Several data base machines have been proposed, most of these predicated on an associative search capability based on head-per-track rotating devices. Baum and Hsiao (1976) pointed out that simple associative search will not suffice for the future, what will be needed are mechanisms which process data with a knowledge of the underlying data structure. Electronic disks are expected to play a significant role in this regard. Bubbles are already becoming price-competitive with head-per-track devices, and simple associative searches are but one instance of their potential logic power. The ability to perform pipelined Boolean logic, data rearrangement, and sorting will indeed make bubbles a serious contender for the data base structure memory, which in turn is an example of intelligc~nfstorage, relieving the burden on the central processor and the paths leading to it. Bubble processing power can be installed almost at
278
TIEN CHI CHEN AND HSU CHANG
will throughout strategic positions in the shifting memory. The stored data, under constant motion, can be steered through these distributed logic centers for processing during transit. Bubbles thus continues the new trend of combining memory with processing power. In the important area of data base processing, this combination holds the promise of allowing many storage modules to work more or less independently toward the same goal of service to the human community. Two magnetic bubble memory chips have been proposed for relational model associative search data base applications. The first one (Chang, 1978b) employs bubble-ladders in parallel for storage, each connected to an associative search cell, and all search cells are subjected to control by the same set of conductors. The crosslinked loops provide distinct and interchangeable columns. When a decoder is used, all rows are equally accessible. The quick simultaneous search on chip limits the readout to the qualified records only. It has been demonstrated that this chip can perform Cartesian product, union, intersection, difference, projection, join, and restriction operations, and thus has complete selection, update, and interrogation capabilities. The second ship uses more standard bubble memory components. It is based on the major-minor loop memory, with the major loop segmented for possible independent access. It employs semiconductor devices to provide associative search, and uses a marker loop to identify the qualified records by a bit. This marker loop can facilitate successive interrogations, by preventing the unqualified records from participating in the next search operation (for details, see Chang and Nigam, 1978).
8. Summary and Outlook: More than Memory
In earlier pages we have endeavored to give a coherent account of magnetic bubble technology. Many of the key ideas are fundamentally simple yet by no means obvious before their discovery. Here is a domain rich in potential and that can stimulate theoretical analysis and practical applications. Bubble memory is already a reality after a mere decade of development. There is no question that their high density, simple fabrication, low power dissipation, and nonvolatility will create a deserved demand in the market place. It would be a mistake to stop here. Bubble logic is a natural next step, and should be just as rewarding. While much slower than its semiconductor counterpart, bubble logic comes to its own in the vicinity of a bubble memory. For there it can
MAGNETIC BUBBLE MEMORY AND LOGIC
279
practically be thrown in forfiee. Electronic logic for the purpose would require considerable cost in bubble generation and detection. Bubble logic requires only simple additional controls, and the pipelined versions only add to path length, without decreasing the data flow rate. Also, the nonvolatility of bubble logic is a definite asset for guaranteed performance in hostile environments. The logic of information steering is still at its infancy. We have sketched its implications for storage management, sorting, and data base needs. Some of the steering techniques, developed for bubbles, can also be applied to other forms of shift-register technology, notably the chargecoupled devices. For text-processing, bubbles can serve as the single technology for processing logic, memory, and portable storage. Traditionally, all nonmemory operations in a computer system are done outside the memory, at the central processing unit. This has created congestions at the data-access channels and much unneeded processing. A case in point is in the associative search for records with matched keys, not an uncommon operation in database processing. With the availability of bubble devices of very large capacity, very low cost, and insignificant power dissipation, truly massive associative search will soon be practicable. It is provocative to consider their role in future intelligence storage, where a small embedded set of well-chosen, personalizable logic tools can eliminate the bulk of unneeded data access.
ACKNOWLEDGMENTS The authors are grateful to their colleagues. past and present. for making this survey possible. Much of the heretofore unpublished work reported here has been the result of our collaboration with Dr. Chin Tung. now of the IBM Office Products Division, San Jose. California. We acknowledge the stimulation received through conversations with Dr. Ta-Lin Hsu, of the IBM San Jose Research Laboratory, and Professor Share-Young Lee, University of Taiwan. One of us (T.C.C.) benefited from discussions with Dr. Werner Kluge, Gesellschaft fur Mathematik und Datenverarbeitung mbH, Bonn, Federal Republic of Germany. REFERENCES Anonymous (1977). Magnetic bubble memories. past-present-future. Digitdl D r s . May, pp. SO-66. Archer, J . L. (1977). Megabit bubble chip. 1077 I N 7 F R M A G Conf. Digest, (Anaheim. California) Pap. 1 1 - 1 . Baum, R. I., and Hsiao, D. K . (1976). Datalase computers-a step towards data utilities. IEEE Trrins. C’ornprrt. C-25, 12.54-1259. Beausoleil, W . F., Brown. D. T., and Phelps, D. F. (1972). Magnetic bubble memory organization. I B M J . R r s . U e v . 16, 587-691.
TIEN CHI CHEN AND HSU CHANG Belady. L. A. (1966). A study of replacement algorithms for a virtual storage computer. f B M Sysf. J . 5, 78- 101. Bobeck, A. H. (1967). Properties and device applications of magnetic domains in orthoferrites. Bell Syst. l e c h . J . 46, 1901- 1925. Bobeck, A. H. (1977). The development of bubble memory devices. ELECTRO 77 Professional Program Pap. 12/1. Bobeck, A. H., and Della Torre, E. (1975). “Magnetic Bubbles.” North-Holland Publ., Amsterdam. Bobeck, A. H., and Scovil, H. E. D. (1971). Magnetic bubbles. Sci. Am. 229, 78-90. Bobeck, A. H., Danylchuk. I., Rossol, F. C., and Straws, W. (1973). Evolution of bubble circuits processed by a single mask level. fEEE Truns. Mugn. MAG-9, 474-480. Bobeck, A. H., Bonyhard, P. I., and Geusic, J. E. (1975). Magnetic bubbles-An emerging new memory technology. Proc. f E E E 63, 1176- 1195. Bobeck, A. H . , Scovil, H. E. D., and Shockley, W. (1970). Magnetic logic arrangement. U.S.Patent 3,541 322. Bonyhard, P. I., and Nelson, T. J. (1973). Dynamic data reallocation in bubble memories. Bell Syst. T w h . J . 52, 307-317. Bonyhard, P. I., and Smith J. L. (1976). 68 kbit capacity 16pm period magnetic bubble memory chip design with 2 pm minimum features. IEEE Trans. M a g n . MAG-12, 614617. Breed, D. J., Voermans, W. T., Logmans, H., and van de Heijden, A. M. J . (1977). New bubble materials with high peak velocity. fEEE Trans. Magn. MAG-13, 1087- 1092. Calhoun, B. A . , Voegeli, 0.. Rosier, I., and Slonczewski, J. (1975). The use of bubble lattices for information storage. A I P Conf.Proc. 24, 617-619. Calhoun, B. A., Eggenberger, J . S. , Rosier, L. L., and Shew, L. F. (1976). Column access of a bubble lattice: Column translation and lattice translation. f B M J . Res. D e v . 20, 368375. Carey, R., and Isaac, E. D. (1966). “Magnetic Domains and Techniques for their Observation.” Academic Press, New York. Chang, H., ed (1975). “Magnetic Bubble Technology.” IEEE Press, New York. Chang, H. (197Ra). Magnetic bubble technology. I n “The Encyclopedia of Computer Science and Technology” (J. Belzer et a / ., eds.) Dekker, New York. Also issued as separate monograph. Chang, H. (1978b). “Bubbles for Relational Data Base,” IBM Res. Rep. IBM Watson Research Center, Yorktown Heights, New York. Chang, H., and Cohen, M. S. (1976). “Joinable Magnetic Bubble Loops Using CurrentControlled Switches,” IBM Tech. Disclosure Bull. No. 18, 3856-3858. IBM Watson Research Center, Yorktown Heights, New York. Chang, H., and Nigam, A. (1978). “Modified Major/Minor Loop Chips for Relational Data Base,” IY78 INTERMAG, Conf. Digest (Florence, Italy) Pap. 1-5. Chen, T. C., and Tung, C. (1976). Storage management operations in linked uniform shiftregister loops. 1BM J. Res. D e v . 20, 123-131. Chen, T. C., Chang. H., and Lee, S. Y. (1975). A magnetic-bubble pipelined rewritable universal logic array. 197.5 Infermug Conf. Digest (London) Pap. 25.9. Chen, T. C., Eswaran, K. P., Lum, V. Y.,and Tung, C. (1978a). Simplified odd-even sort using multiple shift-register loops. f n t . J . Comput. fnJ Sci.. 7, 295-314. Chen, T. C., Lum, V. Y., and Tung, C., (197Rb). “The rebound sorter: an efficient sort engine for large files,” Proc. Fourfh f n f . Conf. on Very Large Data Bases W. Berlin, Germany). Chen, T. T., Oeffinger, T. R., and Gergis, I. S. (1976). A hybrid decoder bubble memory organization. fEEE Truns. Mugn. MAG-12, 630-632.
MAGNETIC BUBBLE MEMORY AND LOGIC Coffmann, E. G., Jr., and Denning, P. J. (1973). “Operating System Theory,” Ch. 7. Prentice-Hall, Englewood Cliffs, New Jersey. Cohen, M. S., and Chang, H. (1975). The frontiers of magnetic bubble technology. Proc. IEEE 63, 1196- 1206. Garey, M. R. (1972). Resident-bubble cellular logic magnetic domains. IEEE Trans. C o m put. C-21, 392-396. Gergis, I. S . , George, P. K., and Kobayashi, T. (1976). Gap tolerant bubble propagation circuit. IEEE Trans. Magn. MAG-12, 631-653. Gibson, D. H. (1974). The cache concept for large scale eomputers. In “Rechnerstrukturen” (H. Hasselmeier and W. G. Spruth, eds.), pp. 199-223. Oldenbourg Verlag, Munich. Ho, C. P., and Chang, H. (1977). Field access bubble lattice propagation devices. IEEE Trans. Magn. MAG-13, 945-952. Hu, H. L., Beaulieu, T. J., Chapman, D. W., Franich, D. M., Henry, G. R., Rosier, L. L., and Shew, L. F. (1978). 1K bit bubble storage device: initial tests. J . A p p l . Phys. 49. 1913- 1917. Juliussen, J. E., Lee, D. M., and Cox, G. M. (1977). Bubbles appearing as microprocessor mass storage. Electronics August, pp. 81-86. Kinoshita, K., Sasao, T., and Matsuda, J. (1976). On magnetic bubble logic circuits. IEEE Trans. Comput. C-25, 247- 253. Knuth, D. E. (1973). “The Art of Programming.’’ Vol. 3, “Sorting and Merging.” AddisonWesley, Reading, Massachusetts. Kooy, C., and Enz, V. (1960). Experimental and theoretical study of the domain configuration in thin layers of BaFe,O,,. Philips Res. R e p . 15, 7. Lee, D. M. (1977). Bubble memory for microprocessor storage. IEEE C O M P C O N 77 Spring 77 Dig P a p . pp. 232-235. Lee, S . Y., and Chang, H. (1974a). An all-bubble test-editing system. IEEE Trans. Magn. Mag-10, 746-749. Lee, S . Y., and Chang, H. (1974b). Magnetic bubble logic. IEEE Trans. Magn. MAG-10, 1059- 1066. Lee, S. Y., and Chang, H., (1975). Associative search bubble devices for content addressable memories. IEEE C O M P C O N Fall 75 Dig.P a p . pp. 91-92. Lee, S. Y., and Chang, H. (1978). “Associative Search Bubble Devices for ContentAddressable Memory and Logic,” IBM Res. Rep. IBM Watson Research Center, Yorktown Heights, New York. Lee, S. Y., Chang, H., Chen, T. C., and Tung, C. (1974). Text editing using magnetic bubbles. IEEE C O M P C O N Fall 74 Dig. P a p . pp. 69-72. Lin, Y. S ., Almasi, G. S., and Keefe, G. E. (1977). Manipulation of I p m bubble with coarse (greater than 4 p m ) overlay patterns. 1977 INTERMAG Conf. Digest (Anaheim, California) Pap. 11-6. Mattson, R. I., Gecsei, J., Slutz, D. R., and Traiger, I. L. (1970). Evaluation techniques for storage hierarchies. IBM Sysf. J. 9, 78- 117. Minnick, R. C., Bailey, P. T., Sandfort, R. M., and Semon, W. L. (1972a). “Magnetic Bubble Logic,” IEEE WESCON 1972 Proc. Paper 814. Minnick, R. C., Bailey, P. T., Sandfort, R. M., and Semon, W. L. (1972b). Magnetic bubble computer system. Proc. AFIPS Fall J . Comput. Conf. pp. 1279-1298. Morrow, R. H., and Perneski, A. J. (1970). “Single Wall Domain Apparatus Having Intersecting Propagating Channels,” U.S. Patent 3,543,255. Myers, W. (1977). Current developments in magnetic bubble technology. Computer, August, pp. 73-82. Nielsen, J. W. (1976). Bubble domain memory materials. IEEE Trans. Magn. MAG-12, 327-345. O’Dell, T. H. (1974). “Magnetic Bubbles.” Macmillan, New York.
TIEN CHI CHEN AND HSU CHANG Pohm, A. V. (1975). Cost-performance perspectives of paging with electronic and electromechanical backing stores. Pro(.. I Pugh, E. W. (1971). Storage hierarchies, gaps, cliffs and trends. IEEE Trutzs, Mtrgn. MAG7 , 810. Sandfort, R. M., and Burke, E. R. (1971). Logic function for magnetic bubble devices. IEEE Tratis. M a p . MAG-7, 358-361. Smith, A . B., ed. (1974). "Bubble Domain Memory Devices." Artech House. Dedham, Massachusetts. Takahashi, K., and Kohara, H. (1975). Symbol string recognition by magnetic bubble devices. IEEE C O M P C O N Fall 75 Dig. Ptip. pp. 93-96. Takasu, M . , Maegawa, H., Furuichi, S., Okada, M., and Yamagishi, K. (1976). A fast access memory design using 3 p n bubble 80K chip. IEEE Trans. Mugt7. MAG-12, 633635. Tsubaya, I., Saito, M., Hattenda, T., Yamaguchi, N., and Arai, Y. (1977). 2M-bit magnetic bubble memory. IEEE Tmns. M a g n . MAG-13, 1360- 1363. 'l'ung, C., Chen, T. C., and Chang, H. (1975). Bubble ladder for information processing. IEEE Trans. Magri. MAG-11, 1163-1165. Williams. R . P. (1977). Serial arithmetic with magnetic bubbles. IEEE Trans. Cottiput. C26, 260-264. Wolfe, R . , and North, J. C. (1972). Suppression of hard bubbles in magnetic garnet films by ion implanktion. BPI/ Syst. .I. 51, 143fi-1440. Wong. C. K., and Coppersmith, D. (1976). The generation of permutations in magnetic bubble memories. IEEE 7rati.s. Cornput. C-25, 254- 262. Ypma, J. E.. and Swanson, P. (1977). Design and performance of a IOOK-byte serial bubble recorder. IEEE C'OMPC'ON Spring 77 Dig.P a p . , pp. 239-242.
ADVANCES IN COMPUTERS, VOL. 17
Computers and the Public’s Right of Access to Government Information ALAN F. WESTIN Department of Political Science Columbia University New York, New York 1.
2.
3. 4. 5.
Information Technology and Government Secrecy . . . . . . . . I . 1 The ,Precomputer Setting of Government Secrecy Issues . . . . . 1.2 General Patterns of Computer Usage and Their Organizational Impact in Government . . . . . . . . . . . . . . . . . Computer Impact on Public Access: Reports from the Information-Holders and Information-Seekers . . . . . . . . . . . . . . . . 2.1 Reports from the Federal Information-Holders . . . . . . . . 2.2 Reports from the Information-Seekers . . . . . . . . . . An Analysis of the Access Situation . . . . . . . . . . . . Recommendations for Action . . . . . . . . . . . . . . The Future of Information Technology and Democratic Government . . . References . . . . . . . . . . . . . . . . . . . .
283 284 289 292 294 303 309 311 314 315
1. Information Technology and Government Secrecy
In the first 15 years of large-scale computerization, substantial attention has been paid to the impact of computerization on individual privacy, organizational decision-making, citizen participation in group and political life, and changing power relationships in organizational structures. Yet only in the past few years have we begun to recognize an impact of information technology equally vital to democratic societies: The effects of computer use on the public’s right of access to government information (see Westin, 1974).’ Clearly, an increasing proportion of the information that government agencies store about people, property, and transactions is today going into automated files. So many files of the executive branches of local, state, regional, and national government in the United States have been computerized that our government’s client services, administrative operations, program evaluation, and manage* Substantial portions have been used for this article, with the permission of the copyright holder, The Committee for Public Justice, and the publisher, Viking Press. 283 CopynshlQ 1978 by Academic Press. Inc. All rights of reproduction in any form reserved. ISBN G12-012117-4
284
ALAN F. WESTIN
ment planning are now permanently embedded in computer and communication systems. Such heavy reliance on computers raises important issues in terms of government secrecy. Putting aside science-fiction approaches (“The machines will take over!”), fears have been voiced that computerization may *make it harder for the average citizen, public-interest group, etc., to know what information is in automated files and how it is used, because the specialized procedures of ADP are not yet well known; *create delays in furnishing information, as agencies cite data-conversion problems and system “bugs” as reasons why they cannot furnish information in computer storage; *make it more difficult for persons to “browse” in public records after these are converted from traditional, eye-readable registers, kept on open shelves, to storage on computer tapes or discs, available only on special call; *create large computer data banks whose software for extracting information is geared to what agency directors wish to see produced, making it a costly special effort to produce information sought by publicaccess groups; dead to significant reliance by government officials on computer-based decision-making aids (models, simulation, etc.), whose use is not made known to the public. Such concerns about computers and secrecy have not received much empirical investigation and little treatment by legislative committees. The public’s right to know may well suffer if we allow this inattention to continue. To put this inquiry into proper perspective, two pieces of groundwork are needed: 1) the legal and political settings of government secrecy must be understood, in order to compare public access in manual systems with computerized record systems (for an excellent discussion of these issues, see Westin, 1974); and 2) the basic patterns of computer usage in those government agencies that are automating. 1.1 The Precomputer Setting of Government Secrecy Issues
The public’s right to know (and thereby control) what the executive branch of government is doing is a pivotal element in the American constitutional scheme. The Framers believed that executive affairs were not the private preserve of a President or his agents in the classic mode of royal households, but an instrument of popular government to be
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
285
conducted mainly in public view or subject to public inspection. To carry out this policy, Congress was given power to investigate and oversee the operations of executive agencies (with authority to compel disclosure of information). American courts could compel the production of information by executive agencies whenever necessary to insure a fair trial to defendants in criminal prosecutions or to test the rational basis of government regulatory action involving property rights or personal liberties. Under a combination of common law and statutory rules, the general public had a right of access to government information. These rules provided that certain records kept by government (for example, tax and land registers) were public documents, open to inspection by anyone. Other records could be inspected by persons with a particular, legitimate interest in seeing them (such as a parent inspecting a child’s official school record). Access by the press and by individuals without a specific legal interest would generally depend on the wording of state and federal statutes. [For articles recounting the development of public-access laws, see U.S. Congress (1964).] Generally, the courts would balance claims to access by Congress, litigants, and the public against what was usually termed the broad public interest in protecting the integrity of the governmental process. This produced rulings exempting from disclosure such matters as state secrets, investigative files, internal memoranda by agency officials, information obtained from citizens under promise of confidentiality, and certain information protected by a vague but real doctrine called executive privilege. These rulings show that demands to make government information public, central as this is to the citizen’s knowledge and control of public affairs, can conflict with two other values of democratic society: the right of privacy of individuals about whom sensitive information is stored in government files (such as census, tax, medical, and social-services records), and the need of all formal organizations, governmental or private, for periods of temporary privacy, in which to gather confidental information, solicit frank advice, conduct secret negotiations, formulate positions, and reach executive decisions (see, generally, Westin, 1967). In general, statutes and judicial rulings prior to World War I1 were very generous in accepting government claims that disclosing certain information would jeopardize the orderly and efficient operations of government. But since World War 11, there has been increasing government activity in defense and social-welfare areas and strong pressures from the press, Congress, and public-interest groups to make government more responsive. The clear trend has been toward opening up executive-agency files to public access. The single most important step to date was passage of the federal Freedom of Information Act and various state counterparts.
286
ALAN F. WESTIN
The federal act, which went into effect in 1967, declared as its central principle that when a request is made by “any person” for identifiable records, in accordance with published rules as to time, place, and fees, each agency “shall make the records promptly available . . . ” To codify certain types of information that would still not be opened for inspection under this general principle of access, Congress set forth nine categories of exempt data:
( I ) defense or foreign policy secrets authorized to be kept secret by executive order; ( 2 ) matters which relate solely to internal personnel rules and practices of an agency; ( 3 ) matters specifically exempted from disclosure by statute; (4) trade secrets and other types of confidential commercial information obtained from private parties: (5) interagency or intra-agency memoranda or letters dealing solely with matters of law and policy, which would not be available by law to a private party in litigation with the agency; (6) personnel and medical records and similar matters “the disclosure of which would constitute a clearly unwarranted invasion of personal privacy”; (7) investigatory files compiled for law-enforcement purposes, except to the extent available by law to a private party; ( 8 ) certain reports prepared in regulation of financial institutions; and (9) geological information concerning wells. In terms of procedures, a person denied information was given the right to sue the executive agency in federal court, with the burden of proof resting on the agency to justify its action in withholding the data. The courts can enjoin agencies from continuing to withhold information that should be furnished under the Act, and punish refusal to comply as contempt. While the Act opened up important areas of government information to public access, and provided a general legal basis for claiming such rights, the way in which executive agencies have operated under it since 1967 drew heavy criticism from Congressional spokesmen, the press, and public-interest groups. They stressed that government refunds under the nine exemption sections of the Act have been disturbingly broad and frequent; that unreasonable delays took place in responding to requests for information; that the charges for providing copies of records were often beyond anything needed for recovery of costs; and that the readiness of agencies to refuse access and maintain lengthy suits in the courts made use of the Act extremely difficult for those without substantial
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
287
money resources (such as individuals and public-interest groups) o r without the time to wait (such as the press, with news deadlines, see, e.g., Wellford, 1974). And, while the courts overruled many claims of privilege under the Act, in the almost 200 cases litigated between 1967 and 1973, critics stated that only an extensive revision of the Act could overcome the defects in its wording and machinery (Wellford, 1974). Congressional hearings into the operations of the Act were held during 1972 and 1973,2and major amendments were enacted in 1974. In November of 1974, Congress overrode a veto by President Gerald Ford and these amendments became law.3The 1974 Amendments strengthened the public’s right to information by broadening the categories of federalagency records that must be disclosed. It tightened up some of the exemptions from access that may be claimed by government agencies and accelerated the time within which agencies must respond to requests for information (responses must be within 10 business days of receipt of a request). If agencies refuse to furnish information and an individual or organization brings suit to complel disclosure, the burden rests with the agency to establish that the records can be lawfully withheld, and courts can assess attorneys’ fees and litigation expenses against an agency whose claim to withhold is found to be unfounded. In addition, any government official found to have withheld information “arbitrarily or capriciously” is subject to disciplinary action by the U.S. Civil Service Commission. Apart from the legal framework, the public policy issue of how much information government should have to reveal about its “sensitive” activities-foreign or domestic-has clearly been caught up in the turbulent politics of the past decade. The secretive conduct of the Vietnam War led anti-war groups, Congressional critics, and leading press and TV forces to demand that far more information about military and foreign affairs be released than either the Johnson or Nixon administrations were willing to disclose; the revelations of deliberate falsification of reports and lying by executive officials (as in the now-admitted deceptions over bombing in Cambodia in 1971 and in the Watergate coverups) have sharpened the attacks upon claims of executive privilege. Another fundamentally political dispute has involved the conduct of federal regulatory agencies. As against the traditional mode of regulation-fostering
* For a summary of these hearings, see Congressional Quarterly Almanac, 1974, index headings under “Freedom of Information.” The Freedom of Information Act is found in Section 552 of Title V, U . S . Code. For a clear, popular discussion of the 1974 Amendments, see American Civil Liberties Union (1975).
288
ALAN F. WESTIN
industry well-being and moving against only the most flagrant violations of public interest and safety-public interest groups in the Nader mold have sought to force regulatory agencies to be more zealous advocates of the consuming public. The resulting demands for the production of “internal” agency records as to standards, procedures, and relations with the regulated have been resisted by government spokesmen (and business spokesmen) as threatening government’s capacity to operate and business willingness to cooperate with informational requests. Because such leading right-to-know challenges are fundamentally political-involving matters of ideology, socio-economic power, and basic institutional reform-the governmental actors involved (the President, Congress, and the courts) look in the moments of ultimate decision to see what the American public thinks about the central issue. When the public concludes that disclosure in a given situation is really necessary to expose wrong-doing, and believes that it would not really cripple the executive function, then Presidents usually negotiate compromises with Congressional committees to make the information available, or courts feel ready to order disclosure. When the public is strongly behind the claim of executive privilege, either out of support for the policy being pursued or of a president being assailed unfairly in Congress and the press, the executive usually stands behind the claim of privilege; courts will then usually find ways to avoid deciding the issue or to uphold the executive, and Congressional forces will find it prudent not to press the challenge too aggressively. One other broad generalization about the policy setting is worth noting. Resort to claims of privilege and secrecy by executive agencies has not followed lines of political party, section, or ideology in our national history. Rather, such claims adhere almost as a badge of office. Each new administration tends to stress at the outset its deep commitment to open government, promising to be more accessible to public view than its predecessor. But it soon finds the need to withhold from the press, Congressional committees, the opposition party, and others certain sensitive data the release of which, it asserts, would deeply jeopardize the national interest, and the administration’s sense of its embodiment of that high purpose. This means that a substantial tension between all occupants of the executive branch and various information-seekers should be seen as a permanent phenomenon. In terms of that situation, continuing pressure to penetrate the secrets of government operations in the politically critical disputes of each era represents a necessary force to counteract the self-protective tendency of executive officials to resist disclosures that are inconvenient to the current administration. What is important to recognize, therefore, is that computerization
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
289
began to spread through the American government establishment at a time when there were disputes over 1) what information should be open to public access under freedom of information laws and 2) what constitutes prompt, complete, and reasonably priced compliance with demands for information to which the public was entitled. Whether computers, adding machines, or three-by-five cards are used to manage information is separate from the policy issue. But a basic factual question is whether the move to automated files and procedures has made it easier, harder, or has had no significant effect on compliance with existing law. 1.2 General Patterns of Computer Usage and Their Organizational Impact in Government
Patterns of computer usage vary widely in government, from department to department, even from bureau to bureau within a department. In organizational terms, this is because each unit starts with records and files unique to its mission, with distinct functions to perform through its collection and use of information, and with distinct styles of management (including attitudes toward providing information to the public). Furthermore, government officials have a wide range of choices about technology-what kind of computer hardware to adopt and software to buy or develop; which files to automate first; what aids to decision-making or management-reporting to attempt through computer recources. Few centralized controls (either executive or legislative) have yet attempted to set uniform guidelines for computerization at a given level of government. So automation today is largely a process of agency-by-agency decision. Nevertheless, there are some patterns to computerization in government. These help in analyzing computer impact on freedom-of-information problems. Computer usage enters most organizations through payroll and financial applications, then moves into automation of the largest and most frequently used files. In addition, computers are used to produce various management reports and statistical summaries. This is the stage that most agencies are currently at in computerization. Some departments or bureaus, however, have begun to develop multifile applications, merging several files into a city, state, or federal data bank in a particular subject area (social services, for example). A few cities, counties, and states are attempting to create a jurisdiction-wide data bank containing files from many different agencies (e.g., police, tax, welfare, health). In addition government agencies are using computers to augment data-sharing among government jurisdictions at the same level (police information systems in metropolitan areas) or at different levels (the FBI’s National Crime Information Center or the National Drivers Registration system).
290
ALAN F. WESTIN
The report by the National Academy of Sciences Project on Computer Data Banks in 1972, a three-year study into computers and privacy, found that, so far, computer use has reproduced rules and practices of the manual era rather than transforming them (Westin and Baker, 1972). Computer use has not yet led to the collection of more detailed and intrusive information about people, to sharing data with different types of agencies than had been the practice before computerization, or to any lessening of the individual citizen’s ability to know and contest the information that was being collected about him or her in a given file. Whatever rules the agency had before computerization (whether respectful of privacy and due process or hostile to them) continue to be reflected in the computerized operations. Where laws or public values have changed secrecy rules, the NAS project found that managers of computerized systems were as able to comply (and were complying) as were managers with manual files. And, highly intrusive new record systems created during the past decade, such as the Army’s monitoring of civilian protest groups, were either wholly manual record systems or were being aided only marginally by computers. The National Academy of Sciences (NAS) report also showed that there are no fully automated government agencies today. In every department or bureau, some files remain wholly manual. Some decisions are made after examining records or conducting interviews and negotiations. Some reports are prepared and distributed without machine assistance. Even when a file has been automated, there is often a back-up system with the paper or card records that provided the initial entry; or a microfilm or microfiche record is created for archival use in addition to the primary storage on computer tape or disks. The general rule, for reasons of ease and cost in technological usage, is that files which contain the most objective and easily recorded information and which are needed and consulted most frequently will be the first automated. At the other end of the computerization process, the files least likely to be automated are those with extensive narrative material, highly subjective reports, and sensitive information. The most important results of computerization discovered by the NAS study were more complete and up-to-date files. There was more analysis and use of information already in the files. Automation produces greater accuracy in some areas of record-keeping, but also introduces the possibility (and likelihood) of other errors. Computerization is leading to the creation of larger and more extensive networks of information-sharing groups. Computers are making possible the creation of some very large data banks, which would probably not have been built, or built as rapidly. but for computer resources.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
291
What the NAS study showed is that computerization has not solved certain problems important to privacy and secrecy issues. Computers do not eliminate the power struggles over control of information within an organization or the self-protective manipulation of information among “cooperating” agencies (whether in intelligence, business regulation, or taxation). The competitions and conflicts of the American federal system have been faithfully mirrored in computer usage so that in any given geographic area, separate computer systems will be maintained by municipalities, counties, special authorities, state agencies, and federal bodies covering services to the same population. There has been no amalgamation of different government functions o r jurisdictions into a common computer management. Finally, several popular notions about how computerization affects government operations should be set aside as red herrings. They confuse questions of public policy with changes in the technological execution of those policies. One is that computers foster the creation of secret files. Government agencies have for centuries been able to maintain secret files about people or transactions so that their existence was unknown to the public. Using index cards, file folders, microfilm, and other precomputer storage media, millions of records were efficiently maintained in these government files. Using the mails, teletype, telephone, and other means of precomputer data communications, some file systems collected and distributed information on a nationwide basis among hundreds of local and regional offices. Computers are not an essential element in the creation of either secret files, or secret national data networks. There is no evidence that computers are making it significantly easier for government to create or manage secret files. Agencies creating secret computerized files still face the problems of hiding fund expenditures from legislative or press view, preventing leaks by defecting employees, and keeping the sources of data secret when reports are produced and used elsewhere in government. The fundamental issues remain: Should any secret files be permitted by law? How can the public discover unlawful secret files, whatever the form of data storage and cammunication? A second red herring is that computers substantially facilitate government manipulation of information. For centuries, some government officials have lied. For centuries, there have been subordinates who falsified reports about what was happening in local areas-in warfare, diplomatic developments, social trends-in order to provide superiors back home with the reports they demanded. Computers continue the capacity for these deceptions. Inaccurate reports about “safe” villages in South Vietnam, contributed during our pacification policy in the mid1960s, went into computer systems. These produced the glowing reports
292
ALAN
F. WESTIN
of success the Johnson administration wanted to hear. Similarly, false reports sent by base commanders from Cambodia in 1969-1970, about where American bombers were flying their sorties, enabled the Defense Department's computers to print out incorrect data. These lies were communicated to Congress, in keeping with the policy of secret bombings ordered by President Nixon and Secretary of Defense Laird. Computer systems simply represent another tool officials can use to falsify or distort events or conditions. Computers did not create this capacity, nor do they ease the difficult task for government critics of learning the facts and exposing manipulation of data. Although computers are not creating or inevitably leading to secret files or falsification of information, they do have important physical and administrative consequences that may require new legal rules and supervisory mechanisms to prevent abusive secrecy practices. The goal is to identify these consequences and to frame responsive policies for them. This brief portrait of computer use by government presents only highlights of the current situation. But it provides a framework for my main question-how computers affect compliance with freedom-of-information laws. 2. Computer Impact on Public Access: Reports from the
InformationHolders and Information-Seekers
To gather data about computer impact, I sent letters of inquiry and conducted follow-up interviews with two groups: the information-holders and the information-seekers. I wrote to 28 federal bureaus, agencies, or departments in 1973, asking them, first, whether their use of computers made it harder or easier to _comply with freedom-of-information laws, or had no significant effect. This question was followed with more detailed questions about possible differences with respect to the character of groups seeking access, the type of inquiry, effects on the costs of compliance, and similar matters. Specific examples were requested. Of the 28 agencies, 23 sent full replies. Some provided a general report on their computer operations, usually where the agency was a relatively small or single-purpose one or where its computerization was limited.4 Other agencies, usually the larger federal departments, concluded that no Agencies furnishing a general reply were the Atomic Energy Commission, Department of Agriculture, Civil Aeronautics Board, Department of Housing and Urban Development, National Labor Relations Board, Federal Communications Commission, Federal Power Commission, and Federal Trade Commission.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
293
composite reply could reconcile the differences among various units. These agencies provided individual responses from their units. In all, there were commentaries from 43 offices, bureaus, agencies, and departments of the federal government. Of course, these present a manager’s view. One does not expect official spokesmen to respond that they are not complying with the law. If computer systems or temporary problems of computer operations were making compliance with access policies slower, more costly, or less complete than under manual procedures, the replies would not be likely to say this. Finally, the replies did not present a set of answers for each major information file maintained by a given bureau or agency. So we do not have a comprehensive picture of practice in each agency. An attempt to verify managerial views of computer impact on public access would require study of precomputer practices, analysis of procedures in the automated files, and detailed comparisons in each agency. Since this was not possible with the resources then at hand, I compensated for the self-protective tendencies in managerial reporting by sending letters and conducting interviews with information-seekers, by writing to those who regularly battle to pry information from government agencies. I wrote to spokesmen for five categories of groups. The groups, and the number in each responding with substantive replies, were 6 General Counsel to Congressional committees 8 Public-interest law firms and research groups “Guardian groups” (civil rights, civil liberties, consumer interest, 10 women’s rights) Agencies supplyingresponsesfromvarious constituentunits were the Department ofCommerce (Patent Office, Social and Economic Statistics Administration, Assistant Secretary for Administration, Office of Organization and Management Systems), Department of Defense (United States Air Force, Office of the Secretary), Department of Health, Education, and Welfare (Social Security Administration, National Institute of Education, Office of Education, National Institutes of Health), Department of Justice (Bureau of Narcotics and Dangerous Drugs, Immigration and Naturalization Service, Law Enforcement Assistance Administration, Federal Bureau of Investigation, Assistant Attorney General for Administration), Department of Labor (Office of the Secretary, Occupational Safety and Health Administration), Department of State (Record Services Division, Agency for International Development), Department of Transportation (Office of the Secretary-Director for Management Systems, United States Coast Guard, Federal Aviation Administration, Federal Highway Administration, National Transportation Safety Board, Office of Policy, Plans, and International Affairs, Office for Environment, Safety, and Consumer Affairs, Ofice of Systems Development and Technology, National Highway Traffic Safety Administration, Office of the General Counsel, Urban Mass Transportation Administration), and Veterans Administration (Department of Medicine and Surgery, Department of Veterans Benefits, Information Service, Controller, Department of Data Management).
294
ALAN F. WESTIN
Freedom-of-information committees of media associations Investigative writers
5 6
These 35 groups were asked whether they found that automation made a difference in efforts to obtain information from executive agencies. I asked for specific instances in which the presence of computerization had made information retrieval easier or harder. This section contains reports on computerization and public access by spokesman for the two “opposing camps’’ in the freedom-of-information contest. My main analysis on the developing trends and what needs to be done to advance the cause of public access follows later. 2.1 Reports from the Federal Information-Holders
About a fifth of the agencies report that computerization has not progressed far enough to have an effect on compliance with freedom-ofinformation laws. They note that computerization has been limited to payroll, financial, and other housekeeping operations not subject to disclosure, Principal files subject to query under freedom-of-information laws remain in manual form. Among those reaching this conclusion were the State Department, the Federal Power Commission, the Federal Trade Commission, the Immigration and Naturalization Service, and the Federal Bureau of Investigation. The FBI, for example, responded that the Bureau has automated “payroll, fiscal, and some selected FBI personnel data” used only for “internal administrative purposes.” Investigative files as such, “have not been computerized.” Legal rules restrict access to wanted-person and stolenproperty data in the National Crime Information Center (NCIC) to lawenforcement agencies, state licensing authorities, and certain federally insured banks. Though similar restrictions apply to the summaries of individual criminal histories now being added to NClC it is the administrative policy of the NCIC that an individual may “see and challenge the contents of his record . . . subject to reasonable administrative procedures.” Because of the small number of criminal-history summaries in NCIC as of February 1973, the FBI reported that no one had yet sought to examine his NCIC record. In conclusion, Acting Director L. Patrick Gray wrote, “computerization is not a significant factor in generating response information” by the FBI to persons and groups seeking access under freedom-of-information laws. Where the records are legally open, he stressed, manual files and procedures are still the basic mechanisms for response.
” Letter from L. Patrick Gray Ill, Acting Director, Federal Bureau of Investigation, February 23, 1973.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
295
A few other agencies report that, although they have automated files that are open under public access rules, computerization has not had any appreciable effect on furnishing information. The U.S. Coast Guard wrote, for example, “We can identify no significant effects of computerization on our ability to release data to the press, public or Congressional groups. As a relatively small organization, we have always been able to quite rapidly derive answers to questions, identify it as releasable data and make prompt release^."^ A more detailed description of “no appreciable effect,” based on the types of requests the agency receives, was supplied by the Federal Highway Administrationa: Computerization has had no effect on this Administration’s handling of requests for information under the provisions of the Freedom of Information Act. Most of the requests we receive [approximately 85 percent) are for copies of our directives, or copies of motor carrier accident investigation reports. Requests for these items are handled promptly [one to three days on the average). Extra copies of our directives are maintained for this purpose, and copies of accident reports are duplicated. We do not believe that computerization would be effective in facilitating our handling of these types of requests. Most of the other requests we receive are for narrative type information [correspondence, etc.) on specific subject areas. This information is not of the type that would lend itself to computerization. Requests for this kind of material are forwarded directly to the office of primary interest and are normally answered promptly, even when questions arise as to the releasability of the information requested. We do have extensive statistical data in our computer systems, most of which is available to the public. We provide for the sale of copies of either printouts or tapes in accordance with Departmental regulations. . . .
A few other agencies reported what might be called “mixed effects” on the distribution of information as a result of computer efforts. A particularly frank reply came from the National Labor Relations Board (NLRB). The NLRB maintains most of its data as case-oriented legal files, kept in manual form. It does not produce overall statistics about industries, unions, or collective-bargaining trends, except in the “infrequent instances in which such information is offered as evidence in a specific case proceeding.” The Board does not have a computer of its own: it purchases computer services. These have been used principally to produce statistics about the Board’s own operations and for computer-
’
Memorandum from T. McDonald, Chief, Public Affairs Division, U.S. Coast Guard, March 26, 1973, enclosed in letter from John L. McGruder, Director of Management Systems, Department of Transportation, June 13, 1973. * Memorandum from John R. Provan, Associate Administrator for Administration, Federal Highway Administration, March 30, 1973, enclosed in letter from John L. McGruder, Director of Management Systems, Department of Transportation.
296
ALAN F. WESTIN
based photocomposition of NLRB decisions, in bound volumes. As to these, the N L R B reports: We would have to conclude that the effect of computers upon our operating statistics (Management Information Systems) has been mixed. A favorable example is that a system recently installed, using terminals connected to a commercial vendor’s time-sharing service, has given the Board members much more information than was ever available before on the performance of their staffs in processing cases and seems to have helped us improve the timeliness of our service to the public. Most of the information from that system is not directly available to the public. In our over-all, agency-wide production of casehandling statistics for purposes of budget review and managerial control, computer processing of reports previously compiled manually and with punch-cards has increased the productivity of our data processing staff. But, as a small agency without a computer of our own, we have encountered nearly chronic shared-computer service problems whose net effect has been that monthly operating reports (some of which are publicly available) are not infrequently issued later than under the prior, noncomputerized system. On the other hand, our use of shared time on computers of other agencies has shortened, by nearly half, the number of months which elapse between the end of a fiscal year and the release of the statistics contained in our Annual Report.
On the use of computerized photocomposition of Board decisions, the report concluded, “The technical problems we have encountered thus far have actually impeded the public availability of this large body of inforrnati~n.”~ Over-all, the NLRB felt that its statistical services, photocomposition program, and production of statistics requested by Congress (such as “detailed elapsed time statistics” for NLRB administrative proceedings) would be aided in the future by more effective computer usage. Such improvement “is, of course, our justification for the effort.” A similar response, from an agency with its own computers, came from the Federal Communications Commission (FCC). The FCC uses computers to record station reports on political broadcasting (as required by Congress); equal employment opportunities reports; financial and ownership information; technical service studies; radio-operator licenses: license and renewal notices; statistical reports on bureau work: and FCC payroll and personnel files. Though the report noted that “as a regulatory agency, the FCC has an open-door policy on information availability,” some of the computerized data it keeps are not available for public access, such as “highly sensitive financial data” and “internal administrative systems.” As a whole, the FCC reported, requests for information are met equally well from both the computerized and manual files. The Letter from Lee D. Vincent, Chief, Organization and Methods Branch, National Labor Relations Board, March 2. 1973.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
297
FCC response concluded that “there is no direct correlation here between computerization and access to information” and computerization “may not in itself make for any greater availability of information.”IO While one fifth of the federal agencies surveyed reported either noeffect or mixed-effects, four fifths of the agencies stated that their computerization of files subject to public access has aided compliance with freedom-of-information laws. In analyzing these replies, it should be noted that my survey had adopted a deliberately broad view of the “public,” listing five types of claimants to executive-branch information: *individuals or organizations seeking access to their own files or hose relating to claims they are pressing; omembers of the press seeking access for investigative purposes .Congressional committees requesting information; oscholars doing studies of government policies; .officials of business, public-interest, consumer or other civic groups looking into government policies and programs. This expansive listing of information-seekers helps explain why four fifths of the agency spokesmen believed they had improved compliance with freedom-of-information laws. Computers were being used to improve internal administrative operations or the agency’s public duties. As a result, the agencies were generally able to provide information more easily from large-scale administrative files on clients and subject area. They have more easily retrievable data about their own personnel. They have better management reports about agency operations and decisions. They can produce better statistical summaries about industries, procedures, or services under their jurisdiction. l1 These improvements in data l o Letter from Leonard Wienles, Chief, Ofice of Information, Federal Communications Commission, April 1 1 , 1973. Many of the agency responses featured examples of improvement in the furnishing of information to various groups where the provision of such data was the function of a given unit or bureau within the federal establishment, such as the Patent Ofice supplying patent information to persons paying the required fees; the National Technical Information Service of the Department of Commerce supplying indexes and copies of government-sponsored research reports; the Office of Education’s information system on research into schools and educational problems; the Social and Economic Statistics Administration’s preparation of data for general distribution; the Defense Department Documentation Center’s On-Line Retrieval System for research reports; or the Atomic Energy Commission’s computerized data base of bibliographic citations and indexes to world-wide literature on nuclear science and technology. There is no doubt that the improvement of such services helps the flow of better information to user groups in the public. However, since freedom-of-information issues involve securing data about the policies and practices executive agencies are following, improvement in providing information services per se is not basically relevant to my inquiry here.
298
ALAN F. WESTIN
production and handling were seen by the reporting agencies as producing better distribution of data to each of the five groups about which I had asked. As to individuals or organizations seeking access to their own files, agencies such as the Veterans Administration and Social Security Administration reported that requests from record-subjects about claims and benefits were being fulfilled much faster and more cheaply than in manual file procedures.I2 This was a direct goal of their computerization efforts. The Veterans Administration noted that veterans seeking access to information in their files generally apply to their local VA regional office. However, faster access is now possible through various fast-response computer systems “designed to handle veterans’ regional office inquiries. This is the computerized Beneficiary Index Records Locator System (BIRLS) at the Austin (Texas) Data Processing Center, which is interconnected with regional office teletype equipment.” Another VA system is a computer at the Philadelphia Data Processing Center that generates print-outs to veteran inquiries on their insurance premiums. loans and dividends. Still a third computerized system generates extracts used to answer inquiries about compensation, pensions, and GI education. l3 Most of the agencies replied that better statistical data and management reports had improved the quality and quantity of what they were able to provide to the press, either in the form of reports they published or in response to specific requests for information. For example, the Information Service of the Veterans Administration noted that “the wide range of statistics available (to the press) and the currency of such data is largely the result of computerization. The fact that an operating element within (VA) is able to respond to most requests for data, usually statistical, within a period of time that we and the press would characterize as ’reasonable’ is evidence enough that computerization has affected the availability of this type of information in a most positive way.”I4 Among groups with legal and political power to obtain information from executive agencies, Congressional committees rank at the top of the list. This was reflected in agency replies. Except in the relatively rare cases where there are issues of executive privilege, Congressional committees make tens of thousands of requests for information yearly to
’’ Letter from Louis Zawatzky, Acting Assistant Commissioner for Administration, Social Security Administration, March 21, 1973; letter from John J. Corcoran, General Counsel, Veterans Administration, June 21, 1973. In Memorandum from Department of Data Management (no author or date), enclosed in letter from John J . Corcoran, General Counsel, Veterans Administration. l4 Memorandum from Information Service, Veterans Administration (no author or date), enclosed in letter from John J . Corcoran, General Counsel, Veterans Administration.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
299
executive agencies (through appropriation and oversight proceedings and in direct inquiries on specific topics). Executive agencies have always spent considerable time answering these requests. The agencies reported that computerization had enabled them to improve these services to Congress, either because of better data and reports available to the agency’s management or because computer programming made it possible to extract the information the legislators sought more easily. For example, Congressional committees and the press have obtained better information about the status of contracts and grants processed by federal agencies as a result of computerized reporting systems employed by many agencies, such as the Law Enforcement Assistance Administration, U.S. Office of Education, and Department of Transportation.15The Department of Transportation explained 16: For example, in 1971, an individual who wished to analyze contents and direction of the DOT R & D program would have to communicate and obtain cooperation of perhaps 20 key departmental officials who in turn would have to mount an effort of record identification and processing of some 2000 active R & D work units (contracts), costing anywhere between $20-80K. Most likely, the entire process would have taken three to five months, at which time the requester would have received resumes of R & D contracts or evaluated tabulations. Today, the same individual, whether in DOT or an outsider, is able to get almost 100 percent of the required data at a cost of approximately $1000 and within a week or less. Even this can be improved by those individuals who have access to remote computer typewriters or video consoles by subscribing to TRIS on-line serivce at an initial cost of $300 (for hook-up and training) and a character of 75g per computer minute; they are able to have their questions answered in less than one hour, and at a cost of $20 to $30 per question.
Scholarly research requests are seen by many agencies as an area in which computerization has been particularly helpful. Once a data base has been automated, and assuming that the software programming is adequate for such purposes, a wide variety of special requests can be filled more quickly and at much less expense than before. As the Federal Aviation Administration noted, “studies of Government programs and policies, whether done by scholars, business or civic groups, or others, l5 Memorandum from Dean St. Dennis, Public Information Office, Law Enforcement Assistance Administration, enclosed in letter from Robert G . Dixon, Jr., Assistant Attorney General, Office of Legal Counsel, Department of Justice, July 6, 1973; letter from Patricia L. Cahn, Assistant Commissioner for Public Affairs, Office of Education, March 1, 1973; memorandum from Robert E. Parsons, Director, Office of R & D Plans and Resources, Department of Transportation, April 5, 1973, enclosed in letter from John L. McGruder, Director of Management Systems, Department of Transportation. Memorandum from Robert E. Parsons, Director, Office of R & D Plans and Resources, Department of Transportation.
300
ALAN F. WESTIN
generally depend upon using information in ’bulk’-hundreds and thousands of cases. These studies are more likely to be economically feasible because of computerization than without.” l7 Many replies gave examples of providing better information to business, public-interest, consumer, and other civic groups looking into government policies and programs when the queries concerned substantive matters (patents, social-science reports, scientific information) as opposed to information on government policies and programs. Other replies addressed this issue and cited improvements in statistical reports, management reports, grant-or-contract-information systems, and large-file searches. Frequently, one government agency provided data which regulated or public-interest groups could use to judge the adequacy of policies pursued by other agencies of the federal, state, or local governments. For example, the National Transportation Safety Board has automated files on aircraft accidents and incidents. This provided the data for a study of accident prevention and government safety standards. Citing many special studies and data it has furnished to groups involved in air safety, the Board concluded, “It would be virtually impossible to examine accident files of this magnitude and compile in-depth analyses of this nature without the use of computer technology.”18 It is noteworthy that the major public-access results of computerization are essentially by-products of the primary goals of improving data services to clients and management. Improving the production of information to other parties, such as the press or public-interest groups investigating government operations, was not a goal of the computerization. Several of the agency replies stated this expli~itly.’~ The replies also showed that effects of computerization vary considerably among units within a department. The Department of Transportation, for example, sent responses for 11 of its constituent bureaus, offices, and boards. These ranged from judgments that computerization was having no effect on public access (the U.S. Coast Guard, Federal Highway AdministraLetter from J. Meisel, Director of Management Systems, Federal Aviation Administration, April 2, 1973. IB Memorandum from Richard L. Spears, General Manager, National Transportation Safety Board, March 30, 1973, enclosed in letter from John L. McGruder, Director of Management Systems, Department of Transportation. The Veterans Administration wrote: “As can readily be understood, the primary purpose of accumulating and storing data in our computer-based systems is for the internal processing and operation of the many systems of benefits available to veterans and their beneficiaries,” letter from John J . Corcoran, Generai Counsel, Veterans Administration. See also letter from W. Fletcher Lutz, Director, Bureau of Accounts and Statistics, Civil Aeronautics Board, March 21, 1973.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
301
tion, and General Counsel’s Office) to estimates that it was greatly aiding responses to public requests (Office of the Secretary, Federal Aviation Administration, and Office for Environment, Safety, and Consumer Affairs).1° Similar diversity was reported in responses from units of the Veterans Administration, Department of Justice, and Department of Health, Education, and Welfare.ll Within such units, the effects of computerization also vary according to the kind of information requested (and, therefore, the files to be drawn on for producing replies). The U.S. Office of Education (OE), for example, reported that questions about its policies are “answered without significant recourse to computerized information.” Questions about OE personnel, program operations, and agency contracts and grants are answered somewhat more easily as a result of automation of files in those areas. Statistical studies on school populations and programs have been computerized, and “are thus much more quickly available.” The area most positively affected, OE reported, is information on educational research projects. OE developed the Educational Resources Information Center (ERIC), an automated system for making data on research results available nationwide.l* This variation in impact reflects the principle noted earlier: the most cost-effective use of computers is to automate the largest, most frequently used files, often files containing factual and statistical matter. Files containing lengthy narrative texts and those that are not often used (e.g., individual case files in a regulatory or lawenforcement agency) are not prime candidates for computerization and have not generally been automated. Public requests for information from a case-type file (on one individual or about one episode or investigation) were not often affected by computerization. Those would still be filed manually. Requests for large-scale data or for searches of large, automated individual case files to spot patterns or trends are significantly aided by automation. In terms of the federal Freedom of Information Act, the agency reports offered several important judgments about the effects of automation. 2o The full replies are contained with the letter from John L. McGruder, Director of Management Systems, Department of Transportation. 21 Letter from Robert G. Dixon, Jr., Assistant Attorney General, Office of Legal Counsel, Department of Justice; General Counsel, Veterans Administration; Assistant Commissioner for Public Affairs, Offce of Education; letter of Patricia L. Cahn, letter of Louis Zawatzky, Acting Assistant Commissioner for Administration, Social Security Administration; and letter of John F. Sherman, Acting Director, Public Health Service, National Institutes of Health, Department of Health, Education and Welfare, April 18, 1973. 22 Letter from Patricia1 L. Cahn, Assistant Commissioner for Public Affairs, Office of Education.
ALAN F. WESTIN
Computerization was not significant in making the determination whether a particular record, report, or file was to be supplied to someone outside the executive branch. That remained a legal question, governed by the federal statute, court rulings, and agency regulations. The law does not give any special status to information simply because it is stored in machine-readable rather than eye-readable form. But the agencies did report that computers were enabling them to comply more effectively with the provisions of access laws than had been possible before automation. Some agencies reported that computers were helping them separate material in a record or file that was available for public access from legally privileged material. The Commerce Department stated: “The use of computer methods unquestionably facilitates the separation and protection of information that is privileged under the stated exceptions under the Act. The suppression techniques available and used by this Department to prevent unwarranted disclosure of privileged information are far more effective than any methods that might be employed in a manual process.”2:1Similarly, the Labor Department said that “computerization has improved our capacity in this respect. Data identification and control procedures are more vigorously documented and adhered to as a result of automation. 24 Several other officials reported that automation was not having such an effect in their agencies. The Federal Aviation Administration commentedZ5: Computerization seems not to significantly affect our ability to screen out information not available for public inspection such as airmen medical records and detailed FAA personnel records. In both hard copy and computerized files procedural safeguards are necessary. The difference is largely the difference between estahlishing clerical procedural safeguards and computer program procedural safeguards. Our experience to date is not conclusive but it could he that carefully designed and tested computer screening may be more cffcctivc than human scrcening, subject as it is to human error.
A n d , the General Counsel of the Transportation Department observedZ6: While computeiization would make it easier to locate requested records, it would probably not change the capacity or ease with which information items 2 3 Letter from Joseph 0. Smiroldo, Acting Director, Office of Organization and Management Systems, Department of Commerce. March 30, 1973. 24 Letter from Frank S. Johnson, Jr., Director, Public Affairs, Department of Labor. March 28. 1973. 93 Letter from J . Meisel, Director of Management Systems, Federal Aviation Administration. ?(i Memorandum from John W. Barnum, General Counsel, April 2, 1973, enclosed in letter from John L. McGruder. Director of Management Systems, Department of Transportation.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
303
that are available for public inspection can be separated from those that are exempt under the Freedom of Information Act. After a requested record is located, it must still be reviewed and evaluated by persons competent to determine whether the document is exempt, whether it should be disclosed as a matter of policy even though exempt, or whether it should be released with appropriate deletions. Making such a determination as to each document before it is put into a computer system is obviously not feasible.
A second way in which some agencies believe computers have made compliance with the Freedom of Information Act more effective is in the capacity of computer systems to produce special lists, statistical reports, and surveys in direct answer to requests by public groups. These, according t o the agencies, could not have been produced with manual files because of the high clerical costs or because the time needed was longer than the inquiring party could allow. Finally, improvement in grant, contract, and licensing information is seen as a major step toward greater responsiveness. To whom government agencies award grants, contracts, and licenses is an active publicpolicy issue. The Law Enforcement Assistance Information (LEAA) Grant Management Information System, indexing some 30,000 grants and subgrants that have been awarded by LEAA since 1968 has been a boon to Congressional and public-interest groups studying the work of that agency. Furthermore, specialized publications prepared and distributed from the LEAA’s automated files-such as a volume with extensive details about each of the automated criminal justice information systems being funded, state by state, though the LEAA-has facilitated analysis by legal, civil liberties, and public-interest groups of the privacy and security aspects of the LEAA grants and the systems they support (Law Enforcement Assistance Administration, 1972).
2.2 Reports from the Information-Seekers
The most striking thing about the replies from the five groups of information-seekers is how often they agreed with the estimate of computer impact of federal executive officials, though they start from different interest positions and often make different judgments about how healthy the state of public access currently was. Congressional committee Counsel replies uniformly corroborated the picture presented by agency reports. Richard Sullivan, from the House Committee on Public Works, wrote: “This Committee and its Subcommittees have not experienced any difficulty in continuing to obtain needed information for the purpose of legislative oversight. The ability of the agencies to furnish detailed statistical information on short notice has
ALAN F. WESTIN been greatly facilitated as a result of the installation of computer systems. 27 A similar report came from Donald Knapp, Counsel to the House Committee on Veterans’ AffairszB: I am pleased to report that we have experienced no major difficulty with the computerization of Veterans Administration files and records. As a matter of fact, for the most part we have found computerization has expedited the collection of information needed by the Committee to make timely decisions concerning policy matters. As an example, I am enclosing a copy of a House Committee Print, 93rd Congress, which was programmed and computerized. In past years, the compilation of this information (relating to the operations of Veterans Administration hospital and medical programs) has required several months. During the past two Congresses, through computerization, we have been able to produce the report in about two weeks.
Lawrence Baskir, Counsel to the Senate Subcommittee on Constitutional Rights, stressed that the central problem in securing access to government data was persuading the department head that he should release the information. If he wanted it released, Baskir said his subcommittee obtained the data whether it was in manual or computerized storage. The heart of the issue was one of policy, a political question, and “this is not being determined today by the presence or absence of computerizati~n.”~~ The most extensive response from a Congressional counsel came from L. James Kronfeld. Because the Government Operations Subcommittee of the House Committee on Government Operations, for which he works, regularly investigates complaints from persons who have had difficulties in getting data or have been refused information by executive agencies, its experiences are among the broadest to be found in Congress. Kronfeld wrote30: What we have found is that the computerization of information does not necessarily have any bearing on its availability. Agencies make their decisions
’’ Letter from Richard J. Sullivan, Chief Counsel, House Committee on Public Works, April 30, 1973. 2R Letter from Donald C. Knapp, Counsel, House Committee on Veterans’ Affairs, May 3, 1973. 29 Telephone interview with Lawrence Baskir, Counsel, Senate Subcommittee on Constitutional Rights, May 8, 1973. Some of the telephone interviews conducted with information-seekers were made by my research assistant, Ms. Caryn Leland, whose help I gratefully acknowledge. 30 Letter from L. James Kronfeld, Special Counsel, Subcommittee on Foreign Operations and Government Information, January 16, 1973. For a detailed presentation of the additional capacities that computers give federal agencies to produce special lists from their computerized base files, see U.S. Congress (1972).
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
305
on availability based on the subject matter of the information rather than its form of storage. Of course, our recent hearings (the Moorhead Committee hearings on revision of the freedom of information law, held during 19721973) have shown serious deficiencies in these agency decisions. The main problem with computerization is the cost to the requester for receipt of the information. For instance, if the information were kept in paper files and a request is made for a specific piece of information which is readily available, it is relatively cheap for the agency to pull the file and make copies of the specific information. However, when a series of files, such as the record of payments under a specific subsidy program is computerized, the costs assessed by the agencies are quite high, as they are generally related to computer time costs. As an example, the most common cost for a copy of a tape is $62.00. Agency personnel have told me that it is much easier to supply a complete duplicate tape if a requester wants only a small part of the information on the tape rather than to pull and print that specific information. A duplicate tape, however, is only useful if the requester has print-out facilities. In cases where a print-out of the data is requested, the usual agency practice is to charge a certain amount for the time taken to locate the information on the tape, plus $5.00 per page for the print-out itself. The time charges are generally computed on the basis of the agency costs per hour for use of the computer. . . . Therefore, what would have been a minor charge for pulling and copying a paper file can be an expensive proposition if the material is on tape and a time-use and print-out charge is levied.
However, Mr. Kronfeld went on to note that the computer provided considerable savings when someone wants large bodies of data. “In the case where the requester wants the whole category of computerized information and has the facilities to process the tape, the cost savings can be substantial.” Mr. Kronfeld cited the production of computergenerated mailing lists for commercial advertisers and organizations as a prime example. Most of the other groups of information-seekers reported either no experience with computerized files or no special problems with them. Such replies came from the Consumer Federation of America and the National Consumers League; from public-interest law firms, such as the Institute for Public Interest Representation and the Citizens Communications Center for Responsive Media; from the National Capital Areas Chapter of the American Civil Liberties Union; from the freedom of information committees of the AP Broadcasters Association and Aviation/Space Writers Association; and from the Washington Office of the League of Women Voters. However, some of these information-seekers did have experiences with computerization. Among the public-interest law firms the most interesting comments came from two of Ralph Nader’s groups, the Center for the Study of Responsive Law and the Public Interest Research Group. Harrison Wellford and Ronald Plesser said they had had some experiences
306
ALAN F. WESTIN
in which computerization had made it possible to obtain information that would have been difficult if not impossible to get from the previous manual file. Their example was from the Securities and Exchange Commission (SEC). An SEC code provision says that anyone who is more than a 10-percent owner of shares in a listed corporation must file an ownership form with the SEC. This is a public document. However, it has always been tiled under the company name, not the individual’s name. When the Nader Congress project was under way in 1972, the researchers wanted to find out what stocks were owned by each Congressman and candidate for Congress in 1972. But, Mr. Plesser noted, “We would have had to go through 10,000 files to get the names.”31 However, the Nader group learned that the SEC had a name-access program for this ownership file, and asked to have a list compiled of each Congressman and candidate for Congress in 1972. At first, the SEC hesitated, on the ground that it might be an invasion of privacy. It could lead to use of their files for commercial mailings, political solicitations, etc. However, the Nader group persuaded the SEC to supply the list, and it aided the Nader Congress research greatly. On the other hand, the Nader associates cited several examples where files had been computerized, but the absence of a software program to produce the desired information prevented them from obtaining the data they needed. In one case, litigants in a damage suit against the manufacturer of a helicopter, suing after an air crash, wanted the maintenance background records for a particular ball bearing believed to have been defective, or defectively maintained. The Air Force keeps a computer log of these maintenance records, but its retrieval program is only for major parts, not minor ones like the ball bearing. The Air Force could print out all of the “nonmajor items,” but this would provide a print-out 300 sheets long. The issue was whether the Air Force would reprogram their computer system. When asked, they replied that the computer was busy 24 hours a day, 7 days a week. Any free time that developed was needed for system repair. Even if there was time available, the cost of reprogramming was estimated at several thousand dollars.32 Another example involved meat-inspection reports and pesticide data. The Center for Responsive Law won the right to inspect these at the Agriculture Department. Some of these data were computerized, but when the Center wanted information on a statewide basis, it was told 31 Letter from Harrison Wellford, January (no date) 1973; telephone interviews with Wellford and Ronald Plesser, March 1973. 5’L A computer expert has said that a tape file copy could have been produced and the desired information retrieved on another system for perhaps $1000.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
307
that it would require costly reprogramming which, Mr. Wellford observed, was “beyond our ability to pay.” The Nader group experience underscores the fact that more information of the kind sought by public-interest groups is potentially available in the computerized files than had been provided in manual records. But where the agency has not provided software programming t o extract what these groups want, it is not yet clear under interpretations of the Freedom of Information Act whether a demand can be made that such expensive reprogramming, often interfering with vital computer services, can be required of the government. If so, who must pay for it? Among the consumer representatives a typical statement of consumer outlook came from Benjamin Kass, a lawyer for consumer interests and formerly a staff member of two Congressional committees that specialized in supervising government information policies. Mr. Kass observed33: Whenever my consumer groups or people have ever needed information, we put in a request and get back volumes of microfiche or print-out copies. I don’t think computerized information presents a unique problem. If some bureaucrat doesn’t want to give information that should be public, it doesn’t matter what the nature of the information is-whether it is Computerized or not. Generally, it is easier to get information if it is computerized because it is harder to give C information and deny it to A or B. It all boils down to the attitude of the agency bureaucrats. If they don’t want to give it out, that’s what counts.
Further confirmation that computerized information is often useful but that groups have to fight to get it came from two leaders of women’s rights activities. Myra Barrer, of Women Today, related that she had been trying for some time to get access to computer print-out sheets showing the number of women employed in each federal agency at each grade level, at what salary, etc. She pressed for accesstat the individual agencies, and often got it, “after haggling,” but what she really wanted was the collection of agency computer print-out sheets kept by the Civil Service Commission and bound in one volume titled, Women in Government. According to Ms. Barrer, this book is “long past due.” In terms of its usefulness once it is “pried out of the agency,” she commented, “the computer print-outs have the advantage of being a lot easier to use for comparing year-to-year figures. Each agency in the print-out lists the supergrades and the women’s names for various years. All the information is in one place.”34
33
34
Telephone interview with Benjamin L. Kass, April 26, 1973. Telephone interview with Myra Barrer, April 20, 1973.
308
ALAN F. WESTIN
A similar appraisal of the value of computerized data came from KO Kimbel of the Women’s Training and Resource Corporation, in Portland, Maine. The Corporation applied for a loan from the Small Business Administration (SBA) but encountered “all kinds of snags and difficulties that we suspected were because we were women. So we decided to find out what percentage of total moneys and services were given to women by the SBA.” When they contacted,the Director of SBA, he said that the information was not available. By going to Congressman William Hathaway, who asked for these data from SBA, got them, and turned them over to Ms. Kimbel, she was able to secure a computer print-out containing a monthly breakdown of the moneys given as loans to various businesses, and to determine which of these were run by women. It turned out to be less than half of one percent. SBA officials said that the print-out was not “reliable,” but Ms. Kimbel noted that it came from the Reports Management Division of SBA, which “is regarded as a very reliable source.”35 The SBA data was released to the press and various women’s groups, and served as highly useful ammunition with which to attack equality policies at SBA. One response from each of two other categories of information-seeker indicate that these groups have the same reaction as those already discussed. Sam Archibald, formerly Staff Director of the House Committee on Government Operations and now Washington Director of the Freedom of Information Center wrote these It is my impression that computerized records are harder to obtain, although easier to amass. The government collects information in an easily retrievable form but is reluctant to regurgitate that information because it might, just might, contain more facts than the requester asked for or the government administrator wants to divulge. There have been instances of absurd restrictions placed on the search and retrieval of computerized government records and of absurd search and copying charges levied by government agencies. But I do not have at hand the details of those records. My material, unfortunately, is not computerized.
Mr. Archibald’s commentary, it seems to me, coincides with that of the Nader groups in emphasizing the cost factors in obtaining search and print-out access to large computerized data bases. Finally, let me quote from a letter received from James Ridgeway, a leading investigative writer and book author, now an editor of Ramparts magazine: “I have had considerable difficulty obtaining information of a
35
36
Telephone interview with KO Kimbel, April 20, 1973. Letter from Samual J. Archibald, February 21, 1973.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
309
nonsecurity nature from the federal government, but not because of any mechanical reason, such as computers. These reasons all pertain to policy. In certain agencies the government hinders dissemination of information by charging for documents, some of which are computer printouts. "3' Following are the two major conclusions suggested by the informationseekers who responded to my inquiry: 1) The strongest agreement that computerization has improved the usefulness and availability of information to be obtained from executive agencies comes from Congressional committee counsel. The willingness of executive agencies to furnish this without cost to legislative committees, and to suppress arguments about searches taking too much of agency personnel time, help to explain why the legislative counsel have such a generally untroubled perspective on cost and inconvenience. 2) Among nongovernment information-seekers, computerized records have often been found to be extremely useful, providing material and services that were not previously available. The private seekers agreed with executive-agency officials that the most important issues are ones of policy (is the information open to public inspection o r not?) not of the particular form of information storage and retrieval. Antisecrecy forces see agency officials extending secrecy-oriented approaches to the computerized files. The private information-seekers pointed to problems of cost in certain requests for information, especially whether agencies are developing access programs that serve public disclosure needs as well as managerial ones.
3. An Analysis of the Access Situation
So far, I have reported on and summarized the replies to my survey of information-holders and information-seekers. Now, drawing on these materials but adding my own experiences in studying the effects of computerization in organizations, let me present my analysis of what has been happening, and what it means to public access. First, computerization has unfolded as a process in which organizational managers decide what they want to do with this new technology to carry out their missions more efficiently. It is in pursuit of this organizational efficiency goal that decisions have been made whether computers should be installed, which files to automate, what "better services" t o pursue through software programming, what program Letter from James Ridgeway, December 23, 1972
31 0
ALAN F. WESTIN
evaluation data or policy-planning data to seek from the computerized data bases, and what reports should be generated to management. Legislative, public, and judicial pressures in the past few years have elevated the issues of privacy to the level where they now command some executive attention in computerization plans and procedures, and there has been some legislative examination of the cost-justifications and costeffectiveness for agency automation. But virtually no attention has been paid by the managers of organizations to the public-access issues in their computer decisions, nor have these issues been put before the managers by the usual public-interest and legislative-watchdog forces. No Congressional hearings have yet focused on these matters, no Nader group has taken up this question as its prime concern, no group of computer professionals has accepted this as an obligation of their civic and professional duties, and no judicial rulings have dealt with the issues of access presented by computerization. In short, the major decisions about the use of computer technology, in terms of whose interests are to be served by the development of this powerful new tool, have been entirely in the hands of executive-agency officials. Second, even though the reports of agency officials and the experiences of many information-seekers indicate that there have been some increases in availability and usefulness of information as a result of computerization, these have been basically serendipitous effects. That is, they have been fall-outs from better control and utilization of information. To the extent that these techniques, designed to help agencies use information for their own purposes, have helped information-seekers, it has not been because the agencies have set out to provide better services to the press, scholars, public-interest groups, or government critics. This point needs to be understood, not because it brands the officials of agencies as deliberate culprits but because it indicates that the forces seeking to improve the public’s access to information have not begun to appreciate what computer technology might do-with the proper inputs to agency computerization plans and procedures-if public-interest spokesmen had some say in how the public’s money is spent on this costly technology. Third, this failure to realize what is at stake, to force the development of new laws, procedures, and institutions to bring public-interest groups into the computerization process, to insure better public access to government data, has taken place at a distressingly critical time in the expansion of computer technology in government. In the late 1960s and early 1970s, third-generation computer systems have been installed in the major government agencies and departments. Basic decisions about the architecture of the systems have been made, money has been allotted for the vital software programming that will control how the files are used,
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
311
new networks for information exchange and dissemination have been created, and the technical priesthood to manage these systems has been assembled and invested with important authority to control the machines. While computerization is basically an ongoing process, and many things can be done to modify or alter the systems, it is very difficult, very costly, and very disruptive of organizational functions to attempt major changes once they have been designed and programmed in a particular way. Furthermore, developments in computer technology unfolding in the 1970s threaten to make the problem more acute. For example, the proliferation of minicomputers means that government agencies will be able to place many “small” files of information on relatively inexpensive machines. This will make it harder to keep track of where information is stored and how it is used. In the first 15 years of computerization in government, I do not believe the record reveals a significant lessening of public access to government information as a result of automation. What it does reveal is a case of lost opportunities and of potentially great danger in the future. We have not appreciated how to bring public participation into the computerization decisions, and we have failed to develop the standards and procedures that will prevent computerization from upsetting the desired balances between public access and government secrecy. 4. Recommendations for Action
I see four main objectives in the effort to bring computer technology under greater public control: 1) Create a public right to effective participation in the decisions of government agencies regarding computerization. It is also necessary to develop the groups, institutions, and procedures that are able to take advantage of this right and to provide the financial support needed. There is a special need for computer professionals, both individually and in professional. societies to come forward as independent expert witnesses in these proceedings. 2 ) Legislate at all levels of government a right of access by the individual to records kept about him in government agencies, as the Federal Privacy Act of 1974 did for federal-agency files and as nine states have done (as of 1977) for state files. This right would cover all but very small groups of government files for which secrecy is justified. It would include provisions for giving citizens an easily obtainable guide to the individually identified records kept by government agencies. There would
31 2
ALAN F. WESTIN
also be a right to challenge the continued retention of inaccurate, incomplete, or misleading information. These rights are essential if the right of the public to know what government is doing is to extend beyond access only by representative groups. It must also be secured for millions of individuals who have lost confidence that government agencies-in social services, taxation, health, law enforcement, etc.-always collect proper and accurate information. For these citizens, only a direct and personal right of inspection in their own record will satisfy the access principle in an age of large-scale data banks.38 3) Identify and develop techniques to make computer systems help the cause of public information as well as agency operations, such as requiring publicly available indexes to file content and software programs, instituting requirements to develop the software necessary to produce information that is in a data base but is not retrievable, and requiring audit-trails that record the uses made of information in files. Again, providing the funds to finance such operations is critical. 4) Pay attention to the effect of antisecrecy actions on the rights of personal privacy of persons and groups about whom there is sensitive information in files. Develop the exceptions and special procedures that will not make these civil-liberty interests the victim of the need for public disclosure. Protection of the government’s need for temporary privacy for its decision-making processes should be incorporated in the new policies and procedures. It will take considerable effort to define and implement such a program, and I hope to conduct an up-date study in 1979-80 to see how further federal automation and the 1974 FOI Act amendments have affected public access. But some actions seem needed now. A major first step is to create a duty in executive agencies to give specific consideration to the improvement of public access in their plans for and operations of automatic data processing (ADP), with a review of the adequacy of their compliance vested in a legislative committee or legislatively responsible agency or commission. An excellent model for this provision can be found in a bill sponsored in 1974 by Senator Lee Metcalf, S.770, to establish an Intergovernmental Office of Consumers’ Counsel. One purpose of the Metcalf bill is “to improve methods for obtaining and disseminating information with respect to the operations of regulated companies of interest to the Federal For a discussion of how the Federal Privacy Act has operated since its institution in 1975, see the Privacy Protection Study Commission (1977).
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
313
Government and other consumers. . . .” To make sure that this purpose is served, the Metcalf bill provides3? Sec. 301 Automatic Data Processing (a) Federal agencies are hereby authorized and directed to make full use of automatic data processing in preparing the information required under this Act and other Acts to which they are subject, to the end that the Counsel, the Congress, and the public shall receive information in a timely and understandable manner. Federal agencies are hereby directed to include, in their annual reports, accounts of their progress toward full use of automatic data processing. (b) The Comptroller General of the United States shall review the activities of the Federal agencies in compliance with this section, and shall report to the Congress the results of such compliance, and where full use of automatic data processing is not being made in the preparation of such information he shall recommend those actions which should be taken to achieve such full use of automatic data processing techniques and equipment.
While the Metcalf-suggested provision is there to create a particular office affecting consumers, I see no reason why such a provision should not be written directly into the amendments to the Freedom of Information Act. It would need Congressional funds expressly authorized for its implementation, perhaps a set percentage of the cost of each computer system (5 percent) assigned to public-access facilities. The second legislative action needed is the requirement of a public proceeding whenever a new computer system is installed by a government agency or a major expansion of an existing system is contemplated. Notice of these hearings should be published in the Federal Register and opportunities for submissions and appearances should be available to interested parties or groups. The agency would have to present, in addition to other items, detailed privacy-impact and public-access impact statements. The record of the hearing and an official response to objections and protests submitted by the head of the agency should be filed with an appropriate Congressional committee, perhaps the committee with subject-matter authority or appropriation authority over the agency’s function, or perhaps a committee like Government Operations. This underscores a dilemma that deserves far more sustained attention than it has received from civil libertarians and democrats: to deal effectively with issues of computers, privacy, and public access means creating working groups of persons with legal, technological, and socialscience skills, to provide the blend of expertise needed to examine the 39 The Metcalf bill is reprinted in Congressional Record, February 6, 1973, pp. S2112-16, with the ADP provision p. 21 15.
314
ALAN F. WESTIN
computerization proposals of executive agencies and to inspect their actual operations. While there have been some still-born early effortslike the attempt by Ralph Nader to enlist volunteers from the Washington area Association for Computing Machinery in projects to use computer resources to get consumer information directly to shoppers, and to distribute consumer-agency violation data on business establishments directly to buyers-these have been few and have not had much of an influence. There are many groups that could be enlisted, if the proper organizational format and funding could be supplied. Again, there exists precedent to draw on for this idea. The Federal Power Commission (FPC), in April 1972, issued a notice that it proposed to create a “Fully Automated Computer Regulatory Information System,“ as an aid in the FPC’s basic responsibilities under the Federal Power Act. The general plan for the system was described, in considerable detail, and interested parties were apprised of their opportunity to appear and comment on the proposed system. When the proposal was being considered for adoption, various parties filed letters, briefs, and comments with FPC. When I examined the docket entries indicating who had filed such comments, I found a long parade of power companies, oil companies, and state public-service commissions, but no public-interest law firms, spokesmen for the press, or other guardian groups for public access. This suggests that the excellent procedure that the FPC is following will not do much for public access (or for privacy) if there are no organizations, financially and technically capable, ready to speak for these values.40 5. The Future of Information Technology and Democractic Government
Thomas. Jefferson, the leading apostle of a free press and open government, was also a devoted enthusiast of science and technology. While he was drafting the First Amendment, composing Virginia’s statute on religious liberty, and fighting Chief Justice John Marshall’s use of judicial power to enhance the rights of property holders, Jefferson was also involved in what eighteenth-century writers called “useful invention,” the application of science to liberate mankind from enslavement to nature 40 For the FPC notice, see Docket N o . R-438, “Development of a Fully Automated Computer Regulatory System-Revisions in Title 18, Code of Federal Regulations, “Notice of a Proposed Rulemaking and Request for Comments,” April 13, 1972; letter from William I.. Webb. Director of Public Information, FPC, March 2, 1973; Docket Entries for No. R438, through April 16, 1973.
RIGHT OF ACCESS TO GOVERNMENT INFORMATION
315
and endless physical labor. He mastered architecture and designed and built Monticello. He invented what he called the “polygraph,” an instrument that attached one pen to another, so that a written copy could be made automatically by a person when writing a letter or document. For Jefferson, science was not an enemy of liberty; it was an ally, and even though he had read deeply about the uses of science to develop new instruments of warfare, he rested his faith in the belief that the rational spirit, in science as well as in republican government, offered man his best chance for progress. We need the Jeffersonian spirit today, more than ever. Technology has grown increasingly powerful, and the struggle over who will control and use it has grown more intense. Were a Jefferson to return to the United States today, I believe he would be fascinated by computers, by the way that automation of clerical functions had replaced armies of petty clerks and minor officialdom. But after his wonder had subsided, he would ask: For whose benefit are these new tools being used? Who controls them? Are they on the side of civil liberty or of arbitrary authority? We owe it to the Jeffersonian heritage to develop the necessary technical information and skills to be able to monitor this new technology, within the tradition of open government and protection of personal privacy that Jefferson would have insisted upon for the preservation of a free society. The skilled and persuasive advocates for applying computers to serve organizational goals have, so far, dominated serious discussion. It is time that those concerned with access to data, by the individual and by representatives of an informed public, also be heard-in the kind of adversary process from which wise public policies have traditionally emerged.
REFERENCES American Civil Liberties Union (1975). “Your Right to Government Information: How to Use the FOIA.” ACLU, New York, New York. Law Enforcement Assistance Administration (1972). “Grant Management Information System,” LEAA, Information Systems Division, Office of Operations Support. U.S. Gov. Print. Off., Washington, D.C. Privacy Protection Study Commission (1977). “Personal Privacy in an Information Society: The Report of the Privacy Study Commission.” U . S . Gov. Print. Off., Washington, D.C. U.S. Congress (1964). “Availability of Information from Federal Departments and Agencies,” Bibliography, House Committee on Government Operations, January. U.S. Gov. Print. Off., Washington, D.C. U.S. Congress (1972). “Sale or Distribution of Mailing Lists by Federal Agencies,” Hearings, House Subcommittee on Government Operations, June 13 and 15. U.S. Gov. Print. Off., Washington, D.C.
31 6
ALAN F. WESTIN
Wellford, H. (1974). Rights of people: The Freedom of Information Act. I n “None of Your Business: Government Secrecy in America” ( N . Dorsen and S. Gillers, eds.), pp. 195216. Viking Press, New York. Westin, A. F. (1967). “Privacy and Freedom.” Atheneum, New York. Westin, A. F. (1974). The technology of secrecy. In “None of Your Business: Government Secrecy in America” (N.Dorsen and S. Gillers, eds.), pp. 288-323. Viking Press, New York. Westin, A. F., and Baker, M. A. (1972). “Data Banks in a Free Society.” Atheneum, New York.
Author Index
A
Breed, D. J., 233,280 Brinch Hansen, P., 187,217 Brook, R. H., 154, 159 Bross, I. D. J., 98, 134, 159 Brown, D. T., 259,279 Brown, G., 3, 8, 72,87 Brown, J. S., 71, 76,86 Bruce, B., 3, 8, 72,87 Burke, E. R., 247, 248, 250,282 Burton, R.,71, 81,86
Abramson, N., 165, 166,216 Akkoyunlo, E., 187,216 Almasi, G. S., 240,281 Alsberg, P. A., 206,216 Anderson, B., 98, 134,159 Anderson, R. R., 207,216 Arai, Y.,240,282 Archer, J. L., 224, 239,279 Aronofsky, J., 165,217 Ash, W., 80, 81, 82,86 Ashenhurst, R. A., 166,216
C B Babic, G. A., 174, 176, 203, 204, 207, 209, 213, 214,216, 219 Backer, P. O., 46.86 Bailey, P. T., 248, 249, 250, 251,281 Baker, M.A., 290.316 Barber, D. L. A., 165, 167,217 Baskett, F., 206,216 Bates, M.,3, 8, 72.87 Baum, R. I., 277,279 Beaulieu, T. J . , 242,281 Beausoleil, W. F., 259,279 Belady, L. A., 257,280 Benoit, J. W., 195,217 Bernstein, A., 187,216 Black, B. A., 207,221 Bloomfield, L., 96, 159 Bobeck, A. H., 224, 228,230, 231, 232, 234, 235, 237, 239, 248, 252,280 Bobrow, D. G., 10,86 Bobrow, R., 76, 80. 81, 82,86 Bohnert, H. G., 46,86 ,Bonyhard, P. I., 224, 228, 230, 234, 235, 237, 239, 260,280 Boggs, D. R., 166,219 Bookchin, B., 120, 122, 160
Cdhoun, B. A., 242,280 Carey, R., 225,280 Carnap, R., 5 , 15, 86 Cam, C. S., 203,216 Carson, J. H., 164,218 Carsten, R. T., 207,216 Cerf, V. G., 203,216 Chandy, K. M.,206,216 Chang, H., 224, 225,235,240,242,243,251, 253,256,261, 262,275,276,277,278,280, 281, 282 Chapman, D. W., 242,281 Charniak, E., 98, 159 Chen, T. C., 251, 253, 261, 266, 271. 274, 278,280, 281, 282 Chen, T. T., 275,280 Cheng, T. T., 206,216 Chomsky, N., 9.86, 96,97, I59 Church, J . D.,218 Coffmann, E. G., Jr., 257,281 Cohen, M.S . , 225, 242, 262,280, 281 Coker, C. H., 173,216 Combs, R., 165,216 Cook, C., 3, 8, 72,87 Coppersmith, D., 266,282 Cosell, B. P., 184, 195,216 Cox, G. M.,224, 237,281
31 7
AUTHOR INDEX
31 8 Crocker, S. D., 203,216 Crowther, W. R., 166, 147,218
D Damerau, F. J., 90,159 Danylchuk, I., 235,280 Davies, D. W., 165, 167,217 Day, J . D., 206,216 Della Tome, E., 224, 230, 23 I , 232,280 Denning, P. J., 257,281 Duenki, A., 196,220
E Eggenberger, J. S., 242,280 Ellis, C. A , , 206,217 Elovitz, H. S., 217 Enslow, P. H., 165, 196,217 Enz, V., 224,281 Eswaran, K. P., 206,217, 271, 274,280
F Farber, D. J., 172, 217 Farmer, W. D., 171,217 Feldmqn, J., 172,217 Fitzpatrick, E., 120, 159 Forsdick, H. C., 184,217 Franich, D. M., 242,281 Fraser, A. G., 166, 169, 171,217 Friedman, J., 91, 162 Fuchs, E., 170,217 Furuichi, S., 240,282
Goodman, N .,220 Graf-Webster, E., 195,217 Gray, J. N., 206,217 Gray, T. E., 196,217 Green, S., 56, 74,815 Greenberger, M., 165,217 Grignetti, M., 80, 81, 82,86 Grishman, R., 119, 120, 122, 132, 134, 145, 152, 159, 160
H Hafner, E. R., 169, 182, 217 Hall, B., 91, 162 Hams, Z. S., 91, 96,97, 98, 116, 160 Hartley, A., 80, 81, 82,86 Hattenda, T., 240, 209,282 Hayes, J. F., 207,216, 217.218 Heafner, J . F., 203,217 Heart, F. E., 166, 167,218 Heinrich, F. R., 172,217 Heitmeyer, C. L., 217 Henderson, D. A , , 199, 220 Henry, G. R., 242,281 Hill, D. R., 90, 160 Hirschman, L., 134, 145, 152,160, 161 Ho, C. P., 242.281 Hobbs. J., 119, 132,160 Hopwood, M. D., 172,217 Housel, B. C., 206,220 Hsiao, D. K., 277,279 Hu, H.L., 242,280
I Insolio, C., 149, 152, 160, 161 Isaac, E. D., 225,280
G
J Garey, M. R., 251,280 Gecsei, J., 257,281 George, P. K., 228, 237,281 Gergis, I. S., 228,237, 275,280 Geusic, J. E., 224, 230, 234, 235, 239,280 Gibson, D. H., 258,281
Jackson, J . R., 217 Jackson, P. E., 170,218 Jensen, E. D., 164,218 Jespersen, O . , 97, 160 Johnson, P. R.. 184, 195,216
AUTHOR INDEX Joos, M., 96, 160 Josselson, H. H., 90, 160 Juliussen, J. E., 224, 237,281
31 9
Loomis, D., 172,217 Lone, R. A., 206,217 Lum, V. Y.,206,220, 271, 274, 278,280 Lyman, M., 154, 161
K Kahn, R. E., 165, 166, 167,218 Kaplan, R. M., 3, 6, 8, 55,87, 91, 162 Kaye, A. R., 207,218 Keefe, G. E., 240,281 Keyser, S. J., 91, 160 Kimbleton, S. R., 165, 184,218 Kinoshita, K., 250,281 Kleinrock, L., 206,218 Klovstad, J., 3, 8, 72,87 Knuth, D. E., 270,281 Kobayashi, H., 207,218 Kobayashi, T., 228, 237,281 Kohara, H., 252,282 Konheim, A. G., 207,218 Kooy, C., 224,281 Korein, J., 136, 160 Kropfl, W. J., 173,218 Kuno, S., 91, 160 Kuo, F. F., 165,216
L Labetoulle, J . , 207, 218 Lancaster, F. W., 91, 160 Larson, K. C., 172,217 Lee, D. M., 224, 237,281 Lee, S. Y . , 243,25 1, 253,256,276,277,280, 281 Lehman, W., 90, 160 Lemon, W. J., 166, 218 Levin, K. D., 206,219 Levine, P. H., 187, 192, 206,218 Liebowitz, B. H., 164,218 Lin, Y. S., 240,281 Lindsay, P. H., 90, 161 Lipsky, L., 218 Liu, M. T., 166, 174, 176, 177, 180, 185, 200, 202, 203, 204,207,209,210, 213, 214, 216, 218,219, 220 Logmans, H., 233,280
M Ma, A. L., 174, 200, 202, 218 McDermott, W., 154, 161 McKenney, J . L., 165, 217 McQuillan, J . M., 165, 168,219 Maegawa, H., 240,282 Makhoul, J., 3, 8, 72,87 Malhortra, R., 185,219 Malman, J. H., 184, 195,216 Mandell, R. L . , 184, 218 Manning, E. G., 173, 206, 207,218, 219 Massy, W. F., 165,217 Matsuda, J., 250,281 Mattson, R. I., 257,281 Meister, B., 207, 218 Metcalfe, R. M., 166, 185, 203, 217, 219 Mills, D. L., 206, 219 Minnick, R. C., 248, 249, 250, 251,281 Moore, D. H., 183,219 Morgan, H. L., 206,219 Morrow, R. H., 244, 245,281 Mullery, A. P., 206,219 Muntz, R. R., 206,216 Murphy, D. P., 10.86 Myers, W., 225, 239,281
N Nash-Webber, B. L., 3, 6, 8, 17, 55, 71, 72, 86, 87, 91,162 (see, also, Webber, B. L., 87) Nelson, T. J., 260,280 Nenadal, Z., 182, 217 Neuhold, E., 206, 220 Newhall, E. E., 170,171,207,216, 217,219, 22 I Nielsen, J. W., 235, 281 Nigram, A., 278,280 Norman, D. A., 90, 161 North, J . C., 236, 282
320
AUTHOR INDEX
0 O'Dell, T. H., 224, 281 Oefinger, T. R., 275,280 Oettinger, A. G., 91, 160 Oh, Y.,174, 180, 181,219 Okada, M., 240,282 Omstein, S . M.,166, 167,218 Otten, K. W., 90,I61
P Palacios, F. G., 206, 216 Pardo, R., 174, 176. 203, 204, 207, 209, 2 13, 214,216, 219 Partee, B. H., 76,87 Passatiume, J. J., 206,219 Peebles, R. W., 173, 206, 207,218,219 Perneski, A. J., 244, 245,281 Petrick, S. R., 91, 160, 161 Phelps, D. F., 259,279 Pierce, J. R., 173,219 Pierce, R. A , , 183, 219 Plath, W. J., 91, 161 Pohm, A. V., 233,282 Posner, M. J. M., 207,216 Postel, J. B., 203, 217 Pouzin, L., 165,219 Pugh, E. W., 233,282
Raze, C., 119, 120, 122, 130, 143, 160, 161 Reames, C. C., 166, 174, 176, 177, 185,210, 219, 220 Reeker, L. H., 90, 161 Reiter, R., 17, 5 5 , 71,86 Retz, D. L., 184, 185,220 Roberts, L. G., 165,220 Robillard, P. N., 207,220 Robinson, J. A., 55,87 Robinson, J . J., 81, 82,87 Robinson, R. A., 184,220 Rosenthal, R., 195,220 Rosier, I., 242, 280
Rosier, L. L., 242.280, 281 Rossol, F. C., 235,280 Rothnie, J. B., 220 Rowe, L. A., 172,217
S Sager, N., 100, 101, 116, 119, 120, 122, 126, 134, 149, 152, 154, 159, 160. 161 Saito, M.,240,282 Salkoff, M., 126,161 Sandfort, R. M.,247, 248, 249, 250, 251, 281, 282 Sapir, E., 96. 161 Sasao, T., 250,281 Schacter, P., 76,87 Schafer, B. W., 184,220 Schank, R., 90,161 Schantz, R. E., 184, 187, 195,216 Schicker, P., 196,220 Schneider, G. M., 165,218 Schwartz, M.,165, 168,220 Schwartz, R., 3, 8, 72.87 Scovil, H.E. D., 248, 252,280 Semon, N. S., 248, 250, 251,281 Semon, W. L., 248, 249, 250, 251,281 Shapiro, A., 98, 159 Sherman, D. N., 207, 209.216, 218 Shew, L. F., 242,280, 281 Shockley, W., 252,280 Shu, N. C., 206,220 Simmons, R. F., 4,87, 91,161 Slonczewski, 1.. 242,280 Slutz, D. R., 257,281 Smith, A. B., 224,282 Smith, J. L., 228, 237,280 Spragins, J. D., 207,220 Stankovic, J., 164,220 Steward, E. H., 170, 220 Stockwell, R. P., 76, 87 Stonebraker, M., 206,220 Strauss, W., 235,280 Stubbs, C. D., 170,218 Sussman, J., 1'84, 195,2/6 Swanson, P., 224, 236, 239,282
AUTHOR INDEX
T Takahashi, K., 252,282 Takasu, M.,240,282 Teitelman, W., 10,86 Thomas, R. H., 184, 195, 199, 206,216, 220 Thompson, B. H., 90,162 Thompson, F. B., 90,162 Timlinson, R. T., I%, 221 Traiger, I. L., 206,217, 257,281 Tschanz, M.,217 Tsubaya, I., 240,282 Tung, C., 253, 261, 266, 271, 274, 278,280, 281, 282
V van Dam, A., 164,220 van de Heijden, A. M. J., 233,280 Venetsanopoulos, A. N., 170, 207,219, 221 Voegeli, O., 242,280 Voermans, W. T., 233,280 Vonderrohe, R. H., 166,216
321
Watkins, S. W.,195,220 Wecker, S.,185, 206,219, 221 Weller, D. R., 171,221 Wellford, H., 287,316 Wessler, B. D., 165,220 West, L. P., 169,221 Westin, A. F., 283, 284, 285, 290,316 White, J. E., 199, 200,221 Wilks, Y., 98, 159 Williams, R. P., 248, 250, 251,282 Winograd, T., 74, 81,87, 90,162 Wolf, J., 3, 8, 72,87 Wolfe, R., 236,282 Wong, C. K., 266,282 Wood, D. C., 165,221 Woods, W. A., 3,4, 5, 6, 7, 8, 9, 55, 65, 67, 69, 72, 74,87, 91,162
Y Yamagishi, K.,240,282 Yamaguchi, N., 240,282 Ypma, J. E., 224, 236, 239,282 Yuen, M. L. T., 207,221
W Walden, D. C., 165, 166, 167, 168, 184, 187, 195,217, 218, 219, 221 Walker, D. E., 90, 91, 162 Waltz, D. J., 90, 91, 162
Zafiropulo, P., 182,221 Zue, V., 3, 8, 72,87 Zwicky, A., 91, 162
Subject Index
A Adjunct strings, 116 Air Force, U.S., 306 Airlines flight schedules, gedanken system and, 4-5 Alternative interpretations. plausibility of, 70 American Civil Liberties Union, 305 Anaphonic reference, in LUNAR system, 70-71 AND-OR logic, in magnetic bubble logic, 247 ARPANET network, 166, 195 A1" grammar, see Augmented transition network grammar Augmented transition network grammar, 3 answering questions about, 6-8 grammar model of, 9 path enumeration in, 56 semantic interpretation system and, 5 Automatic data processing, 312 Automatic information formatting, 92-98
B Bubble memory characteristics, 238-239 see also Magnetic bubble logic; Magnetic bubble memory Bubble shift register memory, 228 Beneficiary Index Records Location System, 298
c Center for the Study of Responsive Law, 305-306 Centralized-control loop network, 169 Character manipulation, in magnetic bubble memory, 253- 256 Citizen Communications Center for Responsive Media, 305 Communication, man-machine, 2- 3
Computer input, on information holders and seekers, 292- 309 Computerization, organizational efficiency of, 309-310 Computer networks, distributed loop, s m Distributed loop computer networks Computer program linguistic routines in, 130- 131 representation of grammar in, I19 Computer programs for information formatting.115- 151 linguistic framework in, 116- 117 Computer routines English transformations in, 131- 136 formatting transformations in, 134- 135 performance of parsing-formatting components in, 124 Computer technology developments in, 311 public control of, 311-314 Computer usage, organizational impact of on government, 289- 292 Conjunctive strings, 116- 117 Constituents, multiple uses of, 68-69 Consumer Federation of America, 305
D Decoding, defined, 90 DEFAULT template, 61 defined, 164- 165 Democracy, information technology and, 3 14- 3 15 Digitalis sublanguage grammar, "classes" in, 100 Distributed data base system, 202-206 Distributed loop computer networks, 163183,193-195 centralized control loop network and, 169-171 distributed frequency system and, 199200 extensions to loop interface, 182
322
SUBJECT INDEX generalized data transfer in, 192- 193 loop interface design in, 178- 183 message formats in, 169- 183 message transmission protocols in, 169193 network command language in, 1%- 199 network operating system design in, 183195 Newhall type loop networks in, 171-173 performance studies in, 206-214 Pierce-type loop networks in, 173 schematic diagram of, 167 user-access and network users of, 195206 Distributed loop operating system, 184- 185 Distributed processing systems, 164- 165 Distributed loop computer network loop, 173- 177 DLDBS, sce Distributed loop data base system, 203-206 DLOS, see Distributed loop operating system DRULE, for wh-determiners, 49-50 Dynamic and static deflections, 249-250
E Education Resources Information Center, 301 Ellipsis, in natural language question answering,69- 70 English transformations transformational decomposition in, 13 1134 transformational regularization in, 133134 ETHERNET network, 166 Extension, vs. intension, 15
F Fact, vs. “meta-fact,” 102-103 Fact retrieval, in natural language information formatting, 152- 154 Federal Aviation Administration, 299, 302 Federal Bureau of Investigation, 294 Federal Communications Commission, 296 Federal Highway Administration, 295, 301 Federal information holders, computer impact on, 294-303
323
Federal Power Commission, 314 Federal Privacy Act (1974). 3 11 Federal Register, 313 Flight schedules question-answering system, 3 FORMAT, time representation in, 145- 146 Formatting transformations, examples of, 142- 147 FORTRAN, LSP system in, 120-121 Four-state ladder, in magnetic bubble logic, 264 Freedom of Information Act, 285-286, 301303, 307, 313 Functional nesting, quantifier removal and, 39-41
G Global routines, 127 Global rule tree, 59 Government information, reports from information-seekers about, 303-309 Government information computers, access to, 283-315 Government secrecy information technology and, 283-292 precomputer setting of, 284-289 Grammar, representations of, 119- 120
H HANET aetwork, 166 House Committee on Government Operation, 304 House Committee on Public Works, 303304 House Committee on Veterans Affairs, 304 HWIM (“Hear What I Mean”), TRIPSIS system and, 8 HWIM speech understanding system, 72-73
I Idler bubble trap and, 245 in magnetic bubble information tracks, 244 Inference intensional and extensional, 15, 73-75
324
SUBJECT INDEX
Information-formatting automatic, 151 natural language, see Natural language information formatting Information technology future of, 314-315 government secrecy and, 283-292 Intension, vs. extension, 15. 73-75 Interpretations alternative, 70 grammar induced phasing of, 77-78 partial, 71-73 Interpretations problems averagers and quantifiers in, 44-45 functional nesting and quantifier reversal in, 39-41 interrogative determinants in, 48-51 interrogation pronouns in, 51-53 negations vs. quantifiers in, 38 other types of modifiers in, 42-44 predicates as modifiers in, 42 quantifier nesting in. 37-38 relative clauses in, 41-42 short scope/broad scope distinctions in, 45-47 “Wh” questions in, 47-51 Interrogative determinants, 50-51 see also “Wh” questions Interrogative pronouns, 51-53 IPC, see Interprocess communication
K Keyphrase index, 7
L Ladder-based sort system, 272 LAMBDA function, 16 Law Enforcement Assistance Administration, 299, 303 Linguistic framework relation of. to information, 117 relation of, to transformations, 118 string analysis in, 1 1 6 - 117 LINGUJSTJC STRING, 116, 122 Local computer networks, 165- 166 Loop computer networks, 166- 169 Loop interface, architecture of, 179- 183 Loop interface address, 187
Loop interface design, 178- 183 Loop subnetworks, analytic comparison of, 207 LSP sentence analyzer, 107 LSP English grammar system, 107, 116, 120, 126 LUNAR grammar semantic interpretations of, 76 syntactic representations in, 76 LUNAR meaning representation language, I1 see also Meaning representation language LUNAR system, 3, 6-7 see olso Natural language question answering; Semantic interpretation alternative interpretations in, 70 anaphoric reference in, 70-71 approximate solutions in, 64 dominant retrieval in, 57 extensional inference in, 15 ill-formed input and partial interpretation in, 71-73 intensional inference in, 73-75 isolated noun phrases in, 80 quantification in, 19 node VS. topic description in, 29 rule trees in, 34 semantic interpretation procedure in, 2526 semantic rules vs. syntax in, 29-30 semantics in, 10- 1 I SEQ enumeration function in, 23 structure in, Y- 10 “Wh” questions in, 53-54 Lunar meaning representation language. see also Meaning representation language functions and classes in, 20 generic quantifier in, 18- 19
M Magnetic bubble(s) behavior of near trap, 246 field-induced propagation of, 226228 formation of, 225 phenomenon of, 225-232 steering of. for text editing, 252-253 Magnetic bubble logic, 243-252 AND-OR type, 247 associative search in, 276-277
SUBJECT INDEX Boolean logic and, 252 character manipulation and, 253-257 data base implications and intelligence storage in, 277-278 dynamic and static deflectors in, 249-250 dynamic reordering of, 259-261 four-state ladder and, 265 general decoder in, 275 information selection and retrieval in, 274-276 line and page management in, 256-257 linked loop and ladder in, 262-266 locality of reference in, 257 physical constraints in, 243-247 sorting with input-output time, 272-274 steering switch in, 261-262 storage hierarchies in, 258-259 storage management in, 257-269 summary and outlook for, 278-279 topping schemes in, 268 uniform ladder in, 266-269 Magnetic bubble memory, 233-243 bubble logic systems and, 251-252 chip fabrication for, 235-236 control of, by electric currents, 228-232 current developments in, 237-240 density and cost factors in, 256-257 detection and replication of, 23 1-232 folding the OETS scheme in, 271-272 major-minor loop scheme in, 233-235 mechanism in, 232-243 memory hierarchies in, 259 multitract logic and, 248 odd-even transposition sort in, 270-271 pipeline array logic in, 251 topping by slithering in, 269 unconventional devices in, 240-243 Magnetic bubble track, 245 Major-minor loop scheme, 233-234 Man-machine communication, history of, 2-3 Meaning hierarchies, magnetic bubble and, 259 Meaning representation language, 11-21 see ulso LUNAR meaning representation language; LUNAR system commands in, 12 defined, 10 designators in, I 1 functions and classes in, 20
325
nonstandard quantifiers and, 17- 19 opaque contexts in, 15 proceduraVdeclarative duality in, 14- 15 proportions in, 12 quantification in, 12, 17-19 restricted class. quantified in, 17 syntax in, 13- 14 unanticipated requirements in, 20-21 Message transmission protocol, 169 Misplaced modifiers, in LUNAR system, 65-68 Modifier placement, selective, 66-67 Modifiers misplaced, 67-68 predictions and role fillers as, 42-43 specializers as, 43 MRL, see Meaning representation language
N National Academy of Sciences, 290-291 National Consumers League, 305 National Crime Information Center, 294 National Labor Relations Board, 295 National Transportation Safety Board, 300 Natural language data bases, 91-92 Natural language information formatting, 89- 159 applications of, 151- 159 approximate solutions in, 64-65 a priori semantic analysis in, 94 audit criteria application in, 154- 157 automatic generation of subfield word classes in, 106- 112 automatic information formatting in, 9296 case matching in, 153 computer programs for, 115- 151 data alert in, 158-159 data summarization in, 187- I88 experimental results in, 110- 112 format normalization in, 145- 149 and information structures in science subfields, 99- 106 modifier placement in, 65-68 principles and methods of analysis in, 94115 quality assessment in, 154 reuse of information in, 153 sublanguage grammar in, 97-99
SUBJECT INDEX
326
sublanguage method summarized in, 1121 I5 text analysis in, 91 Natural language question answering alternative interpretations in, 70 approximate solutions in, 64-65 interpretation problems in, 54 constituent use in, 68-69 ellipsis in, 69- 70 example of, 58-64 intensional inference in, 73-75 loose ends, problems, and future directions in, 64-75 LUNAR and,sec LUNAR postinterpretive processing in, 54-58 semantic interpretations in, 24-37 semantics of notation in, 21-24 semantics and quantification in, 1-85 syntactic/semantic interactions in, 78-84 Network command language, 96-99 Network operating system, requirements of, 185-187 Newhall loop, 171- 173, 208-209 NLAMBDA function, 16 Notation semantics procedural, 2 I quantified commands in, 23-24 Noun phrase question, in interpretation problems, 49 NRULES, in semantic interpretation, 33
0 Odd-even transposition sort, 270-27 I Office of Education, U S . , 299, 301 Official Airlitie Guide. 5
P Parse tree modular structure of. 127- 129 string modules in, 129 Parsing generating quantifiers in, 84 semantic interpretation as part of, 79 Parsing algorithm. 120 Parsing program, 120 components of, 120- 121 Pharmacology sublanguage study, 101
Phased interpretation, in semantic interpretation, 31-32 Pierce message format, 174 Post-interpretative processing, 54-58 LUNAR document retrieval in, 57 printing quantifier dependencies in, 57-58 smart quantifier in, 55-57 Pragmatic grammars, 80-81 Precomputer setting, of government secrecy issues, 284-289 Predicators, as modifiers and interpretation problems.42 Procedural semantics, 5 Public access computer and, 292-309 to government information, 283-292 Public Interest Research Group, 305 “Public’s right to know” issue, 284-289
Q QUANT, as “collar” of higher operations, 35 see nlso Quantifier passing operations Quantified commands, in notation semantics, 23-24 Quantifier dependencies, printing of, 57- 59 Quantifier passing operators, 36-37 Quantifier reversal, functional nesting of, 39-41 Quantifiers averages and, 44-45 generation of, 35-36 in interpretation problems, 44-45 vs. negations, in interpretation problems, 38- 39 “smart,” 55-56 Questions, formal semantic interpretations for, 5-6 QUOTE operator, 57
R Relative clauses, in interpretation problems, 41-42 RELTAG function, 41 Remote program calling, 190 “Right to know” laws, 285-289 Role fillers, as modifiers. 43
SUBJECT INDEX R:REL rule, in interpretation problems, 41 RRULE R:REL, in subordinate interpretations, 50 RRULES in interpretation problems, 41 in semantic interpretation, 33, 60-62
S SBA, see Small Business Administration Science subfields data structures vs. argument in, 103- 105 elementary fact units in, 101- 102 facts vs. meta-facts in, 102- 103 information structures in, 99- 106 properties of information and, 105- 106 S:CONTAIN rule, in semantic interpretation, 24-25 Securities and Exchange Commission, 306 SEM, as node interpretation, 30, 36-37 Semantic actions, see Semantic rules Semantic interpretation, 24-37 complications due to quantifiers in, 26 context-dependent interpretation in, 3031 . data base in, 4 generation of quantifier in, 35-36 multiple matches in, 3-5 NRULES in, 33 organization in, 30, 33-35 overall operation in, 58-60 phased interpretation in, 31 problems with alternate approach in, 26 RRULES in, 33 rules governed by verb in, 33 rule trees in, 34 quantifier passing operations in, 36-37 semantic rule structure in, 27-29 steps in, 36 TYPEFLAG in, 34 whole parsing in, 78-79 Semantic notation enumeration functions in, 21-23 in LUNAR system, 24 %CONTAIN rule in, 24-25 Semantic rules right-hand sides of, 28-29 structure of, 27-29 syntax and, 29-30 Senate Subcommittee on Constitutional Rights, 304
327
Sentence normalization string, I16 SEQ TYPECS, defined, 1 1 Shift-register memory, 228 Short scopeibroad scope distributions, in interpretation problems, 45-47 Small Business Administration, 308 Smart quantifiers, in grammar information system, 55-56 Specializers, as modifiers, 43 SPIDER network, 166 String analysis, in linguistic framework, 116-117 Subfield word classes, automatic generation of, 106- 110 Sublanguage grammatical structure of, 100 semantic structure and, 99 Sublanguage method discovery of information structures in, 115 Syntactic/preventive interpretations, 81-84 top-down vs. bottom-up interpretation in, 79-80 Syntactic structure, role of, 75-78 Syntax trees, 5
T T-bar permalloy patterns, 227 TENEX software, 195 Text analysis, in natural language information formatting, 91 Texts, automatic conversion of to structured data base, 89-159 Three-loop networks, 210 Time normalization problem, in computer program, 150 Transportation Department, U.S., 299-300, 302 TRIPSIS system, 8 pragmatic grammar of, 81 semantic information and, 81 semantic interpretations in, 25 verb interpretation in, 16 Truth conditions, 5 TYPEAS, 7 TYPEFLAG element, 29, 31 in semantic interpretation, 134 TYPEFLAG NIL, 59-61, 79
SUBJECT INDEX TYPEFLAG RRULES,60 TYPEFLAG SET, 79
W Wh” connective relative clause, 100 “Wh” determiners, DRULE for, 49-50 “Wh” questions see also Interrogative pronouns in interpretation problems, 47- 48 other kinds of, 53- 54 Women in Government, 307 Women’s Training and Resource Corporation, 308 Women Today, 307 “
U Ungrammatical sentences, understanding of, 85
V Verbs, rules governed by, 33 Veterans Administration, 298
Contents of Previous Volumes Volume 1
General-Purpose Programming for Business Applications CALVINC. GOTLIEB Numerical Weather Prediction NORMANA. PHILLIPS The Present Status of Automatic Translation of Languages YEHOSHUA BAR-HILLEL Programming Computers to Play Games ARTHURL. SAMUEL Machine Recognition of Spoken Words RICHARDFATEHCHAND Binary Arithmetic GEORGEW. REITWIESNER Volume 2
A Survey of Numerical Methods for Parabolic Differential Equations JIM DOUGLAS,JR. Advances in Orthonormalizing Computation PHILIPJ. DAVISAND PHILIPUBINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques KENNETHR. SHOULDERS Recent Developments in Linear Programming SAUL I. GLASS The Theory of Automata, a Survey ROBERTMCNAUGHTON Volume 3
The Computation of Satellite Orbit Trajectories SAMUEL D. CONTE Multiprogramming E. F. CODD Recent Developments of Nonlinear Programming PHILIPWOLFE Alternating Direction Implicit Methods GARRETBIRKHOFF,RICHARDS. VARGA,A N D DAVIDYOUNG Eombined Analog-Digital Techniques in Simulation HAROLDF. SKRAMSTAD nformation Technology and the Law REEDC. LAWLOR ‘olume 4
he Formulation of Data Processing Problems for Computers WILLIAMC. MCGEE
329
330
CONTENTS OF PREVIOUS VOLUMES
All-Magnetic Circuit Techniques A N D HEWITTD. CRANE DAVIDR. BENNION Computer Education HOWARD E. TOMPKINS Digital Fluid Logic Elements H. H. GLAETTLI Multiple Computer Systems WILLIAM A. CURTIN Volume 5
The Role of Computers in Electron Night Broadcasting JACKMOSHMAN Some Results of Research on Automatic Programming in Eastern Europe TURKSI WLADYSLAW A Discussion of Artificial Intelligence and Self-Organization GORDONPASK Automatic Optical Design ORESTESN. STAVROUDIS Computing Problems and Methods in X-Ray Crystallography CHARLES L. COULTER Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL An Introduction to Procedure-Oriented Languages HARRYD. HUSKEY Volume 6
Information Retrieval CLAUDEE. WALSION Speculations Concerning the First Ultraintelligent Machine JOHN GOOD IRVING Digital Training Devices R. WICKMAN CHARLES Number Systems and Arithmetic HARVEY L. GAKDER Considerations on Man versus Machine for Space Probing P. L. BARGELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HEKBERI GELERNTER Volume 7
Highly Parallel Information Processing Systems JOHNC. MUKTHA Programming Language Processors RUTHM. DAVIS The Man-Machine Combination for Computer-Assisted Copy Editing WAYNEA. DANIELSON Computer-Aided Typesetting WILLIAM R. BOZMAN
CONTENTS OF PREVIOUS VOLUMES
331
Programming Languages for Computational Linguistics ARNOLDC. SATTERTHWAIT Computer Driven Displays and Their Use in ManlMachine Interaction ANDRIES V A N DAM Volume 8
Time-shared Computer Systems THOMASN. PYKE,JR. Formula Manipulation by Computer JEANE. SAMMET Standards for Computers and Information Processing T. B. STEEL, JR. Syntactic Analysis of Natural Language NAOMISAGER Programming Languages and Computers: A Unified Metatheory R. NARASIMHAN Incremental Computation LIONELLO A. LOMBARDI Volume 9
What Next in Computer Technology W. J. POPPELBAUM Advances in Simulation JOHNMCLEOD Symbol Manipulation Languages PAULW. ABRAHAMS Legal Information Retrieval AVIEZRIS. FRAENKEL Large Scale Integration-an Appraisal L. M. SPANDORFER Aerospace Computers A. S. BUCHMAN The Distributed Processor Organization L . J. KOCZELA Volume 10
Humanism, Technology, and Language CHARLES DECARLO Three Computer Cultures: Computer Technology, Computer Mathematics, and Computer Science PETERWEGNER Mathematics in 1984-The Impact of Computers BRYANTHWAITES Computing from the Communication Point of View E. E. DAVID,JR. Computer-Man Communication: Using Computer Graphics in the Instructional Process FREDERICK P. BROOKS, JR.
332
CONTENTS OF PREVIOUS VOLUMES
Computers and Publishing: Writing, Editing. and Printing ANDRIESV A N DAMA N D DAVIDE. RICE A Unified Approach to Pattern Analysis ULF GRENANDER Use of Computers in Biomedical Pattern Recognition ROBERTS. LEDLEY Numerical Methods of Stress Analysis WILLIAM PRAGER Spline Approximation and Computer-Aided Design J. H. AHLBERG Logic per Track Devices D. L. SLOTNICK Volume 11
Automatic Translation of Languages Since 1960: A Linguist's View HARRYH. JOSSELSON Classification, Relevance, and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conversational Speech KLAUSW. OTTEN Man-Machine Interaction Using Speech DAVIDR. H I L L Balanced Magnetic Circuits for Logic and Memory Devices R. B. KIEBURTZ A N D E. E. NEWHALL Command and Control: 'Technology and Social Impact ANTHONYDEBONS Volume 12
Information Security in a Multi-User Computer Environment JAMESP. ANDERSON Managers, Deterministic Models, and Computers DIROCCAFERRERA G. M. FERRERO Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROBERTS Systems Programming Languages R. D. BERGERON, J . D. GANNON, D. P. SHECHTER, F. W. TOMPA,A N D A. V A N DAM Parametric and Nonparametric Recognition by Computer: An Application to Leukocyte Image Processing JUDITHM. s. PREWITI Volume 13
Programmed Control of Asynchronous Program Interrupts L. WEXELBLAT RICHARD Poetry Generation and Analysis JAMESJOYCE
CONTENTS OF PREVIOUS VOLUMES
333
Mapping and Computers PATRICIA FULTON Practical Natural Language Processing: The REL System as Prototype B. THOMPSON A N D BOZENA HENISZTHOMPSON FREDERICK Artificial Intelligence-The Past Decade B. CHANDRASEKARAN Volume 14
On the Structure of Feasible Computations J. HARTMANIS A N D J . SIMON A Look at Programming and Programming Systems T.E. CHEATHAM, JR., A N D JUDYA. TOWNELY Parsing of General Context-Free Languages A. HARRISON SUSANL. GRAHAMA N D MICHAEL Statistical Processors W. J . POPPELBAUM !nformation Secure Systems I. BAUM DAVIDK. HSIAOA N D RICHARD Volume 15
Approaches to Automatic Programming ALANW. BIERMANN The Algorithm Selection Problem JOHNR. RICE Parallel Processing of Ordinary Programs DAVIDJ . KUCK The Computational Study of Language Acquisition LARRYH. REEKER The Wide World of Computer-Based Education DONALDBITZER Volume 16
A 0 C 8 D 9 E O
F 1 C 2
H 3 1 4
J
5
3-D Computer Animation CHARLES A. CSURI Automatic Generation of Computer Programs NOAH s. PRYWES Perspectives in Clinical Computing KEVINC. O’KANEA N D EDWARD A. HALUSKA The Design and Development of Resource-Sharing Services in Computer Communications Networks: A Survey SANDRA A. MAMRAK Privacy Protection in Information Systems REINTURN
This Page Intentionally Left Blank